OpenGL Forward Renderer
#1
https://www.youtube.com/watch?v=wc5bXKyoNYc

This just shows you what I have done so far in my basic C++ OpenGL forward renderer. At the moment everything is really basic but I am hoping to expand on this just to  help me understand more about shaders and C++ in general. I started to follow these these tutorials https://www.youtube.com/watch?v=DkiKgQRiMRU until I got to the last one and just carried it on.
I am using SDL2, GLM and Glew to do  do all of this at the moment in Visual Studio 2013.
What do you think?    Smile
Life is better at 350+ kph
Reply
#2
Wow, nice! Smile

Also, and this is just my belief, stick with forward rendering. Deferred seem to have certain limitations (with number of materials and anti aliasing), so just stick with the traditional way for now and focus on render more nice eye candy.
Code:
systemd-journald(195): Received SIGTERM.
systemd[1]: systemd-udevd.service has no holdoff time, scheduling restart.
systemd[1]: systemd-udevd.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-udevd.service
systemd[1]: Unit systemd-udevd.service entered failed state.
systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
systemd[1]: systemd-journald.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-journald.service
systemd[1]: Unit systemd-journald.service entered failed state.
Try systemd. They said.
It'll be just as reliable as init. They said.
It'll be completely bug-free. They said.
Our monolithic windows-approach is far superior to the Unix-approach. They said.
So the codebase has grown gigantic and no one but our paid group of full-time developers who created it can maintain it and fix bugs... but it'll be fine. They said.
Okay, we'll shove it down your throat whether you like it or not. They said.

I guess it's finally time to look into GuixSD and/or devuan.
Reply
#3
Thanks. Yea I looked into Deferred rendering and for now it's not worth it with all the transparency issues, since I am not using many lights and not many meshes at the moment it doesn't really need deferred rendering but I will probably try it much later on down the line.
Life is better at 350+ kph
Reply
#4
Wow, that was a quick reply!

Yeah. Deferred rendering is both attractive and at the same time not so much. It solves the problem with many light sources but introduces other problems and limitations. It (or something similar) will possible be the standard approach in the future. But personally I believe if one just splits up larger models (like terrain) into smaller pieces and render each with only a few of the brightest lights closest to it one could get enough lights in the scene (if there's even a thing like that? Wink ).

I once made a benchmark on an old intel cpu+gpu motherboard and could get up to 3 light for every single model in the scene. Any more than that and I think some hardware limitation cause serious performance loss. But that was with legacy opengl. I'm not even sure if it would be a problem with glsl (probably just comes down to the actual raw gpu power then). And a modern mid-/high-end gpu these days should have so much processing power that... it almost makes me drool.
Code:
systemd-journald(195): Received SIGTERM.
systemd[1]: systemd-udevd.service has no holdoff time, scheduling restart.
systemd[1]: systemd-udevd.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-udevd.service
systemd[1]: Unit systemd-udevd.service entered failed state.
systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
systemd[1]: systemd-journald.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-journald.service
systemd[1]: Unit systemd-journald.service entered failed state.
Try systemd. They said.
It'll be just as reliable as init. They said.
It'll be completely bug-free. They said.
Our monolithic windows-approach is far superior to the Unix-approach. They said.
So the codebase has grown gigantic and no one but our paid group of full-time developers who created it can maintain it and fix bugs... but it'll be fine. They said.
Okay, we'll shove it down your throat whether you like it or not. They said.

I guess it's finally time to look into GuixSD and/or devuan.
Reply
#5
Hehe Yea. I still have a long way to go though. I need to integrate an FBO to do some sort of shadows which will probably mean creating a new shader just for shadows, but at them moment I am quite busy with Uni work as this is just a side project making me learn C++ and OpenGL. Big Grin
Life is better at 350+ kph
Reply
#6
And once you know how to use FBOs, you can start on reflections... Wink
Code:
systemd-journald(195): Received SIGTERM.
systemd[1]: systemd-udevd.service has no holdoff time, scheduling restart.
systemd[1]: systemd-udevd.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-udevd.service
systemd[1]: Unit systemd-udevd.service entered failed state.
systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
systemd[1]: systemd-journald.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-journald.service
systemd[1]: Unit systemd-journald.service entered failed state.
Try systemd. They said.
It'll be just as reliable as init. They said.
It'll be completely bug-free. They said.
Our monolithic windows-approach is far superior to the Unix-approach. They said.
So the codebase has grown gigantic and no one but our paid group of full-time developers who created it can maintain it and fix bugs... but it'll be fine. They said.
Okay, we'll shove it down your throat whether you like it or not. They said.

I guess it's finally time to look into GuixSD and/or devuan.
Reply
#7
Don't forget all the post processing. Big Grin
But I guess refractions are much shinier, but very intensive.
Life is better at 350+ kph
Reply
#8
,SlingerWow, nice! Smile

Also, and this is just my belief, stick with forward rendering. Deferred seem to have certain limitations (with number of materials and anti aliasing), so just stick with the traditional way for now and focus on render more nice eye candy.

The limitation on materials is definetly not a Deferred rendering issue, its how you batch your objects - this is an issue with whatever rendering pipeline was built on it - not the technique itself: besides the 8timess larger computation cost of deferred (on shader model 3, far less cost on SM4+).

EG: Everything that shares the same material instance, batch together (statically, or dynamically), if two instances  have thesame shader - you still batch those two instances and all the meshes that use them as two seperate materials. Lets say if you have a CarPaint shader, and two players have their cars painted different colors, those two cars are drawn seperatly (except of course if you ose on static shader and send the car's colors in down an instance buffer vbo thing, which is how I used to do it in DirectX9),

Instead of OneMaterial per Shader, it's one material per VARIATION in the shader's paremeter, so you still have one shader, but maybe two materials referencing that shader, because two entities using that shader have different textures, or different colors.

For example: in my game, I have one loaded shader for all space ships where:
 frigate_01 on team one is blue
 frigate_01 on team 2 is read

ALL team_01 frigate_01 enties are all batched together, as ONE draw call, so if the blue team has 60 frigate_01 entities, 60 are drawn via instancing. Then the same for red team. In a SM3 deferred rendered, both team frigates are rendred severeal times per frame - but on SM4, if the deferred renderer supports it, might limit this to 3 or 4 times instead of a possibly higher digit.

You cycle on a per material instance - render all objects with that material to all deferred rendertargets at the same time (lookup shader model 4-5 GPU features, you can literally render to multiple rendertargets in shader pass, which means you could render to 4 rendetextures on some hardware these days in only one draw call - im not shure how the OpenGL implementation works, but HLSL DX10+ has semantics for this),  a SM3 deferred renderers (DX9, and GL equivelants) - however, will always be slower.

But these days with 256bit dpus, the ALU limit going up, and such - the only real problem most game engines have these days, regardless of deferred or foreward - is FIll rate. You can never get enough fill rate - doesn't matter which way you go - the goddamned fill rate (especially when you want to redner a shit load of transparencies).

Material limitations will be in the renderer or the graphics pipeline (eg: the game engine), specifically.


(2014-12-17, 06:12 PM)Slinger Wrote: Wow, that was a quick reply!

Yeah. Deferred rendering is both attractive and at the same time not so much. It solves the problem with many light sources but introduces other problems and limitations. It (or something similar) will possible be the standard approach in the future. But personally I believe if one just splits up larger models (like terrain) into smaller pieces and render each with only a few of the brightest lights closest to it one could get enough lights in the scene (if there's even a thing like that? Wink ).

Most foreward and deferred renderers usually have only a hanfull of lights  being sent to a shader, regardless of how many lights are defined in the actual game world. (keep in mind shaders instruction limitations).


For Forward Surface shading/pixel shader lighting,  AND in defferred mode, unity, for example - will always select no more than 8 lights and pass their info down a constant uniform array in the shader, this is also similar for things like Source, UDK, Doom engine, etc - where the number of lights they pass would vary. So regardless of forward or deferred, you have 100 lights in a scene, but you will never render all of them at once, even if they are all literally in the same place over the same geometry.
 
Code:
// EG: in a shader

uniform float3_myLightColors[] =  { array of the closest set of lights to the camera }

// xyz = world Space position, w = light type (0 = point, 1 = spot, -1 = directional)
uniform float4 _myLightPosition[] =  {positions of the closest 8 lights to the camera}

///////////////////////////////////////////////////////

// alternatively you could define a struct:
struct ROLLCAGE_LIGHT
{
      float4: position; // xyz = pos, w = type
      float4: color; // xyz = rgb, w = attenuation distance(if not a directional light)
      /// other per-light data... EG: spotelight cone, spread, stuff calculated in a shader before everything is actually rendered
      /// light bounce if engine supports GI/BRDF stuff, etc

      // can even store a WorldSpaceLightDir anda ViewSpaceLightDir, etc... you can have 12 interpolaters in a struct (to be SM3 compatible),
      // so might as well use them
}

// and store all of them in a uniform array - in this case these are 16 lights: these are used in additive passes.
uniform ROLLCAGE_LIGHT[16] g_world_lights;

uniform ROLLCAGE_LIGHT g_mainLight; // a seperate light which is always the main directional light, used by the base geometry pass.

where LightDirection could be calculated per-vertex in a fragment shader( or per pixel )
v2f vert(vert_struct IN)
{
   for(i = 0; i < 16; i++)
   o.lightDir[i] = normalize(_g_world_lights[i].position-v.vertex);
  // pass to fragshader...
}

float4 frag(pixel_struct IN)
{
   // lighting, and base diffuse calculation
}
There isnt much point in having more lights than this 8 or 16 in any type of renderer, additionally because:
 - rendering a base (object colors) in the scene, you just need one ADDITIVE pass in a shader to add the lighting for all 8 or 16 lights. If you had 50 lights you would need many more passes - this would be ineficient (and wastefull) use of the precious and extremely limited ALU slots available on most GPUs. Simply pick only the 8(or 16) closest lights in the scene (cpu side), and thats it, you can render them all ate once (per object-in foreward mode), in deferred thats All at once for EVERYTHING within their attenuation ranges..

The cost of trying to send  40 light sources into a gpu shader program will be unbearabvle, or useless. If your game world has 50 lights, you will only  select 8 of the closest lights. If you only have two lights then you just pass empty vectors for the remaining 6 slots to the GPU shader program. This is done for both foreward and deferred modes in alot of game engines. If you think you can handle light-sorting overhead on the CPU side - select all lights in the scene that are flagged as Visible to a camera, sort them back to front ( distance from camera relative: index 0 is the closest light), and pass only the first few lights to the shader.

For your main (sun) direction light, you can use a static uniform vector for direction and color (and atten only for shadow maps),
as you can still have one sun and zero point lights - while still being able to support multiple directional lights using arrays.
If you have only one light source (the directional light), then you can skip the additive light passes alltogether for everything.

The real problem will come with shadowmapping - if you want multiple lights to cast shadows.


Foreward Rendering:
Pass 1: Render Base Pass for geometry (diffuse, main directional light, etc)
Pass 2: Render Geometry again, but with an additive blend mode, with lighting calculation and colors only, over the same mesh (dont clear the depth buffer) - ZWrite in most cases should be set OFF for this pass

(a DepthNormals texture can also exist in a foreward renderer, so DLAA is possible here, but you wont be able to do any HDR if the GPU is handling MSAA for you, as this will cause artifacts in your depthNormalsTexture buffer).

Deferred rendering:
 - Render Depth (or DepthNormals[screen space normals] Texture) of everyting in the scene (a geometry shader that outputs only Depth (and normal translated to vcamera viewspace)
 -- other renderrer passes that the deferred rendering pipline has

-- Geometry Shader: Render base pass ( color, with main directional (SUN) light colors ) to the geom rendertarget
-- Geometry Shader: Render Additive Pass ( add lighting calculations from up to 8 of the closest light sources  )


If you want real time shadow mapping you will need YET another pass for eveything anyhow - stencil shadows are performance heavy in comparison to real time shadow mapping via rendertargets (even Carmike gave up on them) .


If you want to use 16 lights , you can have interpolators in GLSL/HSLS shaders with 16 vector4 datatypes. (at least in HLSL/CGSL - but do keep in mind limitations (specially maximum instruction count) in regards to the shader model too, on SM2 and SM3 there are some serious limitations to avoid regardless of weather your engine is DirectX or OpenGL, deferred or foreward, so if you want Racaged to one day run on on an Iphone, Windows Phone, or even Android, you are going to be VERY VERY restricted ( and rendertextures are still unsupported on alot of mobile devices, so no AA, Deferred, and such on most small devices anyhow ).

In a foreward renderer you can render "Layers" by simply clearing the depth buffer(in both deferred and foreward), so having some kind of bit flag implementation to tag ingame rendrebles can  determin also what order to render in , and what lights whould affect them.

As an optimization you could do you lighting calculations in the vertex shader instead of the pixel shader, but this will mean that you cannot have fancu Normal Mapping, Bump Mapping, Relief Mapping and such (but you can still have rim-lighting and faked fresnal). You can also bake lightmaps (textures) for objects that dont move (emission), since one a baked lightmap is created, you wont ever need to recalculate them (unless your lighting conditions change. Lightmaps  can either be their own Deferred screen space rendertarget, or even be baked on a per-geometry bases (meaning a foreward renderer can support them).



(2014-12-18, 08:36 PM)tonythedemon Wrote: Don't forget all the post processing.  Big Grin
But I guess refractions are much shinier, but very intensive.
I think you meant to say "reflections" ;;  but im gonna answer on refractions first:

I have a shader I wrote for refractions: It's pretty straight foreward:
 - Render everything behind the object (this is known as a GrabPass in Unity HLSL/CG)
 - Sample the  (everything behind me ) texture in the geometry shader with appropriate UV offsets (usually calculated from normals).
 - this means this geometry has two passes:
 - ->  one pass that grabes the current state of the geometry rendertarget ,
 - -> another pass that renders the actual geometry,
 - -> the cost of refractions will depend on how many refractive objects there are.

However: Fake Refractions (cheap)
 -- have a clobal reflection cube map (can either be a single realtime reflection probe, or a stati cube map),
 -- Use screen space refraction offset (however you need it ), instead of uvs calculated from a reflection vector:
 
Now in regards to reflections: you have 3 real choices:
 -- A static cube map
 -- Real Time Reflections (expensive if you want high res),
 -- BRDF style reflections (sample low resolution cube map rendered by the main camera - apply blure to the cube when it is sampled - add some fresnal - and voila: fake HDRI that takes immediate surroundings into perspective)

Reflections (realtime) is simply you rendering a camera in 6 directions ( each rendering to a cube face),  and paccing the cube texture to a shader, After that all you need in your shader is something like this:
 
Code:
// _spec : invert and blure normal for BRDF, sample
// both and blend accordingly for FAKE HDRI + Reflections combined (if your engine has no GI support),
// the __specCube is a uniform spec cube that is rendered at every fram only once and used by any shader that needs it:
fixed3 ref = texCube(_SpecCube0, reflect(viewDir, normal) );
fixed3 hdri = MY_BLURRY_SAMPLER_CUBE_FUNCTION(_SpecCube1, reflect(viewDir, -normal) );

If you want to calculate Normals from a grayscale heightmap (if you dont have tangents and binormals, I currently use this..),
to use with a reflection (this is UnityCG/HLSL but it applies pretty much anywhere:

Code:
//Returns a normal from a grid of heights (heights sampled from a heightmap texture)
// to create BumpMap normals. For real Normals/Parralax/Relief mapping you need tangents/binormals in addition to this
// N == center texel
// a,b,c,d == Up(or down if your 0,0 texel is bottom left), right, down(or UP), left
inline float3 computeNormals(float h_A, float h_B, float h_C, float h_D, float h_N, float heightScale)
{
//To make it easier we offset the points such that n is "0" height
float3 va = { 0, 1, (h_A - h_N)*heightScale };
float3 vb = { 1, 0, (h_B - h_N)*heightScale };
float3 vc = { 0, -1, (h_C - h_N)*heightScale };
float3 vd = { -1, 0, (h_D - h_N)*heightScale };
// cross products of each vector yields the normal of each tri - return the average normal of all 4 tris
// note the -4 on the end if you don't wanna screw this up
float3 average_n = (cross(va, vb) + cross(vb, vc) + cross(vc, vd) + cross(vd, va)) / -4;
return (average_n);
}

PHP Code:
// - where  my samples are with uv offsets like:
// a texel = 1.0/texture size
float2 uv[5] = 
{
                float2(IN.uv_tex1.xIN.uv_tex1.ty),    // 0:A
                float2(IN.uv_tex1.txIN.uv_tex1.y),    // 1:B
                float2(IN.uv_tex1.xIN.uv_tex1.ty),    // 2:C
                float2(IN.uv_tex1.txIN.uv_tex1.y),     // 3:D

                IN.uv_tex1                                            // 4:N : center texel
};

// so usage is as: computeNormals(A, B, C, D, N, [FLOAT]);
float3 normals computeNormals(h[0], h[1], h[2], h[3], h[4], 2); 
The attatched image shows what these two look like in my own project... where I have both a HDRI sample as well as reflections on my ships, wich both are actually sampled from only one subemap rendered by the main camera.
 


Refractions/Reflections are not a post processing effect unless you are applying them directly to the camera's result itself after everything in the scene has already been rendered, If if you are rendering reflections you will need to render reflections from the camera before rendering the objects (dont do realtime reflections on a per geometry basis please, you will eat up a huge chunk of you GPU doing so, especially as far as VRAM is concerned) - one or to cemarea relative cubemaps is enough for most cases. You could also reflect relative to an imaginary plane ( eg: water, wet floors, shiny kitchen tiles ) (to a 2D reflection map).

In deferred rendering Antialiasing is  a postprocess thing (and better yet, you can use whatever antialiasing algorithm you want), because you are doing it in a shader instead of letting the dpu do it for you. Most MSAA/DLAA,NFAA etc... things require a depth buffer anyhow, while AA such as DLAA requires a Depth and AND a screen space normals texture (can be done in both foreward deferred modes).

Antialising requires a depth buffer because it uses the depth buffer to to detect the edges between objects, and then smooth only the pixels(err, Texels) that need to be smoothed in the Color buffer. With DLAA and others that require screen space normals, they can detect actual edges, folds, and creases on the geometry itself (in wich typical MSAA/FSAA cannot),  in return allowing all sharp edges on an object, including those overlapping itself  - to be antialiased.

Yeah, deferred means more work (and heavier computation do to rendering the same geometry like 6 times over, and more confusing screen space transformations), but you can get a far better image quality. Alpha Trsnaparency is a bitch in both foreward and deferred rendering regrdless - it is, and always will be super expensive for as long as people continue rendering depth sorted alpha transparent particle systems (which of course, they can't live without).
Reply
#9
my screenshot didnt upload: trying again:


Attached Files Image(s)
   
Reply
#10
Wow thanks for all the information  Big Grin

At the moment I am just trying to get FBO's working properly. I have got an FBO working on a 3D plane but I can't get any 2D plane to render onto the screen.
so far I have: [Image: r7RzMGV.png]

But I also want to render to the texture to a quad that covers the whole screen.
When I try to try and bind to use a new shader program and draw the image looks like this: [Image: C6fYKJY.png]

My code looks something like this:
Code:
     camera.switchToPerspective();
     shader.bind();
     //render 3D
     shader.unbind();
     

     camera.switchToOhro();
     glDisable(GL_DEPTH_TEST);
     shader2D.bind();

     // render2D quad

     shader2D.unbind();
     glEnable(GL_DEPTH_TEST);
     
     //flip buffers

I don't know what I can be doing wrong unless it's something to do with the 2D shader, but I don't get any errors back from the error checking I do after I compile the shader.

Edit:

I finally worked it out. It was because I was passing a shader as a value not as a pointer. Such a simple mistake  Angry  Ah well at least I can draw 2D now:
[Image: JyFazeq.png?1]
Life is better at 350+ kph
Reply
#11
Codie: lots and lots of goodies in that post. Smile I'll be re-reading it many times.

Also, regarding my previous statements about deferred (been thinking about this and never gotten around to actually write this):
* Using multiple materials should be possible to implement by storing all materials in a texture or UBO array, and then use one of the elements in the gbuffer as an index to it. Shouldn't be any real performance problem on modern hardware.
* And antialiasing was not really an issue. One could just manually implement msaa by using a large resolution buffer and sample it down to render resolution.

I guess the only real issue with deferred I can't accept is that it just can't be implemented for translucent fragments... Undecided

Also, forward rendering can render with a single pass (render all lights and ambient lighting at once), but then one doesn't get the benefit of early z discard of unseen pixels. Of course early-z-pass can't be done for translucent materials. The only reason to limit passes like this I think is mobile platforms and certain drivers that got high cpu-gpu bottlenecks (such as the open/free radeon xorg drivers) - but a single early-z pass is usually still more advantage than disadvantage.
Code:
systemd-journald(195): Received SIGTERM.
systemd[1]: systemd-udevd.service has no holdoff time, scheduling restart.
systemd[1]: systemd-udevd.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-udevd.service
systemd[1]: Unit systemd-udevd.service entered failed state.
systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
systemd[1]: systemd-journald.service failed to schedule restart job: final.target is queued, ignoring restart request for unit systemd-journald.service
systemd[1]: Unit systemd-journald.service entered failed state.
Try systemd. They said.
It'll be just as reliable as init. They said.
It'll be completely bug-free. They said.
Our monolithic windows-approach is far superior to the Unix-approach. They said.
So the codebase has grown gigantic and no one but our paid group of full-time developers who created it can maintain it and fix bugs... but it'll be fine. They said.
Okay, we'll shove it down your throat whether you like it or not. They said.

I guess it's finally time to look into GuixSD and/or devuan.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)