Rendering to the Screen

How the camera draws the world

For the camera to render an object, it needs to convert the object position (x,y coordinates) in terms of blocks to pixels for the canvas (the screen). The camera also needs to change the screen perspective based on it's location. It does this scaling transformation for all the objects in the world. Here are the equations used in Paper:

blockSizeX = canvasWidth / viewportSizeX (width of one block in pixels)

blockSizeY = canvasHeight / viewportSizeY (height of one block in pixels)

locationX = (objectX - cameraX) * blockSizeX (X location of block relative to camera in pixels)

locationY = (objectY - cameraY) * blockSizeY (Y location of block relative to camera in pixels)

3D geometry in other game engines

My Paper game engine renders all blocks as 2D squares but 3D game engines use arrangements of points, lines between points and faces (collections of lines) to make up a 3D model. Faces can only contain 3 points because this guarantees they are flat - if one point moves the whole face will rotate, unlike moving a point in a square. Every face in a 3D model is therefore based on triangles because you can always draw a plane through them, they are 2D planes in a 3D world. For example, a square would be made up of 2 triangles and a cube would be 12 triangles. Note that some game engines may seem to be able to use non-triangle faces but under the hood, before they get rendered, they are being converted into triangles.

Triangles can also be split into 2 right-angled triangles so Pythagoras's Theorem can be used to calculate distances, and trigonometry to calculate angles in a game. 2D Pythagoras has the formula:

c = √(x2 + y2)

In 3D there are 3 coordinates: x = left/right, y = up/down and z = backward/forward (although some engines switch these letters around) so 3D Pythagoras has the formula:

c = √(x2 + y2 + z2)

3D rendering

There are two main types of rendering in 3D, orthographic and perspective. Orthographic means that you basically ignore one of the dimensions and render as if it were 2D but this is not how our eyes see. Instead, we see in perspective, which means that as things get further away they get smaller. To render using the perspective method, points are turned into positions relative to the camera's position and orientation. This involves a lot of complicated trigonometry using vectors, matrices (due to the 3 coordinate system) and Euler angles that I won't go into further! Some more complicated maths then projects those points through the camera's pin hole and onto its sensor (2D plane). This can be easily visualised with an illustration.

3D rotations

When you use 3 numbers (3D) to specify rotation, a situation called 'gimbal lock' can occur where 2 of the planes of rotation meet up. Imagine a gyroscope and if 2 of the rotation bits of the gyroscope align it ocks so you can now only rotate in one direction. This can cause weird effects in games so to avoid this some game engines use quaternions instead of Euler angles to represent rotations. I don't know much about this except that quaternions consist of 4 rotational dimensions x, y, z and i.

3D lighting

In Paper, everything is rendered with the exact colour specified but this is not what happens in real life. If an object is in shadow it appears darker and if it is in daylight it appears lighter. Other game engines try to calculate this lighting effect. There are three ways to do this. Rasterising projects a 3D shape onto a 2D plane and then fills in all the pixels whose centers' are within the shape boundaries. Shaders that work like this are quick and optimised, which allows them to run in real time (good for games) but it takes more effort to produce photorealistic results ('baking').

Ray tracing is more realistic because it simulates particles of light. The camera shoots out rays for every pixel on the camera screen and the rendering engine calculates the intersection of those rays with objects in the scene. If the rays hit something, the engine calculates other rays going from that point to the light sources, and if those hit something rinse and repeat. This recursive action allows for reflections, refractions, reflections of reflections etc. So although I said before that ray tracing simulates particles of light, the rendering engine actually calculates these in a reverse fashion. Ray tracing uses complicated equations to calculate intersection points, and as there are so many, it can be very slow. Ray tracing is used more for CGI and VFX. It is used in animated movies where rendering a few seconds can take hours of computer time!

Ray marching is very similar to ray tracing but is the lesser known of the three. Instead of using complicated maths to calculate if the ray hits something, the rendering engine uses a distance estimator, which takes in the position of the ray and calculates how far it can march (move forward) before hitting something. When the distance estimator gets very low, the engine assumes the ray has hit something and because the distance estimator is an equation, not 3D geometry, the engine can simulate infinite detail. You could specify a simple shape, translate, rotate, reflect or scale it recursively, which would give you infinite detail.

Image of ray tracing created by Henrik distributed by Wikipedia Commons