# 2D omnidirectional shadow mapping with Three.Js

In this post, I will talk about how I implemented a 2D omnidirectional shadow mapping algorithm (that is, given a 2D world with obstacles defined by polygon and a set of point lights, render this world by taking into account the shadows produced by the interaction between the lights and the obstacles). You can check out a live demo here (WebGL support is required to run it).

Although I’ve already implemented 3D shadow maps for spot lights in the past, the present case opens up space for exciting new ways of approaching this problem. As far as I know, Three.Js’ shadow implementations are specifically designed for 3D scenes. Moreover, I haven’t seen any references to omnidirectional shadow maps in this library (which doesn’t mean it’s not there; it might just not be documented). Therefore, instead of using the default implementation of shadow maps, I decided to code my own and explore the new possibilities I had in mind, including the use of a 1D depth map.

## A naive approach

Let’s start by the way it should not be done. The first idea that came to my mind was to use a single 1D depth map, where I would project the world into a circular 1D buffer, as if the camera was a circle inside the 2D plane. The equation for this is very simple: Given the light position $\mathbf{l}$ and a vertex $\mathbf{p}$, its projected coordinate on such a buffer would be given by

$\mathbf{p}'=\frac{\arctan(\mathbf{p}_y-\mathbf{l}_y,\mathbf{p}_x-\mathbf{l}_x)}{2\pi}$.

This formula would go into the vertex shader, where it would receive the polygon vertices $\mathbf{p}$ and forward the projected vertex $\mathbf{p}'$ to the rasterizer. The GPU would, then, interpolate straight lines between the projected vertices (note that, for this to work, the models had to be rendered in wireframe mode, which I’ll talk about later) and the fragment shader would output the depth of each pixel to a framebuffer, which would later be used to determine whether or not a point in the map world lit.

Although very simple, this approach has a fundamental problem. As we are trying to model a circular entity (a hypothetical circular camera) with a non-circular buffer, the discontinuity between the edges is not something our GPU is aware of. This means that polygons that are projected in the discontinuous region of our circular buffer will be incorrectly rasterized, as shows the image below.

## From cube mapping to triangle mapping

Another way to model a omnidirectional camera is by performing the 2D equivalent of the cube mapping algorithm: By breaking down the “field of view” of the light into several sub-regions and modelling each one as a perspective camera. Similarly to the circular buffer, we only need to render each sub-region into a 1D buffer. Also, given that performance was a concern, I decided to divide the light into only three different regions, since each sub-region will require one render pass.

Each perspective projection was defined by a Three.Js PerspectiveCamera object with a horizontal fov of 120°, and having the $\mathbf{z}$ vector as its “up” vector. To move the lights around, we translate these “cameras” along with it. Since they cover the whole field of view of the light, there’s no need to rotate them. In my implementation, I set the aspect ratio to be 1:1 because the PerspectiveCamera constructor receives the vertical fov as its parameter, not the horizontal. Therefore, a 1:1 aspect would make both the horizontal and vertical field of view to be of 120°.

This is how I set up these cameras:

fboCameras = [
new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance),
new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance),
new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance)
];

// Projection matrix to be passed to each depth pass.
// Notice that all lights have the same projection matrix.
fboProjection = fboCameras[i].projectionMatrix;

for(var i = 0; i < 3; ++i) {
fboCameras[i].position.set(0.0,0.0, 0.0);
// These cameras are inside our plane, which lies
// in the XY axes at Z=0. Therefore, our up vector
// must be the Z vector.
fboCameras[i].up.set(0,0,1);

// Matrix that transforms points from world coordinates
// to light coordinates (world2camera)
fboView.push(fboCameras[i].matrixWorldInverse);
}

// Camera [0] corresponds to camera (A) in the image below
fboCameras[0].lookAt(new THREE.Vector3(0, 1, 0));
// Camera [1] corresponds to camera (B) in the image below
fboCameras[1].lookAt(
new THREE.Vector3(-Math.sin(120*Math.PI/180),
Math.cos(120*Math.PI/180), 0)
);
// Camera [2] corresponds to camera (C) in the image below
fboCameras[2].lookAt(
new THREE.Vector3(Math.sin(120*Math.PI/180),
Math.cos(120*Math.PI/180), 0)
);


Each camera will require one framebuffer where it will render its depth view of the world. I have created three buffers with width of 1024 each, and had their format set to RGBA (the reason for this will be explained later).

## Rasterizing a flat polygon into a 1D buffer

As you might already know, a polygon has no thickness. As such, we can’t expect our polygons to be rasterized by the GPU during the light passes, since they would be projected as lines without any thickness. In order to solve this problem, we need to give thickness to our polygons, transforming their 1D projections from lines to rectangles. I’ve thought of two ways to achieve that:

• Render the primitives in wireframe mode. This is not very effective, as line joints may not be connected after rendering.
• Extrude the polygons. In other words, transform them into prisms.

Considering the potential problem of unconnected lines related to the first solution, I decided to extrude all my polygons (in my demo, it was just a matter of using cubes instead of squares).

## To float textures or to fake float?

In a desktop implementation of OpenGL, it would be possible to make the output of each light pass to be written to a float texture, which is highly desirable as we are rendering the depth of our scene relative to each light. Unfortunately, floating point textures are not a standard webgl feature. They require the extension OES_texture_float, which might not be available on some clients.

Instead of using this extension and running the risk of making a demo that’s not playable by some people, I decided to pack the (floating point) depth into an RGBA pixel, which is conveniently 32bits wide. There is a caution that needs to be taken for this to work. From the perspective of the Fragment Shader, each component of gl_FragColor is a float; however, each of them will be converted to an unsigned byte to be stored in the RGBA texture. Moreover, these components are assumed to be in the range [0,1], which will be mapped to the [0, 255] range when converted to bytes. Therefore, in order to pack a depth value inside an RGBA texel, we need to make sure that it can be broken down into four floats, each one lying in the range [0, 1] but with enough precision that it will not lose information after converted to a byte.

An easy way to comply with the range restriction is by using the NDC z coordinate of your vertex, as it is (non-linearly) mapped to the [0,1] range. You can get this value from the gl_FragCoord variable inside your fragment shader program. Provided that a normalized depth is used, it can be broken down into four components (that will be stored as four bytes in the GPU), and can be recombined back into a single float during the scene render pass.

To pack a float as a vec4 in the fragment shader, I used this function proposed by Z Goddard:

vec4 pack_depth( const in float depth ) {
const vec4 bit_shift = vec4( 256.0 * 256.0 * 256.0,
256.0 * 256.0,
256.0, 1.0 );
const vec4 bit_mask  = vec4( 0.0,
1.0 / 256.0,
1.0 / 256.0,
1.0 / 256.0 );
vec4 res = fract( depth * bit_shift );
return res;

}


To unpack a float that’s packed in a vec4, I took this function from ThreeJS’ shadow mapping library:

float unpackDepth( const in vec4 rgba_depth ) {

const vec4 bit_shift = vec4( 1.0 /
( 256.0 * 256.0 * 256.0 ),
1.0 / ( 256.0 * 256.0 ),
1.0 / 256.0, 1.0 );
float depth = dot( rgba_depth, bit_shift );
return depth;

}


## Second render pass: A light region for each fragment

During the scene render pass, we need to determine from which framebuffer each fragment will have its depth compared to. Just like in the shadow mapping algorithm, if the depth of a fragment is smaller than the value stored in its corresponding framebuffer, then this fragment is lit; otherwise, it is not. Given the fragment position $\mathbf{f}$ and the light position ${^w\mathbf{l}}$, the index of the framebuffer that it should sample from can be determined by:

$\lfloor \frac{\arctan(\mathbf{f}_y-{^w\mathbf{l}}_y, \mathbf{f}_x-{^w\mathbf{l}}_x)+7\pi/6}{4\pi/6} +1\rfloor \mod 3$

This equation assumes that you are using three framebuffers (which is our case, since we have one render target for each light). It also assumes that the subdivisions of the light are setup in such a way that region (A) has index 0, (B) is 1 and (C) has index 2 in the render target vector.

The light position ${^w\mathbf{l}}$ must be specified in world coordinates (alternatively, the fragment position $\mathbf{f}$ could be specified with respect to the light, which means we wouldn’t need the term ${^w\mathbf{l}}$ in the equation above). Hence, the light position in world coordinates can be obtained from the view matrix from any of the light subdivisions. Given the light position ${^w\mathbf{l}}$ and its rotation matrix $\mathbf{R}$ (which is actually related to the rotation of the projection plane we are dealing with; not with the light itself), the view matrix of the corresponding render target transforms a point in world coordinates $\mathbf{^wp}$ to the light reference $\mathbf{^lp}$ as follows:

${^l\mathbf{p}}=\mathbf{R}^{-1}\;{^w\mathbf{p}}-\mathbf{R}^{-1}\;{^w\mathbf{l}}.$

This equation tells us that the translational component of the view matrix is given by $\mathbf{t}=-\mathbf{R}^{-1}\;{^w\mathbf{l}}$. Therefore, the light position can be computed as ${^w\mathbf{l}}=-\mathbf{R}\;\mathbf{t}$, where $\mathbf{R}$ is the inverse of the rotational part of the view matrix.

Computing the light position like this certainly feels awfully complicated and unnecessary, considering that we would only need multiply the fragment position by the light’s view matrix in order to get the fragment position with respect to the light. However, if you consider that $\mathbf{R}$ has a trivial format for the light region (A), you will notice that computing the light position is much less costly:

$\mathbf{R}^{-1}= \left( \begin{array}{ccc}1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & -1 & 0 \end{array} \right), \; \mathbf{R} = \left( \begin{array}{ccc}1 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{array} \right)$

From this, we can derive ${^w\mathbf{l}}=\langle \mathbf{t}_x, -\mathbf{t}_z, \mathbf{t}_y \rangle$, where t is the translational part of the view matrix of the region (A).

One may argue that we still have to compute the fragment position with respect to the light in order to get its projective coordinates in the shadow map; however, as we don’t know which light region the fragment belongs to, we could end up by performing two view matrix multiplications (one to determine which light region we should use, and another to project the fragment to its actual light region).

## Draw the volumetric light

The impression of having a volumetric light can be given by making the light intensity at the fragments to saturate where it’s close to the light position, and make it quickly decay to an arbitrary gray shade where we’d like to have the edge of our light source. I achieved this by making the light intensity at the fragment to be the sum of two equations: a gaussian probability density function (also known as the bell curve), which gives the light shape, and a term that’s inversely proportional to the fragment distance squared, which is inspired by how light intensity actually decays:

$\mathbf{I}=\frac{1}{\sigma \sqrt{2\pi} }e^{-\frac{d^2}{2\sigma} } + \frac{\alpha}{1 + d^2}$

## Rendering multiple lights

In my first attempt to add multiple lights to the scene, I created multidimensional arrays of framebuffers and view matrices (one dimension for different lights and another for the light regions). This meant a hardcoded limit for the maximum amount of lights that could be rendered simultaneously. Also, this limit turned out to be around 5 lights in my machine, which is understandable considering the amount of framebuffers created in the process, but isn’t practical.

Therefore, I decided to have four render passes for each light: three to produce the shadow maps and one to render the scene. The scene render pass would take as input the shadow maps from its correspondent light and a buffer with the scene rendered by the previous light, on top of which it would accumulate its lighting effect. Since reading and writing to the same FBO doesn’t seem to result in a well defined behavior, I had two scene framebuffers, which I alternately used as input/output during this process.

Consider an example where there are three lights. In the first pass, the framebuffer A will be cleared to the ambient color and be given as input to the the shader responsible for rendering the scene with the first light. This scene will be written on the framebuffer B. On the second pass, the FBO B will be drawn on FBO A, and on top of it, the second light will be rendered. Finally, during the third pass, the image on FBO A will be drawn on the screen buffer, and on top of it, we will add the third and final light. Notice that on the last pass, there’s no need to draw on top of a framebuffer, as its output is the final result we want to display.

## Final thoughts

The implemented solution is generic in two senses: firstly, obstacles can be modeled by using any polygon, provided that they are extruded along the Z axis; secondly, there’s no theoretical limit for the amount of lights (although an ultimate limit will be imposed by the available GPU power in the form of a framerate drop-down).

In terms of shadow quality, this implementation is not concerned with smoothing the transition between shadowed and lit regions, which is still possible with a shadow mapping algorithm. Also, we can observe some undesired effects around an obstacle when the projection plane of a light region gets inside of it (which can happen when the light is either near or inside an obstacle). Although this looks like a precision issue with the depth map, I would investigate this problem further before proposing either a solution or a workaround for it.

Finally, notice that I haven’t regarded the possibility having the camera move in our scenario. Although allowing for camera motion would require some subtle motifications in the general algorithm, I assume it would be accompanied by a larger world with more obstacles, which would require some special optimizations in order to cull out irrelevant objects (which would need to take into account both the light position and the camera range of view). Also, the Z range for the projection matrix of each light region could be automatically adjusted so as to only consider relevant obstacles, reducing the effect of Z conflicts in the shadow map.