Configuring GDB ImageWatch to visualize custom buffer types

demo_window

gdb-imagewatch is a tool that can be used in conjunction with GDB to visualize in-memory buffers during debug sessions. In the most typical scenario, it’s used to visualize the contents of an OpenCV image being processed by the debugged application. To help with the debugging task, it has many features inspired by the Visual Studio ImageWatch plugin, such as pixel value visualization, support for multiple buffer types and dumping buffers to files; while also containing other helpful features such as buffer rotation.

Since the gdb-imagewatch plugin was created with extensibility in mind, it is very easy to adapt it so that it can support as many buffer types as needed by the programmer. In this post, I will show how to adapt the plugin so that it can display the contents of a custom C++ Image class.

Let’s begin with the sample code to be debugged:

// Compile this code with a C++11 compliant compiler; on
// gcc, use the -std=c++11 flag.
#include <iostream>

class Image {
public:
  Image(int width, int height, int channels) :
  width_(width), height_(height), channels_(channels) {
    contents_ = new unsigned char[width*height*channels];
  }

  // Forbid duplicating references to the same pointer
  Image(const Image&) = delete;
  Image& operator=(const Image&) = delete;

  // Swap pointers in case of move semantics
  Image(Image&& other) {
    *this = std::move(other);
  }
  Image& operator=(Image&& other) {
    width_ = other.width_;
    height_ = other.height_;
    channels_ = other.channels_;
    contents_ = other.contents_;

    other.contents_ = nullptr;

    return *this;
  }

  unsigned char& operator()(int row,
                            int col,
                            int channel) {
      return contents_[(row*width_ + col) * channels_ +
                        channel];
  }

  ~Image() {
    delete[] contents_;
  }

  int width() {
    return width_;
  }

  int height() {
    return width_;
  }

  int channels() {
    return width_;
  }

  unsigned char* contents() {
    return contents_;
  }

// The visibility of the fields is irrelevant to the
// plugin
private:
  unsigned char* contents_;
  int width_;
  int height_;
  int channels_;
};

int main() {
  Image gradient(50, 50, 1);
  for(int row = 0 ; row < gradient.height(); ++row) {
      for(int col = 0; col < gradient.width(); ++col) {
          gradient(row, col, 0) = col;
      }
  }

  return 0;
}

This sample defines a very simple Image class that behaves similarly to a single_ptr, and has no Region of Interest (ROI) functionality. In order to support it, we need to modify the script resources/giwscripts/giwtype.py. There are two functions in this file that need to be adapted in order for our new Image class to be recognized by gdb-imagewatch.

The first one is is_symbol_observable(symbol). When GDB pauses (e.g. when it hits a breakpoint), it will pass control to gdb-imagewatch, which in turn will inspect for all visible symbols (variables) for that particular context. The parameter symbol is a GDB object with information of that symbol (like name and type), and the function is_symbol_observable must return True if the received symbol belongs to a buffer type (and False otherwise). For the Image class presented in this example, the function must check if the type of the given symbol is either an Image or a pointer to one.

The second function is get_buffer_metadata(obj_name, picked_obj, debugger_bridge). Given that is_symbol_observable decided that any particular symbol was a buffer, the gdb-imagewatch plugin must also receive the relevant, buffer specific data for that symbol. This data can be obtained by fetching the values of the Image class fields from picked_obj, which can be done by dealing with picked_obj as if it were a dict. The function get_buffer_metadata(obj_name, picked_obj, debugger_bridge) must return a dict containing the elements below, in the following order:

  • display_name: Name of the buffer as it must appear in the ImageWatch window. Can be customized to also show its typename, for instance.
  • buffer: Pointer to beginning of buffer data
  • width: Buffer width. Must be int.
  • height: Buffer height. Must be int.
  • channels: Number of channels. Must be int.
  • type: Numeric type of each element contained in the buffer. Must be one of the following: GIW_TYPES_UINT8, GIW_TYPES_UINT16, GIW_TYPES_INT16, GIW_TYPES_INT32, GIW_TYPES_FLOAT32, GIW_TYPES_FLOAT64
  • row_stride: Number of pixels that exist between two vertically adjacent pixels. Also known as row step. Must be int.
  • pixel_layout: String describing how internal channels should be ordered for display purposes. The default value for buffers of 3 and 4 channels is 'bgra', and 'rgba' for images of 1 and 2 channels. This string must contain exactly four characters, and each one must be one of 'r', 'g', 'b' or 'a'. Repeated channels, such as 'rrgg' are also valid.

For the sample Image class described above, this is what the script resources/gdbiwtype.py should look like:

# -*- coding: utf-8 -*-

"""
This module is concerned with the analysis of each
variable found by the debugger, as well as identifying
and describing the buffers that should be plotted in the
ImageWatch window.
"""

from giwscripts import symbols


def get_buffer_metadata(obj_name,
                        picked_obj,
                        debugger_bridge):
  buffer = debugger_bridge.get_casted_pointer('char',
    picked_obj['contents_'])
  if buffer == 0x0:
    raise Exception('Received null buffer!')

  width = int(picked_obj['width_'])
  height = int(picked_obj['height_'])

  channels = int(picked_obj['channels_'])
  row_stride = int(width/channels)
  pixel_layout = 'rgba'

  # Our Image class only has one buffer type: unsigned
  # char
  type_value = symbols.GIW_TYPES_UINT8

  return {
    'display_name': str(picked_obj.type) + ' ' +
                    obj_name,
    'pointer': buffer,
    'width': width,
    'height': height,
    'channels': channels,
    'type': type_value,
    'row_stride': row_stride,
    'pixel_layout': pixel_layout
  }


def is_symbol_observable(symbol):
  """
  Returns true if the given symbol is of observable type
  (the type of the buffer you are working with). Make
  sure to check for pointers of your type as well
  """
  symbol_type = str(symbol.type)
  # In practice, you should use regular expressions to
  # handle all type modifiers gracefully
  return symbol_type == 'Image' or \
         symbol_type == 'Image *'

After making any changes to resources/giwscripts/giwtype.py, you must rebuild/reinstall the plugin in order to the changes to make effect.

Final thoughts

There’s nothing preventing you from making more sophisticated changes to the resources/giwscripts/giwtype.py script. For instance, there may be multiple buffer types that you’d like to be able to inspect simultaneously, and thus is_symbol_observable(symbol) could check for all of them, while get_buffer_info(picked_obj) could return different fields depending on picked_obj.type.

Advertisements

2D omnidirectional shadow mapping with Three.Js

A 2D scene is visualized by a camera placed on top of it. Black squares represent obstacles that light cannot transpose. In this scene, we have three light sources.

In this post, I will talk about how I implemented a 2D omnidirectional shadow mapping algorithm (that is, given a 2D world with obstacles defined by polygon and a set of point lights, render this world by taking into account the shadows produced by the interaction between the lights and the obstacles). You can check out a live demo here (WebGL support is required to run it).

Although I’ve already implemented 3D shadow maps for spot lights in the past, the present case opens up space for exciting new ways of approaching this problem. As far as I know, Three.Js’ shadow implementations are specifically designed for 3D scenes. Moreover, I haven’t seen any references to omnidirectional shadow maps in this library (which doesn’t mean it’s not there; it might just not be documented). Therefore, instead of using the default implementation of shadow maps, I decided to code my own and explore the new possibilities I had in mind, including the use of a 1D depth map.

A naive approach

Let’s start by the way it should not be done. The first idea that came to my mind was to use a single 1D depth map, where I would project the world into a circular 1D buffer, as if the camera was a circle inside the 2D plane. The equation for this is very simple: Given the light position \mathbf{l} and a vertex \mathbf{p}, its projected coordinate on such a buffer would be given by

\mathbf{p}'=\frac{\arctan(\mathbf{p}_y-\mathbf{l}_y,\mathbf{p}_x-\mathbf{l}_x)}{2\pi}.

This formula would go into the vertex shader, where it would receive the polygon vertices \mathbf{p} and forward the projected vertex \mathbf{p}' to the rasterizer. The GPU would, then, interpolate straight lines between the projected vertices (note that, for this to work, the models had to be rendered in wireframe mode, which I’ll talk about later) and the fragment shader would output the depth of each pixel to a framebuffer, which would later be used to determine whether or not a point in the map world lit.

Although very simple, this approach has a fundamental problem. As we are trying to model a circular entity (a hypothetical circular camera) with a non-circular buffer, the discontinuity between the edges is not something our GPU is aware of. This means that polygons that are projected in the discontinuous region of our circular buffer will be incorrectly rasterized, as shows the image below.

From cube mapping to triangle mapping

Another way to model a omnidirectional camera is by performing the 2D equivalent of the cube mapping algorithm: By breaking down the “field of view” of the light into several sub-regions and modelling each one as a perspective camera. Similarly to the circular buffer, we only need to render each sub-region into a 1D buffer. Also, given that performance was a concern, I decided to divide the light into only three different regions, since each sub-region will require one render pass.

Each perspective projection was defined by a Three.Js PerspectiveCamera object with a horizontal fov of 120°, and having the \mathbf{z} vector as its “up” vector. To move the lights around, we translate these “cameras” along with it. Since they cover the whole field of view of the light, there’s no need to rotate them. In my implementation, I set the aspect ratio to be 1:1 because the PerspectiveCamera constructor receives the vertical fov as its parameter, not the horizontal. Therefore, a 1:1 aspect would make both the horizontal and vertical field of view to be of 120°.

This is how I set up these cameras:

fboCameras = [
    new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance),
    new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance),
    new THREE.PerspectiveCamera(360/3, 1, 0.2, maxDistance)
];

// Projection matrix to be passed to each depth pass.
// Notice that all lights have the same projection matrix.
fboProjection = fboCameras[i].projectionMatrix;

for(var i = 0; i < 3; ++i) {
    fboCameras[i].position.set(0.0,0.0, 0.0);
    // These cameras are inside our plane, which lies
    // in the XY axes at Z=0. Therefore, our up vector
    // must be the Z vector.
    fboCameras[i].up.set(0,0,1);

    // Matrix that transforms points from world coordinates
    // to light coordinates (world2camera)
    fboView.push(fboCameras[i].matrixWorldInverse);
}

// Camera [0] corresponds to camera (A) in the image below
fboCameras[0].lookAt(new THREE.Vector3(0, 1, 0));
// Camera [1] corresponds to camera (B) in the image below
fboCameras[1].lookAt(
    new THREE.Vector3(-Math.sin(120*Math.PI/180),
                      Math.cos(120*Math.PI/180), 0)
);
// Camera [2] corresponds to camera (C) in the image below
fboCameras[2].lookAt(
    new THREE.Vector3(Math.sin(120*Math.PI/180),
                      Math.cos(120*Math.PI/180), 0)
);

Each camera will require one framebuffer where it will render its depth view of the world. I have created three buffers with width of 1024 each, and had their format set to RGBA (the reason for this will be explained later).

Rasterizing a flat polygon into a 1D buffer

As you might already know, a polygon has no thickness. As such, we can’t expect our polygons to be rasterized by the GPU during the light passes, since they would be projected as lines without any thickness. In order to solve this problem, we need to give thickness to our polygons, transforming their 1D projections from lines to rectangles. I’ve thought of two ways to achieve that:

  • Render the primitives in wireframe mode. This is not very effective, as line joints may not be connected after rendering.
  • Extrude the polygons. In other words, transform them into prisms.

Considering the potential problem of unconnected lines related to the first solution, I decided to extrude all my polygons (in my demo, it was just a matter of using cubes instead of squares).

To float textures or to fake float?

In a desktop implementation of OpenGL, it would be possible to make the output of each light pass to be written to a float texture, which is highly desirable as we are rendering the depth of our scene relative to each light. Unfortunately, floating point textures are not a standard webgl feature. They require the extension OES_texture_float, which might not be available on some clients.

Instead of using this extension and running the risk of making a demo that’s not playable by some people, I decided to pack the (floating point) depth into an RGBA pixel, which is conveniently 32bits wide. There is a caution that needs to be taken for this to work. From the perspective of the Fragment Shader, each component of gl_FragColor is a float; however, each of them will be converted to an unsigned byte to be stored in the RGBA texture. Moreover, these components are assumed to be in the range [0,1], which will be mapped to the [0, 255] range when converted to bytes. Therefore, in order to pack a depth value inside an RGBA texel, we need to make sure that it can be broken down into four floats, each one lying in the range [0, 1] but with enough precision that it will not lose information after converted to a byte.

An easy way to comply with the range restriction is by using the NDC z coordinate of your vertex, as it is (non-linearly) mapped to the [0,1] range. You can get this value from the gl_FragCoord variable inside your fragment shader program. Provided that a normalized depth is used, it can be broken down into four components (that will be stored as four bytes in the GPU), and can be recombined back into a single float during the scene render pass.

To pack a float as a vec4 in the fragment shader, I used this function proposed by Z Goddard:

vec4 pack_depth( const in float depth ) {
    const vec4 bit_shift = vec4( 256.0 * 256.0 * 256.0,
                                 256.0 * 256.0,
                                 256.0, 1.0 );
    const vec4 bit_mask  = vec4( 0.0,
                                 1.0 / 256.0,
                                 1.0 / 256.0,
                                 1.0 / 256.0 );
    vec4 res = fract( depth * bit_shift );
    res -= res.xxyz * bit_mask;
    return res;

}

To unpack a float that’s packed in a vec4, I took this function from ThreeJS’ shadow mapping library:

float unpackDepth( const in vec4 rgba_depth ) {

    const vec4 bit_shift = vec4( 1.0 /
                                 ( 256.0 * 256.0 * 256.0 ),
                                 1.0 / ( 256.0 * 256.0 ),
                                 1.0 / 256.0, 1.0 );
    float depth = dot( rgba_depth, bit_shift );
    return depth;

}

Second render pass: A light region for each fragment

During the scene render pass, we need to determine from which framebuffer each fragment will have its depth compared to. Just like in the shadow mapping algorithm, if the depth of a fragment is smaller than the value stored in its corresponding framebuffer, then this fragment is lit; otherwise, it is not. Given the fragment position \mathbf{f} and the light position {^w\mathbf{l}}, the index of the framebuffer that it should sample from can be determined by:

\lfloor \frac{\arctan(\mathbf{f}_y-{^w\mathbf{l}}_y, \mathbf{f}_x-{^w\mathbf{l}}_x)+7\pi/6}{4\pi/6} +1\rfloor \mod 3

This equation assumes that you are using three framebuffers (which is our case, since we have one render target for each light). It also assumes that the subdivisions of the light are setup in such a way that region (A) has index 0, (B) is 1 and (C) has index 2 in the render target vector.

The light position {^w\mathbf{l}} must be specified in world coordinates (alternatively, the fragment position \mathbf{f} could be specified with respect to the light, which means we wouldn’t need the term {^w\mathbf{l}} in the equation above). Hence, the light position in world coordinates can be obtained from the view matrix from any of the light subdivisions. Given the light position {^w\mathbf{l}} and its rotation matrix \mathbf{R} (which is actually related to the rotation of the projection plane we are dealing with; not with the light itself), the view matrix of the corresponding render target transforms a point in world coordinates \mathbf{^wp} to the light reference \mathbf{^lp} as follows:

{^l\mathbf{p}}=\mathbf{R}^{-1}\;{^w\mathbf{p}}-\mathbf{R}^{-1}\;{^w\mathbf{l}}.

This equation tells us that the translational component of the view matrix is given by \mathbf{t}=-\mathbf{R}^{-1}\;{^w\mathbf{l}}. Therefore, the light position can be computed as {^w\mathbf{l}}=-\mathbf{R}\;\mathbf{t}, where \mathbf{R} is the inverse of the rotational part of the view matrix.

Computing the light position like this certainly feels awfully complicated and unnecessary, considering that we would only need multiply the fragment position by the light’s view matrix in order to get the fragment position with respect to the light. However, if you consider that \mathbf{R} has a trivial format for the light region (A), you will notice that computing the light position is much less costly:

\mathbf{R}^{-1}= \left( \begin{array}{ccc}1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & -1 & 0 \end{array} \right), \; \mathbf{R} = \left( \begin{array}{ccc}1 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{array} \right)

From this, we can derive {^w\mathbf{l}}=\langle \mathbf{t}_x, -\mathbf{t}_z, \mathbf{t}_y \rangle, where t is the translational part of the view matrix of the region (A).

One may argue that we still have to compute the fragment position with respect to the light in order to get its projective coordinates in the shadow map; however, as we don’t know which light region the fragment belongs to, we could end up by performing two view matrix multiplications (one to determine which light region we should use, and another to project the fragment to its actual light region).

Draw the volumetric light

The impression of having a volumetric light can be given by making the light intensity at the fragments to saturate where it’s close to the light position, and make it quickly decay to an arbitrary gray shade where we’d like to have the edge of our light source. I achieved this by making the light intensity at the fragment to be the sum of two equations: a gaussian probability density function (also known as the bell curve), which gives the light shape, and a term that’s inversely proportional to the fragment distance squared, which is inspired by how light intensity actually decays:

\mathbf{I}=\frac{1}{\sigma \sqrt{2\pi} }e^{-\frac{d^2}{2\sigma} } + \frac{\alpha}{1 + d^2}

As it decays very quickly, the blue curve is responsible for rendering the light outline, while the red curve is what mostly illuminates the environment.

Rendering multiple lights

In my first attempt to add multiple lights to the scene, I created multidimensional arrays of framebuffers and view matrices (one dimension for different lights and another for the light regions). This meant a hardcoded limit for the maximum amount of lights that could be rendered simultaneously. Also, this limit turned out to be around 5 lights in my machine, which is understandable considering the amount of framebuffers created in the process, but isn’t practical.

Therefore, I decided to have four render passes for each light: three to produce the shadow maps and one to render the scene. The scene render pass would take as input the shadow maps from its correspondent light and a buffer with the scene rendered by the previous light, on top of which it would accumulate its lighting effect. Since reading and writing to the same FBO doesn’t seem to result in a well defined behavior, I had two scene framebuffers, which I alternately used as input/output during this process.

Consider an example where there are three lights. In the first pass, the framebuffer A will be cleared to the ambient color and be given as input to the the shader responsible for rendering the scene with the first light. This scene will be written on the framebuffer B. On the second pass, the FBO B will be drawn on FBO A, and on top of it, the second light will be rendered. Finally, during the third pass, the image on FBO A will be drawn on the screen buffer, and on top of it, we will add the third and final light. Notice that on the last pass, there’s no need to draw on top of a framebuffer, as its output is the final result we want to display.

multiple_lights_diagram

Final thoughts

The implemented solution is generic in two senses: firstly, obstacles can be modeled by using any polygon, provided that they are extruded along the Z axis; secondly, there’s no theoretical limit for the amount of lights (although an ultimate limit will be imposed by the available GPU power in the form of a framerate drop-down).

In terms of shadow quality, this implementation is not concerned with smoothing the transition between shadowed and lit regions, which is still possible with a shadow mapping algorithm. Also, we can observe some undesired effects around an obstacle when the projection plane of a light region gets inside of it (which can happen when the light is either near or inside an obstacle). Although this looks like a precision issue with the depth map, I would investigate this problem further before proposing either a solution or a workaround for it.

Finally, notice that I haven’t regarded the possibility having the camera move in our scenario. Although allowing for camera motion would require some subtle motifications in the general algorithm, I assume it would be accompanied by a larger world with more obstacles, which would require some special optimizations in order to cull out irrelevant objects (which would need to take into account both the light position and the camera range of view). Also, the Z range for the projection matrix of each light region could be automatically adjusted so as to only consider relevant obstacles, reducing the effect of Z conflicts in the shadow map.

A mathematical overview of UVA 855

Lunch in Grid City, or UVA 855, is a problem created for the Portuguese regional of the ACM-ICPC. The premise of the problem is that you are given a set of coordinates S=\{\mathbf{p}_1,\mathbf{p}_2,\dots \mathbf{p}_n\}, n<=50000 in a grid, and have to find a coordinate \mathbf{c} that minimizes the sum of the Manhattan distances between all points in S and \mathbf{c}. Formally:

\mathbf{c}=\displaystyle{\arg\!\min}_{p} \displaystyle\sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|+|\mathbf{p}_{i_y}-\mathbf{p}_y|}

A naive approach would be to compute the distances of all points in S to all possible points in the grid, and pick up the point with smaller score. This has complexity \mathcal{O}(san), where s and a are the height and width of the grid, respectively. Since 1 \leq s,a \leq 1000, this approach would obviously not work.

Simplifying the definition

The summation that defines the score for a candidate \mathbf{p} can be broken down into two separate summations. This simplifies the problem and helps us to better analyse it:

\displaystyle\mathbf{c}={\arg\!\min}_{p} \sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|} + \sum_{i=1}^{n}{|\mathbf{p}_{i_y}-\mathbf{p}_y|}

What this equation tells us is that we can solve the problem by separately optimizing each axis, since they are independent. Let’s focus on the x axis for the sake of clarity:

neighbors

Each component of p belongs to S

The problem specification gives a valuable hint: it gives the impression that \mathbf{p} \in S. Although this is not entirely true, we can prove that \mathbf{p}_x = \mathbf{p}_{i_x} | \mathbf{p}_i \in S and \mathbf{p}_y = \mathbf{p}_{j_y} | \mathbf{p}_j \in S.

Suppose that there is a potential solution \mathbf{p} such that \mathbf{p}_x=\mathbf{p}_{i_x} | \mathbf{p}_i \not\in S. The score on this particular dimension is given by:

\displaystyle\alpha={\!\min} \sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|} (I)

Let k be the amount of points to the left of \mathbf{p}, and l the amount to its right. If we take the solution to be the closest point on its left, the new score would be:

\displaystyle\alpha'_l=\alpha-k\Delta x_l + l\Delta x_l=\alpha-(k-l)\Delta x_l ,

where \Delta x_l is the difference between the x coordinate from the closest point to the left and \mathbf{p}_x. Here we can notice that, if k \geq l, then \alpha'_l \leq \alpha, which means the point to the left would be preferred over our hypothetical solution. Moreover, if k < l, then picking up a point to the right would give us the new (and better) score

\displaystyle\alpha'_r=\alpha-l\Delta x_r + k\Delta x_r=\alpha-(l-k)\Delta x_r ,

where \Delta x_r is the difference computed by using the closest point to the right.

From this analysis, we conclude that our search space for \mathbf{p}_x is reduced to the x coordinates of all points \mathbf{p}_i \in S. The same argument can be made for \mathbf{p}_y (where its search space is the y component of all points in S).

The solution as a median

Let \mathbf{p}_x be the x value that minimizes the distances on the x axis. Its score \alpha is computed as given by (I). If we were to pick up the left neighbor of \mathbf{p}_x, the new score would be:

\alpha'_l=\alpha-k\Delta x_l + (l+1)\Delta x_l=\alpha-(k-l-1)\Delta x_l.

Since \alpha'_l>\alpha (otherwise, \mathbf{p}_x wouldn’t be a solution), we can say that:

\alpha-(k-l-1)\Delta x_l>\alpha,\\(k-l-1)\Delta x_l<0,\\k<l+1,\\k \leq l. (II)

This tells us that the amount of points by the left of \mathbf{p} should not be larger than the amount of points by its right. Another relation between $k$ and $l$ can be found by analyzing the score \alpha'_r obtained by picking up the point to the right of \mathbf{p}:

\alpha'_r=\alpha-l\Delta x_r + (k+1)\Delta x_r=\alpha-(l-k-1)\Delta x_r.

In this case, \alpha'_r \geq \alpha. By exploring this relation:

\alpha-(l-k-1)\Delta x_r \geq \alpha,\\ (l-k-1)\Delta x_r \leq 0,\\ k \geq l-1. (III)

By combining (II) and (III), we have that l-1 \leq k \leq l. If the total amount of points n is odd, then k=l, otherwise |l-k| would be greater than 1. In the other hand, if n is even, then k can’t be equal to l. Therefore, k=l-1.

In general terms, if you have a sorted collection of x coordinates in X, the coordinate \mathbf{p}_x that gives the best score α is given by

\mathbf{p}_x=X[{\lfloor\frac{n-1}{2}\rfloor}],

considering an indexation that starts at 0. This is essentially the median of the collection X. Once again, the same argument also applies to the y coordinates.

Final thoughts

We conclude that, in order to solve the Lunch in Grid City problem, we have to:

  • Create two different vectors, one with all x coordinates and another with all y coordinates of the given points in S (referred to as the list of friends in the problem description);
  • Sort each vector;
  • Return \langle X[{\lfloor\frac{n-1}{2}\rfloor}], Y[{\lfloor\frac{n-1}{2}\rfloor}] \rangle .

This solution has complexity O(n\log n), where the bottleneck is the sorting algorithm.

Loading a skeletal animation? Linearize it!

When I first studied the principles behind skeletal animations, I decided to implement it by interpolating between poses on CPU and by having the skinning on the vertex shader for performance reasons. I was thrilled to see some computational weight being transferred from CPU to GPU, where it could be handled with heavy parallelization.

As performance was my main concern, I also tried to squeeze the most of my CPU in the pose interpolation process by avoiding conditional logic (which, in turn, avoids branch mispredictions), interpolating quaternions with LERP instead of SLERP (the animation difference is not noticeable if you have a large amount of samples), normalizing vectors with the invsqrt procedure and using function pointers to discriminate between bones that do require interpolation from the ones that do not — which, eventually, I realized was not such a good idea: although a procedure call may be decoded in the fetch pipeline stage, an indirect function call depends on an address that might not be available on cache, which could cause the pipeline to stall for a (very slow) memory fetch.

When I wrote my first implementation, I found out that there was a myriad of possible interpolation methods, which would have been chosen by the designer in the modelling software: linear, constant, bezier curves and so on. It didn’t sound like a good idea to implement all those methods, as this would clutter my code with features that could fit very well in a large game engine, but not in my time constrained demo. Also, implementing some of these techniques would defeat my purpose of having a blazing fast skeletal animation demo.

The simpler, the better

Facing this issue, I decided to approximate whatever curve the designer used to animate the model into a series of line segments, with equally spaced joints. This way, I would only have to implement a linear interpolation method, and at the same time, would be capable of loading and animating any model.

functions

As I was using Autodesk’s FBX SDK, sampling the interpolation function was straightforward, since it provides a function to sample poses from each bone. My core sampling function (which only had to be run once, after loading the model) looks like this:

for (int i = 0; i < numSamples; ++i) {
    // Fetches keyframe
    Eigen::Map<Matrix4d> pose_double((double*)
                         evaluator->GetNodeLocalTransform
                                      (node, sampleTime));
    Matrix4f pose = pose_double.cast<float>();

    // Grabs translation
    Vector3f translation = pose.block<3,1>(0,3);

    // Grabs scale
    Vector3f scale(pose.block<3,1>(0,0).norm(),
                    pose.block<3,1>(0,1).norm(),
                    pose.block<3,1>(0,2).norm());

    // Grabs rotation
    float quat_data[4];
    Map<Quaternionf> rot_quat(quat_data);
    Map<Vector4f> rot_vec4(quat_data);
    rot_quat = Quaternionf(pose.block<3,3>(0,0));

    // Pushes key frames and their corresponding time
    translations.push_back(translation);
    scales.push_back(scale);
    attitudes.push_back(rot_vec4);

    sampleTime += skipTime;
}

Interpolating between keyframes

After transforming the animation function of all bones into a sequence of linear functions, your interpolation code can be greatly simplified. This is how my implementation looks like:

// Compute current and next keyframe indices
int current_index = int(t*framerate);
int next_index = (current_index+1)%translations.size();

// Compute interpolation factor
float alpha = t*framerate-float(current_index);

// Interpolate translation
Vector3f translation = translations[current_index]+
                       (translations[next_index]-
                        translations[current_index])*alpha;

// Interpolate rotation
Vector4f attitude = attitudes[current_index]+
                    (attitudes[next_index]-
                     attitudes[current_index])*alpha;
attitude = attitude*invsqrt(attitude[0]*attitude[0] +
                            attitude[1]*attitude[1] +
                            attitude[2]*attitude[2] +
                            attitude[3]*attitude[3]);

// Interpolate scale
Vector3f scale = scales[current_index]+
                 (scales[next_index]-
                  scales[current_index])*alpha;

// Return matrix
output << 0,0,0, translation[0],
          0,0,0, translation[1],
          0,0,0, translation[2],
          0,0,0, 1;

Matrix3f rs = Map<Quaternionf>(attitude.data()).matrix();
rs.col(0) *= scale[0];
rs.col(1) *= scale[1];
rs.col(2) *= scale[2];
output.block<3,3>(0,0) = rs;

Obviously, this ignores a large amount of concepts (some of which are FBX-specific), but none fall within the scope of this article.

Final thoughts

It turns out that loading an FBX model takes quite some time, and even more so if you linearize your animations. This may also be true for many other model formats out there, including .blend. That’s one of the reasons why it’s good to have a custom model format to ship with your games, designed in such a way that you don’t need to pre-process your model before it’s ready to be rendered.

One thing I plan to study deeper is how this linearization process plays with animation blending. I currently expect this to be straightforward, but some hardcore optimization can make it a little bit tricky.

Finally, in this post I overlooked two very important libraries, Eigen and the FBX SDK. In the future, I will be writing more detailed posts about each of them.

Superior texture filter for dynamic pixel art upscaling

This article is an extension to a previously discussed topic: How to perform large texture magnification for pixelated games without aliasing between texels. My first post described a method to achieve this by assuming that, during the whole time, your polygons remain approximately at the same size after they are rendered.

From left to right, the knight is rendered at: Close distance with 15k pixels²/texels; medium distance with 49 pixels²/texels; far distance with 9 pixels²/texels. In all frames, α=0.1.

Constant pixels with variable α

The ideal value for the α parameter will depend on how many pixels are filling each texel after the texture is rendered. To have a smooth transition between texels without blurring them under extreme magnifications, nor aliasing them under small magnifications, we can have the number of pixels at the edge of each texel being constant, regardless of how far the polygon is.

teste

The derivative \frac{wdu}{dx} gives how many texels there are for each screen pixel, which is smaller than 1.0 when the texture is magnified. Therefore, if you want to have k pixels at the edge of your texels, your α must be k\frac{wdu}{dx}. Since we also have the v texture coordinate, the same will apply to it, which means α will be a 2D vector:

k\langle\frac{wdu}{dx},\frac{hdv}{dy}\rangle

This should give you an antialiased magnification that works independently of the polygon position, as shows the figure below:

left to right
From left to right, the knight is rendered at: Close distance with 15k pixels²/texels; medium distance with 49 pixels²/texels; far distance with 9 pixels²/texels. In all frames, k=0.7.

WebGL implementation

The derivatives with respect to screen space can be calculated by functions dFdx and dFdy, but in OpenGL ES/WebGL they require the extension GL_OES_standard_derivatives. As of Three.js r84, this extension can be activated by setting the derivatives property of your shader material object to true.

Our vertex shader remains the same as in our previous post:

varying vec2 vUv;

void main()
{
  const float w = 32.0;
  const float h = 64.0;

  vUv = uv * vec2(w, h);
  gl_Position = projectionMatrix * modelViewMatrix *
                vec4(position, 1.0 );
}

The few modifications will come in the fragment shader:

precision highp float;

varying vec2 vUv;
uniform sampler2D texSampler;

void main(void) {
  const float w = 32.0;
  const float h = 64.0;

  // I chose this k because it looked nice in
  // my demo
  vec2 alpha = 0.7 * vec2(dFdx(vUv.x),
                          dFdy(vUv.y));
  vec2 x = fract(vUv);
  vec2 x_ = clamp(0.5 / alpha * x, 0.0, 0.5) +
            clamp(0.5 / alpha * (x - 1.0) + 0.5,
                  0.0, 0.5);

  vec2 texCoord = (floor(vUv) + x_) / vec2(w, h);
  gl_FragColor = texture2D(texSampler, texCoord);
}

To see the WebGL demo in action, click here.

Final thoughts

We have presented a more general approach to the manual texture filter previously discussed. Despite its advantage of automatically detecting the best α for the polygon being rendered, it depends on an OpenGL ES extension that might not be available on certain devices. Add that to the performance impact of the dFdx/dFdy functions, the previous method will certainly be preferred should your polygon not change in size during your game.

It’s also important to notice that, under small magnifications, the texture can look a bit blurry depending on the value k that was chosen.

Manual texture filtering for pixelated games in WebGL

cover
Left: Linear filtering; middle: nearest neighbor sampling; right: custom texture filter. The knight is part of the Wesnoth Frankenpack. Click here to see texture filter demo in action.

If you’re writing a pixelated game that performs large magnification of textures, you’re probably using the nearest texture filter so your game will look like the knight on the middle instead of the one to the left.

The problem with nearest texel sampling is that it’s susceptible to aliasing if the texels are not aligned with the screen pixels, which can happen if you apply transformations such as rotation and shearing to your textured polygons. Ideally, we would like to have smooth transitions between neighboring texels in the final image, like the knight on the right in the image above.

Manual texture filtering

One way to achieve this result is by performing linear interpolation between texels on the edges of each texel in the fragment shader, but sampling the nearest texel everywhere else. A simple way to achieve this is by activating WebGL’s linear filtering, and playing with UV coordinates so that the graphics card will perform the actual interpolation between texels for you.

grid

We know that the texture coordinates t’ in an nearest filter can be calculated by:

\textbf{t}'=\frac{\left \lfloor{\langle w,h\rangle\textbf{t}}\right \rfloor+\langle0.5,0.5\rangle}{\langle w,h\rangle},

where w and h are the texture width and height, respectively. The offset makes our fragment shader sample at the center of each texel, which is important since we have enabled linear filtering. In order to have smooth transitions between texels, this offset should be replaced by a function that increases linearly at the margin of the texel, remains constant at its “blocky” region (with a value of 0.5) and then increases to 1.0 on the opposite margin of the texel, like this:

graph

By doing this with the UV coordinates, the video card will automatically interpolate your texels whenever they are sampled at their margins, effectively producing an anti-aliased blocky effect like the knight shown above.

Closed formula for the offset function

The offset function displayed in the plot above could be easily implemented with a bunch of conditional logic, but I personally steer from conditional statements on GLSL programs for performance and legibility reasons. Having said that, our offset function can be formulated by the sum of two clamped linear functions, illustrated below:

graphderivation

Here, x is the fractional part of the texture coordinate u after it is scaled from [0, 1] to [0, w]. That is x=\text{fract}(uw). The same logic also applies to the texture coordinate v, which leads to the following formula:

\textbf{x}_{uv}=\text{fract}(\textbf{t}\langle w,h\rangle)

Meaning of the α parameter

The value of α determines how smooth will be the transition between texels, and it must be in the range ]0, 0.5[. For α=0, the transition between texels will be crisp, since such a value leaves no room for linear interpolation — that is, the final result will be the equivalent of the nearest filter. For α=0.5, every coordinate inside the texels will be subject to linear interpolation, equivalently to just using a linear filter. The ideal value for α really depends on how stretched your textures will be: the larger the stretching, the smaller should be your α.

Ideally, your program should automatically determine the best value of α given the depth of the fragment and your camera parameters (including the canvas size), but that’s something I’ll talk about in the future.

Putting it all together

The final equation that gives us uv coordinates that smooths a magnified, pixelated texture is as follows:

\textbf{t}'=\frac{\left \lfloor{\langle w,h\rangle\textbf{t}}\right \rfloor+\textbf{x}'_{uv}}{\langle w,h\rangle},

where \textbf{x}'_{uv}=\text{clamp}(\frac{\textbf{x}_{uv}}{2\alpha},0,0.5)+\text{clamp}(\frac{\textbf{x}_{uv}+\langle 0.5, 0.5 \rangle}{2\alpha},0,0.5).

The term \langle w,h\rangle\textbf{t} can be computed on the vertex shader. If you’re on OpenGL, you could also try to use the flat modifier to disable interpolating it, and see if that gives any performance boost. In WebGL GLSL, the vertex and fragment shaders for our filter is as follows:

varying vec2 vUv;

void main()
{
  const float w = 32.0;
  const float h = 64.0;

  vUv = uv * vec2(w, h);
  gl_Position = projectionMatrix * modelViewMatrix *
                vec4(position, 1.0 );
}
precision highp float;

varying vec2 vUv;
uniform sampler2D texSampler;

void main(void) {
  const float w = 32.0;
  const float h = 64.0;

  // I chose this alpha because it looked nice in
  // my demo
  vec2 alpha = vec2(0.07);
  vec2 x = fract(vUv);
  vec2 x_ = clamp(0.5 / alpha * x, 0.0, 0.5) +
            clamp(0.5 / alpha * (x - 1.0) + 0.5,
                  0.0, 0.5);

  vec2 texCoord = (floor(vUv) + x_) / vec2(w, h);
  gl_FragColor = texture2D(texSampler, texCoord);
}

Notice that some attributes and uniforms are not declared in my code. That’s because I used THREE.JS to make my demo.

Final thoughts

The discussed method is indicated if your scene matches the following two criteria:

  • You perform large magnification of your texture: The knight texture used in the header of this article has 32×64 pixels, but was scaled up to cover a rectangle of size 264×413 pixels (before the rotation was applied). That’s the equivalent of taking each texel and making it more than 8 times bigger. In cases like this, linear filtering will just make the texture blurry, and nearest filtering might introduce unwanted aliasing.
  • Your objects undergo rotation or shearing: There’s a reason why I rotated the knight shown in the header of this article: if the texels were aligned with the screen pixels, then there would be no aliasing at all and a nearest filter would suffice for my purpose.

Update

This discussion has been extended here, where I talk about a way to automatically compute the best α independently of the polygon position.

Custom shaders with Three.JS: Uniforms, textures and lighting

If you’re familiar to WebGL and GLSL programming and have started using three.js, you’ll eventually run into a situation where you want to code your own shader, but at the same time use the resources that this library provides you with. In this post, I’ll show you how to setup a custom shader with a three.js geometry, pass it your own uniforms, bind a texture to a particular uniform and receive all lights that you’ve added to the scene.

Source code for demo

The source code below has been tested with three.js r84 and can be visualized here.

<html>
<head>
<script src="three.min.js"></script>
<script src="render.js"></script>
<script id="vertShader" type="shader">
varying vec2 vUv;
varying vec3 vecPos;
varying vec3 vecNormal;
 
void main() {
  vUv = uv;
  // Since the light is in camera coordinates,
  // I'll need the vertex position in camera coords too
  vecPos = (modelViewMatrix * vec4(position, 1.0)).xyz;
  // That's NOT exacly how you should transform your
  // normals but this will work fine, since my model
  // matrix is pretty basic
  vecNormal = (modelViewMatrix * vec4(normal, 0.0)).xyz;
  gl_Position = projectionMatrix *
                vec4(vecPos, 1.0);
}
</script>
<script id="fragShader" type="shader">
precision highp float;
 
varying vec2 vUv;
varying vec3 vecPos;
varying vec3 vecNormal;
 
uniform float lightIntensity;
uniform sampler2D textureSampler;

struct PointLight {
  vec3 color;
  vec3 position; // light position, in camera coordinates
  float distance; // used for attenuation purposes. Since
                  // we're writing our own shader, it can
                  // really be anything we want (as long as
                  // we assign it to our light in its
                  // "distance" field
};

uniform PointLight pointLights[NUM_POINT_LIGHTS];
 
void main(void) {
  // Pretty basic lambertian lighting...
  vec4 addedLights = vec4(0.0,
                          0.0,
                          0.0,
                          1.0);
  for(int l = 0; l < NUM_POINT_LIGHTS; l++) {
      vec3 lightDirection = normalize(vecPos
                            - pointLights[l].position);
      addedLights.rgb += clamp(dot(-lightDirection,
                               vecNormal), 0.0, 1.0)
                         * pointLights[l].color
                         * lightIntensity;
  }
  gl_FragColor = texture2D(textureSampler, vUv)
                 * addedLights;
}
</script>
</head>
<body style="margin: 0px;" onload="init()"></body>
</html>

And this is the source for the render.js file:

// standard global variables
var scene, camera, renderer, textureLoader, light;

// Character 3d object
var character = null;

// FUNCTIONS
function init() {
  // SCENE
  scene = new THREE.Scene();
  textureLoader = new THREE.TextureLoader();

  // CAMERA
  var SCREEN_WIDTH = window.innerWidth;
  var SCREEN_HEIGHT = window.innerHeight;
  var VIEW_ANGLE = 45;
  var ASPECT = SCREEN_WIDTH / SCREEN_HEIGHT;
  var NEAR = 0.1;
  var FAR = 1000;
  camera = new THREE.PerspectiveCamera(VIEW_ANGLE, ASPECT,
                                       NEAR, FAR);
  scene.add(camera);
  camera.position.set(0,0,5);
  camera.lookAt(scene.position);

  // RENDERER
  renderer = new THREE.WebGLRenderer({
    antialias:true,
    alpha: true
  });
  renderer.setSize(SCREEN_WIDTH, SCREEN_HEIGHT);
  var container = document.body;
  container.appendChild( renderer.domElement );

  // Create light
  light = new THREE.PointLight(0xffffff, 1.0);
  // We want it to be very close to our character
  light.position.set(0.0, 0.0, 0.1);
  scene.add(light);

  // Create character
  character = buildCharacter();
  scene.add(character);

  // Start animation
  animate();
}

var buildCharacter = (function() {
  var _geo = null;

  // Share the same geometry across all planar objects
  function getPlaneGeometry() {
    if(_geo == null) {
      _geo = new THREE.PlaneGeometry(1.0, 1.0);
    }

    return _geo;
  };

  return function() {
    var g = getPlaneGeometry();
    var creatureImage = textureLoader.load('texture.png');
    creatureImage.magFilter = THREE.NearestFilter;

    var mat = new THREE.ShaderMaterial({
        uniforms: THREE.UniformsUtils.merge([
            THREE.UniformsLib['lights'],
            {
                lightIntensity: {type: 'f', value: 1.0},
                textureSampler: {type: 't', value: null}
            }
        ]),
        vertexShader: document.
                      getElementById('vertShader').text,
        fragmentShader: document.
                        getElementById('fragShader').text,
        transparent: true,
        lights: true
    });
    // THREE.UniformsUtils.merge() call THREE.clone() on
    // each uniform. We don't want our texture to be
    // duplicated, so I assign it to the uniform value
    // right here.
    mat.uniforms.textureSampler.value = creatureImage;

    var obj = new THREE.Mesh(g, mat);

    return obj;
  }
})();

function animate() {
  // Update light profile
  var timestampNow = new Date().getTime()/1000.0;
  var lightIntensity = 0.75 +
                       0.25 * Math.cos(timestampNow *
                                       Math.PI);

  character.material.uniforms
           .lightIntensity.value = lightIntensity;
  light.color.setHSL(lightIntensity, 1.0, 0.5);

  // Render scene
  renderer.render(scene, camera);
  requestAnimationFrame(animate);
}

There’s nothing special about the init() function: It sets up the webgl renderer, creates the scene, the camera and a point light and adds an object to the scene. The function getPlaneGeometry() just instantiates a THREE.PlaneGeometry, which contains vertices, normals, and texture coordinates. The anonymous function on line 61 will create our mesh object. In three.js, a mesh is a “renderable” type formed by combining a geometry (from which three.js will create a vertex buffer object) and a material (which is basically a shader with metadata, such as uniforms). Here, we create a ShaderMaterial, which allows us to specify our custom vertex and fragment shaders. The “uniforms” parameter takes an object with the format:

UNIFORM_NAME: {
  type: UNIFORM_TYPE,
  value: UNIFORM_VALUE
}

For a guide on accepted formats, consult here. In this example, I created a float uniform named lightIntensity and a texture sampler uniform named, well, textureSampler.

Finally, on function animate I update the color uniform to a simple sinusoidal function. Three.js will automatically send the current value of each uniform to the GPU every time you call render.

Vertex shader and implicit attributes and uniforms

If you paid close attention to the vertex shader code, you’ve probably noticed that the declaration for the “position” attribute seems to be missing. The same also applies to the projectionMatrix and modelViewMatrix. In fact, three.js modifies the vertex shader given to it by appending the declaration of several attributes and uniforms. To understand the fields that are automatically created by three.js, refer to the WebGLProgram documentation.

Texture Sampler Uniform

In three.js, texture sampler uniforms have a type "t" and must be assigned to THREE.Texture objects. If your texture is an image, there’s a helper function that simplifies this task:

// It accepts other formats too
var loader = new THREE.TextureLoader;
loader.load("path/to/image.png")
We will draw this guy inside our polygon. Kudos for Stephen
We will draw this guy inside our polygon. Kudos for Stephen “Redshrike” Challener and William.Thompsonj

After I’ve chosen the texture image (remember to use an image whose dimensions are powers of two), our shader material is initialized in the mat variable.

Here I’ve loaded the image file and created a THREE.Texture object. I also set the magnification filter to nearest (I want my texture to be pixelated as I scale it up). More information on the filter constants used by three.js can be found on the texture constants documentation. You can also see the fields for the THREE.Texture object here.

Another important property introduced is the transparent flag, since my texture has some regions with alpha=0. By setting the transparent flag to true, three.js will automatically call gl.enable(GL_BLEND) when this object is about to be drawn. It also defers the rendering of transparent objects to after opaque objects are drawn. Also, three.js draws these objects ordered from the farthest to the closest ones.

Now that we’ve seen how to create a shader material, let’s take a look at the shader code itself.

Dealing with lights

In order to deal with light objects in your shader, you have to manually setup the required uniforms both on the material object and in your shader code.

Setting up the material

The first thing you have to do is to enable the lights flag in the material object by setting this field to true.

You also have to include the required light objects in the uniform list of your material object. Since three.js already has a public field with all light uniforms (and there’s a lot of them) in THREE.UniformsLib['lights'], you can merge them to your uniform object with the function THREE.UniformUtils.merge:

    var mat = new THREE.ShaderMaterial({
        uniforms: THREE.UniformsUtils.merge([
            THREE.UniformsLib['lights'],
            {
                lightIntensity: {type: 'f', value: 1.0},
                textureSampler: {type: 't', value: null}
            }
        ]),
        vertexShader: document.
                      getElementById('vertShader').text,
        fragmentShader: document.
                        getElementById('fragShader').text,
        transparent: true,
        lights: true
    });
    // THREE.UniformsUtils.merge() call THREE.clone() on
    // each uniform. We don't want our texture to be
    // duplicated, so I assign it to the uniform value
    // right here.
    mat.uniforms.textureSampler.value = creatureImage;

Create some lights

In order to test our demo, we create a point light right in front of our object. This must be done before the character creation, making sure that THREE.UniformsLib['lights'] contain the required light objects.

  light = new THREE.PointLight(0xffffff, 1.0);
  // We want it to be very close to our character
  light.position.set(0.0, 0.0, 0.1);
  scene.add(light);

Setting up the shader

In our demo, three.js is automatically sending all your lights to our light uniforms. Since we are using a point light, we’ll need to create the following uniform:

struct PointLight {
  vec3 color;
  vec3 position; // light position, in camera coordinates
  float distance; // used for attenuation purposes. Since
                  // we're writing our own shader, it can
                  // really be anything we want (as long
                  // as we assign it to our light in its
                  // "distance" field
};

uniform PointLight pointLights[NUM_POINT_LIGHTS];

Notice that the PointLight structure is in (a partial) format expected by three.js to provide the data relative to Point Lights to the shader program. In the UniformsLib documentation you can see a list of all such uniforms provided by the THREE.UniformsLib. You can also see their shader implementation here. Notice that the constant NUM_POINT_LIGHTS is automatically created by three.js, so no need to worry about it.

And that’s it! This is how the final character looks like:

Whoa! Now you look like a badass, sir!
Whoa! Now you look like a badass, sir!

Adding lights at any moment (aka adding lights at runtime)

So you want to add a light after your shader programs were compiled and are running (for instance, you may have a candle being lit in the middle of the gameplay). Two steps are required. First, add the light to the scene, as explained previously. Then, for each material (make sure you have access to them), set its needsUpdate flag to true. That should get your shader recompiled to take into account the new number of lights.

var light = new THREE.PointLight(0xffffff, 1.0);
light.position.set(0.0,0.0,0.1);
scene.add(light);
material.needsUpdate = true;

If you have access to your character (and any objects that contain a material), you can access its material in the “material” field:

character.material.needsUpdate = true;

Notice that you will have to do that to all materials in your scene (or at least those you think that will be affected). Also, you only have to perform shader recompilation if the number of similar lights is changed, since you would be changing the NUM_*_LIGHTS constant. Keep in mind that, if you add a point light and remove a spotlight, you will still need to update your material, as both the number of spotlights and point lights have changed.

Final thoughts

This is a pretty basic example that shows how to setup a custom shader with textures and three.js lights. Although the final effect could have been easily achieved without touching any shader, I hope this will prove useful to someone willing to write a shader that’s not available in the standard three.js material list.

Considering the fact that most cool effects are achievable by multipass rendering, I’ll eventually extend this post in the future to cover that subject as well.