Complex Polygon Rendering using OpenCL

Chris Benesch
9 min readDec 8, 2019

For those of us who have to deal with GIS data, or other complex polygons we are used to seeing algorithms even in the best geometry tools that have N² performance. We twist and contort our code to get every last bit of CPU resources out. Once we get it computing properly then we have to wait again for everything to render. OpenCL has been a great tool for parallel computing, but its non-deterministic requirement makes such calculations unfeasible. I wish to share with you an approach that I recently used that gave me very good results, using both outlines and filled polygons. The latter part in OpenCL I will share code with as it is non-intuitive, and probably why you landed here in the first place.

The first thing I did was to load shape files that had the data I wished to display. Let’s say worldwide land coverage. I used the GDAL library to load and transform them into a WGS-84 format. This may seem pointless now, but the end result of this whole project is to handle display of shapefiles that could come from anywhere and in any supported projection, ie. Albers, etc.. Getting them into standard lat/lon coordinates is a good starting point. I loaded and created new persistent objects that were a geometry collection of one polygon per collection object. Sometimes the shapefiles might come in with multiple polygons per feature broken up by political regions, but for us we don’t care about that. When a request comes in, it has a map projection type and its boundaries. We take that and use it to create a boundary. A boundary is a very simple polygon that represents your eventual screen extents, ie -140 to -90 longitude by 50 to 20 latitude.

The next step is clipping. We don’t want to process the worlds points completely, just the ones we are interested in. The OGRGeometry object has a convenient Intersection() function that we can use with our boundary polygon to clip the polygon in question to the appropriate size. The drawback is this function is at least N*N in performance based on the number of points. To minimize that, first we get the “Envelope” of the boundary polygon and the polygon in question from the OGRGeometryCollection. We can then compare the two objects and see if they overlap, which will be much, much faster since its just literally a handful of comparison statements. If it overlaps, it gets thrown onto the pile to process. I am using QT so I can take advantage of the QThreadPool object and it handles making sure I don’t create more threads than the system has resources for. Outside of this process, I have a QAtomicInt which I increment each time I add an object (polygon to clip) to the thread pool and the job itself decrements the variable. The output is then added to a list of polygons guarded by a mutex. After sifting through our master OGRGeometryCollection we just wait for it to be done in a loop like the following:

while(atom_int > 0)
{
qApp->processEvents();
}

The thinking is that since the function is N*N with regard to point count, trying to do them all in one go will give me

(E(Npts)*Npolys)^2

I used epsilon notation to show the arithmetic mean shorthand and distinguish it from regular parentheses. As you can guess that would get very expensive, and it was. For a standard cut of the continental US it took about 2 seconds to sift through a list of 3000 polygons with 320,000 points (low resolution data), and 75 seconds for high resolution data. That’s when I tried using the CPU parallel processing and found my time reduced dramatically. 12ms for low resolution and 500ms for high resolution. For reference these times are on an Acer G73 gaming laptop with 12 cores @ 4GHz and a GT-1070 GPU

The next step was aesthetic and computationally accelerating. From those polygons, I wanted no two points to be less than 3 pixels away from each other. Hence was born the ShapePoint structure:

struct ShapePoint {
unsigned char closed;
unsigned int shapenum;
unsigned int pointnum;
float lon;
float lat;
float screenX;
float screenY;
};

The fields are pretty self explanatory, except that polygon holes are considered separate polygons. We still don’t know screen x and y coordinates. After filling an array with them, we call our first OpenCL function. lltoxy_<projection type>. You can call it with the size of the array in elements as a one dimensional work unit, and the purpose of the function is to based off the projection information and the point, to fill in the screenX and screenY coordinates. It can be argued that with a sufficiently small number of points that this could be more expensive to set up and execute than it gains in speed, but for the vast majority of cases, it is a time saver. I also use 32 bit objects as that is what the GPU is designed to do. 4 byte/pixel images and 4 byte boundary memory access, and 4 byte float operations are much faster than 64 bit float math. Unless there is a good reason not to, following these guidelines will yield the fastest possible GPU code.

Next we take said array which now has screenX and screenY filled in and do good ol pythagoras theorem on it to get our distance. We start at a new shapenum boundary and go point by point out until the difference of x squared plus the difference of y squared is greater than our pixel density squared, in this case 9. We use that to create an array of segments.

typedef struct Segment {
unsigned short x1;
unsigned short y1;
unsigned short x2;
unsigned short y2;
} Segment;

If the ShapePoint.closed field is > 0 then make sure and add a last segment to the starting point. If the ending polygon has > 4 segments, add it to the list, otherwise its screen clutter. This yields a clean looking display.

Next up comes the rendering. For this, I created a buffer of the size of the image (width * height) times 4. Then we can create an ImageHandle struct.

typedef struct ImageHandle {
volatile __global unsigned int *imgMemory;
unsigned short w;
unsigned short h;
} ImageHandle;

And for drawing lines, the convenience function

void plotPoint(ImageHandle *h,unsigned x,unsigned y)
{
unsigned offset;
unsigned char dp,dv;
if (x > h->w || y > h->h)
return;
offset = (y*h->w)+x;
h->imgMemory[offset]++;
}

The rest for lines is pretty straightforward.

Now the hard part was doing filled polygons. I tried floodfill, I tried using QT’s polygon engine to render after the fact in the CPU, and none of them worked real well. Land data is simply too complex to handle the traditional way. We have three known factors:

1 . Our polygons are closed and have boundary lines at the edges of the screen

2 . No two points in a polygon will share the same x and y coordinate with the exception of the first and last.

3 . Because of 1 and 2 the ray tracing algorithm would work with this scenario, in this case x -> width

So, lets begin our function:

__kernel void floodfill(__global struct ProjectionInfoData *pi,volatile __global unsigned int *imgdata,__global Segment *segments,unsigned int n_segs)
{
const unsigned x = get_global_id(0);
const unsigned y = get_global_id(1);
ImageHandle h;
h.imgMemory = imgdata;
h.w = pi->pixelWidth;
h.h = pi->pixelHeight;

unsigned int n_crs = 0;

Dont worry about the ProjectionInfoData struct, all we need to know now is the width and height. We started this process with a two dimensional work object. First off is obviously the creation of the ImageHandle struct, and initializing the variable n_crs. The point of this whole function is to determine if n_crs is odd or even and if odd, plot the point at the x and y value given. Like any good programmer dealing with speed as a top priority, Lets find obvious escape conditions first. We begin our loop looking through the pointer to segments of size n_segs.

for (int i = 0; i < n_segs; i++)
{

/* get the easy ones out of the way first, these could lead to an early return */

/* handle horizontal segments */
if (segments[i].y1 == segments[i].y2 &&
segments[i].y1 == y && (
(x <= segments[i].x1 && x >= segments[i].x2) ||
(x <= segments[i].x2 && x >= segments[i].x1)))
{
plotPoint(&h,x,y);
return;
}

/* handle equals segment end points */
if ((x == segments[i].x1 && y == segments[i].y1) ||
(x == segments[i].x2 && y == segments[i].y2))
{
plotPoint(&h,x,y);
return;
}

We obviously want to plot points ON the line. Next we can skip ones we dont care about:

/* Exclude ones that wont qualify right off the bat */

/* Too far left */

if (segments[i].x1 < x && segments[i].x2 < x)
continue;

/* Too high */

if (segments[i].y1 < y && segments[i].y2 < y)
continue;

/* Too low */

if (segments[i].y1 > y && segments[i].y2 > y)
continue;

Now, an interesting condition arose during the development of this, when you are at the y value of a vertex, since two segments both have the y value in question, you get counted as two crossings. So in our case, we chose to only count downward facing segments as a crossing if we are on a vertex

/* We know it crosses in y, if its off to the right, just add crossing count */

if (segments[i].x1 > x && segments[i].x2 > x)
{
/* if its at the vertex of two segments discard */
if (segments[i].y1 == y || segments[i].y2 == y)
{
/* so only count ones going downwards */
if ((segments[i].y1 > y && segments[i].y2 == y) || (segments[i].y1 == y && segments[i].y2 > y))
{
n_crs++;
}
} else {
n_crs++;
}
continue;
}

Now, we got the easy ones out of the way, now we have to handle when we are inside the line domain by x values and see where in y we cross it. First off, the easy ones, and also eliminate that pesky vertex problem from above:

/* watch out for vertexes again */

if (segments[i].y1 == y || segments[i].y2 == y)
{
if (segments[i].y1 > y && segments[i].y2 == y && segments[i].x2 > x)
{
n_crs++;
}

if (segments[i].y2 > y && segments[i].y1 == y && segments[i].x1 > x)
{
n_crs++;
}
continue;
}

Now, to determine our x crossing value, we needed to do a bit of math. Given our points x1,y1,x2,y2, and our y value:

That gives us our x_intersection function:

unsigned short x_intersection(__global Segment *s,unsigned short y)
{
if (s->y1 == s->y2 && y == s->y1)
return s->y1;
if (s->y1 == y)
return s->x1;
if (s->y2 == y)
return s->x2;
float v = s->x2 — s->x1;
v *= y-s->y1;
v /= s->y2 — s->y1;
v += s->x1;
unsigned short rv = v;
return rv;
}

A little sanity checking above to make sure we arent being passed junk, and continuing on with our original function:

unsigned short xintsct = x_intersection(&segments[i],y);
/* If we are on the line, plot */
if (xintsct == x)
{
plotPoint(&h,x,y);
return;
}

/* count it as a crossing if its to the right only */
if (xintsct > x)
n_crs++;

And that concludes our loop for each segment. At the end, we finally do:

if (n_crs & 1) // Odd number of crossings = inside poly

plotPoint(&h,x,y);

It seems like a lot of work, but it runs surprisingly fast. This whole process from request to image return was less than 0.1 seconds for low res data and 1.2 for high res.

Here is low res CONUS with the line version in dark blue, the filled polygons tan, and the country borders in black overlaid. This whole image set took just under a second to generate:

And here is high res data around the island off the coast of Virginia, same colors, about 2.5 seconds from request to image return.

It took me hours and hours of fruitless google searching and weeks of banging my head against the wall to get good performance out of this, so hopefully this can help someone. Please feel free to reach out with questions or comments, or better yet, clap for this article and follow me :-)

Cheers,

Chris Benesch

--

--

Chris Benesch

Professional software engineer. Math enthusiast. Entrepreneur at heart. http://www.beneschtech.com