Collision Detection in 2D
An explanation of the Separating Axis Theorem
If you have ever played any video game in your life, you used algorithms that solve the hard problem of collision detection, without even knowing about it. The solutions are often ingenious and answer the question of "what do we need all this math for" beautifully. This post is a deep dive into one of those solutions. The SAT algorithm.
This post is going to answer that question, by explaining one particular algorithm based on the SAT theorem.
In all the demos of this article, you can drag around the polygons (and their points), as well as rotate polygons around their centers by pressing the A or D keys.
What you see in the demo above is based on the SAT (Separating Axis Theorem) Algorithm. We'll derive this idea and how it can be used for collision detections, with mathematics and in code, step by step.
The main idea behind it is rather simple.
If you can draw a line between two shapes, without touching either of them, then the two shapes do not collide.
If you can find such a line, you know for sure, that the two shapes are separated from each other by (at least) that line. And that's why this line can be called a "separating axis", and that's exactly where the name of the theorem comes from.
But how does the computer search for the lines that separate two shapes? It's using shadows.
If you shine a light onto an object from very, very far away and track where the shadow falls on a wall, you get what is mathematically known as a "projection". Projections are a useful concept in linear algebra and we need them for the SAT algorithm.
For the next section you should be comfortable with some basic concepts of vectors, if you aren't here's a summary of the concepts necessary.
There is a mathematical expression for how to calculate where that projection (shadow) of a point (object) would fall on a line (wall).
Given the Points , , and , where is the point we want to project and and are the two points defining the line , we can find the projection of onto the line like so:
The same formula can be expressed in code:
Drawing this out onto the canvas yields a line going through and and an arrow , going from to , which casts a shadow on the line from to .
The above can be done for any 2 vectors. For a line, we needed 2 points to get the direction of the line and to shift the result onto the line by adding on the vector/point a. The whole idea for only two vectors can be expressed more shortly and is also a useful operation to have in general.
The same idea in Typescript:
There is also the idea of using a scalar projection. A scalar is just a number, and the result of scalar projections are just numbers. A scalar projection calculates the length of the projection of the vector and in the direction of vector b. And it can be computed like so:
Notice the little hat on the b, that means that this is a unit vector.
Projecting a Whole Polygon
In the SAT Algorithm the projections of the polygons determine whether or not a single ray of light could shine through in between them. If there is such a ray of light, the two can't collide. Otherwise, how could the light get through?
The ray of light is the separating axis.
So the goal is easier now: For the two polygons we only need to find two shadows (projections) that don't overlap to determine that they don't collide.
However, to get there, we first need to be able to project a whole polygon onto a line. Casting the shadow for a whole object, not just for a single point...
Luckily, a polygon is just a set of points. Usually, the points of a polygon are called vertices in computer graphics. If we were to project all the vertices of a polygon we would still have to figure out which points are the "left" and "right" edge of the shadow.
There are multiple approaches to do this, but one I find particularly fascinating and which is the one used down below in the demos is that of using a support function.
Essentially a support function finds a point among a set of points (here the vertices of the polygon we want to project), that is furthest along in a particular direction.
If we run the support function in the two directions of the line we want to project on, we get out only two points.
We can then use these two support points to project them onto the line. Which gives us the "left" and "right" edge of the shadow.
Projection for 2 Polygons
All of this also works for 2 Polygons at once of course! The question we would like to answer given the two shadows is whether or not they overlap. Since the shadows are two line segments, that we know to be collinear we can use a simple test like the one below to check whether or not they overlap.
There are also versions of this test, which use the dot product to determine how far along the line the points are and by comparing the minimum and maximum against each other determine whether or not they overlap. Which algorithm we use doesn't matter so much in the end, the important thing is that we can test for the two shadows "overlapping" or not. In the example below, if the shadows overlap we change the color to red.
The Last Idea
How should we pick the lines that we project onto? Isn't there an infinity of possible lines we could choose from?
Luckily for us, there is a last insight for the SAT algorithm to work, namely that we only have to check a single axis for every edge of the polygons we want to check for collisions. Namely, need to check each of the normals of the edges, and see if the shadows projected onto that normal overlap. You can see that in the example below, where we are slowly cycling through the different normals, one by one!
The only thing we are interested in from the normal is the direction, so we displace it towards the origin and draw a line passing in the same general direction that the normal was pointing in. Usually, normals would be directly on the edge of the polygon.
The edge to which the normal belongs is colored yellow in the example and you can see how the line drawn through the origin is always perpendicular to the yellow edge (because the direction is taken from the normal that has to be the case)!
In the above example, if you make the shapes overlap, what do you notice about their shadows?
In the demo above, we stop for a second at each step just for visualization purposes. We could just check all of the normals "at once". If all shadows overlap (and show up in red), then the shapes also have to overlap. They are colliding. That's the whole meaning behind the separating axis theorem.
To sum up the algorithm:
- Cast shadows onto the normals of two polygons
- Check if the shadows overlap
- If all shadows overlap, there is a collision.
- If you find a single pair of shadows that doesn't overlap, there can't be a collision, because you have found a separating axis.
Now you might say, wait for a second... we don't yet prevent the shapes from intersecting with each other, do we?
And you would be right.
This is an important distinction in collision detection algorithms, between calculating if two shapes collide, and then what to do to deal with that collision.
This second problem is usually known as calculating a collision response. So, how could we appropriately respond to the collision?
Adding a Collision Response
Luckily for us, SAT is easy to adapt to get out a collision response – namely, if you find that all shadows (and therefore the shapes) overlap, you can find the pair of shadows that overlaps the least, and push the two shapes apart in that direction until the shadows don't touch anymore.
This idea of the "overlap" between the two shadows is related to an idea known as the MTV – the Minimum Translation Vector. In a way, the smallest overlap of the shadows of the two polygons projected onto their normals is a vector that tells us exactly how far and in which direction the two shapes have to move so that at least one pair of shadows stops overlapping. It represents the minimum amount of work necessary to push the shapes apart. So we just apply half of the vector as a translation transformation to one shape and the opposite of that to the other shape.
Let's see how we could implement that idea in code:
Then we can use that getResponseForCollision function like so:
And here is the demo:
The Problem of Concavity
Cool, so we are done now, right? Well, check this out:
Well... that didn't go too well huh? The problem is that what you just saw were so-called "concave" shapes... All the polygons you've seen so far up to this point were "convex" and therefore just worked.
Convex and concave express whether something bulges in- or outwards. I remember them with the word "cave" as in concave. A cave goes inward, and so does a concave object. In other words, concave shapes are shapes that have some kind of indentation in them. A star would be a concave shape, or something looking like a V or U.
To handle collisions with concave shapes is very tricky... so instead, we don't bother and just chop their concave shapes up into simple, convex shapes and then run collision detection algorithms on them.
To do that, we are going to use a process called triangulation. Chopping an arbitrary shape up into triangles, is not a trivial problem and has more than one solution! We'll look at one of them known as the "Ear Clipping Algorithm".
The Ear Clipping Algorithm or how to triangulate Polygons
Triangulating Polygons is a tough problem because there are so many weird edge cases to deal with. There have been whole papers written about how to efficiently do it (see resources below) and I won't go into too much detail here, and neither will I attempt to fix all the possible edge cases.
The algorithm we are going to use for triangulating concave shapes into a bunch of convex triangles is known as "Ear Clipping".
The basic idea is to find "ears" – triangular pieces of the polygon that "stick out", like an ear, and then "clip" them out of the polygon, reducing the problem to a smaller polygon – for which we can then find another ear to clip, and so on until only a last triangle is left.
If you turn off the visualization, you can drag around the polygon vertices to try out different shapes!
Let's define what an "ear" is more formally. An ear is a set of three consecutive vertices, , and if two conditions are met:
- The interior angle between edge , and has to be smaller than 180°
- The triangle formed by the three vertices can not contain any other vertices of the polygon.
When an ear is found, we put it into our list of triangles for the triangulation, remove the vertices from the list of vertices to check, and keep looping over the rest of the vertices left and check for another ear. We can keep doing this until just 3 vertices are left – those will form the last ear. In code this would look something like this:
For now, this is it. We can have collision detection and responses for both convex and concave shapes.
However... Collision detection is a tricky business and there are still edge cases where the above breaks down.
The first that comes to mind: Curved shapes. Anything that doesn't have vertices doesn't work with the above algorithms and we would need to convert it first. Also, the triangulation implemented doesn't handle holes and breaks if the interior angle between two edges of a polygon is exactly equal to 180° or the polygon intersects with itself. Most of these problems could be tackled simply by using a better triangulation algorithm, in case you are curious there is a list of resources down below.
Another problem could be objects or shapes that move very quickly. If they move quickly enough and the step size is bigger than the thing they should collide with, they could simply pass through (or "over") the other object. In games, usually bullets and thin walls would show this kind of behavior and problem. It's like the shape were "tunneling" through the other shape because it moved so fast that there never was a position update where the two were overlapping...
Yet another problem is that of performance when there are lots of things that could potentially collide with each other. I mean, all of the above code is written for clarity rather than performance so there are improvements to be gained in "simply" rewriting the collision code we have, but the problem of checking collisions for many shapes at once remains . Which is bad...
What if n is more than 2? If it is 10, 100, or 1000? Collision detection algorithms of any kind are expensive and if we have to check every single polygon against every single other polygon the amount of computations we have to do grows exponentially. Which will eventually slow down to a halt.
This means we have to use some clever data structure or algorithm to reduce the number of collisions to check.
One way to do so would be to divide up the "space" into separate regions or "buckets". Because two shapes that are very very far apart can not intersect (unless one of them is big) we could cut down on the number of collision detections we have to do by a lot, because only things from the same bucket could potentially collide with one another.
The general algorithm that could be used for that is called a Quadtree (in 2D) or an Octree (in 3d). Here's an image of the idea in 3D:
But coding these is reserved for another time.
If you came all this way – congratulations! I hope you learned something. You can inspect most of the code for the demos of this website at the repo for this page over here.
If you want to read more, I have assembled a list of resources for this project that I used while building. Here you go: