The Solution: Introduction to G2O

On this page

This page introduces G2O on the conceptual level. To learn about how it is implemented in Invisible Map, visit the page on Invisible Map Generation (the backend processing code).

G2O, or “generic graph optimization”, is an algorithm to solve SLAM problems first proposed in 2011, and is now pretty standard. G2O, like most other SLAM solutions, represents measurements as a graph. Each vertex contains a pose - some position and orientation, either represented as a vector of 3 position elements followed by 3 (or 4) orientation elements, or some 4x4 matrix representing a rotation and a translation. One can easily be translated to the other. The edges each have some linear transformation representing the conversion from one vertex to the next.

The most important intuition to get first is that the edge values (the measurements) are fixed, while the vertex values (the poses) are variable. G2O will not change the measurements, instead altering the poses to best match them. The result will (hopefully) be a more accurate version of the already recorded map.

Definitions

The Math

For those who want the full understanding, we will work through all the math here, step by step. If you don’t care, feel free to skip this section.

First, define \(X^*\) to be the ideal poses - \(argmin F(X)\). Given an initial guess \(\check{X}\), tipically defined as the initial pose guesses given by measurement, define \(\Delta X\) so that \(\check{X} + \Delta X = X^*\). In this way, we just need to find \(\Delta X\). This may seem like a small difference, but it will help a lot.

Now we need to minimize \(F(\check{X} + \Delta X)\). Substituing into the error function, we get: $$F(\check{X} + \Delta X) = \sum\limits_{i,j} e_{ij} (\check{X} + \Delta X)^T \Omega_{ij} e_{ij} (\check{X} + \Delta X)$$ However, we do not know exactly what \(e_{ij}\) is, so we cannot just plug in \(\check{X} + \Delta X\) without first knowing \(\Delta X\) (which is what we want to find!). We can, however, approximate it through linearization with the following equation: $$e_{ij}(\check{X} + \Delta X) \approx e_{ij}(\check{X}) + J_{ij}\Delta X$$ where \(J_{ij}\) is the Jacobian of \(e_{ij}\). IF the gradient is the vector-to-scalar analog of the derivative, the Jacobian is the vector-to-vector analog of the gradient - a matrix of partial derivatives of each output variable with respect to each input variale. This effectively creates a plane through \(e_{ij}(\check{X})\) in the direction of the error function, then takes a step of size \(\Delta X\) in that direcion. For sufficiently smooth functions, this will be a pretty good estimation of the resulting value. Notice that this only works with the sum \(\check{X} + \Delta X\); this is why we restated the problem this way in the first place.

From here, it will be a lot of math and simplifying. You can follow along or just skip to the final answer. We will take \(e_{ij} = e_{ij}(\check{X})\) for simplicity. $$F_{ij}(\check{X} + \Delta X) = e_{ij}(\check{X} + \Delta X)^T \Omega_{ij} e_{ij} (\check{X} + \Delta X)$$ $$F_{ij}(\check{X} + \Delta X) = (e_{ij} + J_{ij}\Delta X)^T \Omega_{ij} (e_{ij} + J_{ij}\Delta X)$$ $$F_{ij}(\check{X} + \Delta X) = (e_{ij}^T + \Delta X^T J_{ij}^T) \Omega_{ij} (e_{ij} + J_{ij}\Delta X)$$ $$F_{ij}(\check{X} + \Delta X) = e_{ij}^T\Omega_{ij}e_{ij} + e_{ij}^T\Omega_{ij}J_{ij}\Delta X + \Delta X^T J_{ij}^T\Omega_{ij}e_{ij} + \Delta X^T J_{ij}^T\Omega_{ij}J_{ij}\Delta X$$

\(F_{ij}(\check{X} + \Delta X) = e_{ij}^T\Omega_{ij}e_{ij} + 2e_{ij}^T\Omega_{ij}J_{ij}\Delta X + \Delta X^T J_{ij}^T\Omega_{ij}J_{ij}\Delta X\) 1

$$c_{ij} = e_{ij}^T\Omega_{ij}e_{ij} \;\;\;\;\;\;\; b_{ij} = e_{ij}^T\Omega_{ij}J_{ij} \;\;\;\;\;\;\; H_{ij} = J_{ij}^T\Omega_{ij}J_{ij}$$ $$F_{ij}(\check{X} + \Delta X) = c_{ij} + 2b_{ij}\Delta X + \Delta X^T H_{ij}\Delta X$$ $$F(\check{X} + \Delta X) = c + 2b\Delta X + \Delta X^T H\Delta X$$

Now we have \(F(\check{X} + \Delta X)\) in terms of constants and \(\Delta X\). To find the minimum, we take the derivative and set it equal to 0. Or, more accurately, take the gradient and set it to equal to \(\vec{0}\): $$\nabla(\check{X} + \Delta X) = \nabla(c + 2b\Delta X + \Delta X^T H \Delta X) = \vec{0}$$ $$2b + 2H\Delta X = \vec{0}$$ $$H\Delta X = -b$$ $$\Delta X = -H^{-1}b$$ And now we have a definition for \(\Delta X\) given \(\check{X}\), \(e_{ij}(X)\) and \(J_{ij}(X)\), and can solve for it accordingly. From here, all we need to do is add \(\Delta X\) to \(\check{X}\) to get \(X^*\) and we are done.

Sparse Bundle Adjustment (SBA)

Sparse Bundle Adjustment (SBA) is an add-on to G2O to make it a little more accurate in the case of detecting things like April Tags with a camera. The camera does not actually find the center of the tag; it finds the four corners of the tag, storing them as separate detections. In a normal implementation of G2O, this is represented as four different vertices and four different edges. These four vertices, while representing pretty much the same thing, are only connected indirectly, through the odometry nodes. Theoretically, this should not matter, but small changes in the tag (curved surfaces, slightly too big or too small, etc.) can cause the optimizer to move these corners independently, resulting in a very skewed tag in the final map.

SBA attempts to fix this by creating an extra node in the “middle” of the other four. The odometry node connects to this, then this node connects to the four corner nodes. These nodes are set as fixed relative to the main tag node, so the optimizer can only move the center of the tag, not each corner independently. Most of this is abstracted away in a special type of Edge in SparseOptimizer, and by storing only the position of the center node in our Graph. The only notable change is that this position is stored as an x-y position on the phone screen, in pixels, instead of a pose, in meters and radians.

With the switch to mapping with primarily LIDAR phones, SBA is not needed. With LIDAR phones, when we detect a tag, we shoot a ray in that direction. If it intersects a vertical plane (which it should, since it is mounted on a wall!), we store the transformation to this intersection instead of the transformation to the tag. This pose will be on the wall, in the same orientation of the wall. While this is much more accurate because of LIDAR, it also means it is harder to convert that pose back to pixels, as is needed for SBA. Therefore, we have largely abandoned SBA, at least for the time being.