The Model: Graphs in the Invisible Map

On this page

Most SLAM solutions represent sensor data as a graph. The main vertices (nodes) that appear in graph-based SLAM are the camera poses and sometimes landmarks, and the edges between vertices are constraints (some physical relationship). In the Invisible Map, the nodes are camera poses, April tag poses, and dummy nodes (explained further in the Dummy Nodes section). The edges are the transformations from the camera pose to the tags and dummy nodes at each frame. Edges are constraints and thus their values are fixed. This is the most important intuition to gain. SLAM will change the positions of the nodes to best fit the edges, not the other way around.

Camera, tag and dummy node positions are represented in the world reference frame using a 4-by-4 transformation matrix. The upper left 3-by-3 matrix represents the rotation and the 4th column represents the translation from the origin of the world referene frame. The world reference frame has the y axis oriented vertically with gravity and the x and z axis are oriented based on the starting position of the phone.

When not part of a matrix, orientation is typically represented as a quaternion. Quaternions are a 4-dimensional extension of complex numbers which turn out to be great for representing rotations. The imaginary components - x, y, and z - represent an axis in 3D around which to rotate. The w component represents the amount to rotate. The way they are set up, the magnitude of rotation quaternions will always be 1. This is only a brief overview of quaternions, but should be most of what you need to work on any part of Invisible Map. If you would like to learn more, the YouTube channel 3Blue1Brown has a series of videos on quaternions, along with an interactive website.

Camera Nodes

The camera nodes are the phone odometry data points captured during map creation. InvisibleMapCreator captures 10 frames per second, and ARKit tracks the transform of the camera pose. The app then stores the information as the user records the map.

Tag Nodes

The C++ code in InvisibleMap is used to detect and recognize tags. How the AprilTag detection itself works is not necessary to know. The important piece is that the app will save the position of the tags as a transform in the world frame. Each separate detection of a tag is saved as a separate data point.

Dummy Nodes

Dummy nodes are added in the back-end processing before sending the graph through G2O for optimization. Their purpose is to restrict the rotation of the camera nodes in order to prevent them from tilting and causing the path to stray vertically. We will work through the conceptual math below, but feel free to skip this section.

Dummy Node Math

The \(\chi^2\) for an edge is given by the equation \(\chi^2 = e^T \Omega e\). These terms are defined in more detail on the G2O page in the definitions section. In short, \(e\) is the error between vertices of the edge and \(\Omega\) is the information matrix providing scale factors for the errors.

Each camera node has a single associated dummy node, and both nodes are connected by an edge. The error can be represented as a vector: $$\begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \\ \Delta q_x \\ \Delta q_y \\ \Delta q_z \end{bmatrix}$$ where \(\Delta x, \Delta y, \Delta z\) are the errors in displacement and \(\Delta q_x, \Delta q_y, \Delta q_z\) are the errors in the quaternion components (used for rotation, e.g. yaw, pitch, roll). The information matrix \(\Omega\) is a 6-by-6 matrix.

The goal of dummy nodes is to prevent G2O from rotating the camera nodes about any axis except the global y-axis, which is aligned with gravity. This is because rotating about the global y-axis is the only direction that does not change the projection of gravity onto the camera's local coordinate system's basis vectors. If the camera nodes were allowed to rotate about other axes, it could cause the path to warp up and down vertically as the poses get tilted. Since dummy nodes only restrict rotation, the information matrix has weights of 0 for anything involving the three displacement components (only the bottom right 3-by-3 matrix has content, the rest is zeros). Putting this and the representation of the error vector above together and subtituting into the \(\chi^2\) function we obtain: $$\chi^2 = \begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \\ \Delta q_x \\ \Delta q_y \\ \Delta q_z \end{bmatrix} ^ T \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & . & . & . \\ 0 & 0 & 0 & . & . & . \\ 0 & 0 & 0 & . & . & . \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \\ \Delta q_x \\ \Delta q_y \\ \Delta q_z \end{bmatrix}$$ The content of the bottom right 3-by-3 matrix will be explained next.

Next, we define three unit vectors \(u, v, g\). Only one of these vectors needs to be defined specifically; we'll define \(u\). The other two are essentially arbitrary as long as they are both orthogonal to \(u\). We define \(u\) as $$q_{spin\,about\,y} - q_{original} \over ||q_{spin\,about\,y} - q_{original}||$$ where \(q_{spin\,about\,y}\) is the vector of the quaternion components after being rotated some amount about the y-axis by the graph optimization, and \(q_{original}\) is the original quaternion vector.

The bottom right 3-by-3 matrix can now be defined as $$uu^Tc_1 + vv^Tc_2 + gg^Tc_3$$ where \(c_1, c_2, c_3\) are weights. Substituing this matrix into the \(\chi^2\) function and simplifying to remove all the zero terms, we obtain: $$\Delta q^T (uu^Tc_1 + vv^Tc_2 + gg^Tc_3) \Delta q$$ where \(\Delta q\) is \(\begin{bmatrix} \Delta q_x \\ \Delta q_y \\ \Delta q_z \end{bmatrix}\). This can further be simplified to $$c_1(\Delta q^T u)^2 + c_2(\Delta q^T v)^2 + c_3(\Delta q^T g)^2$$ From this equation we can see how a change in \(q\) about different axes would increase and decrease each of the three terms. For the dummy nodes to allow rotation about the y-axis and not any others, we set \(c_1\) to be small and the the other two weights to be large. This causes rotations about the y axis to be low in effect on \(\chi^2\) while other rotations have high effect.

Edges

In the Invisible Map code, there are four types of edges: odometry to odometry, odometry to tag, odometry to dummy, and a special edge with three vertices between an odometry node, a tag anchor node (the center), and a tag corner. This last special edge is only used when using Sparse Bundle Adjustment (SBA) for the optimization. Only the tag anchor is used when not using SBA.