Graph Walks
All FIB object types are allocated from a VPP memory pool 1. The objects are thus susceptible to memory re-allocation, therefore the use of a bare “C” pointer to refer to a child or parent is not possible. Instead there is the concept of a fib_node_ptr_t which is a tuple of type,index. The type indicates what type of object it is (and hence which pool to use) and the index is the index in that pool. This allows for the safe retrieval of any object type.
When a child resolves via a parent it does so knowing the type of that parent. The child to parent relationship is thus fully known to the child, and hence a forward walk of the graph (from child to parent) is trivial. However, a parent does not choose its children, it does not even choose the type. All object types that form part of the FIB control plane graph all inherit from a single base class; fib_node_t. A fib_node_t identifies the object’s index and its associated virtual function table provides the parent a mechanism to visit that object during the walk. The reason for a back-walk is to inform all children that the state of the parent has changed in some way, and that the child may itself need to update.
To support the many to one, child to parent, relationship a parent must maintain a list of its children. The requirements of this list are;
O(1) insertion and delete time. Several child-parent relationships are made/broken during route addition/deletion.
Ordering. High priority children are at the front, low priority at the back (see section Fast Convergence)
Insertion at arbitrary locations.
To realise these requirements the child-list is a doubly linked-list, where each element contains a fib_node_ptr_t. The VPP pool memory model applies to the list elements, so they are also identified by an index. When a child is added to a list it is returned the index of the element. Using this index the element can be removed in constant time. The list supports ‘push-front’ and ‘push-back’ semantics for ordering. To walk the children of a parent is then to iterate this list.
A back-walk of the graph is a depth first search where all children in all levels of the hierarchy are visited. Such walks can therefore encounter all object instances in the FIB control plane graph, numbering in the millions. A FIB control-plane graph is cyclic in the presence of a recursion loop, so the walk implementation has mechanisms to detect this and exit early.
A back-walk can be either synchronous or asynchronous. A synchronous walk will visit the entire section of the graph before control is returned to the caller, an asynchronous walk will queue the walk to a background process, to run at a later time, and immediately return to the caller. To implement asynchronous walks a fib_walk_t object it added to the front of the parent’s child list. As children are visited the fib_walk_t object advances through the list. Since it is inserted in the list, when the walk suspends and resumes, it can continue at the correct location. It is also safe with respect to the deletion of children from the list. New children are added to the head of the list, and so will not encounter the walk, but since they are new, they already have the up to date state of the parent.
A VLIB process ‘fib-walk’ runs to perform the asynchronous walks. VLIB has no priority scheduling between respective processes, so the fib-walk process does work in small increments so it does not block the main route download process. Since the main download process effectively has priority numerous asynchronous back-walks can be started on the same parent instance before the fib-walk process can run. FIB is a ‘final state’ application. If a parent changes n times, it is not necessary for the children to also update n times, instead it is only necessary that this child updates to the latest, or final, state. Consequently when multiple walks on a parent (and hence potential updates to a child) are queued, these walks can be merged into a single walk. This is the main reason the walks are designed this way, to eliminate (as much as possible) redundant work and thus converge the system as fast as possible.
Choosing between a synchronous and an asynchronous walk is therefore a trade-off between time it takes to propagate a change in the parent to all of its children, versus the time it takes to act on a single route update. For example, if a route update were to affect millions of child recursive routes, then the rate at which such updates could be processed would be dependent on the number of child recursive route which would not be good. At the time of writing FIB2.0 uses synchronous walk in all locations except when walking the children of a path-list, and it has more than 32 3 children. This avoids the case mentioned above.
Footnotes: