Multi-Architecture Graph Node Cookbook

In the context of graph node dispatch functions, it's easy enough to use the vpp multi-architecture support setup. The point of the scheme is simple: for performance-critical nodes, generate multiple CPU hardware-dependent versions of the node dispatch functions, and pick the best one at runtime.

The vpp scheme is simple enough to use, but details matter.

100,000 foot view

We compile entire graph node dispatch function implementation files multiple times. These compilations give rise to multiple versions of the graph node dispatch functions. Per-node constructor-functions interrogate CPU hardware, select the node dispatch function variant to use, and set the vlib_node_registration_t ".function" member to the address of the selected variant.


Declare the node dispatch function as shown, using the VLIB_NODE_FN macro. The name of the node function MUST match the name of the graph node.

VLIB_NODE_FN (ip4_sdp_node) (vlib_main_t * vm, vlib_node_runtime_t * node,
                             vlib_frame_t * frame)
    return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ ,
                        1 /* is_trace */ );
    return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ ,
                        0 /* is_trace */ );

We need to generate precisely one copy of the vlib_node_registration_t, error strings, and packet trace decode function.

Simply bracket these items with "#ifndef CLIB_MARCH_VARIANT...#endif":

static u8 *
format_sdp_trace (u8 * s, va_list * args)


static char *sdp_error_strings[] = {
#define _(sym,string) string,
#undef _


VLIB_REGISTER_NODE (ip4_sdp_node) =
  // DO NOT set the .function structure member.
  // The multiarch selection __attribute__((constructor)) function
  // takes care of it at runtime
  .name = "ip4-sdp",
  .vector_size = sizeof (u32),
  .format_trace = format_sdp_trace,

  .n_errors = ARRAY_LEN(sdp_error_strings),
  .error_strings = sdp_error_strings,

  .n_next_nodes = SDP_N_NEXT,

  /* edit / add dispositions here */
  .next_nodes =
    [SDP_NEXT_DROP] = "ip4-drop",

To belabor the point: do not set the ".function" member! That's the job of the multi-arch selection __attribute__((constructor)) function

Always inline node dispatch functions

It's typical for a graph dispatch function to contain one or more calls to an inline function. See above. If your node dispatch function is structured that way, make ABSOLUTELY CERTAIN to use the "always_inline" macro:

always_inline uword
ip46_sdp_inline (vlib_main_t * vm, vlib_node_runtime_t * node,
             vlib_frame_t * frame,
             int is_ip4, int is_trace)
{ ... }

Otherwise, the compiler is highly likely NOT to build multiple versions of the guts of your dispatch function.

It's fairly easy to spot this mistake in "perf top." If you see, for example, a bunch of functions with names of the form "xxx_node_fn_avx2" in the profile, BUT your brand-new node function shows up with a name of the form "xxx_inline.isra.1", it's quite likely that the inline was declared "static inline" instead of "always_inline".

Modify CMakeLists.txt

If the component in question already lists "MULTIARCH_SOURCES", simply add the indicated .c file to the list. Otherwise, add as shown below. Note that the added file "new_multiarch_node.c" should appear in both SOURCES and MULTIARCH_SOURCES:


  new_ multiarch_node.c