ACL plugin constant-time lookup
The initial implementation of ACL plugin performs a trivial for() cycle, going through the assigned ACLs on a per-packet basis. This is not very efficient, even if for very short ACLs due to its simplicity it can beat more advanced methods.
However, to cover the case of longer ACLs with acceptable performance, we need to have a better way of matching. This write-up proposes a mechanism to make a lookup from O(M) where M is number of entries to O(N) where N is number of different mask combinations.
Preparation of ACL(s)
The ACL plugin will maintain a global list of “mask types”, i.e. the specific configurations of “do not care” bits within the ACEs. Upon the creation of a new ACL, a pass will be made through all the ACEs, to assign and possibly allocate the “mask type number”.
Each ACL has a structure hash_acl_info_t representing the “hash-based” parts of information related to that ACL, primarily the array of hash_ace_info_t structures - each of the members of that array corresponding to one of the rules (ACEs) in the original ACL, for this they have a pair of (acl_index, ace_index) to keep track, predominantly for debugging.
Why do we need a whole separate structure, and are not adding new fields to the existing rule structure? First, encapsulation, to minimize the pollution of the main ACL code with the hash-based lookup artifacts. Second, one rule may correspond to more than one “hash-based” ACE. In fact, most of the rules do correspond to two of those. Why ?
Consider that the current ACL lookup logic is that if a packet is not the initial fragment, and there is an L4 entry acting on the packet, the comparison will be made only on the L4 protocol field value rather than on the protocol and port values. This behavior is governed by l4_match_nonfirst_fragment flag in the acl_main, and is needed to maintain the compatibility with the existing software switch implementation.
While for the sequential check in single_acl_match_5tuple() it is very easy to implement by just breaking out at the right moment, in case of hash-based matching this cost us two checks: one on full 5-tuple and the flag pkt.is_nonfirst_fragment being zero, the second on 3-tuple and the flag pkt.is_nonfirst_fragment being one, with the second check triggered by the acl_main.l4_match_nonfirst_fragment setting being the default 1. This dictates the necessity of having a “match” field in a given hash_ace_info_t element, which would reflect the value we are supposed to match after applying the mask.
There can be other circumstances when it might be beneficial to expand the given rule in the original ACL into multiple - for example, as an optimization within the port range handling for small port ranges (this is not done as of the time of writing).
Assigning ACLs to an interface
Once the ACL list is assigned to an interface, or, rather, a new ACL is added to the list of the existing ACLs applied to the interface, we need to update the bihash accelerating the lookup.
All the entries for the lookups are stored within a single 48_8
bihash, which captures the 5-tuple from the packet as well as the
miscellaneous per-packet information flags, e.g. l4_valid,
is_non_first_fragment, and so on. To facilitate the use of the single
bihash by all the interfaces, the is_ip6, is_input, sw_if_index
are part of the key, as well as mask_type_index - the latter being
necessary because there can be entries with the same value but different
permit ::/0, permit::/128.
At the moment of an ACL being applied to an interface, we need to walk the list of hash_ace_info_t entries corresponding to that ACL, and update the bihash with the keys corresponding to the match values in these entries.
The value of the hash match contains the index into a per-sw_if_index vector of applied_ace_hash_entry_t elements, as well as a couple of flags: shadowed (optimization: if this flag on a matched entry is zero, means we can stop the lookup early and declare a match - see below), and need_portrange_check - meaning that what matched was a superset of the actual match, and we need to perform an extra check.
Also, upon insertion, we must keep in mind there can be multiple applied_ace_hash_entry_t for the same key and must keep a list of those. This is necessary to incrementally apply/unapply the ACLs as part of the ACL vector: say, two ACLs have “permit 2001:db8::1/128 any” - we should be able to retain the entry for the second ACL even if we have deleted the first one. Also, in case there are two entries with the same key but different port ranges, say 0..42 and 142..65535 - we need to be able to sequentially match on those if we decide not to expand them into individual port-specific entries.
The simple single-packet lookup is defined in multi_acl_match_get_applied_ace_index, which returns the index of the applied hash ACE if there was a match, or ~0 if there wasn’t.
The future optimized per-packet lookup may be batched in three phases:
Prepare the keys in the per-worker vector by doing logical AND of original 5-tuple record with the elements of the mask vector.
Lookup the keys in the bihash in a batch manner, collecting the result with lowest u64 (acl index within vector, ACE index) from the hash lookup value, and performing the list walk if necessary (for portranges).
Take the action from the ACL record as defined by (ACL#, ACE#) from the resulting lookup winner, or, if no match found, then perform default deny.
During the phase of combining multiple ACLs into one rulebase, when they are applied to interface, we also can perform several optimizations.
If a given ACE is a strict subset of another ACE located up in the linear search order, we can ignore this ACE completely - because by definition it will never match. We will call such an ACE redundant. Here is an example:
permit 2001:db8:1::/48 2001:db8:2::/48 (B) deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A)
A bit more formally, we can define this relationship of an ACE A to ACE B as:
redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA) && contains(dstB, dstA) && is_after(A, B))
Here as “contains” we define an operation operating on the sets defined by the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition) respectively, and returning true if all the elements represented by the second argument are represented by the first argument. The “is_after” is true if A is located below B in the ruleset.
If a given ACE does not intersect at all with any other ACE in front of it, we can mark it as such.
Then during the sequence of the lookups the successful hit on this ACE means we do not need to look up other mask combinations - thus potentially significantly speeding up the match process. Here is an example, assuming we have the following ACL:
permit 2001:db8:1::/48 2001:db8:2::/48 (B) deny 2001:db8:3::/48 2001:db8:2:1::/64 (A)
In this case if we match the second entry, we do not need to check whether we have matched the first one - the source addresses are completely different. We call such an ACE independent from another.
We can define this as
independent(aceA, aceB) := (!intersect(protoA, protoB) || !intersect(srcA, srcB) || !intersect(dstA, dstB))
where intersect is defined as operation returning true if there are elements belonging to the sets of both arguments.
If the entry A is neither redundant nor independent from B, and is below B in the ruleset, we call such an entry shadowed by B, here is an example:
deny tcp 2001:db8:1::/48 2001:db8:2::/48 (B) permit 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A)
This means the earlier rule “carves out” a subset of A, thus leaving a “shadow”. (Evidently, the action needs to be different for the shadow to have an effect, but for for the terminology sake we do not care).
The more formal definition:
shadowed(aceA, aceB) := !redundant(aceA, aceB) && !independent(aceA, aceB) && is_after(aceA, aceB)
Using this terminology, any ruleset can be represented as a DAG (Directed Acyclic Graph), with the bottom being the implicit “deny any”, pointing to the set of rules shadowing it or the ones it is redundant for.
These rules may in turn be shadowing each other. There is no cycles in this graph because of the natural order of the rules - the rule located closer to the end of the ruleset can never shadow or make redundant a rule higher up.
The optimization that enables can allow for is to skip matching certain masks on a per-lookup basis - if a given rule has matched, the only adjustments that can happen is the match with one of the shadowing rules.
Also, another avenue for the optimization can be starting the lookup process with the mask type that maximizes the chances of the independent ACE match, thus resulting in an ACE lookup being a single hash table hit.
All the new routines are located in a separate file, so we can cleanly experiment with a different approach if this does not fit all of the use cases.
The constant-time lookup within the data path has the API with the same signature as:
u8 multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2, int is_ip6, int is_input, u32 * acl_match_p, u32 * rule_match_p, u32 * trace_bitmap)
There should be a new upper-level function with the same signature, which will make a decision whether to use a linear lookup, or to use the constant-time lookup implemented by this work, or to add some other optimizations (e.g. by keeping the cache of the last N lookups).
The calls to the routine doing preparatory work should happen in
acl_add_list() after creating the linear-lookup structures, and the
routine doing the preparatory work populating the hashtable should be
acl_interface_add_del_inout_acl() or its callees.
The initial implementation will be geared towards looking up a single match at a time, with the subsequent optimizations possible to make the lookup for more than one packet.