Paper Study - SMORE Semi Oblivious Traffic Engineering
10 Mar 2020** This is a study & review of the Opera paper. I’m not an author.
CONTRA
Motivation
In order to achieve efficient routing protocols, a system should supports:
- Traffic Engineering: prefer least utilized paths
- Routing Constraints: middle-box traversal, specifying certain routes
- Fast Adaption: update routing upon condition change
Conga/Hula: supports only least utilized path policy, not RC and not for arbitrary topology (only uses tree-like topo)
Contra: performance aware, programmable, general (for wide range of policies and topologies)
High level view:
Contra compiles high level policies into performance-aware routing protocols in data plane
Language
Path constraints are defined with regular expression, metric
Example: S can use F1/F2 with least util to reach D, drop for other path
Sample language to map paths to ranks based on their performance
Policies
Widest Shortest Path: least utilized path among shortest paths
Source-local preference: regular expression to specify paths
Compilation using Merlin (a language for provisioning network resources)
-
Automata are generated from policies. (b)&(c)
-
Intermedia representation (product graph) is generated with automata, checked with actual topology (d)
- forwarding tables are populated during runtime (e) when probes arrive carrying information on the pathway (metrics, next hop switch)
- Synthesize P4 programs to track policy states during runtime
Product Graph
Policy -> automata + topology = Product Graph
Each PG edge is a policy-compliant path for the topology
Forwarding
Uses forwarding table populated by the probes and automata for forwarding decisions. Example 1, node B to D, can go B-D or B-C-D, because metric for B-C-D is 0.2 < 0.3 of B-D, selects B-C-D. Example 2, for node A, metric of A-B-D is larger, but the automata says if ABD then 0
< 0.4 of A-C-D, selects A-B-D.
Probes
[D0 is a destination node, B0 and A1 are source nodes]
D0 initializes and sends probes with performance metrics, and the probes will be sent subsequently to the sources. Nodes along the way can know the path metrics based on information received and update forwarding table of its own and the probe to inform path condition change to its neighbors.
####Challenges:
- Transient loop in arbitrary topology: bc the algorithm is distance-vector + probes traversal takes time, asynchronous updates can cause loops (whereas fat-tree is tree-based, no loops by nature)
- See version number in probes to forced synchronization. e.g. probes 1 will not be used when probes 2 arrives regardless of their arrival time
- Probes cannot be sent too frequently as it will cause some nodes using newer probes, some use older probes. Limit probes sending rate to at least 0.5RTT to ensure old probes are propagated through whole pathway before sending new probes.
- Out-of-order delivery:
- Use flowlet switching to group packets in the same flow to avoid reordering at the receiver (work done by FLARE)
- Introduces looping because flowlet switching entries expire at different times, could form loops occasionally: count hops to monitor loops (distance-vector)
- Introduces policy violation because flowlet switching is constraint oblivious: add PG tags (policy) to flowlet entry so that flowlet becomes aware of the policy
- Constraint violation: because of loops + path metrics updates, could violate hard coded constraint (go through firewall…)
- Use Merlin to compute a product graph from automata (policies) and topology with probes and tags. In PG, tag represents the state of the automata (policy). At switch, tags will be checked to enforce global user policy.
Eval
Fat-tree topo:
Fig8: symmetric fat-tree
Fig9: asymmetric fat-tree with failed link
metrics: FCT, task completion time
Arbitrary topo:
Hula/Conga will not work, beats SPAIN
left = fast, up = more -> upleft means short latency
left = load balanced, up = more -> upleft means good balancing