Paper Study - SMORE Semi Oblivious Traffic Engineering
10 Mar 2020** This is a study & review of the Opera paper. I’m not an author.
SMORE: Semi-Oblivious Traffic Engineering Schema
Intro: Problems
- current TE focuses on performance or robustness, not both
- goal: good performance and failure tolerance.
- oblivious routing algorithm
- low-stretch, diverse, naturally load-balancing
- dynamically adjust sending rates
- combination is powerful and novel
FE System Study
-
FE components
- which path to take: path selection, slow -> infrequent
- how much traffic: rate adaptation, fast -> continuous
- challenge: to select small set of paths, to flexibly handle traffic scenarios.
-
Path properties metrics
- low-stretch: low latency
- high diversity: good robustness
- good balancing: better performance
- good design needs to be capacity aware (not to over-utilize low-capacity links) & globally optimized (considers all pairs simultaneously)
SMORE Design
-
Oblivious routing by Racke’s paper : a system of optional paths is chosen in advance for every source-destination pair, and every packet for that pair must travel along one of these optional paths, without consideration for demand.
- Why oblivious? To anticipate demands is challenging, and time consuming. Actual condition may differ a lot from anticipation. Avoids overfitting -> naturally robust.
- Routing tree / decomposition tree: In each iteration, algorithm generates a new routing tree with approximation algorithm –> low-stretch; updates weights of each link based on its capacity and cumulative utilization in the previous tree (penalize further use of already heavily utilized links) –> load-balancing; randomized intermediate destinations –> diversity.
-
Rate Adaption: computes updated value of path weights by minimizing MLU (max link util) using a Linear Program, LP.
- paths are fixed –> LP computation is fast
Evaluation
Realistic simulation on Facebook’s backbone network carrying production traffic
- lighter congestions
- no over-subscription for SMORE, whereas ECMP can over-subscribe 100X.
- latency is competitive with others
- SMORE overall
- low congestion loss (red → 0)
- high throughput (black → 1)
- (a) SMORE’s MLU performance (green) is greatly better other’s, and competitive to Optimal (using MCF, multi-commodity flow, to minimize MLU) which has the following cons:
- Solving MCF is slow thus limit responsiveness to changing demands. On average 25s for an optimized LP solver to solve MCF instances for a WAN.
- Small changes in TMs (traffic matrix) can lead to a significantly different solution to MCF, which leads to excessive routing churn.
- Ensuring continuous consistent updates to forwarding state at routers adds significant management complexity.
- (b) SMORE is also robust (tolerant to failures): failure losses (blue on the bottom next to zero) is minimal and stable