Zibo's Website A digital collection of my work and life

Paper Study - SMORE Semi Oblivious Traffic Engineering

** This is a study & review of the Opera paper. I’m not an author.

SMORE: Semi-Oblivious Traffic Engineering Schema

Intro: Problems

  • current TE focuses on performance or robustness, not both
  • goal: good performance and failure tolerance.
  • oblivious routing algorithm
    • low-stretch, diverse, naturally load-balancing
  • dynamically adjust sending rates
  • combination is powerful and novel

FE System Study

  • FE components

    • which path to take: path selection, slow -> infrequent
    • how much traffic: rate adaptation, fast -> continuous
    • challenge: to select small set of paths, to flexibly handle traffic scenarios.

    Screen Shot 2020-03-09 at 14.10.20

  • Path properties metrics

    • low-stretch: low latency
    • high diversity: good robustness
    • good balancing: better performance
    • good design needs to be capacity aware (not to over-utilize low-capacity links) & globally optimized (considers all pairs simultaneously)

    Screen Shot 2020-03-09 at 14.08.36

SMORE Design

  • Oblivious routing by Racke’s paper : a system of optional paths is chosen in advance for every source-destination pair, and every packet for that pair must travel along one of these optional paths, without consideration for demand.

    • Why oblivious? To anticipate demands is challenging, and time consuming. Actual condition may differ a lot from anticipation. Avoids overfitting -> naturally robust.
    • Routing tree / decomposition tree: In each iteration, algorithm generates a new routing tree with approximation algorithm –> low-stretch; updates weights of each link based on its capacity and cumulative utilization in the previous tree (penalize further use of already heavily utilized links) –> load-balancing; randomized intermediate destinations –> diversity.
  • Rate Adaption: computes updated value of path weights by minimizing MLU (max link util) using a Linear Program, LP.

    • paths are fixed –> LP computation is fast

    Screen Shot 2020-03-09 at 15.28.11

Evaluation

Realistic simulation on Facebook’s backbone network carrying production traffic

Screen Shot 2020-03-09 at 15.32.31

  • lighter congestions
  • no over-subscription for SMORE, whereas ECMP can over-subscribe 100X.
  • latency is competitive with others

Screen Shot 2020-03-09 at 15.34.07

  • SMORE overall
    • low congestion loss (red → 0)
    • high throughput (black → 1)
  • (a) SMORE’s MLU performance (green) is greatly better other’s, and competitive to Optimal (using MCF, multi-commodity flow, to minimize MLU) which has the following cons:
    • Solving MCF is slow thus limit responsiveness to changing demands. On average 25s for an optimized LP solver to solve MCF instances for a WAN.
    • Small changes in TMs (traffic matrix) can lead to a significantly different solution to MCF, which leads to excessive routing churn.
    • Ensuring continuous consistent updates to forwarding state at routers adds significant management complexity.
  • (b) SMORE is also robust (tolerant to failures): failure losses (blue on the bottom next to zero) is minimal and stable
[ paper academic ]