Paper Study - ExLL An Extremely Low-latency Congestion Control for Mobile Cellular Networks
10 Feb 2020** This is a study & review of the ExLL paper. I’m not an author.
ExLL: An Extremely Low-latency Congestion Control for Mobile Cellular Networks
Intro
-
Data latency: devastating effect on applications, such as self-driving cars, autonomous robots, and remote surgery (5G apps)
- TCP sessions often have long delay: bufferfloat: over-buffering problem at bottleneck link because of TCP’s loose congestion control
- More paper confirming bufferfloat is severe in cellular network
Cellular Network Characteristics for Protocol Design:
- Downlink Scheduling: the base station (BS) schedules downlink packets towards multiple user equipments (UEs) at 1 ms granularity (a.k.a. transmission time interval, TTI), based on both the signal strength reported by each UE and the current traffic load -> Infer cellular link bandwidth instantly with reception pattern rather than explicitly probing for bandwidth or measuring for a long period of time
- Uplink Scheduling: (the BS needs to grant uplink transmission eligibility for each UE, which happens at a regular interval known as SR (scheduling request) periodicity) Ignoring this periodicity -> underestimate minimum RTT -> CC to run in unrealistic operating conditions.
- Minimum RTT is not affected by channel condition between UR and BS (due to adaptive MCS (modulation and coding scheme) selection used in LTE systems, which approximately eliminates MAC-layer retransmissions, but also not affected by other UEs connected to the same BS since per-UE queue is isolated)
Existing Protocols:
Testing done with Android phones and LTE connections
- Low latency and throughput can’t be achieved together
- ExLL:
- Target: Low latency with high throughput
- Estimations done at UE -> ExLL a receiver-drive CC
- Estimates bandwidth of links by analyzing packet reception pattern in downlink
- Estimates min RTT with SR periodicity in uplink
- Uses control feedback from FAST for receive winding RWND
- Server set its congestion window, CWND, upon RWND from receiver -> immediately deployable
- ExLL can be sender-drive, small performance gap
Observation
-
Measurement Set-up
-
Max throughput and min RTT:
- Moving device (change in RSSI, signal strength): stable RTT, unstable throughput
-
Min RTT over different RSSI values:
- Min RTT not much affected by RSSI values
-
Per-UE queue:
- Per-UE queue is important because delay in one UE will not affect other devices in the same Base Station (BS).
ExLL’s Network Inference
-
Downlink: efficient probing is thought to be important
-
Challenge: injecting packets that causes network over-buffering to probe bandwidth is bad for latency
-
Solutions:
- Extracting physical layer parameters
- Only supported by specific vendors
- Needs software tools needing rooting a device
- Machine Learning
- Takes time to learn and respond -> latency in response to traffic load/channel condition change
- ExLL: estimates bandwidth rather than probing
- Extracting physical layer parameters
-
Scheduling Pattern
- Figure 4 (a): At initial stage, the number of received packets in a radio frame increases rapidly as CWND increases -> downlink behavior temporarily depends on CWND growth
- Figure 4 (b): the patterns of allocated subframes change, but the total number of packet receptions per radio frame becomes stable ( N(f) = 159, 162, 167 )
-
Downlink bandwidth estimation
- F(·): packet reception during one radio frame (received bytes in one radio frame / 10 ms)
- C(·): estimation of max bandwidth (before splitting for UEs)
- F(·) ≈ Measured throughput
- C(·)(a) ≈ 300 Mbps (30 MHz capacity), C(·)~(b)~ ≈ 400 Mbps (40 MHz capacity)
-
Poor channel condition is detected & C(·) is adjusted in (a) at t = 14s
-
-
Uplink
- Scheduling pattern
- Granularity of (a) is ~ 1ms, (b) is ~ 10ms -> SR (scheduling request) periodicity for uplink is 10ms.
- CDF (cumulative density function) of RTT, 37 - 47 ms, avg = 42 ms, max - min = 10ms = SR periodicity
- Challenge: use min RTT as measurement or target for controlling CWND -> conservative, loss throughput
- Solution: new estimation technique that incorporates SR periodicity.
ExLL Design
-
Control Algorithm
-
Based on FAST control equation:
\(w_{i+1} = (i - \gamma)w_i + \gamma(\frac{mRE_i}{R_i}w_i + \alpha)\)
- \[\gamma \in (0,1] , \alpha > 0, w_i \ , \ R_i \ , \ mER_i \ is \ CWND, \ RTT, minimum \ RTT \ estimate \ at \ i\]
-
CWND grows at constant factor: $\alpha$
-
$ \alpha $ determines the amount of queueing in the bottleneck of a flow, which accumulates for multiple flows, the agility of bandwidth adaptation, and the robustness in maintaining high throughput (many inherent trade-offs)
-
Example: small $ \alpha $
Pros Cons small queue, low latency slow adaption, RTT fluctuation -> -> over reduce CWND -> empty queue -> less throughput
-
ExLL solves the above trade-offs by
\[\alpha \to \alpha(1-\frac{T_i}{MTE_i})\] -
$ T_i $, $ MTE_i $ is measured throughput, maximum throughput estimate at time i
\(w_{i+1} = (i - \gamma)w_i + \gamma(\frac{mRE_i}{R_i}w_i + \alpha(1-\frac{T_i}{MTE_i}))\) Eq.3
-
Large $ \alpha $ : agile and robust bandwidth probing + no over buffering at bottleneck link, because growth factor $ \alpha(1-\frac{T_i}{MTE_i}) \to 0$ as $ T_i \to MTEi $
-
-
State Transition (workflow) receiver-driven
-
Observation mode
- When Cubic’s CWND <= bandwidth estimate
-
Control mode
- When Cubic’s CWND > bandwidth estimate
- report RWND, compute wi+1, send back to server
- When Cubic’s CWND > bandwidth estimate
-
Recovery logic (more on this later)
- Packet loss or CWND reduced <= RWND
- Store RWND value, stop updating RWND until CWND >= RWND again
- Cubic reset
- Restart ExLL from observation mode
- Packet loss or CWND reduced <= RWND
-
-
Sender-driven: plug-in for Cubic
- ExLL Control mode
- When \(\frac{cwnd^C}{mRE_S} > MTE_S\)
- Calculate $ cwnd^E_i $ from Eq.3
- Overrides Cubic when $ cwnd^E < cwnd^C $
- When \(\frac{cwnd^C}{mRE_S} > MTE_S\)
- ExLL Control mode
-
MTE calculation
- F(·) is updated per radio frame = 10ms, much fast than sender’s information delayed by RTT ~ 40+ ms.
- Sender-drive: $ MTE_S = \frac{cwnd}{\Delta t} $ , where $ \Delta t = t_{last Ack} - t_{first Ack} $, small difference in actual experiment measurements
-
mRE calculation
-
Observation: track minimum, average RTT per packet with mpRTT, apRTT
-
Control
- $ mRE = mpRTT + D(2*(apRTT - mpRTT)) $
Where D(·) finds matching SR periodicity in (5, 10, 20, 40, 80) from
$ \hat T^{SR} = 2*(apRTT - mpRTT) $
-
Experiment: a eNB that uses SR periodicity of 10ms ≈ $\hat T^{SR}$
- eNBs with 10, 20, 30 ms SR periodicity
-
-
Recovery from loss or timeout
- CWND reduced to < RWND for packet loss
- Store RWND value, stop updating RWND until CWND >= RWND again
- Cubic resets, CWND = default initial CWND, for timeout
- Restart ExLL from observation mode
Evaluation
- Receiver- vs. sender-side ExLL: marginal difference, receiver-side slightly better
- A real LTE network, mRTT = 50ms, max throughput = 150 Mbps.
- Showing a similar results, stable RTT, throughput, both close to LTE limits.
- CWND of sender-side ExLL less stable.
- In multiple LTE networks
- Sender-side ExLL slightly worse than receiver-side
- LTE, RTT = 50ms, Tput = 100 -> 50 Mbps
- Smooth transition from 100 Mbps to 50 Mbps
- In different mobility scenarios
- Static Channel:
- 90 Mbps, 50ms, ExLL vs. BRR: more accurate RTT, CWND, less fluctuation in throughput
- High throughput, close to Cubic + low latency, close to BDP, PropRate
- Mobile Channel
- Smaller RTT, smoother CWND in congested time, 20-40, 80-100
- Outperforms others by a large margin
- Multiple Flows
- Fairness (Tput) much better than BRR
- Non-cellular bottleneck adaptation
- 0-30s: non-cellular bottleneck: observation mode
- 30-60s: cellular bottleneck: control mode
- beyond 60s:non-cellular bottleneck: observation mode
- Runs as Cubic for fairness in first 120s, when non-cellular bottleneck disappears, fully utilize bandwidth
- Fair share of bandwidth with other ExLL flows
Applications Performance
- Showed significant amount of improvement on applications level speed
Related Works
- For Internet
- Vegas, FAST: Limited deployment, complex.
- BBR: high latency
- Copa: low throughput
- For Datacenter
- pHost: unrealistic assumptions: core is free, size of flows is known in advance
- ExpressPass: new switch function, overhead in packets
- For Cellular:
- DRWA, CQIC, Verus, CLAW, PropRate: worse performances
Conclusion
- Develop novel techniques that can estimate the cellular link bandwidth and realistic minimum RTT without explicit probing, which can be easily extended to next-generation cellular technologies such as 5G.
- Incorporate the control logic of FAST into ExLL to minimize the latency even in dynamic cellular channel conditions.
- Implement ExLL in both receiver- and sender-side versions that give wider deployment opportunities. The receiver-side ExLL can provide an immediate solution for untouched commodity servers while the sender-side ExLL can provide a fundamental solution for 5G URLLC (ultra reliability and low latency communication).