SWIFT: Predictive Fast Reroute

SWIFT is the first BGP fast reroute framework that works upon remote transit outages. Remote outages are challenging because they can happen in any network, at any time, and can affect any set of destinations. Furthermore, the Internet converges slowly upon remote outages, resulting in downtime for users.

SWIFT has two key ingredients. First, it quickly localizes the outage and predicts the affected traffic. Second, it reroutes the affected traffic to unaffected backup paths with just few data-plane rule updates using precomputed data-plane tags and a 2-stage forwarding table.

Before the failure
Before the failure

BGP converges slowly

Upon a remote outage, BGP messages are propagated router by router through the network to announce or withdraw routes for prefixes. This propagation is slow as each router involved takes time to process these messages. For example, in the scenario depicted above, upon the failure of (5,6), AS 1 may wait tens of seconds before it learns the new routes to reach the 10k prefixes advertised by AS 6. We computed the duration of the bursts of withdrawals observed on 213 BGP sessions during a month, and found that it often takes several seconds to receive a burst. In addition, updating the forwarding entries for all the affected prefixes is also time consuming. In the end, the convergence may last tens of seconds, sometimes minutes.

Before the failure

Remote outages often happen

Upon the failure of the link (5,6), AS 5 starts to propagate BGP messages to its neighbors. If AS 5 and AS 2 do not have backup routes for prefixes advertised by AS 6, they drop the traffic for these prefixes and send BGP withdrawals to their neighbors. During the convergence, AS 1 continues to send the traffic on the path (2,5,6), resulting in downtime for AS 1. We found that BGP peers often receive large bursts of withdrawals (see figure on the right). We confirmed through private discussions with operators that bursts of withdrawals do indeed cause downtime.

Before the failure

Operators care about slow BGP convergence! Take a look at our 72 operators survey.

Fast and accurate outage localization

Immediately after receiving the first BGP messages of a burst, a SWIFTED router runs an inference algorithm to localize the outage and predict which prefixes will be a affected—a sort of time-bound Root Cause Analysis (RCA). Based on this inference, the SWIFTED router reroutes the potentially affected prefixes on paths unaffected by the inferred failure. The figure below shows that the SWIFT inference algorithm is accurate for more than 85% of the bursts. Accurate inferences enable SWIFT to fast reroute, and also ensure safety since SWIFT can use backup routes not affected by the failure. For example, upon the failure of the link between AS 5 and 6, SWIFT uses AS 3 as next-hop because the path AS 3 advertises do not use the link (5,6).

Fast and flexible data-plane rerouting

SWIFT includes a data-plane encoding scheme that enables the router to flexibly match and reroute the traffic for all the prefixes affected by a remote outage with few data-plane rule updates. SWIFT relies on a two stage forwarding table to speed up data-plane updates. The first stage contains rules to tag traversing packets based on the AS paths along which they are forwarded and also the next-hop to use should any of them fail. The second stage contains rules for forwarding the packets according to these tags. By matching on portions of the tags, a SWIFTED router can quickly select packets passing through any given AS link(s), and reroute them to a pre-computed next-hop. The figure below shows the convergence time of a recent Cisco router upon a remote failure affecting 290k prefixes. The router takes 110 seconds to converge, while its SWIFTED version converges within 2 seconds.

Before the failure

Before the failure

Is SWIFT useful if PIC or MPLS Fast Reroute are used already? Yes

Prefix Independent Convergence (PIC) and MPLS fast reroute only work upon local failures (e.g., on your peering links), not the remote ones (e.g., in your neighbor's network). SWIFT works both upon local and remote outages.

Is SWIFT deployable on an arbitrary set of nodes? Yes, under reasonable assumptions

We proved that SWIFT rerouting (i) causes no forwarding loop and (ii) can only be beneficial, irrespective of the set of SWIFTED routers. As a result, you can safely deploy SWIFT in one or all your routers, independantly of whether other ASes also deploy SWIFT or not.

Does SWIFT require to cooperate with other networks to be deployable? No

SWIFT does not require any modification of BGP. In addition, data-plane tags are not propagated to other routers. They are added by the SWIFTED router (in the first stage of the forwarding table) and removed by the SWIFTED router as well at the egress. For example, if the destination MAC header field is used for the data-plane tags, the SWIFTED router modifies the destination MAC field in the first stage of the forwarding table, and simply removes the tags in the second stage of the forwarding table by setting the destination MAC address of the actual next-hop . Deploying SWIFT is therefore invisible for the other networks and thus no cooperation is required.

Does SWIFT update the backup next-hops when it receives a BGP advertisement or withdrawal? Yes

Whenever the SWIFTED router receives an advertisement or a withdrawal for a prefix, it recomputes the backup next-hops to use for that prefix in case of a failure, and update the data-plane tags accordingly. Only one data-plane rule update is required in the first stage of the forwarding table to update the data-plane tags for a prefix.

Does SWIFT comply with routing policies? Yes

When SWIFT computes the backup next-hops, it can take into account policies defined by the operators. For example, operators can prevent SWIFT from using an expensive link with a provider rather than a more convenient one with a cutsomer.

Does SWIFT lose traffic if it reroutes unaffected prefixes (false positives)? No, under reasonable assumptions

When SWIFT reroutes, it always uses valid backup paths. Thus, rerouting the traffic for a unaffected prefix will not result in downtime for that prefix. However, other metrics such as the delay or the jitter can change during the rerouting process for that prefix.

SWIFT: Predictive Fast Reroute, SIGCOMM'17. Los Angeles, USA. July, 2017.
(paper, technical report, bibtex)

by Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, Laurent Vanbever

Boosting the BGP convergence in SDXes with SWIFT, SIGCOMM'17. Los Angeles, USA. July, 2017. Demo (paper, poster, bibtex)

by Philipp Mao, Rudiger Birkner, Thomas Holterbach, Laurent Vanbever

SIGCOMM 2017 (August 2017)

by Thomas Holterbach

We release a Python-based implementation of SWIFT and its algorithms.

You can use our implementation to quickly evaluate the benefit SWIFT would provide on your BGP data.
The source code is publicly available on GitHub.

We provide a VM to show SWIFT in action.

Just run one script in the VM and a virtual network with Quagga routers is built.
Simulate a remote failure and see the benefit of SWIFT!
You can download the VM here. The instructions on how to use it are available in the GitHub repository.

Blink is the first data-driven fast reroute framework that works upon remote transit outages. Remote outages are challenging because they can happen in any network, at any time, and can affect any set of destinations. As the Internet converges slowly, remote outages often result in downtime for users.

Blink leverages the emerging programmable switches to detect remote failures directly in the data plane by looking at TCP-induced signals. Upon detection of a failure, Blink quickly restores connectivity by rerouting traffic to working backup paths, at line rate.

It can take O(minutes) to receive the first sign of a failure from the control plane

We illustrate the slow control-plane convergence through a case study, by measuring the time the first BGP updates took to propagate after the Time Warner Cable (TWC) networks were affected by an outage on August 27 2014. The Figure on the right depicts the CDFs of the time difference between the outage and the timestamp of the first BGP withdrawal each BGP peer from RIPE RIS and RouteViews received from each TWC AS after the outage. More than half of the BGP peers took more than a minute to receive the first update (continuous lines). In addition, the CDFs of the time difference between the outage and the last prefix withdrawal for each AS, show that BGP convergence can be as slow as several minutes (dashed lines).


Convergence time upon the TWC outage

TCP flows exhibit a quick and pronounced behavior upon a remote outage

Upon an outage, the retransmission timer of a TCP source will eventually expire, triggering it to retransmit the first unacknowledged packet, again and again, following an exponential backoff. As this behavior is shared between all the TCP flows, when multiple flows experience the same failure, the signal obtained by counting the overall retransmissions consists of “retransmission waves”. The Figure on right shows the retransmission count for a trace that we generated with the ns-3 simulator after simulating a link failure. We can clearly see that first retransmissions quickly arrive after the failure, and most of them within a short period of time. Blink leverages this signal to quickly infer failures in the data plane.


TCP behavior upon an outage

Accurate failure inference

Blink looks at consecutive TCP retransmissions with the same next expected sequence number and on a per-prefix bases to infer failures. To fit in a programmable switch with limited resources, Blink monitors a sample of active flows for each monitored prefix. For a better accuracy, Blink uses a sliding window to count the number of flows experiencing retransmissions over time. Whenever the majority of the flows destined to a given prefix experience retransmissions within a time window, Blink infers a failure for that prefix. The Figure below shows the True Positive Rate of Blink as a function of the traffic pattern extracted from 15 real traffic traces. Blink can achieve more than 80% of True Positive Rate for most of the traffic patterns (green bars), and is close to an "upper bound" strategy of Blink (gray bars) which monitors all the flows instead of just a sample.

Quick connectivity recovery

Whenever Blink detects a failure, it immediately activates, at line rate, backup paths that are pre-populated by the control-plane. As the rerouting is performed entirely in the data plane, Blink cannot avoid forwarding issues such as blackholes or loops. The good news is that upon rerouting, Blink sends few of the monitored flows to each backup next-hop during a short period of time and use them to assess (similarly than to detect failures) which one is working. After this period, Blink reroutes all the traffic to the best and working backup next-hop. To illustrate how fast is Blink at rerouting traffic, we ran it on a Barefoot Tofino switch, and measured how fast it can restore connectivity upon a failure. The figure below shows that the Tofino switch was able to restore connectivity via a backup link within 460ms with sub-1ms RTT and 1.1s with RTTs ranging from 10ms to 300ms.



Before the failure

Before the failure

Blink: Fast Connectivity Recovery Entirely in the Data Plane, NSDI'19. Boston, USA. February, 2019.
(paper, bibtex)

by Thomas Holterbach, Edgar Costa Molero, Maria Apostolaki, Stefano Vissicchio, Alberto Dainotti, Laurent Vanbever

We release a Python-based implemetation of Blink.

You can use our implemetation to try Blink on traffic traces and quickly see how it would perform.
The source code is publicly available on GitHub.

We provide all the materials required to run Blink in a virtual environment.

This includes the implementation of Blink in P4_16, a set of scripts and tools to easily build a VM using vagrant and run a mininet-based network with p4 switches, as well as the instructions to run Blink.
All of this is also publicly available on GitHub.