Chameleon
Taming the transient while reconfiguring BGP
Abstract
BGP reconfigurations are a daily occurrence for most network operators, especially in large networks. Yet, performing safe and robust BGP reconfiguration changes is still an open problem. Few BGP reconfiguration techniques exist, and they are either (i) unsafe, because they ignore transient states, which can easily lead to invariant violations; or (ii) impractical, as they duplicate the entire routing and forwarding states, and require special hardware.
In this paper, we introduce Chameleon, the first BGP reconfiguration framework capable of maintaining correctness throughout a reconfiguration campaign while relying on standard BGP functionalities and minimizing state duplication. Akin to concurrency coordination in distributed systems, Chameleon models the reconfiguration process with happens-before relations. This modeling allows us to capture the safety properties of transient BGP states. We then use this knowledge to precisely control the BGP route propagation and convergence, so that input invariants are provably preserved at any time during the reconfiguration.
We fully implement Chameleon and evaluate it in both testbeds and simulations, on real-world topologies and large-scale reconfiguration scenarios. In most experiments, our system computes reconfiguration plans within a minute, and performs them from start to finish in a few minutes, with minimal overhead.
Overview
Chameleon generates a reconfiguration plan in three consicutive steps.
- The analyzer describes the space of concurrent convergence processes by analyzing the initial and final configuration (the input to Chameleon) and computing happens-before relations between routing states of different routers.
- The scheduler explores the space of convergence processes spanned by the happens-before relations to find one that satisfies the specification. It describes this convergence process as a node schedule that captures which routes are selected at which time.
- The compiler transforms this node schedule into a reconfiguration plan, that is, a sequence of temporary configuration commands and local conditions for synchronization.
Chameleon’s workflow is visualized in the figure above based on an example. You can interactively play with this example below using our web-based network simulator!
Example
              The following demonstrates a reconfiguration plan computed by Chameleon.
              Originally, all routers forward to the network e1 that advertises a prefix towards 100.0.0.0/24.
              The network e6 also advertises the same prefix, but with a lower local preference.
              The reconfiguration entails lowering the local preference for the route towards e1 from 200 to 50, causing all routers to forward traffic to the right instead of left.
            
              The network is configured to have two route-reflectors: n2 on the top-left and n5 on the bottom-right.
              All other routers peer with both n2 and n5.
              Select the " BGP Config" layer to see the current configuration.
            
You are using a small screen. Please either zoom out or open this webpage on a computer with a larger screen. Otherwise, directly visit bgpsim.github.io. We may implement a responsive design for the simulator in the future.
Explanation of the Reconfiguration Plan
The reconfiguration plan (and its current progress) can be shown by clicking on "Migration". The right-hand side then shows the three main phases of the plan: the setup, update, and the cleanup phase.
              In the setup phase, Chameleon establishes two temporary BGP sessions (one between n1 and n4, and one between n3 and n6) and establishes the initial preferences for all routers.
              Similarly, the  cleanup phase removes all preferences and the temporary BGP sessions (which are no longer used).
              In both the setup and cleanup phase, the forwarding state of the network remains unchanged.
            
The update phase actually performs the reconfiguration. It is separated into multiple steps (five in this example). Each step consists of multiple individual commands that will migrate a single router. The commands have pre- and post-conditions to ensure synchronization.
BibTex
@inproceedings{schneider2023taming,
    title = {{Taming the transient while reconfiguring BGP}},
    author = {Schneider, Tibor and Schmid, Roland and Vissicchio, Stefano and Vanbever, Laurent},
    booktitle = {Proceedings of the ACM SIGCOMM 2023 Conference},
    series = {ACM SIGCOMM '23}
    year = {2023},
    pages = {77–93},
    numpages = {17},
    address = {New York, NY, USA},
    location = {New York, NY, USA},
    url = {https://doi.org/10.1145/3603269.3604855},
    doi = {10.1145/3603269.3604855},
    isbn = {9798400702365},
    publisher = {Association for Computing Machinery},
    keywords = {reconfiguration, network update, border gateway protocol (BGP), convergence, scheduling},
}