If you have ever watched a multi-tenant overlay suddenly mix traffic from two completely unrelated clients into the same VXLAN tunnel, you know the sinking feeling that follows. The root cause is almost never a bug in the underlay. It is a routing policy that was designed without tenant awareness—policies that treat all flows as equal, ignoring the tenant context embedded in packet headers or metadata. This article is for platform engineers and network architects who have already deployed basic multi-tenant overlays and now need to evolve them into systems that preserve flow integrity across tenant boundaries. We will walk through the design of tenant-aware routing policies, covering core mechanisms, a repeatable workflow, tooling considerations, variations for different deployment models, and the most common failure modes.
The Problem: What Breaks When Routing Ignores Tenants
Without explicit tenant awareness in routing policies, overlay networks exhibit three classes of failure that erode flow integrity. First, cross-tenant flow interleaving occurs when traffic from two tenants shares the same overlay path and, due to ECMP or hashing based on five-tuple only, packets from distinct tenants get interleaved in a way that breaks session state for stateful middleboxes. Second, policy conflicts at gateway boundaries emerge when a single gateway processes traffic for multiple tenants but applies a uniform routing rule—for example, sending all traffic through a single inspection appliance, which then cannot distinguish tenant A from tenant B. Third, flow asymmetry appears when forward and return paths pass through different tenant contexts, causing stateful firewalls to drop legitimate traffic.
In a typical real-world scenario, a platform team deploys an overlay using VXLAN or Geneve and assigns each tenant a unique VNI (Virtual Network Identifier). The underlay routing, however, remains tenant-agnostic: BGP or OSPF sees only the outer IP headers. When two tenants generate flows with the same five-tuple (e.g., both are HTTP clients talking to the same server IP), the underlay ECMP hash may place them on different paths, but the overlay's forwarding state—like MAC tables or ARP caches—might conflate them. The result: intermittent timeouts, misrouted packets, and a debugging nightmare.
Why Traditional Policy Frameworks Fall Short
Most routing policy languages—whether based on route maps, prefix lists, or BGP communities—operate on packet headers that do not include tenant identifiers. Even when the overlay encapsulates the tenant ID in a VNI or a Geneve option, the routing decision layer (e.g., the underlay router) typically ignores these fields. Overlay-agnostic routing policies cannot enforce tenant isolation at the forwarding plane. This forces teams to resort to brute-force solutions like dedicated underlay VLANs per tenant, which defeat the purpose of an overlay.
Prerequisites: What You Need Before Designing Tenant-Aware Policies
Before you can write a single routing policy that respects tenant boundaries, you must establish three foundations: a consistent tenant identification scheme, an overlay encapsulation that preserves that identity across hops, and a control plane that can propagate tenant context to routing decision points.
Tenant Identification Scheme
Every packet must carry a stable, unambiguous tenant identifier from the moment it enters the overlay until it exits. Common schemes include VNI in VXLAN, VSID in NVGRE, or custom Geneve TLV options. The identifier must be consistent across all endpoints—if tenant A is VNI 100 on one host, it must be VNI 100 everywhere. This sounds trivial, but in practice, many teams assign VNIs per subnet or per virtual network, not per tenant, leading to overlaps when a tenant spans multiple subnets.
Overlay Encapsulation That Preserves Identity
Choose an encapsulation that supports the identifier at a layer visible to routing policies. VXLAN uses a 24-bit VNI in the outer header, which underlay routers can inspect if they are overlay-aware (e.g., using VXLAN termination on switches). Geneve offers variable-length options, allowing you to embed rich tenant metadata. However, not all network devices can parse Geneve options; if your underlay routers are not overlay-aware, the tenant ID remains invisible to them, and you must rely on host-based routing policies instead.
Control Plane Integration
Tenant-aware routing requires a control plane that communicates tenant context to the routing daemon. EVPN with MPLS or VXLAN is the most common choice: it carries the VNI in route advertisements, allowing BGP to associate each route with a specific tenant. Alternatively, a centralized SDN controller can program tenant-specific forwarding rules on each host or switch. Without such integration, you end up with static, per-tenant routing tables that are brittle and hard to maintain.
Core Workflow: Designing Tenant-Aware Routing Policies Step by Step
The following workflow assumes you have the prerequisites in place. It produces a set of routing policies that preserve flow integrity by ensuring all packets from a given tenant follow consistent, isolated paths.
Step 1: Map Tenant Identifiers to Routing Domains
Assign each tenant a unique routing domain—a virtual routing and forwarding (VRF) instance on each overlay endpoint. The VRF isolates the tenant's routing table, so routes from tenant A never leak into tenant B's table. On EVPN/VXLAN fabrics, this maps directly: each VNI belongs to a VRF. Ensure that the VRF configuration includes the tenant's VNI as the import and export RT (Route Target).
Step 2: Define Tenant-Aware Forwarding Policies
Write forwarding policies that use the tenant identifier as a matching criterion. For example, on an EVPN gateway, you might configure a policy that matches on VNI 100 and applies a specific next-hop list (e.g., tenant A's firewall cluster). On a host-based overlay like Calico or Cilium, this translates to network policies that include tenant labels. The key is that the policy must be evaluated after the tenant ID is resolved, not before.
Step 3: Enforce Symmetric Return Paths
Flow integrity demands that return packets follow the same path as forward packets, at least through stateful devices. To achieve this, configure the overlay endpoints to use the same tenant identifier for return traffic. In VXLAN, this means the destination VNI in the return packet must match the source VNI from the forward direction. Some implementations (e.g., with symmetric IRB) handle this automatically; others require explicit policy to associate VNIs on both sides.
Step 4: Test with Multi-Tenant Traffic Generators
Simulate concurrent flows from multiple tenants, using the same five-tuple where possible, and verify that each flow remains within its designated VRF and path. Check that stateful firewalls see consistent tenant context—for example, by inspecting the VNI in the outer header on both directions. Automated testing is critical here; manual testing rarely catches interleaving issues.
Tools, Setup, and Environment Realities
Tenant-aware routing policies depend heavily on the overlay technology and the control plane you choose. We evaluate three common approaches: EVPN/VXLAN fabrics, host-based overlays (e.g., Cilium, Calico), and SDN controllers with OpenFlow.
EVPN/VXLAN Fabrics
This is the most mature option for data center overlays. Modern switches from Cisco (NX-OS), Arista (EOS), and Juniper (Junos) support VXLAN termination and can match on VNI in route policies. For example, an Arista EOS switch can use a route-map to set the VNI as a match criterion. The limitation is that underlay routers without overlay awareness cannot use these policies; you need a fabric where every L3 hop is overlay-capable. Many teams deploy a spine-leaf fabric with VXLAN-capable leaf switches and overlay-aware spines, or use an IP fabric with EVPN and rely on the leaf switches as the routing policy enforcement points.
Host-Based Overlays
In Kubernetes or VM environments, overlay proxies like Cilium (with eBPF) or Calico (with iptables or eBPF) can enforce tenant-aware policies at the host level. These tools use labels or annotations to assign tenant identity, then translate them into forwarding rules. The advantage is that the host can inspect the full packet, including encapsulation options, and apply fine-grained policies per flow. The downside is that policies are only enforced at the host—traffic that leaves the host (e.g., to a physical gateway) loses this context unless the gateway is also aware.
SDN Controllers
Controllers like VMware NSX or OpenDaylight can program tenant-aware forwarding on both virtual and physical switches. They maintain a global view of tenant identifiers and can push flow entries that match on VNI or Geneve options. This approach offers the richest policy capabilities but introduces a control-plane dependency that can become a bottleneck or a single point of failure. Many teams use a hybrid model: SDN for policy orchestration, but rely on EVPN for underlay routing.
Variations for Different Constraints
Not every environment can support full EVPN/VXLAN fabrics or host-based overlays. Here are three common variations and how to adapt the workflow.
Hybrid Cloud with Physical and Virtual Gateways
When a tenant spans on-premises and cloud, the overlay must extend across both domains. In this scenario, use a cloud-native overlay like AWS Transit Gateway or Azure Virtual WAN, which supports VNI-like identifiers (e.g., AWS VPC IDs). The challenge is that the cloud provider's routing policies may not accept custom identifiers. The workaround is to map each tenant to a separate cloud VPC or virtual network, and then use inter-region peering with tenant-specific route tables. This approach preserves flow integrity but at the cost of increased VPC management overhead.
Edge Deployments with Limited Compute
Edge nodes often lack the CPU to run full eBPF or VXLAN termination. In such cases, simplify the overlay to use MPLS with a single label per tenant, or use GRE tunnels with a tenant-specific key. The routing policy then matches on the MPLS label or GRE key. The trade-off is that MPLS and GRE offer less encapsulation flexibility than VXLAN, and the number of tenants may be limited by label space (20 bits for MPLS, 32 bits for GRE key).
Legacy Underlay with No Overlay Awareness
If your underlay routers cannot parse VNI or Geneve options, you must enforce tenant-aware policies entirely at the overlay endpoints (hosts or virtual switches). This means the host must perform all routing decisions, including path selection and load balancing, based on tenant identity. Use a host-based overlay like Calico with BGP peering to the underlay, but ensure that the host's forwarding table includes tenant-specific next hops. The limitation is that traffic that leaves the host to a non-aware router may still experience flow interleaving; to mitigate this, avoid ECMP on the underlay for inter-tenant traffic, or use per-tenant tunnels that are pinned to specific underlay paths.
Pitfalls, Debugging, and What to Check When It Fails
Even with careful design, tenant-aware routing policies fail in predictable ways. Here are the most common issues and how to diagnose them.
Pitfall 1: VNI Mismatch at Gateway Boundaries
When traffic crosses from an overlay to a non-overlay network (e.g., to the internet), the gateway must strip the encapsulation and then apply tenant-aware routing. If the gateway does not preserve the tenant mapping, return traffic may be injected into the wrong VRF. Check: Verify that the gateway's routing table uses the correct VNI-to-VRF mapping for return traffic. Use show vrf and show ip route vrf on the gateway to confirm.
Pitfall 2: ECMP Hashing on Outer Headers
Underlay ECMP often hashes on the outer IP header only, ignoring VNI. This can cause flows from the same tenant to take different underlay paths, breaking flow integrity if a stateful device is in the path. Check: Configure the underlay to include the VNI in the hash computation, or disable ECMP for overlay traffic and use a single path per tenant. Many switch vendors support
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!