Troubleshooting Localized DHCP Failures on Aruba CX Switches

A practical walkthrough of troubleshooting a localized DHCP failure on an Aruba 6200 CX switch caused by strict DHCP snooping and stale client bindings.

Have you ever stared at a network issue that just didn't make logical sense?

As network engineers, we're used to things being relatively deterministic. But recently, I ran into a localized DHCP anomaly on an Aruba 6200 based on CX-OS that perfectly illustrates what happens when strict Layer 2 security features collide with modern wireless client behavior.

Here’s a look at how I troubleshot the issue, found the smoking gun, and ultimately fixed it.

The Setup and the Symptom

My environment is a standard hierarchical design: a firewall acting as the DHCP server, a core switch, and multiple access switches. Connected to those access switches are wireless Access Points (APs) broadcasting various SSIDs mapped to specific VLANs.

Everything was humming along perfectly until one morning, out of nowhere, wireless clients connected to one specific access switch stopped receiving IP addresses.

To make things even weirder, the issue was incredibly specific:

  • It was localized: Clients on the exact same VLANs, but connected to other access switches, were pulling IPs without breaking a sweat.
  • It was partial: On the affected switch, clients on VLAN 20 were getting IPs just fine. But clients on VLANs 10, 11, and 12 were getting APIPA addresses (169.254.x.x) and dropping off the network.

Because the other access switches were working perfectly, I could immediately rule out a few major suspects. The firewall wasn't out of DHCP leases, and the core switch wasn't suffering from a broken IP Helper or routing misconfiguration. The DHCP Discover packets were clearly making it to the server from the rest of the building—they were just failing on this one specific access node.

The problem had to be on the uplink, the access switch itself, or the switch ports facing the APs.

The Troubleshooting Process

My first instinct was to check the Layer 2 path and look for any standard misconfigurations.

  • VLAN Database & Trunks: Had someone accidentally pruned a VLAN? Nope. I verified that VLANs 10-12 and 20 existed in the local VLAN database and were permitted across the trunk uplink to the core.
  • Spanning Tree (STP): I checked the AP-facing ports to ensure a topology change hadn't thrown those specific VLANs into a Blocking state. The ports were correctly configured as edge ports with root and BPDU guards in place. The STP state was happily Forwarding.
  • DHCP Snooping Trust: The most common culprit for localized DHCP failures is an untrusted uplink. If a switch reboots and loses its unsaved config, it’ll do exactly what it's supposed to do: block DHCP Offers coming from the uplink.

I jumped into the CLI to verify the trust state. Here is what I saw:

ACCESS-SW-01# show dhcp-snooping

 DHCPv4-Snooping Information
  DHCPv4-Snooping         : Yes         Verify MAC Address  : Yes
  Allow Overwrite Binding : No          Enabled VLANs       : 10-12,20...
  Trust VxLAN Tunnels     : Yes

 Port Information
                  Max      Static    Dynamic
  Port     Trust  Bindings Bindings  Bindings
  -------- -----  -------- --------  --------
  1/1/X    No     1024     0         18       <-- (AP Port)
  1/1/Y    Yes    0        0         0        <-- (Uplink to Core)

The uplink (1/1/Y) was explicitly set to Trust: Yes. So, a missing trust statement wasn't the issue either.

The Smoking Gun

I decided to stop guessing and look at the actual system events. I checked the logs around the exact time the clients were failing to connect.

There it was:

ACCESS-SW-01# show events
...
LOG_WARN|...|Drop offer from <DHCP_SERVER_IP> of already assigned address <CLIENT_IP_1> to <MAC_ADDRESS_A>.
LOG_WARN|...|Drop offer from <DHCP_SERVER_IP> of already assigned address <CLIENT_IP_2> to <MAC_ADDRESS_B>.
LOG_WARN|...|Drop offer from <DHCP_SERVER_IP> of already assigned address <CLIENT_IP_1> to <MAC_ADDRESS_A>.

The logs from the IP Source Address Validation daemon were explicitly telling me what was happening.

The DHCP server was doing its job perfectly. It was receiving the DHCP Discovers, allocating an IP, and sending valid DHCP Offers back down the wire. However, the access switch was actively intercepting and dropping those offers before they ever reached the AP.

The "Aha!" Moment

Why was the switch dropping valid offers? The log held the clue: already assigned address.

I looked back at my show dhcp-snooping output, and one specific configuration line suddenly jumped out at me:

Allow Overwrite Binding : No

In a wireless environment, clients roam constantly from AP to AP. Furthermore, modern iOS and Android devices aggressively use randomized, private MAC addresses.

Because of this behavior, the DHCP server was trying to offer an IP address to a client, but the access switch already had a stale entry in its local DHCP snooping database binding that specific IP to an old session or a previous MAC address.

Because I hadn't allowed the switch to overwrite bindings, the switch assumed the DHCP server was attempting to hand out an IP that was already in use. It viewed this as an IP conflict or a potential spoofing attempt, and aggressively dropped the valid offer to protect the network.

This also explained why VLAN 20 was still working. The IP pool for VLAN 20 simply hadn't encountered a stale binding conflict yet. The pools for VLANs 10-12, however, were heavily congested with stale snooping entries on this particular access switch.

The Fix

I needed to instruct the switch to trust the DHCP server. If a trusted server issues a new lease for an IP that the switch already has in its local database, the switch should just overwrite the old entry with the new MAC address.

1. The Immediate Fix

I needed a quick win to get clients online without waiting for the stale timers to age out naturally. I flushed the current DHCPv4 snooping binding table, which cleared the localized conflicts instantly:

ACCESS-SW-01# clear dhcpv4-snooping binding

Instantly, the wireless clients on VLANs 10-12 successfully pulled their IPs and connected.

2. The Permanent Fix

To make sure I didn't get a ticket about this again tomorrow, I updated the DHCP snooping configuration globally to allow overwriting bindings:

ACCESS-SW-01# configure
ACCESS-SW-01(config)# dhcpv4-snooping allow-overwrite-binding
ACCESS-SW-01(config)# write memory

The Takeaway

When deploying strict Layer 2 security features like DHCP Snooping and IP Source Guard in a wireless-heavy environment, you absolutely have to account for client mobility and MAC randomization. Stale bindings aren't just a possibility; they are an inevitability.

Enabling binding overwrites ensures your security features still protect against rogue DHCP servers, without accidentally causing localized Denial of Service conditions for valid roaming clients.