Tuesday, August 20, 2024

Advanced tips for a good Arista network debugging

 

Jim Carrey Typing GIFs | Tenor 

Hello !

Long time since I posted... In my professional journey I discovered Arista networking... I had to deploy and engineer over 400 switches (new datacenter) so I did a LOT of troubleshooting and I wanted to share some advanced tips.

Arista EOS is close to Cisco IOS but not identical. Also, it is 100% based on linux core (CentOS 7 at the moment of this blog) so we can use the power of linux on top of the particularities of EOS.

1) Check if your route-map works

These commands help to understand if your route is received/sent by the current configured route-map, and which statement matches :

Commands

show bgp debug policy inbound neighbor { <neighbor address> | all } ipv4 unicast [ vrf <vrf name> ] [ route-map <route-map> ] <prefix> 

show bgp debug policy outbound neighbor <neighbor address> ipv4 unicast [ vrf <vrf name> ] [ route-map <route-map> ] <prefix>

Example output

% show bgp debug policy inbound neighbor 10.1.2.1 ipv4 unicast vrf red route-map foo 10.100.20.0/24 

NLRI 10.100.20.0/24, received from 10.1.2.1

route-map foo
 seq 10 permit
     match as 1 (failed)
     Seq result: fall through to next sequence
  seq 20 permit
     match as 1 (matched)
     sub-route-map sub_foo (permit)
        seq 10 permit
           match as 1 (matched)
           Seq result: permit
        Route map result: permit, matching sequence 10
     Seq result: permit
  Route map result: permit, matching sequence 20

 Associated link (you need an account) : https://www.arista.com/en/support/toi/eos-4-22-1f/14267-route-map-debugging-cli

2) Check if your configuration is coherent

These commands will help you check and verify if your configuration is coherent.

 2.1 Consistency

 Starting with EOS 4.30.2 it is possible to use the consistency check globally to verify if what you configured is used and well linked with other parts (example prefix-list and route-maps).

commands

Commands

show configuration consistency policy

Example output

% SWLAB#show configuration consistency policy

Undefined references

Feature Result Detail
---------------------- ------------ -----------------------------------------
IPv6 access list warn lab-control-plane-acl is undefined
Route map warn ROUTE_LEAKING_PFX is undefined

2.2 config-sanity

The config-sanity has always been part of EOS and is very useful, it checks for each part if you have everything configured by categories.

Commands

show vxlan config-sanity
show route-map config-sanity
show mlag config-sanity

Example output

% SWLAB#show vxlan config-sanity

category result detail
---------------------------------- -------- -----------------------------------------
Local VTEP Configuration Check WARN
VLAN-VNI Map WARN VLAN 20 does not exist

3) Check the control plane

Thanks to the linux base, it is possible to do live packet capturing on the switch.

Warning - depending on your EOS version and hardware model, there are sometimes bugs linked to packet capturing, such as freezing completely the switch (need hard electrical reboot). See bug 836750

First, you go to the VRF environment you want to analyze, then you go to linux bash and you capture the interface.

Commands

cli vrf MYVRF
bash
sudo su -
ip a (check the interfaces within the VRF environment) 
tcpdump -nni vlan971

Linux is a full operating system so you might like to use different commands as you like, such as top, netstat...

4) Capture packets

Capturing packets is never without risks, for example if you have too much throughput it could lead to a bug/overload on the switch so beware !

 A good solution is to use the Recirculation Channel feature to help with the CPU load. 

 Let's says I want to capture packets (data) on interface Ethernet3 (Tx and Rx), and that Ethernet34 is not used.

Commands

conf t
!
interface recirc-channel 1
switchport recirculation features cpu-mirror
!
interface et34
description Recirc-channel1
traffic-loopback source system device mac
channel-group recirculation 1
no shut
!
monitor session MonitorAvecRecirc source Ethernet3 both
monitor session MonitorAvecRecirc destination cpu
end
!
! Launch tcpdump 
tcpdump monitor MonitorAvecRecirc > tcpdump_data.pcap

 tcpdump via CLI is limited, and filter do not work but it is very helpful to troubleshoot some cases

 ---

That's it for my post on Arista. See you soon !


No comments:

Post a Comment