pwru is one of the best tools for troubleshooting Linux network issues. The full name of pwru is Packet Where Are You?,
How does it work?
eBPF allows us to attach hooks to kernel functions. When a kernel function is executed, eBPF can define additional actions, such as recording the function and its parameters and then printing them out.
When Linux starts, it generates /proc/kallsyms, a file that pwru reads to locate all functions related to skb (the data structure for network packets in the kernel). Then, pwru hooks into these functions. This allows eBPF to track the exact path a packet takes through the kernel stack.
By leveraging this method, pwru can diagnose almost any network connectivity issue on Linux. By tracing the function path, it quickly determines which functions have processed the packet and which ones have not. Cross-referencing the functions with the source code helps identify the root cause of the issue.
Installation and Usage
Installing pwru is simple:
apt install pwru
To start tracing packets, use the following command:
pwru icmp and dest host 1.1.1.1
The filtering syntax is the same as tcpdump.
Case Study: Troubleshooting a Linux Network Connectivity Issue with pwru
Today, we encountered another case of network connectivity failure on Linux. The network structure was quite simple: the default route pointed to a VXLAN driver interface, where packets were encapsulated before being sent out via a physical interface. This kind of issue is perfect for pwru, as it can directly reveal the exact code path a packet takes through the Linux networking stack.
To trace the packet, we used the following command:
pwru --filter-track-skb --all-kmods dst 10.1.1.100
The output from pwru was as follows:
From the output, we could see that the packet was dropped. Although the reason was listed as SKB_DROP_REASON_NOT_SPECIFIED
, this wasn’t a problem because we could identify that the function executed right before the drop was vxlan_get_route. From there, we could inspect the relevant Linux source code:
if (!IS_ERR(rt)) {
if (rt->dst.dev == dev) {
netdev_dbg(dev, "circular route to %pI4
", &daddr);
ip_rt_put(rt);
return ERR_PTR(-ELOOP);
}
*saddr = fl4.saddr;
if (use_cache)
dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
} else {
netdev_dbg(dev, "no route to %pI4
", &daddr);
return ERR_PTR(-ENETUNREACH);
}
From the source code, we confirmed that there were two possible reasons for the failure in vxlan_get_route
:
- Loopback Issue: If the encapsulated packet was being sent out through the same network device that originally received it, a loop would occur, causing the packet to be continuously re-encapsulated and reprocessed.
- Missing Route: If no valid route existed for the packet, it would be dropped.
By inspecting the IP routing table, we discovered that an extra route entry had caused a routing loop. Deleting this incorrect route immediately resolved the issue.
Why pwru Matters
Without pwru, we would have needed to rely on networking expertise to manually check various points in the system before eventually identifying the misconfigured routing table as the root cause. However, with pwru, we were able to systematically trace the packet’s path and pinpoint the exact reason for the failure much more efficiently.
Note: The post is authorized to republish and translated from https://www.kawabangga.com/posts/6879