Hello, as previously discussed in private with Jeff and David, I encounter a deadlock problem somewhere in netlink in 2.4 kernels since at least 2.4.21 to 2.4.25-pre7. At first I thought it was related to the TG3 driver that I was using, but after many steps in the wrong direction, I could finally find an easy way to reproduce it on other NICs (e1000 & 3c59x). I have added lots of traces in the kernel to track the dev->refcnt changes, but I now need help to understand what I gathered. For this, you need an application which binds to a multicast address. Since I've been noticing the problem on keepalived at first, I'm sticking to it for this example, but I just could reproduce the same problem with ntpd a few minutes ago. The problem is the following : 1/ configure an interface up with an address 2/ start keepalived. Keepalived registers itself to receive netlink broadcasts (link and address groups), and sets a multicast address for VRRP on the interfaces. 3/ now flush all addresses on this interface 4/ then put the link down 5/ then stop keepalived 6/ now rmmod => it hangs in unregister_netdevice() with dev->refcnt=2 now simply change the order of operations between 3 and 4 (addr vs link) : 1/ configure an interface up with an address 2/ start keepalived. 3/ then put the link down 4/ now flush all addresses on this interface 5/ then stop keepalived 6/ now rmmod => no problem at all Stopping keepalived after the ip link down or ip addr flush is OK too. I have tried suggestions by Alexandre Cassen to disable either link or address group registration in keepalived, but it did not change anything at all. I even set the group to zero, but the problem persists, which led me to try ntp to confirm that this was a multicast problem in fact. Anyway, "ip monitor" does not cause this trouble. So I'm now certain that just listening to netlink broadcasts does not causes this problem. BTW, If I manually delete the addresses by hand instead of flushing them, it does not work either. So I put lots of printk's in the kernel to track dev->refcnt at several places, and I now have the following traces, with all printk(refcnt), not displayed here, and along with diffs between them. The end of the name tells the order of removal : A=addr, L=link, K=keepalived. So "trace.kal" describes the following operations, where keepalived was stopped, then the address was flushed, then the link was set down. root: ##### TRACE ##### modprobe e1000 root: ##### TRACE ##### ip addr root: ##### TRACE ##### ip addr add 1.2.3.0/24 dev eth2 root: ##### TRACE ##### ip addr root: ##### TRACE ##### ip link set eth2 up root: ##### TRACE ##### keepalived --vrrp -f /var/state/vrrp.conf root: ##### TRACE ##### ip addr root: ##### TRACE ##### ip addr flush dev eth2 root: ##### TRACE ##### ip link set eth2 down root: ##### TRACE ##### killall keepalived root: ##### TRACE ##### ip addr root: ##### TRACE ##### rmmod e1000 Since there are important differences between logs, and it's been several days I spent on the problem, I think that there is something obvious in front of me that I cannot see. I have uploaded the traces and the side-by-side diffs on this site (not posted because they're about 10kB each) http://w.ods.org/debug/pb-mcast/ I really hope that someone with better knowledge will be able either to point to the problem, or to narrow the problem so that I have some clues where to add traces or what to try, because I'm really out of ideas now. Thanks in advance, Willy