[ENBD] 2.4.26a crashes

Edward Muller enbd@lists.community.tummy.com
07 Jan 2002 09:52:44 -0500


I've had two crashes like this in the last week. I'm not 100% sure they
are being caused by ENBD but it's one of two suspects.

Twice over the last week my primary machine in a two node cluster locked
up with the following (or similar) messages:

nbd: #10333[0]: nbd_rollback Rollback req c19090c0 from slot 0!

(Someone read me this over the phone, so I'm sorry if it's not 100%)

I'm not swapping over ENBD. I have two ENBD devices /dev/nd/a/0 and
/dev/nd/b/0. /dev/nd/a0 is disk #2 in /dev/md0, which is a RAID1 array.
/dev/nd/b/0 is disk #2 in /dev/md1, which is a RAID1 array. /dev/md0 is
mounted as /home. /dev/md1 is mounted as /var.

When the above error message happens the machine locks. I can't login,
although I can still ping the network interfaces and I can hit enter on
the screeen and it scrolls.

I am using eepro100 network cards (two of them, one of them dedicated to
ENBD) on a 2.4.16 kernel. Until saturday (the crashes happened before
that) I was using the open source eepro100 drives, now I am using the
e100 driver from Intel. My understaning is that the eepro100 OSS drives
is buggy and may be causing my problems (this is the other possible
problem)..