[Linux-ha-dev] Root cause for machine lockups in CVS head version

Andrea Arcangeli andrea at suse.de
Wed Jan 14 06:54:31 MST 2004


On Tue, Jan 13, 2004 at 09:58:08PM -0700, Alan Robertson wrote:
> If I set the priority of keventd higher than that of heartbeat, then the 
> hang doesn't happen.  It was 100% reproducible before.  Now it doesn't seem 
> to happen (!).  /me bows down to Andrea.
> 
> So, it does involve the kernel, it is a bug, and fixing it involves 
> diddling with keventd.

yes.

However it's not clear if the fix I did so far was strong enough, I
mean, I'm unsure if you're running the kernel with the fix already
applied, in such case you shouldn't need to tweak the keventd priority.

	keventd_task->policy = SCHED_RR;
	keventd_task->rt_priority = MAX_RT_PRIO-1;

the above two lines should be enough to give keventd max prio, as worse
equal (not minor) to the other SCHED_RR tasks. equal prio is fine. Only
minor prio is a problem.

I wonder if I did an off-by-one mistake in the above setting of the
rt_priority or a similar thinko.

Can you verify if your kernel has the two above lines in
kernel/context.c. If you didn't upgrade the kernel post-installation a
grep in /usr/src/linux/kernel/context.c should do it.

The kernel shouldn't require tweaking of keventd prio to allow the
console to context switch under SCHED_RR load.


More information about the Linux-HA-Dev mailing list