[Linux-ha-dev] Root cause for machine lockups in CVS head version

Alan Robertson alanr at unix.sh
Wed Jan 7 10:04:53 MST 2004


Zhu, Yi wrote:
> On Wed, 7 Jan 2004, Zhu, Yi wrote:
> 
> 
>>I tried the cl_shortsleep(), it works. I think it's a good idea!
> 
> 
> I appologize for my mistake. The cl_shortsleep solution only works on 2.6
> kernel, but doesn't work on 2.4 kernel. I tested vanilla 2.4.20 and
> 2.4.20-8 for redhat 9, both locked up the machine. While 2.6.0 works well.
> 
> 
>>One thing I don't quite understand is from the nanosleep man page:
>>
>>" If the process is scheduled under a  real-time policy like
>>SCHED_FIFO or SCHED_RR, then pauses of up to 2 ms will be performed
>>as busy waits with microsecond precision"
>>
>>Does this mean it is possible not to put the process to the waiting
>>queue by the kernel if the sleep time is less than 2 ms (like 0)?
> 
> 
> I answer my own question. The man page is for 2.4 kernel, in 2.6 kernel
> which introduces O(1) scheduler, the sys_nanosleep() implementation is
> changed. This means kernel will put the calling nanosleep() process into
> wait queue regardless of the sleep time. 2.4 kernel implements less than 2
> ms sleep time with udelay(), which busy loops in the processor.
> 
> This answers above question, but it's not a perfect answer. Because the
> redhat 9 defaut 2.4.20-8 kernel uses O(1) scheduler, but it still locks up! :(
> 
> 
> I propose the fix like this,
> 
> case EAGAIN:
> 	if (remaining_data > 0) {
> 		recv(conn_info->s, msg_begin, len, MSG_WAITALL);
> 	}
> 	break;
> 
> If you agree, I'll create a patch on this. It's too late (should be early)
> today. :)


Hmmm...

Does this fix the problem?

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce



More information about the Linux-HA-Dev mailing list