[reid.s@usa.net: mailing list...]

Reid Steiner reid_steiner@hotmail.com
Mon, 30 Sep 2002 23:30:29 +0000


Hi Nick, thanks for the reply!  I'm not sure if I'm supposed to post to the 
list or reply to you.

No failover occurs if I HUP Apache myself.  The only files that actually 
rotate on Sunday night have to be a week old and they are system logs and 
httpd-error.log   httpd-access.log is rotated anytime it hits 600MB (that's 
why I have the logrotate script in cron.hourly, to monitor it).  The only 
thing I can thing of is that fos or pulse is getting a SIGTERM are 
restarting which makes it inaccessible to the back up server which then 
takes over.  Here is the log entry from the last failover:

web2:

ep 29 01:01:04 web2 syslogd 1.4.1: restart.
Sep 29 01:01:19 web2 nanny[1954]: READ returned error 104:Connection reset 
by peer
Sep 29 01:01:19 web2 nanny[1954]: Exiting due to connection failure of 
xx.xx.xxx.2:80
Sep 29 01:01:19 web2 fos[1926]: Monitor for service xxx.xxx.xxx.3:80 exited. 
This is a failover condition!
Sep 29 01:01:19 web2 fos[1926]: will now exit to notify pulse...
Sep 29 01:01:21 web2 pulse[1924]: fos process exited -- performing service 
failover
Sep 29 01:01:21 web2 pulse[1924]: running command  "/usr/sbin/fos" 
"--active" "-c" "/etc/sysconfig/ha/lvs.cf" "--nofork"
Sep 29 01:01:21 web2 pulse[5310]: DEBUG -- device = eth0:1
Sep 29 01:01:21 web2 pulse[5310]: DEBUG -- floatAddr = xxx.xxx.xxx.3
Sep 29 01:01:21 web2 pulse[5310]: DEBUG -- vip_nmask = 0.0.0.0
Sep 29 01:01:21 web2 pulse[5311]: running command  "/sbin/ifconfig" "eth0:1" 
"xxx.xxx.xxx.3" "up"
Sep 29 01:01:21 web2 pulse[5311]: DEBUG -- Executing '/sbin/ifconfig eth0:1 
xxx.xxx.xxx.2 netmask 0.0.0.0 up'
Sep 29 01:01:21 web2 fos[5309]: SIOCGIFADDR failed: Cannot assign requested 
address
Sep 29 01:01:21 web2 fos[5309]: Stopping local services (if any)


Web1:

Sep 29 01:01:05 web1 syslogd 1.4.1: restart.
Sep 29 01:01:16 web1 pulse[1889]: PARTNER HAS TOLD US TO GO INACTIVE!
Sep 29 01:01:16 web1 fos[1934]: Shutting down due to signal 15
Sep 29 01:01:16 web1 fos[1934]: Shutting down local service 64.62.133.3:80
Sep 29 01:01:16 web1 fos[1934]: running command  "/etc/rc.d/init.d/httpd" 
"stop"
Sep 29 01:01:18 web1 httpd: httpd shutdown succeeded
Sep 29 01:01:18 web1 fos[1934]: will now exit to notify pulse...



>From: Nicholas Garratt <nick@clickatell.com>
>To: linux-ha@muc.de
>Subject: Re: [reid.s@usa.net: mailing list...]
>Date: Mon, 30 Sep 2002 14:45:04 +0200
>
>hi
>
>the nasty thing with logrotate is its need to HUP or stop/start the daemon 
>writing to the log its rotating. apache has a very nice solution which I 
>use for some daemons I never want to have to HUP, namely rotatelogs. 
>Basically its reads from stdin and writes to a file with a timestamp 
>(unixtime). rotatelogs then rotates the log it writes to at a configurable 
>interval without ever having to close stdin which means the daemon 
>generating the log data never has to reopen its log files.
>
>the obvious problem with this solution is that the daemon needs to be able 
>to log to a pipe or stdout. daemons that insist on opening a log file can 
>be tricked with a named pipe, although I haven't played with this much...
>
>nick
>
>
>>
>>Hi Armin, I'm trying to post to the mailing list and I'd like to post 
>>this:
>>
>>I'm writing because I have similar issues as a few people who have posted 
>>concerning piranha and logrotate.
>>
>>Some have posted that they "remedied" the problem by making logs rotate 
>>only after an increased size as opposed to date/time/normal log file size. 
>>  I'm running RedHat 7.3 on two web servers using Piranha piranha-0.5.3-9 
>>in fos mode.  I have the httpd-access logs rotating at 600MB and logrotate 
>>is in cron.hourly.  All other files in logrotate.d are rotated weekly 
>>(apache  ftpd  rpm  syslog).
>>
>>I have no fail-overs during the week, however, when the weekly files 
>>rotate on Sunday morning, piranha fails over and fails back.  I only 
>>really notice because I have httpd runing on the backup web server (for a 
>>bullitin board, to balance the load between the servers) and it goes down 
>>after the re-failover).
>>
>>When I HUP apache, logrotate or syslogd, I can't get it to happen. However 
>>when I HUP fos on the primary node, the same behavior occurs *almost*.  
>>All that's missing in the logs when I HUP pulse myself is:
>>
>>Sep 29 01:01:19 web2 nanny[1954]: READ returned error 104:Connection reset 
>>by peer
>>
>>
>>Here's what my lvs.conf looks like:
>>
>>
>>primary = xxx.xxx.xxx.1
>>service = fos
>>rsh_command = ssh
>>backup_active = 1
>>backup = xxx.xxx.xxx.2
>>heartbeat = 1
>>heartbeat_port = 539
>>keepalive = 18
>>deadtime = 36
>>network = direct
>>debug_level = NONE
>>failover failover {
>>      address = xxx.xxx.xxx.3 eth1:1
>>      active = 1
>>      port = 80
>>      timeout = 18
>>      send = "GET / HTTP/1.0\r\n\r\n"
>>      expect = "HTTP"
>>      start_cmd = "/etc/rc.d/init.d/httpd start"
>>      stop_cmd = "/etc/rc.d/init.d/httpd stop"
>>}
>>
>>
>>
>>Thanks in advance for any thoughts/direction.
>>
>>
>>
>>Reid Steiner
>>
>>----- End forwarded message -----
>>
>>--
>>Armin Gruner                   ____                          
>>mailto:ag@muc.de
>>``Nur wer sich aendert, bleibt \  /                    
>>http://www.muc.de/~ag/
>>sich treu'' - Wolf Biermann     \/ PGP Key: 0x72DBE671 or finger -l 
>>ag@muc.de
>
>
>--
>--------------------------------
>www.clickatell.com
>Any message, anywhere
>Phone: +27 21 9487150




_________________________________________________________________
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx