[Linux-HA] Config Problems
Andrew Beekhof
beekhof at gmail.com
Tue Mar 14 04:55:41 MST 2006
On 3/12/06, CA Lists <lists at creativeanvil.com> wrote:
> Hi All,
>
> I've searched the site, searched mailing list archives, etc, and fought with
> this for almost a month. Finally, I'm giving in and asking. I'm having some
> issues getting heartbeat 2.0.2 set up (RHEL 4 on Intel EM64T) in an
> Active/Passive cluster for apache 2. With both nodes up, the first fails
> over immediately and then apache goes down on the secondary node. If I just
> have heartbeat enabled on the primary node, it starts up and detects the
> other node is dead. So, it decides to start up. It brings up the ip, but
> kills it right away:
>
> PING 192.168.1.10 (192.168.1.10): 56 data bytes
> 64 bytes from 192.168.1.10: icmp_seq=36 ttl=64 time=0.366 ms
> 64 bytes from 192.168.1.10: icmp_seq=37 ttl=64 time=0.3 ms
> 64 bytes from 192.168.1.10: icmp_seq=38 ttl=64 time=0.328 ms
> 64 bytes from 192.168.1.10: icmp_seq=39 ttl=64 time=5.132 ms
> 64 bytes from 192.168.1.10: icmp_seq=40 ttl=64 time=0.345 ms
> 64 bytes from 192.168.1.10: icmp_seq=41 ttl=64 time=0.389 ms
>
> (STOPPED RESPONDING)
>
> --- 192.168.1.10 ping statistics ---
> 168 packets transmitted, 6 packets received, 96% packet loss
> round-trip min/avg/max = 0.3/1.143/5.132 ms
If apache fails, then the whole group will get moved to another node -
which would explain the IP being stopped.
What you need to find out is why apache wont start/stay started.
And the reason for that is likely related to this log:
> apache[3416]: 2006/03/06_22:06:40 ERROR: (98)Address already in use:
> make_sock: could not bind to address 192.168.1.10:80 no listening sockets
> available, shutting down
More than that I cant help you.
>
> So, a look in the logs shows:
>
> crmd[3190]: 2006/03/06_22:06:40 info: mask(lrm.c:do_lrm_rsc_op): Performing
> op start on apache_group:apache
> apache[3416]: 2006/03/06_22:06:40 ERROR: (98)Address already in use:
> make_sock: could not bind to address 192.168.1.10:80 no listening sockets
> available, shutting down Unable to open logs
>
> -- SKIP --
>
> crmd[3190]: 2006/03/06_22:06:40 WARN: mask(lrm.c:do_lrm_event): LRM
> operation (3) monitor on apache_group:ip_resource_1 cancelled
> IPaddr[3496]: 2006/03/06_22:06:41 INFO: /sbin/route -n del -host
> 192.168.1.10
> IPaddr[3496]: 2006/03/06_22:06:41 INFO: /sbin/ifconfig eth0:0 down
> IPaddr[3496]: 2006/03/06_22:06:41 INFO: IP Address 192.168.1.10 released
>
> If someone wants the full log, I have it.
>
> So, I did a lsof -I:80 on boot to ensure nothing was running on port 80 and
> continued checking until after apache 'failed'. Nothing was running.
> However, without me starting apache, I get this:
>
> # ps aux | grep 'httpd'
> root 3334 0.0 0.7 86004 7464 ? Ss 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon 3374 0.0 0.7 86004 7484 ? S 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon 3375 0.0 0.7 86004 7484 ? S 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon 3376 0.0 0.7 86004 7484 ? S 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon 3377 0.0 0.7 86004 7484 ? S 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon 3378 0.0 0.7 86004 7484 ? S 22:06 0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> root 3541 0.0 0.0 51080 672 pts/0 S+ 22:19 0:00 grep httpd
>
> If I bring up the 192.168.1.10 IP address on the machine, all is fine. There
> is nothing else on the network running on .10, I am positive of that. I have
> checked that.
>
> /etc/ha.d/ha.cf contains:
>
> logfacility daemon
> logfile /var/log/ha-log
> keepalive 1
> deadtime 10
> warntime 5
> initdead 120 # depend on your hardware
> udpport 694
> ping 192.168.1.1
> bcast eth0
> auto_failback off
> node creativeweb1
> node creativeweb2
> use_logd no
> compression bz2
> compression_threshold 2
> crm yes
> respawn hacluster /usr/lib/heartbeat/ipfail
>
> /var/lib/heartbeat/crm/cib.xml contains:
>
> <cib generated="true" admin_epoch="0" epoch="12" num_updates="1850"
> have_quorum="true" num_peers="1" origin="creativeweb1"
> cib_feature_revision="1" last_written="Mon Mar 6 22:06:41 2006"
> dc_uuid="5ceded9a-2332-4995-8840-f28198093da1" debug_source="finalize_join"
> ccm_transition="1">
> <configuration>
> <crm_config>
> <nvpair id="transition_idle_timeout" name="transitition_idle_timeout"
> value="120s"/>
> <nvpair id="no_quorum_policy" name="no_quorum_policy"
> value="ignore"/>
> </crm_config>
> <nodes>
> <node id="5ceded9a-2332-4995-8840-f28198093da1" uname="creativeweb1"
> type="member"/>
> </nodes>
> <resources>
> <group id="apache_group">
> <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> provider="heartbeat">
> <operations>
> <op id="1" interval="5s" name="monitor" timeout="5s"/>
> </operations>
> <instance_attributes>
> <attributes>
> <nvpair name="ip" value="192.168.1.10"
> id="e71700d9-9907-414e-9674-32ccad73ac2a"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive id="apache" class="ocf" type="apache"
> provider="heartbeat">
> <operations>
> <op id="3" name="monitor" interval="10s" timeout="10s"/>
> </operations>
> </primitive>
> </group>
> </resources>
> <constraints>
> <rsc_location id="run_apache_group" rsc="apache_group">
> <rule id="pref_run_apache_group" score="100">
> <expression attribute="#uname" operation="eq"
> value="creativeweb1" id="6bf2195f-dbe1-4127-a2f2-0a37eb1297a2"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> <status>
> <node_state id="5ceded9a-2332-4995-8840-f28198093da1"
> uname="creativeweb1" in_ccm="true" join="member" origin="do_lrm_query"
> crmd="online" ha="active" expected="member">
> <lrm>
> <lrm_resources>
> <lrm_resource id="apache_group:ip_resource_1" rsc_state="stopped"
> op_status="0" rc_code="0" last_op="stop">
> <lrm_rsc_op id="apache_group:ip_resource_1_start_0"
> operation="start" op_status="0" call_id="2" rc_code="0"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="running"/>
> <lrm_rsc_op id="apache_group:ip_resource_1_monitor_5000"
> operation="monitor" op_status="0" call_id="3" rc_code="0"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="running"/>
> <lrm_rsc_op id="apache_group:ip_resource_1_stop_0"
> operation="stop" origin="do_update_resource"
> transition_key="2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="stopped" call_id="11" rc_code="0" op_status="0"/>
> </lrm_resource>
> <lrm_resource id="apache_group:apache" rsc_state="stopped" op_status="0"
> rc_code="0" last_op="stop">
> <lrm_rsc_op id="apache_group:apache_start_0" operation="start"
> op_status="4" call_id="8" rc_code="1" origin="do_update_resource"
> transition_key="1:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="4:1:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="start_failed"/>
> <lrm_rsc_op id="apache_group:apache_monitor_10000"
> operation="monitor" op_status="4" call_id="5" rc_code="6"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="4:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="monitor_failed"/>
> <lrm_rsc_op id="apache_group:apache_stop_0" operation="stop"
> origin="do_update_resource"
> transition_key="2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="stopped" call_id="9" rc_code="0" op_status="0"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> </node_state>
> </status>
> </cib>
>
> Any help is much appreciated. Please let me know anything additional that
> you may need to help me troubleshoot. Thanks,
>
> Rob
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list