[Linux-HA] Config Problems

Andrew Beekhof beekhof at gmail.com
Tue Mar 14 04:55:41 MST 2006


On 3/12/06, CA Lists <lists at creativeanvil.com> wrote:
> Hi All,
>
> I've searched the site, searched mailing list archives, etc, and fought with
> this for almost a month. Finally, I'm giving in and asking. I'm having some
> issues getting heartbeat 2.0.2 set up (RHEL 4 on Intel EM64T) in an
> Active/Passive cluster for apache 2. With both nodes up, the first fails
> over immediately and then apache goes down on the secondary node. If I just
> have heartbeat enabled on the primary node, it starts up and detects the
> other node is dead. So, it decides to start up. It brings up the ip, but
> kills it right away:
>
> PING 192.168.1.10 (192.168.1.10): 56 data bytes
> 64 bytes from 192.168.1.10: icmp_seq=36 ttl=64 time=0.366 ms
> 64 bytes from 192.168.1.10: icmp_seq=37 ttl=64 time=0.3 ms
> 64 bytes from 192.168.1.10: icmp_seq=38 ttl=64 time=0.328 ms
> 64 bytes from 192.168.1.10: icmp_seq=39 ttl=64 time=5.132 ms
> 64 bytes from 192.168.1.10: icmp_seq=40 ttl=64 time=0.345 ms
> 64 bytes from 192.168.1.10: icmp_seq=41 ttl=64 time=0.389 ms
>
> (STOPPED RESPONDING)
>
> --- 192.168.1.10 ping statistics ---
> 168 packets transmitted, 6 packets received, 96% packet loss
> round-trip min/avg/max = 0.3/1.143/5.132 ms

If apache fails, then the whole group will get moved to another node -
which would explain the IP being stopped.

What you need to find out is why apache wont start/stay started.
And the reason for that is likely related to this log:
 > apache[3416]:   2006/03/06_22:06:40 ERROR: (98)Address already in use:
 > make_sock: could not bind to address 192.168.1.10:80 no listening sockets
 > available, shutting down

More than that I cant help you.

>
> So, a look in the logs shows:
>
> crmd[3190]: 2006/03/06_22:06:40 info: mask(lrm.c:do_lrm_rsc_op): Performing
> op start on apache_group:apache
> apache[3416]:   2006/03/06_22:06:40 ERROR: (98)Address already in use:
> make_sock: could not bind to address 192.168.1.10:80 no listening sockets
> available, shutting down Unable to open logs
>
> -- SKIP --
>
> crmd[3190]: 2006/03/06_22:06:40 WARN: mask(lrm.c:do_lrm_event): LRM
> operation (3) monitor on apache_group:ip_resource_1 cancelled
> IPaddr[3496]:   2006/03/06_22:06:41 INFO: /sbin/route -n del -host
> 192.168.1.10
> IPaddr[3496]:   2006/03/06_22:06:41 INFO: /sbin/ifconfig eth0:0 down
> IPaddr[3496]:   2006/03/06_22:06:41 INFO: IP Address 192.168.1.10 released
>
> If someone wants the full log, I have it.
>
> So, I did a lsof -I:80 on boot to ensure nothing was running on port 80 and
> continued checking until after apache 'failed'. Nothing was running.
> However, without me starting apache, I get this:
>
> # ps aux | grep 'httpd'
> root      3334  0.0  0.7 86004 7464 ?        Ss   22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon    3374  0.0  0.7 86004 7484 ?        S    22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon    3375  0.0  0.7 86004 7484 ?        S    22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon    3376  0.0  0.7 86004 7484 ?        S    22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon    3377  0.0  0.7 86004 7484 ?        S    22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> daemon    3378  0.0  0.7 86004 7484 ?        S    22:06   0:00
> /usr/local/apache2/bin/httpd -DSTATUS -f /usr/local/apache2/conf/httpd.conf
> root      3541  0.0  0.0 51080  672 pts/0    S+   22:19   0:00 grep httpd
>
> If I bring up the 192.168.1.10 IP address on the machine, all is fine. There
> is nothing else on the network running on .10, I am positive of that. I have
> checked that.
>
> /etc/ha.d/ha.cf contains:
>
> logfacility daemon
> logfile /var/log/ha-log
> keepalive 1
> deadtime 10
> warntime 5
> initdead 120 # depend on your hardware
> udpport 694
> ping 192.168.1.1
> bcast eth0
> auto_failback off
> node creativeweb1
> node creativeweb2
> use_logd no
> compression     bz2
> compression_threshold 2
> crm yes
> respawn hacluster /usr/lib/heartbeat/ipfail
>
> /var/lib/heartbeat/crm/cib.xml contains:
>
> <cib generated="true" admin_epoch="0" epoch="12" num_updates="1850"
> have_quorum="true" num_peers="1" origin="creativeweb1"
> cib_feature_revision="1" last_written="Mon Mar  6 22:06:41 2006"
> dc_uuid="5ceded9a-2332-4995-8840-f28198093da1" debug_source="finalize_join"
> ccm_transition="1">
>    <configuration>
>      <crm_config>
>        <nvpair id="transition_idle_timeout" name="transitition_idle_timeout"
> value="120s"/>
>        <nvpair id="no_quorum_policy" name="no_quorum_policy"
> value="ignore"/>
>      </crm_config>
>      <nodes>
>        <node id="5ceded9a-2332-4995-8840-f28198093da1" uname="creativeweb1"
> type="member"/>
>      </nodes>
>      <resources>
>        <group id="apache_group">
>          <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> provider="heartbeat">
>            <operations>
>              <op id="1" interval="5s" name="monitor" timeout="5s"/>
>            </operations>
>            <instance_attributes>
>              <attributes>
>                <nvpair name="ip" value="192.168.1.10"
> id="e71700d9-9907-414e-9674-32ccad73ac2a"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive id="apache" class="ocf" type="apache"
> provider="heartbeat">
>            <operations>
>              <op id="3" name="monitor" interval="10s" timeout="10s"/>
>            </operations>
>          </primitive>
>        </group>
>      </resources>
>      <constraints>
>        <rsc_location id="run_apache_group" rsc="apache_group">
>          <rule id="pref_run_apache_group" score="100">
>            <expression attribute="#uname" operation="eq"
> value="creativeweb1" id="6bf2195f-dbe1-4127-a2f2-0a37eb1297a2"/>
>          </rule>
>        </rsc_location>
>      </constraints>
>    </configuration>
>    <status>
>      <node_state id="5ceded9a-2332-4995-8840-f28198093da1"
> uname="creativeweb1" in_ccm="true" join="member" origin="do_lrm_query"
> crmd="online" ha="active" expected="member">
>        <lrm>
>          <lrm_resources>
>            <lrm_resource id="apache_group:ip_resource_1" rsc_state="stopped"
> op_status="0" rc_code="0" last_op="stop">
>              <lrm_rsc_op id="apache_group:ip_resource_1_start_0"
> operation="start" op_status="0" call_id="2" rc_code="0"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="running"/>
>              <lrm_rsc_op id="apache_group:ip_resource_1_monitor_5000"
> operation="monitor" op_status="0" call_id="3" rc_code="0"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="running"/>
>              <lrm_rsc_op id="apache_group:ip_resource_1_stop_0"
> operation="stop" origin="do_update_resource"
> transition_key="2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="stopped" call_id="11" rc_code="0" op_status="0"/>
>            </lrm_resource>
> <lrm_resource id="apache_group:apache" rsc_state="stopped" op_status="0"
> rc_code="0" last_op="stop">
>              <lrm_rsc_op id="apache_group:apache_start_0" operation="start"
> op_status="4" call_id="8" rc_code="1" origin="do_update_resource"
> transition_key="1:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="4:1:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="start_failed"/>
>              <lrm_rsc_op id="apache_group:apache_monitor_10000"
> operation="monitor" op_status="4" call_id="5" rc_code="6"
> origin="do_update_resource"
> transition_key="0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="4:0:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="monitor_failed"/>
>              <lrm_rsc_op id="apache_group:apache_stop_0" operation="stop"
> origin="do_update_resource"
> transition_key="2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> transition_magic="0:2:2e5aa48e-46d0-475d-ba4e-b20e0c2b9fe2"
> rsc_state="stopped" call_id="9" rc_code="0" op_status="0"/>
>            </lrm_resource>
>          </lrm_resources>
>        </lrm>
>      </node_state>
>    </status>
>  </cib>
>
> Any help is much appreciated. Please let me know anything additional that
> you may need to help me troubleshoot. Thanks,
>
> Rob
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list