[Linux-ha-dev] A STONITH plugin for checking whether the target
node is kdumping or not.
Satomi TANIGUCHI
taniguchis at intellilink.co.jp
Tue Oct 21 04:44:15 MDT 2008
Hi Dejan,
Thank you very very much for taking care of it!
I'm posting a patch to make the condition for searching and getting the value of
kdump_check_user more strictly.
It's for Linux-HA Dev a29f1b78dfe5.
I'm sorry to bother you again.
And I attached a patch for mkdumprd with almost the same modification.
It's for mkdumprd version 5.0.39.
You said "This patch has to go elsewhere, to whoever maintains mkdumprd".
Though I have examined, there is no general way because to add functions
to mkdumprd is a role of each distributer...
And kdumpchecker can't work well with not-patched-mkdumprd.
So, would you apply this as a document like README?
Regards,
Satomi TANIGUCHI
Dejan Muhamedagic wrote:
> Hi Satomi-san,
>
> On Tue, Oct 14, 2008 at 03:06:27PM +0900, Satomi TANIGUCHI wrote:
>> Hi Dejan,
>>
>> Thank you so much for your comments!
>> I modified and tested the patch.
>>
>>
>> Dejan Muhamedagic wrote:
>>> Hi Satomi-san,
>>>
>>> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
>>>> Hi lists,
>>>>
>>>> I'm posting a STONITH plugin which checks whether the target node is kdumping
>>>> or not.
>>>> There are some steps to use this, but I believe this plugin is helpful for
>>>> failure analysis.
>>>> See attached README for details about how to use this.
>>>>
>>>> There are 2 patches.
>>>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
>>>> And the patch named "mkdumprd_for_kdumpcheck.patch" is
>>>> for mkdumprd version 5.0.39.
>>>>
>>>> If you're interested in, please give me your comments.
>>>> Any comments and suggestions are really appreciated.
>>> The script (kdumpcheck) looks fine to me. Just a few points.
>>>
>>> The use of upper case variable names: Typically, those denote
>>> global (or exported) environment variables. Vars which should
>>> live only within a function (though that's not possible with
>>> Bourne shell) should be lower case and, probably, have shorter
>>> names. Excessive use of upper case strains eyes more than the
>>> lower case. That is unless you're a VMS user ;-)
>> I changed all non-global variables' names to lower and shorter strings.
>> Thanks!
>>
>>> Leave "function" and "local" keywords out, unless you want to use
>>> /bin/bash for the script, but I don't see why would that be
>>> necessary.
>> I deleted "function" and "local".
>> And now check_identity_file() and check_user_existence() require no argument.
>>
>>> I wonder if the status function should depend on ping-ing the
>>> target node.
>> The ping-ing is just to confirm that
>> the node which kdumpcheck plugin is working on knows the hostnames in hostlist.
>> Because if the target node is not listed in hostlist,
>> kdumpcheck will fail to STONITH the node.
>> Is it verbosity?
>> I referd to ssh STONITH plugin when I wrote these process...
>> I think it is necessary for the case which an user writes wrong hostname
>> to hostlist or /etc/hosts.
>>
>>> Document that this works only on Linux.
>> I added NOTE in README's introduction.
>
> Applied the patch.
>
> Cheers,
>
> Dejan
>
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>> Cheers,
>>>
>>> Dejan
>>>
>>>> Best Regards,
>>>> Satomi TANIGUCHI
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>
>> diff -urN org/configure.in mod/configure.in
>> --- org/configure.in 2008-10-14 10:24:16.000000000 +0900
>> +++ mod/configure.in 2008-10-14 10:25:17.000000000 +0900
>> @@ -2665,6 +2665,7 @@
>> lib/plugins/stonith/external/riloe \
>> lib/plugins/stonith/external/ssh \
>> lib/plugins/stonith/external/hmchttp \
>> + lib/plugins/stonith/external/kdumpcheck \
>> lib/plugins/stonith/external/xen0-ha \
>> lib/plugins/stonith/external/drac5 \
>> lib/plugins/HBcompress/Makefile \
>> diff -urN org/lib/plugins/stonith/external/Makefile.am mod/lib/plugins/stonith/external/Makefile.am
>> --- org/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:24:17.000000000 +0900
>> +++ mod/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:25:17.000000000 +0900
>> @@ -20,13 +20,13 @@
>> MAINTAINERCLEANFILES = Makefile.in
>>
>> EXTRA_DIST = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \
>> - xen0-ha-dom0-stonith-helper sbd
>> + xen0-ha-dom0-stonith-helper sbd kdumpcheck
>>
>> extdir = $(stonith_ext_plugindir)
>>
>> helperdir = $(stonith_plugindir)
>>
>> ext_SCRIPTS = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu xen0 hmchttp \
>> - xen0-ha sbd
>> + xen0-ha sbd kdumpcheck
>>
>> helper_SCRIPTS = xen0-ha-dom0-stonith-helper
>> diff -urN org/lib/plugins/stonith/external/kdumpcheck.in mod/lib/plugins/stonith/external/kdumpcheck.in
>> --- org/lib/plugins/stonith/external/kdumpcheck.in 1970-01-01 09:00:00.000000000 +0900
>> +++ mod/lib/plugins/stonith/external/kdumpcheck.in 2008-10-14 10:02:03.000000000 +0900
>> @@ -0,0 +1,288 @@
>> +#!/bin/sh
>> +#
>> +# External STONITH module to check kdump.
>> +#
>> +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of version 2 of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful, but
>> +# WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>> +#
>> +# Further, this software is distributed without any warranty that it is
>> +# free of the rightful claim of any third person regarding infringement
>> +# or the like. Any license provided herein, whether implied or
>> +# otherwise, applies only to this software file. Patent licenses, if
>> +# any, provided herein do not apply to combinations of this program with
>> +# other software, or any other product whatsoever.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
>> +#
>> +
>> +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n"
>> +#Set default user name.
>> +USERNAME="kdumpchecker"
>> +#Initialize identity file-path options for ssh command
>> +IDENTITY_OPTS=""
>> +
>> +#For debug print.
>> +DEBUG=1
>> +if [ -n "${DEBUG}" ]; then
>> + DEBUG_FILE=/var/log/ha-kdumpcheck.log
>> + touch ${DEBUG_FILE}
>> + chmod 600 ${DEBUG_FILE}
>> +
>> + exec 2>> ${DEBUG_FILE}
>> + OUTPUT='>&2'
>> +fi
>> +
>> +print_debug() {
>> + if [ -n "${DEBUG}" ]; then
>> + cat >&2
>> + else
>> + cat > /dev/null 2>&1
>> + fi
>> +}
>> +
>> +#Rewrite the hostlist to accept "," as a delimeter for hostnames too.
>> +hostlist=`echo ${hostlist} | tr ',' ' '`
>> +
>> +##
>> +# Check the parameter hostlist is set or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_hostlist() {
>> + if [ -z "${hostlist}" ]; then
>> + echo "`date`::ERROR: hostlist is empty." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> +}
>> +
>> +##
>> +# Set kdump check user name to USERNAME.
>> +# always return 0.
>> +##
>> +get_username() {
>> + kdump_conf="/etc/kdump.conf"
>> + config_name="kdump_check_user"
>> +
>> + if [ ! -f "${kdump_conf}" ]; then
>> + echo "`date`::DEBUG: ${kdump_conf} doesn't exist." | print_debug
>> + return 0
>> + fi
>> +
>> + tmp=`grep "^\s*${config_name}\>" ${kdump_conf} | awk '{print $2}'`
>> + if [ -n "${tmp}" ]; then
>> + USERNAME="${tmp}"
>> + fi
>> +
>> + echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug
>> +}
>> +
>> +##
>> +# Check the specified or default identity file exists or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_identity_file() {
>> + IDENTITY_OPTS=""
>> + if [ -n "${identity_file}" ]; then
>> + if [ ! -f "${identity_file}" ]; then
>> + echo "`date`::ERROR: ${identity_file} doesn't exist." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> + IDENTITY_OPTS="-i ${identity_file}"
>> + else
>> + flg_file_exists=0
>> + homedir=`eval echo "~${USERNAME}"`
>> + for filename in "${homedir}/.ssh/id_rsa" \
>> + "${homedir}/.ssh/id_dsa" \
>> + "${homedir}/.ssh/identity"
>> + do
>> + if [ -f "${filename}" ]; then
>> + flg_file_exists=1
>> + IDENTITY_OPTS="${IDENTITY_OPTS} -i ${filename}"
>> + fi
>> + done
>> + if [ ${flg_file_exists} -eq 0 ]; then
>> + echo "`date`::ERROR: ${USERNAME}'s identity file for ssh command" \
>> + " doesn't exist." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> + fi
>> +}
>> +
>> +##
>> +# Check the user to check doing kdump exists or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_user_existence() {
>> +
>> + # Get kdump check user name and check whether he exists or not.
>> + grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1
>> + ret=$?
>> + if [ ${ret} != 0 ]; then
>> + echo "`date`::ERROR: user ${USERNAME} doesn't exist." \
>> + "please confirm \"kdump_check_user\" setting in /etc/kdump.conf." \
>> + "(default user name is \"kdumpchecker\")" | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> +}
>> +
>> +##
>> +# Check the target node is kdumping or not.
>> +# arg1 : target node name.
>> +# ret : 0 -> the target is kdumping.
>> +# : 1 -> the target is _not_ kdumping.
>> +# : else -> failed to check.
>> +##
>> +check_kdump() {
>> + target_node="$1"
>> +
>> + # Get kdump check user name.
>> + get_username
>> + check_user_existence
>> + exec_cmd="${SSH_COMMAND} -l ${USERNAME}"
>> +
>> + # Specify kdump check user's identity file for ssh command.
>> + check_identity_file
>> + exec_cmd="${exec_cmd} ${IDENTITY_OPTS}"
>> +
>> + # Now, check the target!
>> + # In advance, Write the following setting at the head of
>> + # kdump_check_user's public key in authorized_keys file on target node.
>> + # command="test -s /proc/vmcore", \
>> + # no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
>> + echo "`date`::DEBUG: execute the command" \
>> + "[${exec_cmd} ${target_node}]." | print_debug
>> + ${exec_cmd} ${target_node} > /dev/null 2>&1
>> + ret=$?
>> + echo "`date`::DEBUG: the command's result is ${ret}." | print_debug
>> +
>> + #ret -> 0 : vmcore file's size is not zero. the node is kdumping.
>> + #ret -> 1 : the node is _not_ kdumping (vmcore didn't exist or
>> + # its size is zero). It still needs to be STONITH'ed.
>> + #ret -> 255 : ssh command is failed.
>> + # else : Maybe command strings in authorized_keys is wrong...
>> + return ${ret}
>> +}
>> +
>> +###
>> +#
>> +# Main function.
>> +#
>> +###
>> +case $1 in
>> +gethosts)
>> + check_hostlist
>> + for hostname in ${hostlist} ; do
>> + echo "${hostname}"
>> + done
>> + exit 0
>> + ;;
>> +on)
>> + # This plugin does only check whether a target node is kdumping or not.
>> + exit 1
>> + ;;
>> +reset|off)
>> + check_hostlist
>> + ret=1
>> + for hostname in ${hostlist}
>> + do
>> + if [ "${hostname}" != "$2" ]; then
>> + continue
>> + fi
>> + while [ 1 ]
>> + do
>> + check_kdump "$2"
>> + ret=$?
>> + if [ ${ret} -ne 255 ]; then
>> + exit ${ret}
>> + fi
>> + #255 means ssh command itself is failed.
>> + #For example, connection failure as if network doesn't start yet
>> + #in 2nd kernel on the target node.
>> + #So, retry to check after a little while.
>> + sleep 1
>> + done
>> + done
>> + exit ${ret}
>> + ;;
>> +status)
>> + check_hostlist
>> + for hostname in ${hostlist}
>> + do
>> + if ping -w1 -c1 "${hostname}" 2>&1 | grep "unknown host"
>> + then
>> + exit 1
>> + fi
>> + done
>> + get_username
>> + check_user_existence
>> + check_identity_file
>> + exit 0
>> + ;;
>> +getconfignames)
>> + echo "hostlist identity_file"
>> + exit 0
>> + ;;
>> +getinfo-devid)
>> + echo "kdump check STONITH device"
>> + exit 0
>> + ;;
>> +getinfo-devname)
>> + echo "kdump check STONITH external device"
>> + exit 0
>> + ;;
>> +getinfo-devdescr)
>> + echo "ssh-based kdump checker"
>> + echo "To check whether a target node is dumping or not."
>> + exit 0
>> + ;;
>> +getinfo-devurl)
>> + echo "kdump -> http://lse.sourceforge.net/kdump/"
>> + echo "ssh -> http://openssh.org"
>> + exit 0
>> + ;;
>> +getinfo-xml)
>> + cat << SSHXML
>> +<parameters>
>> +<parameter name="hostlist" unique="1" required="1">
>> +<content type="string" />
>> +<shortdesc lang="en">
>> +Hostlist
>> +</shortdesc>
>> +<longdesc lang="en">
>> +The list of hosts that the STONITH device controls
>> +</longdesc>
>> +</parameter>
>> +
>> +<parameter name="identity_file" unique="1" required="0">
>> +<content type="string" />
>> +<shortdesc lang="en">
>> +Identity file's full path for kdump check user
>> +</shortdesc>
>> +<longdesc lang="en">
>> +The full path of kdump check user's identity file for ssh command.
>> +The identity in the specified file have to be restricted to execute
>> +only the following command.
>> +"test -s /proc/vmcore"
>> +Default: kdump check user's default identity file path.
>> +NOTE: You can specify kdump check user name in /etc/kdump.conf.
>> + The parameter name is "kdump_check_user".
>> + Default user is "kdumpchecker".
>> +</longdesc>
>> +</parameter>
>> +
>> +</parameters>
>> +SSHXML
>> + exit 0
>> + ;;
>> +*)
>> + exit 1
>> + ;;
>> +esac
>
>> Kdump check STONITH plugin "kdumpcheck"
>> 1. Introduction
>> This plugin's purpose is to avoid STONITH for a node which is doing kdump.
>> It confirms whether the node is doing kdump or not when STONITH reset or
>> off operation is executed.
>> If the target node is doing kdump, this plugin considers that STONITH
>> succeeded. If not, it considers that STONITH failed.
>>
>> NOTE: This plugin has no ability to shutdown or startup a node.
>> So it has to be used with other STONITH plugin.
>> Then, when this plugin failed, the next plugin which can kill a node
>> is executed.
>> NOTE: This plugin works only on Linux.
>>
>> 2. The way to check
>> When STONITH reset or off is executed, kdumpcheck connects to the target
>> node, and checks the size of /proc/vmcore.
>> It judges that the target node is _not_ doing kdump when the size of
>> /proc/vmcore on the node is zero, or the file doesn't exist.
>> Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
>> is executed.
>>
>> 3. Expanding mkdumprd
>> This plugin requires non-root user and ssh connection even on 2nd kernel.
>> So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
>> This patch is tested with mkdumprd version 5.0.39.
>> The patch adds the following functions:
>> i) Start udevd with specified .rules files.
>> ii) Bring the specified network interface up.
>> iii) Start sshd.
>> iv) Add the specified user to the 2nd kernel.
>> The user is to check whether the node is doing kdump or not.
>> v) Execute sync command after dumping.
>>
>> NOTE: i) to iv) expandings are only for the case that filesystem partition
>> is specified as the location where the vmcore should be dumped.
>>
>> 4. Parameters
>> kdumpcheck's parameters are the following.
>> hostlist : The list of hosts that the STONITH device controls.
>> delimiter is "," or " ".
>> indispensable setting. (default:none)
>> identity_file: a full-path of the private key file for the user
>> who checks doing kdump.
>> (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
>> $HOME/.ssh/identity)
>>
>> NOTE: To execute this plugin first, set the highest priority to this plugin
>> in all STONITH resources.
>>
>> 5. How to Use
>> To use this tool, do the following steps at all nodes in the cluster.
>> 1) Add an user to check doing kdump.
>> ex.)
>> # useradd kdumpchecker
>> # passwd kdumpchecker
>> 2) Allow passwordless login from the node which will do STONITH to all
>> target nodes for the user added at step 1).
>> ex.)
>> $ cd
>> $ mkdir .ssh
>> $ chmod 700 .ssh
>> $ cd .ssh
>> $ ssh-keygen (generate authentication keys with empty passphrase)
>> $ scp id_rsa.pub kdumpchecker at target_node:"~/.ssh/."
>> $ ssh kdumpchecker at target_node
>> $ cd ~/.ssh
>> $ cat id_rsa.pub >> authorized_keys
>> $ chmod 600 autorized_keys
>> $ rm id_rsa.pub
>> 3) Limit the command that the user can execute.
>> Describe the following commands in a line at the head of the user's
>> public key in target node's authorized_keys file.
>> [command="test -s /proc/vmcore"]
>> And describe some options (like no-pty, no-port-forwarding and so on)
>> according to your security policy.
>> ex.)
>> $ vi ~/.ssh/authorized_keys
>> command="test -s /proc/vmcore",no-port-forwarding,no-X11-forwarding,
>> no-agent-forwarding,no-pty ssh-rsa AAA..snip..== kdumpchecker at node1
>> 4) Add settings in /etc/kdump.conf.
>> network_device : network interface name to check doing kdump.
>> indispensable setting. (default: none)
>> kdump_check_user : user name to check doing kdump.
>> specify non-root user.
>> (default: "kdumpchecker")
>> udev_rules : .rules files' names.
>> specify if you use udev for mapping devices.
>> specified files have to be in /etc/udev/rules.d/.
>> you can specify two or more files.
>> delimiter is "," or " ". (default: none)
>> ex.)
>> # vi /etc/kdump.conf
>> ext3 /dev/sda1
>> network_device eth0
>> kdump_check_user kdumpchecker
>> udev_rules 10-if.rules
>> 5) Apply the patch to /sbin/mkdumprd.
>> # cd /sbin
>> # patch -p 1 < mkdumprd_for_kdumpcheck.patch
>> 6) Restart kdump service.
>> # service kdump restart
>> 7) Describe cib.xml to set STONITH plugin.
>> (See "2. Parameters" and "6. Appendix")
>>
>> 6. Appendix
>> A sample cib.xml.
>> <clone id="clnStonith">
>> <instance_attributes id="instance_attributes.id238245a">
>> <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
>> <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
>> </instance_attributes>
>> <group id="grpStonith">
>> <instance_attributes id="instance_attributes.id2382455"/>
>> <primitive id="grpStonith-kdumpcheck" class="stonith" type="external/kd
>> umpcheck">
>> <instance_attributes id="instance_attributes.id238240a">
>> <nvpair id="nvpair.id238240b" name="hostlist" value="node1,node2"/>
>> <nvpair id="nvpair.id238240c" name="priority" value="1"/>
>> <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
>> </instance_attributes>
>> <operations>
>> <op id="grpStonith-kdumpcheck-start" name="start" interval="0" tim
>> eout="300" on-fail="restart"/>
>> <op id="grpStonith-kdumpcheck-monitor" name="monitor" interval="10"
>> timeout="60" on-fail="restart"/>
>> <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" timeou
>> t="300" on-fail="block"/>
>> </operations>
>> <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
>> </primitive>
>> <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
>> <instance_attributes id="instance_attributes.id2382402a">
>> <nvpair id="nvpair.id2382408a" name="hostlist" value="node1,node2"/
>> >
>> <nvpair id="nvpair.id238066b" name="priority" value="2"/>
>> <nvpair id="nvpair.id2382408c" name="stonith-timeout" value="60s"/>
>> </instance_attributes>
>> <operations>
>> <op id="grpStonith-ssh-start" name="start" interval="0" timeout="30
>> 0" on-fail="restart"/>
>> <op id="grpStonith-ssh-monitor" name="monitor" interval="10" timeou
>> t="60" on-fail="restart"/>
>> <op id="grpStonith-ssh-stop" name="stop" interval="0" timeout="300"
>> on-fail="block"/>
>> </operations>
>> <meta_attributes id="primitive-grpStonith-ssh.meta"/>
>> </primitive>
>> </group>
>> </clone>
>>
>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kdumpcheck_getvalue.patch
Type: text/x-patch
Size: 1147 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20081021/673b80e4/kdumpcheck_getvalue-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mkdumprd_for_kdumpcheck.patch
Type: text/x-patch
Size: 6447 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20081021/673b80e4/mkdumprd_for_kdumpcheck-0001.bin
More information about the Linux-HA-Dev
mailing list