[Linux-ha-dev] heartbeat should detect and recover from corrupt
CIB
Andrew Beekhof
beekhof at gmail.com
Fri Mar 2 01:03:19 MST 2007
On 3/2/07, Aníbal Monsalve Salazar <anibal at sgi.com> wrote:
> Hello,
>
> Russell Coker found that "under XFS failure modes a recently created
> file may end up filled with zeros if there is a power outage (IPMI
> fence) at an inconvenient time. Heartbeat keeps a backup copy of
> /var/lib/heartbeat/crm/cib.xml but if the primary copy is filled with
> zeros it doesn't use the backup!"
>
> I created the following patch. It has been tested and found to work
> under the circumstances described above.
coincidentally i already applied something equivalent to the first
part of this patch a couple of days ago - but the second half looks
like a good addition too.
thanks!
>
> --- lib/crm/common/xml.c~ 2007-01-12 13:57:08.000000000 +1100
> +++ lib/crm/common/xml.c 2007-03-02 14:41:03.352014250 +1100
> @@ -634,6 +634,11 @@
>
> /* establish the file with correct permissions */
> file_output_strm = fopen(filename, "w");
> + if(file_output_strm == NULL) {
> + crm_free(buffer);
> + cl_perror("Cannot open %s", filename);
> + return -1;
> + }
> fclose(file_output_strm);
> chmod(filename, cib_mode);
>
> @@ -684,7 +689,11 @@
> if(res < 0) {
> cl_perror("Cannot write output to %s",filename);
> }
> - fflush(file_output_strm);
> + if(fflush(file_output_strm) == EOF || fsync(fileno(file_output_strm)) < 0) {
> + cl_perror("fflush or fsync error on %s", filename);
> + fclose(file_output_strm);
> + return -1;
> + }
> }
> fclose(file_output_strm);
> crm_free(buffer);
>
> Aníbal
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
More information about the Linux-HA-Dev
mailing list