[Linux-ha-dev] heartbeat should detect and recover from corrupt CIB
Aníbal Monsalve Salazar
anibal at sgi.com
Fri Mar 2 00:11:27 MST 2007
Hello,
Russell Coker found that "under XFS failure modes a recently created
file may end up filled with zeros if there is a power outage (IPMI
fence) at an inconvenient time. Heartbeat keeps a backup copy of
/var/lib/heartbeat/crm/cib.xml but if the primary copy is filled with
zeros it doesn't use the backup!"
I created the following patch. It has been tested and found to work
under the circumstances described above.
--- lib/crm/common/xml.c~ 2007-01-12 13:57:08.000000000 +1100
+++ lib/crm/common/xml.c 2007-03-02 14:41:03.352014250 +1100
@@ -634,6 +634,11 @@
/* establish the file with correct permissions */
file_output_strm = fopen(filename, "w");
+ if(file_output_strm == NULL) {
+ crm_free(buffer);
+ cl_perror("Cannot open %s", filename);
+ return -1;
+ }
fclose(file_output_strm);
chmod(filename, cib_mode);
@@ -684,7 +689,11 @@
if(res < 0) {
cl_perror("Cannot write output to %s",filename);
}
- fflush(file_output_strm);
+ if(fflush(file_output_strm) == EOF || fsync(fileno(file_output_strm)) < 0) {
+ cl_perror("fflush or fsync error on %s", filename);
+ fclose(file_output_strm);
+ return -1;
+ }
}
fclose(file_output_strm);
crm_free(buffer);
Aníbal
More information about the Linux-HA-Dev
mailing list