[Linux-ha-dev] heartbeat should detect and recover from corrupt CIB

Aníbal Monsalve Salazar anibal at sgi.com
Fri Mar 2 00:11:27 MST 2007


Hello,

Russell Coker found that "under XFS failure modes a recently created
file may end up filled with zeros if there is a power outage (IPMI
fence) at an inconvenient time. Heartbeat keeps a backup copy of
/var/lib/heartbeat/crm/cib.xml but if the primary copy is filled with
zeros it doesn't use the backup!" 

I created the following patch. It has been tested and found to work
under the circumstances described above.

--- lib/crm/common/xml.c~	2007-01-12 13:57:08.000000000 +1100
+++ lib/crm/common/xml.c	2007-03-02 14:41:03.352014250 +1100
@@ -634,6 +634,11 @@
 
 	/* establish the file with correct permissions */
 	file_output_strm = fopen(filename, "w");
+	if(file_output_strm == NULL) {
+		crm_free(buffer);
+		cl_perror("Cannot open %s", filename);
+		return -1;
+	}
 	fclose(file_output_strm);
 	chmod(filename, cib_mode);
 
@@ -684,7 +689,11 @@
 		if(res < 0) {
 			cl_perror("Cannot write output to %s",filename);
 		}
-		fflush(file_output_strm);
+		if(fflush(file_output_strm) == EOF || fsync(fileno(file_output_strm)) < 0) {
+			cl_perror("fflush or fsync error on %s", filename);
+			fclose(file_output_strm);
+			return -1;
+		}
 	}
 	fclose(file_output_strm);
 	crm_free(buffer);

Aníbal


More information about the Linux-HA-Dev mailing list