[ENBD] Doubled requests

Peter T. Breuer ptb at it.uc3m.es
Wed Jan 14 03:42:32 MST 2004


"Also sprach Arne Wiebalck:"
> On Wed, 14 Jan 2004, Peter T. Breuer wrote:
> Here we go ...
> 
> data of the suspicious buffer_head:
> 
> Jan 14 11:18:53 e005 kernel: cr: Suspicious block number: 73749 (last
> 73749) d6e7ba80 (d6e7ba80)
> Jan 14 11:18:53 e005 kernel: cr:
> Jan 14 11:18:53 e005 kernel: 	 bh d6e7ba80
> Jan 14 11:18:53 e005 kernel: 	 b_next 00000000
> Jan 14 11:18:53 e005 kernel: 	 b_blocknr 73749
> Jan 14 11:18:53 e005 kernel: 	 b_size 1024
> Jan 14 11:18:53 e005 kernel: 	 b_list 1
> Jan 14 11:18:53 e005 kernel: 	 b_dev 25856
> Jan 14 11:18:53 e005 kernel: 	 b_count 2
> Jan 14 11:18:53 e005 kernel: 	 b_rdev 25856
> Jan 14 11:18:53 e005 kernel: 	 b_state 31d
> Jan 14 11:18:53 e005 kernel: 	 b_flushtime 360963
> Jan 14 11:18:53 e005 kernel: 	 b_next_free d6e7ba20
> Jan 14 11:18:53 e005 kernel: 	 b_prev_free d7c5d440
> Jan 14 11:18:53 e005 kernel: 	 b_this_page d6e7ba20
> Jan 14 11:18:53 e005 kernel: 	 b_reqnext 00000000
> Jan 14 11:18:53 e005 kernel: 	 b_pprev 00000000
> Jan 14 11:18:53 e005 kernel: 	 b_data cacae400
> Jan 14 11:18:53 e005 kernel: 	 b_page c11dae04
> Jan 14 11:18:53 e005 kernel: 	 b_end_io c0139390
> Jan 14 11:18:53 e005 kernel: 	 b_private c0139390
> Jan 14 11:18:53 e005 kernel: 	 b_rsector 147498
> Jan 14 11:18:53 e005 kernel: 	 b_wait d6e7bac8
> Jan 14 11:18:53 e005 kernel: 	 b_inode_buffers d6e7bac8

Hmmmmmm ... not that informative. Forget it. I also see the problem.
I instrumented file.c in a very naive fashion, placing the following
code in fileit():

         case WRITE: // 1
           DEBUG("+[%u@%Lu]", len, self->lastpoint);

           res = write(fd, buf, len);
           if (1) {
               // instrumentation
               static u64 lastpoint = -1LL, point;
               static int lastlen = -1;
               point = seek(fd, 0LL, SEEK_CUR);
               if (!(lastpoint+lastlen <= point || lastpoint >= point+len) ){
                   PERR("two consecutive writes to the same place!\n");
                   PERR("%#Lx-%#Lx and %#Lx-%Lx!\n",
                           lastpoint, lastpoint+lastlen,
                           point, point+len);
               }
               lastpoint = point;
               lastlen = len;
           }

and ...

file 12180: <#1007> fileit two consecutive writes to the same place!
file 12180: <#1010> fileit 0x131000-0x131400 and 0x131000-131400!
file 12180: <#1007> fileit two consecutive writes to the same place!
file 12180: <#1010> fileit 0x131400-0x131800 and 0x131400-131800!

That's 1K blocks too. I'll ivestigae.

> I also decoded the state bits
> 
> Jan 14 11:18:53 e005 kernel: cr: 1 No: 73749 --    Uptodate:1    Dirty:0
> Lock:4    Req:8    Mapped:16    New:0    Async:0    Wait_IO:0    flushTime
> 360963
> 
> I will also go and get some info from the "last" bh this one collides
> with ...
> 
> 
> > OK - so I understand that your device is being written to by the enbd
> > server? You should be able to confirm the writes in the server itself -
> 
> That's right.

OK.

> > I would try and confirm your reading by instrumenting the write method.
> > You probably can take the instrumentation from your driver, no?
> 
> I also thought about counting the writes inside the server. This would
> give a clue if enbd (server, client, driver) or the kernel between
> enbd-server and my driver is the source.

I'll now look for it.

> > Oh - well then it cannot be a collision issue in the driver :-).
> > And if the driver does not say that it has done a rollback, it has not
> > repeated a reuest attempt.
> 
> UP doesn't help.
> 
> BUT: it seems that you already "impemented" a switch for the doubled
> requests, it's called "merge_requests"! in one of the last tests I forgot
> to activate  it and now it transferred several 100,000 writes with no
> twins. I'll investigate further ....

Hmm. Maybe an artefact of effective block size?


Peter


More information about the ENBD mailing list