[ENBD] Doubled requests
Peter T. Breuer
ptb at it.uc3m.es
Wed Jan 14 03:42:32 MST 2004
"Also sprach Arne Wiebalck:"
> On Wed, 14 Jan 2004, Peter T. Breuer wrote:
> Here we go ...
>
> data of the suspicious buffer_head:
>
> Jan 14 11:18:53 e005 kernel: cr: Suspicious block number: 73749 (last
> 73749) d6e7ba80 (d6e7ba80)
> Jan 14 11:18:53 e005 kernel: cr:
> Jan 14 11:18:53 e005 kernel: bh d6e7ba80
> Jan 14 11:18:53 e005 kernel: b_next 00000000
> Jan 14 11:18:53 e005 kernel: b_blocknr 73749
> Jan 14 11:18:53 e005 kernel: b_size 1024
> Jan 14 11:18:53 e005 kernel: b_list 1
> Jan 14 11:18:53 e005 kernel: b_dev 25856
> Jan 14 11:18:53 e005 kernel: b_count 2
> Jan 14 11:18:53 e005 kernel: b_rdev 25856
> Jan 14 11:18:53 e005 kernel: b_state 31d
> Jan 14 11:18:53 e005 kernel: b_flushtime 360963
> Jan 14 11:18:53 e005 kernel: b_next_free d6e7ba20
> Jan 14 11:18:53 e005 kernel: b_prev_free d7c5d440
> Jan 14 11:18:53 e005 kernel: b_this_page d6e7ba20
> Jan 14 11:18:53 e005 kernel: b_reqnext 00000000
> Jan 14 11:18:53 e005 kernel: b_pprev 00000000
> Jan 14 11:18:53 e005 kernel: b_data cacae400
> Jan 14 11:18:53 e005 kernel: b_page c11dae04
> Jan 14 11:18:53 e005 kernel: b_end_io c0139390
> Jan 14 11:18:53 e005 kernel: b_private c0139390
> Jan 14 11:18:53 e005 kernel: b_rsector 147498
> Jan 14 11:18:53 e005 kernel: b_wait d6e7bac8
> Jan 14 11:18:53 e005 kernel: b_inode_buffers d6e7bac8
Hmmmmmm ... not that informative. Forget it. I also see the problem.
I instrumented file.c in a very naive fashion, placing the following
code in fileit():
case WRITE: // 1
DEBUG("+[%u@%Lu]", len, self->lastpoint);
res = write(fd, buf, len);
if (1) {
// instrumentation
static u64 lastpoint = -1LL, point;
static int lastlen = -1;
point = seek(fd, 0LL, SEEK_CUR);
if (!(lastpoint+lastlen <= point || lastpoint >= point+len) ){
PERR("two consecutive writes to the same place!\n");
PERR("%#Lx-%#Lx and %#Lx-%Lx!\n",
lastpoint, lastpoint+lastlen,
point, point+len);
}
lastpoint = point;
lastlen = len;
}
and ...
file 12180: <#1007> fileit two consecutive writes to the same place!
file 12180: <#1010> fileit 0x131000-0x131400 and 0x131000-131400!
file 12180: <#1007> fileit two consecutive writes to the same place!
file 12180: <#1010> fileit 0x131400-0x131800 and 0x131400-131800!
That's 1K blocks too. I'll ivestigae.
> I also decoded the state bits
>
> Jan 14 11:18:53 e005 kernel: cr: 1 No: 73749 -- Uptodate:1 Dirty:0
> Lock:4 Req:8 Mapped:16 New:0 Async:0 Wait_IO:0 flushTime
> 360963
>
> I will also go and get some info from the "last" bh this one collides
> with ...
>
>
> > OK - so I understand that your device is being written to by the enbd
> > server? You should be able to confirm the writes in the server itself -
>
> That's right.
OK.
> > I would try and confirm your reading by instrumenting the write method.
> > You probably can take the instrumentation from your driver, no?
>
> I also thought about counting the writes inside the server. This would
> give a clue if enbd (server, client, driver) or the kernel between
> enbd-server and my driver is the source.
I'll now look for it.
> > Oh - well then it cannot be a collision issue in the driver :-).
> > And if the driver does not say that it has done a rollback, it has not
> > repeated a reuest attempt.
>
> UP doesn't help.
>
> BUT: it seems that you already "impemented" a switch for the doubled
> requests, it's called "merge_requests"! in one of the last tests I forgot
> to activate it and now it transferred several 100,000 writes with no
> twins. I'll investigate further ....
Hmm. Maybe an artefact of effective block size?
Peter
More information about the ENBD
mailing list