Message ID | 20110408084334.GA28360@stefanha-thinkpad.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 8, 2011 at 9:43 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >> librbd stacks on top of librados to provide access >> to rbd images. >> >> Using librbd simplifies the qemu code, and allows >> qemu to use new versions of the rbd format >> with few (if any) changes. >> >> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> >> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> >> --- >> block/rbd.c | 785 +++++++++++++++-------------------------------------- >> block/rbd_types.h | 71 ----- >> configure | 33 +-- >> 3 files changed, 221 insertions(+), 668 deletions(-) >> delete mode 100644 block/rbd_types.h > > Hi Josh, > I have applied your patches onto qemu.git/master and am running > ceph.git/master. > > Unfortunately qemu-iotests fails for me. I forgot to mention that qemu-iotests lives at: git://git.kernel.org/pub/scm/linux/kernel/git/hch/qemu-iotests.git Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: > On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >> librbd stacks on top of librados to provide access >> to rbd images. >> >> Using librbd simplifies the qemu code, and allows >> qemu to use new versions of the rbd format >> with few (if any) changes. >> >> Signed-off-by: Josh Durgin<josh.durgin@dreamhost.com> >> Signed-off-by: Yehuda Sadeh<yehuda@hq.newdream.net> >> --- >> block/rbd.c | 785 +++++++++++++++-------------------------------------- >> block/rbd_types.h | 71 ----- >> configure | 33 +-- >> 3 files changed, 221 insertions(+), 668 deletions(-) >> delete mode 100644 block/rbd_types.h > > Hi Josh, > I have applied your patches onto qemu.git/master and am running > ceph.git/master. > > Unfortunately qemu-iotests fails for me. Thanks for testing and the detailed instructions! I'm looking into this now. Josh Durgin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: > On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >> librbd stacks on top of librados to provide access >> to rbd images. >> >> Using librbd simplifies the qemu code, and allows >> qemu to use new versions of the rbd format >> with few (if any) changes. >> >> Signed-off-by: Josh Durgin<josh.durgin@dreamhost.com> >> Signed-off-by: Yehuda Sadeh<yehuda@hq.newdream.net> >> --- >> block/rbd.c | 785 +++++++++++++++-------------------------------------- >> block/rbd_types.h | 71 ----- >> configure | 33 +-- >> 3 files changed, 221 insertions(+), 668 deletions(-) >> delete mode 100644 block/rbd_types.h > > Hi Josh, > I have applied your patches onto qemu.git/master and am running > ceph.git/master. > > Unfortunately qemu-iotests fails for me. > > > Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 > rbd:rbd/t.raw. I can reproduce this consistently. Here is the > backtrace of the hung process (not consuming CPU, probably deadlocked): This hung because it wasn't checking the return value of rbd_aio_write. I've fixed this in the for-qemu branch of http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd implementation is not 'growable' - writing to a large offset will not expand the rbd image correctly. Should we implement bdrv_truncate to support this (librbd has a resize operation)? Is bdrv_truncate useful outside of qemu-img and qemu-io? > Test 008 failed with an assertion but succeeded when run again. I think > this is a race condition: This is likely a use-after-free, but I haven't been able to find the race condition yet (or reproduce it). Could you get a backtrace from the core file? Thanks, Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 12, 2011 at 1:18 AM, Josh Durgin <josh.durgin@dreamhost.com> wrote: > On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: >> >> On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >>> >>> librbd stacks on top of librados to provide access >>> to rbd images. >>> >>> Using librbd simplifies the qemu code, and allows >>> qemu to use new versions of the rbd format >>> with few (if any) changes. >>> >>> Signed-off-by: Josh Durgin<josh.durgin@dreamhost.com> >>> Signed-off-by: Yehuda Sadeh<yehuda@hq.newdream.net> >>> --- >>> block/rbd.c | 785 >>> +++++++++++++++-------------------------------------- >>> block/rbd_types.h | 71 ----- >>> configure | 33 +-- >>> 3 files changed, 221 insertions(+), 668 deletions(-) >>> delete mode 100644 block/rbd_types.h >> >> Hi Josh, >> I have applied your patches onto qemu.git/master and am running >> ceph.git/master. >> >> Unfortunately qemu-iotests fails for me. >> >> >> Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 >> rbd:rbd/t.raw. I can reproduce this consistently. Here is the >> backtrace of the hung process (not consuming CPU, probably deadlocked): > > This hung because it wasn't checking the return value of rbd_aio_write. > I've fixed this in the for-qemu branch of > http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd > implementation is not 'growable' - writing to a large offset will not expand > the rbd image correctly. Should we implement bdrv_truncate to support this > (librbd has a resize operation)? Is bdrv_truncate useful outside of qemu-img > and qemu-io? If librbd has a resize operation then it would be nice to wire up bdrv_truncate() for completeness. Note that bdrv_truncate() can also be called online using the block_resize monitor command. Since rbd devices are not growable we should fix qemu-iotests to skip 016 for rbd. >> Test 008 failed with an assertion but succeeded when run again. I think >> this is a race condition: > > This is likely a use-after-free, but I haven't been able to find the race > condition yet (or reproduce it). Could you get a backtrace from the core > file? Unfortunately I have no core file and wasn't able to reproduce it again. Is qemu-iotests passing for you now? Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 12 Apr 2011, Stefan Hajnoczi wrote: > On Tue, Apr 12, 2011 at 1:18 AM, Josh Durgin <josh.durgin@dreamhost.com> wrote: > > On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: > >> > >> On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: > >>> > >>> librbd stacks on top of librados to provide access > >>> to rbd images. > >>> > >>> Using librbd simplifies the qemu code, and allows > >>> qemu to use new versions of the rbd format > >>> with few (if any) changes. > >>> > >>> Signed-off-by: Josh Durgin<josh.durgin@dreamhost.com> > >>> Signed-off-by: Yehuda Sadeh<yehuda@hq.newdream.net> > >>> --- > >>> block/rbd.c | 785 > >>> +++++++++++++++-------------------------------------- > >>> block/rbd_types.h | 71 ----- > >>> configure | 33 +-- > >>> 3 files changed, 221 insertions(+), 668 deletions(-) > >>> delete mode 100644 block/rbd_types.h > >> > >> Hi Josh, > >> I have applied your patches onto qemu.git/master and am running > >> ceph.git/master. > >> > >> Unfortunately qemu-iotests fails for me. > >> > >> > >> Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 > >> rbd:rbd/t.raw. I can reproduce this consistently. Here is the > >> backtrace of the hung process (not consuming CPU, probably deadlocked): > > > > This hung because it wasn't checking the return value of rbd_aio_write. > > I've fixed this in the for-qemu branch of > > http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd > > implementation is not 'growable' - writing to a large offset will not expand > > the rbd image correctly. Should we implement bdrv_truncate to support this > > (librbd has a resize operation)? Is bdrv_truncate useful outside of qemu-img > > and qemu-io? > > If librbd has a resize operation then it would be nice to wire up > bdrv_truncate() for completeness. Note that bdrv_truncate() can also > be called online using the block_resize monitor command. > > Since rbd devices are not growable we should fix qemu-iotests to skip > 016 for rbd. There is a resize operation, but it's expected that you'll use it for any bdev size change (grow or shrink). Does qemu grow a device by writing to the (new) highest offset, or is there another operation that should be wired up? We want to avoid a situation where RBD isn't aware of the qemu bdev resize and has to grow a bit each time we write to a larger offset, as resize is a somewhat expensive operation... Thanks! sage
On 04/12/2011 01:54 AM, Stefan Hajnoczi wrote:
> Is qemu-iotests passing for you now?
Yes, they all pass when 016 is skipped.
Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 12, 2011 at 4:38 PM, Sage Weil <sage@newdream.net> wrote: > On Tue, 12 Apr 2011, Stefan Hajnoczi wrote: >> On Tue, Apr 12, 2011 at 1:18 AM, Josh Durgin <josh.durgin@dreamhost.com> wrote: >> > On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: >> >> >> >> On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >> >>> >> >>> librbd stacks on top of librados to provide access >> >>> to rbd images. >> >>> >> >>> Using librbd simplifies the qemu code, and allows >> >>> qemu to use new versions of the rbd format >> >>> with few (if any) changes. >> >>> >> >>> Signed-off-by: Josh Durgin<josh.durgin@dreamhost.com> >> >>> Signed-off-by: Yehuda Sadeh<yehuda@hq.newdream.net> >> >>> --- >> >>> block/rbd.c | 785 >> >>> +++++++++++++++-------------------------------------- >> >>> block/rbd_types.h | 71 ----- >> >>> configure | 33 +-- >> >>> 3 files changed, 221 insertions(+), 668 deletions(-) >> >>> delete mode 100644 block/rbd_types.h >> >> >> >> Hi Josh, >> >> I have applied your patches onto qemu.git/master and am running >> >> ceph.git/master. >> >> >> >> Unfortunately qemu-iotests fails for me. >> >> >> >> >> >> Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 >> >> rbd:rbd/t.raw. I can reproduce this consistently. Here is the >> >> backtrace of the hung process (not consuming CPU, probably deadlocked): >> > >> > This hung because it wasn't checking the return value of rbd_aio_write. >> > I've fixed this in the for-qemu branch of >> > http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd >> > implementation is not 'growable' - writing to a large offset will not expand >> > the rbd image correctly. Should we implement bdrv_truncate to support this >> > (librbd has a resize operation)? Is bdrv_truncate useful outside of qemu-img >> > and qemu-io? >> >> If librbd has a resize operation then it would be nice to wire up >> bdrv_truncate() for completeness. Note that bdrv_truncate() can also >> be called online using the block_resize monitor command. >> >> Since rbd devices are not growable we should fix qemu-iotests to skip >> 016 for rbd. > > There is a resize operation, but it's expected that you'll use it for any > bdev size change (grow or shrink). Does qemu grow a device by writing to > the (new) highest offset, or is there another operation that should be > wired up? We want to avoid a situation where RBD isn't aware of the qemu > bdev resize and has to grow a bit each time we write to a larger offset, > as resize is a somewhat expensive operation... Good it sounds like RBD and QEMU have similar concepts here. The bdrv_truncate() operation is a (rare) image resize operation. It is not the extend-beyond-EOF grow operation which QEMU simply performs as a write beyond bdrv_getlength() bytes. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- 008.out 2010-12-07 16:18:18.762829295 +0000 +++ 008.out.bad 2011-04-08 08:18:31.562761417 +0100 @@ -2,8 +2,31 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 == reading whole image == -read 134217728/134217728 bytes at offset 0 -128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +common/Mutex.h: In function 'void Mutex::Lock(bool)', in thread '0x7f263e057720' +common/Mutex.h: 118: FAILED assert(r == 0) + ceph version 0.25-577-gd941422 (commit:d94142221153ec985c699ad69c3925136f3a30de) + 1: (librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, librbd::AioCompletion*)+0x726) [0x7f263c248db6] + 2: /home/stefanha/qemu/qemu-io() [0x435e7d] + 3: /home/stefanha/qemu/qemu-io() [0x435f70] + 4: /home/stefanha/qemu/qemu-io() [0x411d4c] + ceph version 0.25-577-gd941422 (commit:d94142221153ec985c699ad69c3925136f3a30de) + 1: (librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, librbd::AioCompletion*)+0x726) [0x7f263c248db6] + 2: /home/stefanha/qemu/qemu-io() [0x435e7d] + 3: /home/stefanha/qemu/qemu-io() [0x435f70] + 4: /home/stefanha/qemu/qemu-io() [0x411d4c] +terminate called after throwing an instance of 'ceph::FailedAssertion' +common/Mutex.h: In function 'void Mutex::Lock(bool)', in thread '0x7f263e057720' +common/Mutex.h: 118: FAILED assert(r == 0) + ceph version 0.25-577-gd941422 (commit:d94142221153ec985c699ad69c3925136f3a30de) + 1: (librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, librbd::AioCompletion*)+0x726) [0x7f263c248db6] + 2: /home/stefanha/qemu/qemu-io() [0x435e7d] + 3: /home/stefanha/qemu/qemu-io() [0x435f70] + 4: /home/stefanha/qemu/qemu-io() [0x411d4c] + ceph version 0.25-577-gd941422 (commit:d94142221153ec985c699ad69c3925136f3a30de) + 1: (librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, librbd::AioCompletion*)+0x726) [0x7f263c248db6] + 2: /home/stefanha/qemu/qemu-io() [0x435e7d] + 3: /home/stefanha/qemu/qemu-io() [0x435f70] + 4: /home/stefanha/qemu/qemu-io() [0x411d4c] Do you have a chance to look into this? Please let me know if you need more information. I run like this: $ cd qemu-iotests $ ln -s ~/ceph/src/ceph.conf . $ LD_LIBRARY_PATH=/home/stefanha/ceph/src/.libs PATH=~/qemu/x86_64-softmmu/:~/qemu:~/ceph/src:$PATH TEST_DIR=rbd ./check -rbd I've also temporarily hacked qemu-iotests/common.config to accept rbd pool names: diff --git a/common.config b/common.config index bdd0530..c4c2eb6 100644 --- a/common.config +++ b/common.config @@ -102,14 +102,14 @@ export QEMU_IO="$QEMU_IO_PROG $QEMU_IO_OPTIONS" [ -f /etc/qemu-iotest.config ] && . /etc/qemu-iotest.config -if [ ! -e "$TEST_DIR" ]; then +if [ -z "$TEST_DIR" ]; then TEST_DIR=`pwd`/scratch fi -if [ ! -d "$TEST_DIR" ]; then - echo "common.config: Error: \$TEST_DIR ($TEST_DIR) is not a directory" - exit 1 -fi +#if [ ! -d "$TEST_DIR" ]; then +# echo "common.config: Error: \$TEST_DIR ($TEST_DIR) is not a directory" +# exit 1 +#fi _readlink() {