Message ID | 1353415495-30561-1-git-send-email-s.priebe@profihost.ag (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: > rbd / rados tends to return pretty often length of writes > or discarded blocks. These values might be bigger than int. > > Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> > --- > block/rbd.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Looks good but I want to check whether this fixes an bug you've hit? Please indicate details of the bug and how to reproduce it in the commit message. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Stefan, Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: > On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: >> rbd / rados tends to return pretty often length of writes >> or discarded blocks. These values might be bigger than int. >> >> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> >> --- >> block/rbd.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) > > Looks good but I want to check whether this fixes an bug you've hit? > Please indicate details of the bug and how to reproduce it in the commit > message. you get various I/O errors in client. As negative return values indicate I/O errors. When now a big positive value is returned by librbd block/rbd tries to store this one in acb->ret which is an int. Then it wraps around and is negative. After that block/rbd thinks this is an I/O error and report this to the guest. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote: > Hi Stefan, > > Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: > >> On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: >>> >>> rbd / rados tends to return pretty often length of writes >>> or discarded blocks. These values might be bigger than int. >>> >>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> >>> --- >>> block/rbd.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> Looks good but I want to check whether this fixes an bug you've hit? >> Please indicate details of the bug and how to reproduce it in the commit >> message. > > > you get various I/O errors in client. As negative return values indicate I/O > errors. When now a big positive value is returned by librbd block/rbd tries > to store this one in acb->ret which is an int. Then it wraps around and is > negative. After that block/rbd thinks this is an I/O error and report this > to the guest. It's still not clear whether this is a bug that you can reproduce. After all, the ret value would have to be >2^31 which is a 2+ GB request! I'm asking if this is a critical bug fix that needs to go into QEMU 1.3-rc1 because of a real-world issue? Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 21.11.2012 07:41, schrieb Stefan Hajnoczi: > On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote: >> Hi Stefan, >> >> Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: >> >>> On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: >>>> >>>> rbd / rados tends to return pretty often length of writes >>>> or discarded blocks. These values might be bigger than int. >>>> >>>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> >>>> --- >>>> block/rbd.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> >>> Looks good but I want to check whether this fixes an bug you've hit? >>> Please indicate details of the bug and how to reproduce it in the commit >>> message. >> >> >> you get various I/O errors in client. As negative return values indicate I/O >> errors. When now a big positive value is returned by librbd block/rbd tries >> to store this one in acb->ret which is an int. Then it wraps around and is >> negative. After that block/rbd thinks this is an I/O error and report this >> to the guest. > > It's still not clear whether this is a bug that you can reproduce. > After all, the ret value would have to be >2^31 which is a 2+ GB > request! Yes and that is the fact. Look here: if (acb->cmd == RBD_AIO_WRITE || acb->cmd == RBD_AIO_DISCARD) { if (r < 0) { acb->ret = r; acb->error = 1; } else if (!acb->error) { acb->ret = rcb->size; } It sets acb->ret to rcb->size. But the size from a DISCARD if you DISCARD a whole device might be 500GB or today even some TB. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote: > Am 21.11.2012 07:41, schrieb Stefan Hajnoczi: > >On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote: > >>Hi Stefan, > >> > >>Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: > >> > >>>On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: > >>>> > >>>>rbd / rados tends to return pretty often length of writes > >>>>or discarded blocks. These values might be bigger than int. > >>>> > >>>>Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> > >>>>--- > >>>> block/rbd.c | 4 ++-- > >>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> > >>>Looks good but I want to check whether this fixes an bug you've hit? > >>>Please indicate details of the bug and how to reproduce it in the commit > >>>message. > >> > >> > >>you get various I/O errors in client. As negative return values indicate I/O > >>errors. When now a big positive value is returned by librbd block/rbd tries > >>to store this one in acb->ret which is an int. Then it wraps around and is > >>negative. After that block/rbd thinks this is an I/O error and report this > >>to the guest. > > > >It's still not clear whether this is a bug that you can reproduce. > >After all, the ret value would have to be >2^31 which is a 2+ GB > >request! > Yes and that is the fact. > > Look here: > if (acb->cmd == RBD_AIO_WRITE || > acb->cmd == RBD_AIO_DISCARD) { > if (r < 0) { > acb->ret = r; > acb->error = 1; > } else if (!acb->error) { > acb->ret = rcb->size; > } > > It sets acb->ret to rcb->size. But the size from a DISCARD if you > DISCARD a whole device might be 500GB or today even some TB. We're going in circles here. I know the types are wrong in the code and your patch fixes it, that's why I said it looks good in my first reply. QEMU is currently in hard freeze and only critical patches should go in. Providing steps to reproduce the bug helps me decide that this patch should still be merged for QEMU 1.3-rc1. Anyway, the patch is straightforward, I have applied it to my block tree and it will be in QEMU 1.3-rc1: https://github.com/stefanha/qemu/commits/block Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 21.11.2012 09:26, schrieb Stefan Hajnoczi: > On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote: >> Am 21.11.2012 07:41, schrieb Stefan Hajnoczi: > We're going in circles here. I know the types are wrong in the code and > your patch fixes it, that's why I said it looks good in my first reply. Sorry not so familiar with processes like these. > QEMU is currently in hard freeze and only critical patches should go in. > Providing steps to reproduce the bug helps me decide that this patch > should still be merged for QEMU 1.3-rc1. > > Anyway, the patch is straightforward, I have applied it to my block tree > and it will be in QEMU 1.3-rc1: > https://github.com/stefanha/qemu/commits/block Thanks! The steps to reproduce are: mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support. Might you have a look at my other rbd fix too? It fixes a race between task cancellation and writes. The same race was fixed in iscsi this summer. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 21, 2012 at 09:33:08AM +0100, Stefan Priebe - Profihost AG wrote: > Am 21.11.2012 09:26, schrieb Stefan Hajnoczi: > >On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote: > >>Am 21.11.2012 07:41, schrieb Stefan Hajnoczi: > >QEMU is currently in hard freeze and only critical patches should go in. > >Providing steps to reproduce the bug helps me decide that this patch > >should still be merged for QEMU 1.3-rc1. > > > >Anyway, the patch is straightforward, I have applied it to my block tree > >and it will be in QEMU 1.3-rc1: > >https://github.com/stefanha/qemu/commits/block > > Thanks! > > The steps to reproduce are: > mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends > a discard. Important is that you use scsi-hd and set > discard_granularity=512. Otherwise rbd disabled discard support. Excellent, thanks! I will add it to the commit description. > Might you have a look at my other rbd fix too? It fixes a race > between task cancellation and writes. The same race was fixed in > iscsi this summer. Yes. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 20.11.2012 13:44, schrieb Stefan Priebe: > rbd / rados tends to return pretty often length of writes > or discarded blocks. These values might be bigger than int. > > Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> > --- > block/rbd.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/rbd.c b/block/rbd.c > index f57d0c6..6bf9c2e 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -69,7 +69,7 @@ typedef enum { > typedef struct RBDAIOCB { > BlockDriverAIOCB common; > QEMUBH *bh; > - int ret; > + int64_t ret; > QEMUIOVector *qiov; > char *bounce; > RBDAIOCmd cmd; > @@ -87,7 +87,7 @@ typedef struct RADOSCB { > int done; > int64_t size; > char *buf; > - int ret; > + int64_t ret; > } RADOSCB; > > #define RBD_FD_READ 0 Why do you use int64_t instead of off_t? If the value is related to file sizes, off_t would be a good choice. Stefan W. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Not sure about off_t. What is min and max size? Stefan Am 21.11.2012 um 18:03 schrieb Stefan Weil <sw@weilnetz.de>: > Am 20.11.2012 13:44, schrieb Stefan Priebe: >> rbd / rados tends to return pretty often length of writes >> or discarded blocks. These values might be bigger than int. >> >> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> >> --- >> block/rbd.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/block/rbd.c b/block/rbd.c >> index f57d0c6..6bf9c2e 100644 >> --- a/block/rbd.c >> +++ b/block/rbd.c >> @@ -69,7 +69,7 @@ typedef enum { >> typedef struct RBDAIOCB { >> BlockDriverAIOCB common; >> QEMUBH *bh; >> - int ret; >> + int64_t ret; >> QEMUIOVector *qiov; >> char *bounce; >> RBDAIOCmd cmd; >> @@ -87,7 +87,7 @@ typedef struct RADOSCB { >> int done; >> int64_t size; >> char *buf; >> - int ret; >> + int64_t ret; >> } RADOSCB; >> #define RBD_FD_READ 0 > > > Why do you use int64_t instead of off_t? > If the value is related to file sizes, off_t would be a good choice. > > Stefan W. > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 November 2012 17:03, Stefan Weil <sw@weilnetz.de> wrote: > Why do you use int64_t instead of off_t? > If the value is related to file sizes, off_t would be a good choice. Looking at the librbd API (which is what the size and ret values come from), it uses size_t and ssize_t for these. So I think probably ssize_t is the right type for ret (and size) in our structs here. -- PMM -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 21.11.2012 23:32, schrieb Peter Maydell: > On 21 November 2012 17:03, Stefan Weil <sw@weilnetz.de> wrote: >> Why do you use int64_t instead of off_t? >> If the value is related to file sizes, off_t would be a good choice. > > Looking at the librbd API (which is what the size and ret > values come from), it uses size_t and ssize_t for these. > So I think probably ssize_t is the right type for ret > (and size) in our structs here. This sounds reasonable but does ssize_t support negative values? For error values. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22 November 2012 08:23, Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > Am 21.11.2012 23:32, schrieb Peter Maydell: >> Looking at the librbd API (which is what the size and ret >> values come from), it uses size_t and ssize_t for these. >> So I think probably ssize_t is the right type for ret >> (and size) in our structs here. > > > This sounds reasonable but does ssize_t support negative values? For error > values. Yes, the first 's' in ssize_t means 'signed' and is the difference between it and size_t. -- PMM -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, i send a new patch using ssize_t. (Subject [PATCH] overflow of int ret: use ssize_t for ret) Stefan Am 22.11.2012 09:40, schrieb Peter Maydell: > On 22 November 2012 08:23, Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> Am 21.11.2012 23:32, schrieb Peter Maydell: >>> Looking at the librbd API (which is what the size and ret >>> values come from), it uses size_t and ssize_t for these. >>> So I think probably ssize_t is the right type for ret >>> (and size) in our structs here. >> >> >> This sounds reasonable but does ssize_t support negative values? For error >> values. > > Yes, the first 's' in ssize_t means 'signed' and is the > difference between it and size_t. > > -- PMM > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/rbd.c b/block/rbd.c index f57d0c6..6bf9c2e 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -69,7 +69,7 @@ typedef enum { typedef struct RBDAIOCB { BlockDriverAIOCB common; QEMUBH *bh; - int ret; + int64_t ret; QEMUIOVector *qiov; char *bounce; RBDAIOCmd cmd; @@ -87,7 +87,7 @@ typedef struct RADOSCB { int done; int64_t size; char *buf; - int ret; + int64_t ret; } RADOSCB; #define RBD_FD_READ 0
rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> --- block/rbd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)