Message ID | 1353575275-1343-1-git-send-email-s.priebe@profihost.ag (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Am 22.11.2012 10:07, schrieb Stefan Priebe: > When acb->cmd is WRITE or DISCARD block/rbd stores rcb->size into acb->ret > > Look here: > if (acb->cmd == RBD_AIO_WRITE || > acb->cmd == RBD_AIO_DISCARD) { > if (r< 0) { > acb->ret = r; > acb->error = 1; > } else if (!acb->error) { > acb->ret = rcb->size; > } > > right now acb->ret is just an int and we might get an overflow if size is too big. > For discards rcb->size holds the size of the discard - this might be some TB if you > discard a whole device. > > The steps to reproduce are: > mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support. > --- > block/rbd.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/rbd.c b/block/rbd.c > index 5a0f79f..0384c6c 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -69,7 +69,7 @@ typedef enum { > typedef struct RBDAIOCB { > BlockDriverAIOCB common; > QEMUBH *bh; > - int ret; > + ssize_t ret; > QEMUIOVector *qiov; > char *bounce; > RBDAIOCmd cmd; > @@ -86,7 +86,7 @@ typedef struct RADOSCB { > int done; > int64_t size; > char *buf; > - int ret; > + ssize_t ret; > } RADOSCB; > > #define RBD_FD_READ 0 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: > diff --git a/block/rbd.c b/block/rbd.c > index 5a0f79f..0384c6c 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -69,7 +69,7 @@ typedef enum { > typedef struct RBDAIOCB { > BlockDriverAIOCB common; > QEMUBH *bh; > - int ret; > + ssize_t ret; > QEMUIOVector *qiov; > char *bounce; > RBDAIOCmd cmd; > @@ -86,7 +86,7 @@ typedef struct RADOSCB { > int done; > int64_t size; > char *buf; > - int ret; > + ssize_t ret; > } RADOSCB; > > #define RBD_FD_READ 0 I preferred your previous patch: ssize_t on 32-bit hosts has sizeof(ssize_t) == 4. In qemu_rbd_complete_aio() we may assign acb->ret = rcb->size. Here the size field is int64_t, so ssize_t ret would truncate the value. But BlockDriverCompetionFunc only takes an int argument so we're back to square one. The types are busted and changing the type of ret won't fix that :(. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: >> diff --git a/block/rbd.c b/block/rbd.c >> index 5a0f79f..0384c6c 100644 >> --- a/block/rbd.c >> +++ b/block/rbd.c >> @@ -69,7 +69,7 @@ typedef enum { >> typedef struct RBDAIOCB { >> BlockDriverAIOCB common; >> QEMUBH *bh; >> - int ret; >> + ssize_t ret; >> QEMUIOVector *qiov; >> char *bounce; >> RBDAIOCmd cmd; >> @@ -86,7 +86,7 @@ typedef struct RADOSCB { >> int done; >> int64_t size; >> char *buf; >> - int ret; >> + ssize_t ret; >> } RADOSCB; >> >> #define RBD_FD_READ 0 > > I preferred your previous patch: > > ssize_t on 32-bit hosts has sizeof(ssize_t) == 4. In > qemu_rbd_complete_aio() we may assign acb->ret = rcb->size. Here the > size field is int64_t, so ssize_t ret would truncate the value. The rcb size field should be a size_t: it is used for calling rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits then we've already got a problem there. -- PMM -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, i'm not a ceph or inktank guy. I can't made any decision on what to change. At least right now you'll see failing I/O's in your guest, when you discard whole disks. I could fix this for me with int64 and with ssize_t. So if i should resend another patch i need a concrete advise how to patch. Thanks! Greets, Stefan Am 23.11.2012 15:15, schrieb Peter Maydell: > On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: >>> diff --git a/block/rbd.c b/block/rbd.c >>> index 5a0f79f..0384c6c 100644 >>> --- a/block/rbd.c >>> +++ b/block/rbd.c >>> @@ -69,7 +69,7 @@ typedef enum { >>> typedef struct RBDAIOCB { >>> BlockDriverAIOCB common; >>> QEMUBH *bh; >>> - int ret; >>> + ssize_t ret; >>> QEMUIOVector *qiov; >>> char *bounce; >>> RBDAIOCmd cmd; >>> @@ -86,7 +86,7 @@ typedef struct RADOSCB { >>> int done; >>> int64_t size; >>> char *buf; >>> - int ret; >>> + ssize_t ret; >>> } RADOSCB; >>> >>> #define RBD_FD_READ 0 >> >> I preferred your previous patch: >> >> ssize_t on 32-bit hosts has sizeof(ssize_t) == 4. In >> qemu_rbd_complete_aio() we may assign acb->ret = rcb->size. Here the >> size field is int64_t, so ssize_t ret would truncate the value. > > The rcb size field should be a size_t: it is used for calling > rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits > then we've already got a problem there. > > -- PMM > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 23, 2012 at 3:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: >>> diff --git a/block/rbd.c b/block/rbd.c >>> index 5a0f79f..0384c6c 100644 >>> --- a/block/rbd.c >>> +++ b/block/rbd.c >>> @@ -69,7 +69,7 @@ typedef enum { >>> typedef struct RBDAIOCB { >>> BlockDriverAIOCB common; >>> QEMUBH *bh; >>> - int ret; >>> + ssize_t ret; >>> QEMUIOVector *qiov; >>> char *bounce; >>> RBDAIOCmd cmd; >>> @@ -86,7 +86,7 @@ typedef struct RADOSCB { >>> int done; >>> int64_t size; >>> char *buf; >>> - int ret; >>> + ssize_t ret; >>> } RADOSCB; >>> >>> #define RBD_FD_READ 0 >> >> I preferred your previous patch: >> >> ssize_t on 32-bit hosts has sizeof(ssize_t) == 4. In >> qemu_rbd_complete_aio() we may assign acb->ret = rcb->size. Here the >> size field is int64_t, so ssize_t ret would truncate the value. > > The rcb size field should be a size_t: it is used for calling > rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits > then we've already got a problem there. You are right that size_t is fine for read/write. But size_t is not fine for discard: 1. librbd takes uint64_t for discard. 2. Completing a discard request uses size. 3. BlockDriverCompletionFunc only passes int ret value <--- broken for large discard Stefan Priebe has been discarding large regions (e.g. >4 GB must be possible on a 32-bit host). The previous int64_t patch didn't clean up types completely but it made things better. AFAICT we need to be able to discard >4 GB so ssize_t/size_t doesn't cut it on 32-bit hosts. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/rbd.c b/block/rbd.c index 5a0f79f..0384c6c 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -69,7 +69,7 @@ typedef enum { typedef struct RBDAIOCB { BlockDriverAIOCB common; QEMUBH *bh; - int ret; + ssize_t ret; QEMUIOVector *qiov; char *bounce; RBDAIOCmd cmd; @@ -86,7 +86,7 @@ typedef struct RADOSCB { int done; int64_t size; char *buf; - int ret; + ssize_t ret; } RADOSCB; #define RBD_FD_READ 0