Message ID | 1525682645-30510-9-git-send-email-dongsheng.yang@easystack.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang <dongsheng.yang@easystack.cn> wrote: > Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> > --- > drivers/block/rbd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c > index 39ca0d7..a341929 100644 > --- a/drivers/block/rbd.c > +++ b/drivers/block/rbd.c > @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) > struct ceph_osd_request *osd_req = obj_request->osd_req; > > osd_req->r_flags = CEPH_OSD_FLAG_WRITE; > + osd_req->r_abort_on_full = true; > ktime_get_real_ts(&osd_req->r_mtime); > osd_req->r_data_offset = obj_request->ex.oe_off; > } Blocking instead of returning -ENOSPC is intentional. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a and associated tracker tickets. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanx Ilya, IIUC, the problem is that block layer will translate the ENOSPC to EIO, then it will cause filesystem corruption if the cluster get full -> not full soon. But Christoph introduce a new block status code type, which will translate the correct error number to users of block layer in 2017. please check this commit: https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996 Thanx Yang On 05/07/2018 10:30 PM, Ilya Dryomov wrote: > On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang > <dongsheng.yang@easystack.cn> wrote: >> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> >> --- >> drivers/block/rbd.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >> index 39ca0d7..a341929 100644 >> --- a/drivers/block/rbd.c >> +++ b/drivers/block/rbd.c >> @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) >> struct ceph_osd_request *osd_req = obj_request->osd_req; >> >> osd_req->r_flags = CEPH_OSD_FLAG_WRITE; >> + osd_req->r_abort_on_full = true; >> ktime_get_real_ts(&osd_req->r_mtime); >> osd_req->r_data_offset = obj_request->ex.oe_off; >> } > Blocking instead of returning -ENOSPC is intentional. See > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a > > and associated tracker tickets. > > Thanks, > > Ilya > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To be more specific, the ENOSPC was introduced by a9d6ceb838755c24dde8a0ca02c3378926fc63db. Please check: https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db Or, maybe we can keep the default behavior of blocking IO, but add a rbd option like abort_on_full to return ENOSPC in the case we really don't want any IO blocking in our program. Thanx On 05/08/2018 10:58 AM, Dongsheng Yang wrote: > Thanx Ilya, > IIUC, the problem is that block layer will translate the ENOSPC > to EIO, > then it will cause filesystem corruption if the cluster get full -> > not full soon. > > But Christoph introduce a new block status code type, which will > translate > the correct error number to users of block layer in 2017. > > please check this commit: > > https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996 > > > Thanx > Yang > > On 05/07/2018 10:30 PM, Ilya Dryomov wrote: >> On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang >> <dongsheng.yang@easystack.cn> wrote: >>> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> >>> --- >>> drivers/block/rbd.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >>> index 39ca0d7..a341929 100644 >>> --- a/drivers/block/rbd.c >>> +++ b/drivers/block/rbd.c >>> @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct >>> rbd_obj_request *obj_request) >>> struct ceph_osd_request *osd_req = obj_request->osd_req; >>> >>> osd_req->r_flags = CEPH_OSD_FLAG_WRITE; >>> + osd_req->r_abort_on_full = true; >>> ktime_get_real_ts(&osd_req->r_mtime); >>> osd_req->r_data_offset = obj_request->ex.oe_off; >>> } >> Blocking instead of returning -ENOSPC is intentional. See >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a >> >> and associated tracker tickets. >> >> Thanks, >> >> Ilya >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 8, 2018 at 4:58 AM, Dongsheng Yang <dongsheng.yang@easystack.cn> wrote: > Thanx Ilya, > IIUC, the problem is that block layer will translate the ENOSPC to EIO, > then it will cause filesystem corruption if the cluster get full -> not full > soon. > > But Christoph introduce a new block status code type, which will translate > the correct error number to users of block layer in 2017. > > please check this commit: > > https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996 IIRC the way it used to work is that an error code would always be propagated in the sense that bios would be completed with whatever error code set by the block driver (even something crazy like -EBFONT). Christoph's patch constrained that to a set of blk_status_t codes. The particular error code doesn't really matter. Whether it's -EIO or -ENOSPC, a filesystem nowadays will either remount itself read-only or shutdown. xfs used to retry indefinitely in certain cases, but that has changed and is now configurable via /sys/fs/xfs. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang <dongsheng.yang@easystack.cn> wrote: > To be more specific, the ENOSPC was introduced by > a9d6ceb838755c24dde8a0ca02c3378926fc63db. > > Please check: > > https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db This commit added an error string for -ENOSPC, so that "critical space allocation error" is printed instead of "I/O error". It didn't change anything about generic error code propagation. > > Or, maybe we can keep the default behavior of blocking IO, but add a rbd > option like abort_on_full to > return ENOSPC in the case we really don't want any IO blocking in our > program. What is your use case? Are you using raw block devices with pool quotas enabled? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang <dongsheng.yang@easystack.cn> wrote: > > > On 05/10/2018 10:56 PM, Ilya Dryomov wrote: > > On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang > <dongsheng.yang@easystack.cn> wrote: > > To be more specific, the ENOSPC was introduced by > a9d6ceb838755c24dde8a0ca02c3378926fc63db. > > Please check: > > https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db > > This commit added an error string for -ENOSPC, so that "critical space > allocation error" is printed instead of "I/O error". It didn't change > anything about generic error code propagation. > > Or, maybe we can keep the default behavior of blocking IO, but add a rbd > option like abort_on_full to > return ENOSPC in the case we really don't want any IO blocking in our > program. > > What is your use case? Are you using raw block devices with pool > quotas enabled? > > both raw block and xfs. > > One of the problem cases is that: > We provide a container service for customer. Each project has > a pool with quota, and the customer in this project can create > containers with specified size of disk, but that's thin provisioned. > > When they reach the quota, the problem is the process in container > which is doing writing will be blocked, and go into D state. Then we > can't kill processes or stop these containers. > > Customers complained times > > "Just tell me what happened, rather than suspend everything and forbid to > cancel." > > So we want to just return error to user and allow them to do a clean work > and recovery > their containers. Right, so you are using pool quotas. Support for pool quotas is a relatively recent addition (since 4.7), so you are hitting all the sharp edges. This is exactly how it works in userspace -- no changes were made. Hanging processes in D state is definitely undesirable, but I'm curious how would the recovery look like? There is nothing that can done on the customer side once the pool is marked full. It will remain full until an administrative action is taken. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/11/2018 09:09 PM, Ilya Dryomov wrote: > On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang > <dongsheng.yang@easystack.cn> wrote: >> >> On 05/10/2018 10:56 PM, Ilya Dryomov wrote: >> >> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang >> <dongsheng.yang@easystack.cn> wrote: ... >> Customers complained times >> >> "Just tell me what happened, rather than suspend everything and forbid to >> cancel." >> >> So we want to just return error to user and allow them to do a clean work >> and recovery >> their containers. > Right, so you are using pool quotas. Support for pool quotas is > a relatively recent addition (since 4.7), so you are hitting all the > sharp edges. This is exactly how it works in userspace -- no changes > were made. > > Hanging processes in D state is definitely undesirable, but I'm curious > how would the recovery look like? There is nothing that can done on the > customer side once the pool is marked full. It will remain full until > an administrative action is taken. We can remove some unused rbd images in this case, https://github.com/ceph/ceph/pull/12627 In our using, there is a reset button at web for each container. When the user found no space left, they can remove some unused volumes. Then we can click the reset button of each container which will reset the container status, actually, it just stop and then start the container again. And then this container would be running again. Thanx > > Thanks, > > Ilya > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 7:43 AM, Dongsheng Yang <dongsheng.yang@easystack.cn> wrote: > > > On 05/11/2018 09:09 PM, Ilya Dryomov wrote: >> >> On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang >> <dongsheng.yang@easystack.cn> wrote: >>> >>> >>> On 05/10/2018 10:56 PM, Ilya Dryomov wrote: >>> >>> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang >>> <dongsheng.yang@easystack.cn> wrote: > > ... >>> >>> Customers complained times >>> >>> "Just tell me what happened, rather than suspend everything and forbid to >>> cancel." >>> >>> So we want to just return error to user and allow them to do a clean work >>> and recovery >>> their containers. >> >> Right, so you are using pool quotas. Support for pool quotas is >> a relatively recent addition (since 4.7), so you are hitting all the >> sharp edges. This is exactly how it works in userspace -- no changes >> were made. >> >> Hanging processes in D state is definitely undesirable, but I'm curious >> how would the recovery look like? There is nothing that can done on the >> customer side once the pool is marked full. It will remain full until >> an administrative action is taken. > > > We can remove some unused rbd images in this case, > > https://github.com/ceph/ceph/pull/12627 > > In our using, there is a reset button at web for each container. When the > user > found no space left, they can remove some unused volumes. Then > we can click the reset button of each container which will reset the > container status, actually, it just stop and then start the container again. > And then this container would be running again. I see, removing an image is one example. I'd be fine with adding an option to return -ENOSPC, off by default. Josh, any objections? Your patch that made blocking optional was dropped, and I'm trying to remember why. Was it just because the OSDs had to be changed to drop such writes on the floor, the problem being that someone with an old osdmap wouldn't know whether to resend? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/15/2018 06:34 AM, Ilya Dryomov wrote: > On Mon, May 14, 2018 at 7:43 AM, Dongsheng Yang > <dongsheng.yang@easystack.cn> wrote: >> >> >> On 05/11/2018 09:09 PM, Ilya Dryomov wrote: >>> >>> On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang >>> <dongsheng.yang@easystack.cn> wrote: >>>> >>>> >>>> On 05/10/2018 10:56 PM, Ilya Dryomov wrote: >>>> >>>> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang >>>> <dongsheng.yang@easystack.cn> wrote: >> >> ... >>>> >>>> Customers complained times >>>> >>>> "Just tell me what happened, rather than suspend everything and forbid to >>>> cancel." >>>> >>>> So we want to just return error to user and allow them to do a clean work >>>> and recovery >>>> their containers. >>> >>> Right, so you are using pool quotas. Support for pool quotas is >>> a relatively recent addition (since 4.7), so you are hitting all the >>> sharp edges. This is exactly how it works in userspace -- no changes >>> were made. >>> >>> Hanging processes in D state is definitely undesirable, but I'm curious >>> how would the recovery look like? There is nothing that can done on the >>> customer side once the pool is marked full. It will remain full until >>> an administrative action is taken. >> >> >> We can remove some unused rbd images in this case, >> >> https://github.com/ceph/ceph/pull/12627 >> >> In our using, there is a reset button at web for each container. When the >> user >> found no space left, they can remove some unused volumes. Then >> we can click the reset button of each container which will reset the >> container status, actually, it just stop and then start the container again. >> And then this container would be running again. > > I see, removing an image is one example. I'd be fine with adding an > option to return -ENOSPC, off by default. > > Josh, any objections? Your patch that made blocking optional was > dropped, and I'm trying to remember why. Was it just because the OSDs > had to be changed to drop such writes on the floor, the problem being > that someone with an old osdmap wouldn't know whether to resend? Yes, that was the reason at the time. Since the kernel is passing ENOSPC up the stack now, I see no reason not to add a krbd option to do so. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 39ca0d7..a341929 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) struct ceph_osd_request *osd_req = obj_request->osd_req; osd_req->r_flags = CEPH_OSD_FLAG_WRITE; + osd_req->r_abort_on_full = true; ktime_get_real_ts(&osd_req->r_mtime); osd_req->r_data_offset = obj_request->ex.oe_off; }
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> --- drivers/block/rbd.c | 1 + 1 file changed, 1 insertion(+)