diff mbox

Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit

Message ID CAOi1vP_7La=nCs71wG3=HbYLaxM3uBFytv8pjGJLuR_uuB0GLg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov March 23, 2018, 9:34 a.m. UTC
On Fri, Mar 23, 2018 at 10:31 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Thu, Mar 22, 2018 at 12:57 PM, 亀井仁志 / KAMEI,HITOSHI
> <hitoshi.kamei.xm@hitachi.com> wrote:
>> Hi Yang,
>>
>>> I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"?
>>
>> I will add such option to the rbd map command to manipulate image settings. So, the end-user
>> do not change the settings directly via sysfs file.
>>
>>>       @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work)
>>>                       goto err;
>>>               }
>>>
>>>       +       /* Ignore/skip discard requests for thick-provision image */
>>>
>>> Just ignore? or return -EOPNOTSUPP?
>>
>> Thanks, I think -EOPNOTSUPP is better because user programs cannot know
>> the result of requested operation when the kernel rbd driver ignores
>> discard request. The result of requested operation when the kernel rbd driver
>> ignores discard requests, which probably misleads the user programs.
>>
>>> In addition, we should not ignore the REQ_OP_WRITE_ZEROES.
>>
>> Relating to the above, the return code of REQ_OP_WRITE_ZEROS request
>> is also -EOPNOTSUPP instead of ignoring. I think the result of
>> -EOPNOTSUPP is also better for this request because the kernel
>> rbd driver can expect that user programs write zero data by itself.
>
> REQ_OP_WRITE_ZEROES should continue to work, we just need to make
> sure it never issues truncates or deletes and instead writes zeroes
> explicitly.
>
> I think we should be explicit about the fact that discard is not
> supported instead of accepting the discard request and failing it in
> rbd_queue_workfn().  Attached patch is what I have in mind.

Now with the patch.

Thanks,

                Ilya

Comments

KAMEI Hitoshi March 26, 2018, 12:31 p.m. UTC | #1
Hi Ilya,

I think your patch fully completes our purpose and I confirmed
that the kernel with the patch could work well by testing in my environment.

I added the notrim option to rbd map command in accordance with
your kernel rbd driver patch, and I pushed the patch to GitHub
(https://github.com/hitoshikamei/ceph/tree/rbdmap-notrim).

So, could you please merge your patch to the kernel? If your patch is merged,
then my patch can work, so I will send the Pull Request.

Regards,

-- 
Hitoshi Kamei
Ilya Dryomov March 26, 2018, 1:24 p.m. UTC | #2
On Mon, Mar 26, 2018 at 2:31 PM, 亀井仁志 / KAMEI,HITOSHI
<hitoshi.kamei.xm@hitachi.com> wrote:
> Hi Ilya,
>
> I think your patch fully completes our purpose and I confirmed
> that the kernel with the patch could work well by testing in my environment.
>
> I added the notrim option to rbd map command in accordance with
> your kernel rbd driver patch, and I pushed the patch to GitHub
> (https://github.com/hitoshikamei/ceph/tree/rbdmap-notrim).

Change the man page text to say

* notrim - Turn off discard and write zeroes offload support to avoid
  deprovisioning a fully provisioned image (since 4.17).  When enabled, discard
  requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to
  manually zeroing.

and merge it into the first patch.

>
> So, could you please merge your patch to the kernel? If your patch is merged,
> then my patch can work, so I will send the Pull Request.

https://github.com/ceph/ceph-client/commit/7ddd22f2dd1d82da9a4cc0c54dc10760a53964f0

You can open the PR right now, we just won't merge it until the
kernel patch is in.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 1db405366371cb12e5182815c06f6f491af4b63c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov@gmail.com>
Date: Fri, 23 Mar 2018 09:14:47 +0100
Subject: [PATCH] rbd: notrim map option

Add an option to turn off discard and write zeroes offload support to
avoid deprovisioning a fully provisioned image.  When enabled, discard
requests will fail with -EOPNOTSUPP, write zeroes requests will fall
back to manually zeroing.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1a90e5dd7470..dbe440a6531e 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -733,6 +733,7 @@  enum {
 	Opt_read_write,
 	Opt_lock_on_read,
 	Opt_exclusive,
+	Opt_notrim,
 	Opt_err
 };
 
@@ -746,6 +747,7 @@  static match_table_t rbd_opts_tokens = {
 	{Opt_read_write, "rw"},		/* Alternate spelling */
 	{Opt_lock_on_read, "lock_on_read"},
 	{Opt_exclusive, "exclusive"},
+	{Opt_notrim, "notrim"},
 	{Opt_err, NULL}
 };
 
@@ -754,12 +756,14 @@  struct rbd_options {
 	bool	read_only;
 	bool	lock_on_read;
 	bool	exclusive;
+	bool	trim;
 };
 
 #define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_MAX_RQ
 #define RBD_READ_ONLY_DEFAULT	false
 #define RBD_LOCK_ON_READ_DEFAULT false
 #define RBD_EXCLUSIVE_DEFAULT	false
+#define RBD_TRIM_DEFAULT	true
 
 static int parse_rbd_opts_token(char *c, void *private)
 {
@@ -801,6 +805,9 @@  static int parse_rbd_opts_token(char *c, void *private)
 	case Opt_exclusive:
 		rbd_opts->exclusive = true;
 		break;
+	case Opt_notrim:
+		rbd_opts->trim = false;
+		break;
 	default:
 		/* libceph prints "bad option" msg */
 		return -EINVAL;
@@ -3940,11 +3947,12 @@  static int rbd_init_disk(struct rbd_device *rbd_dev)
 	blk_queue_io_min(q, objset_bytes);
 	blk_queue_io_opt(q, objset_bytes);
 
-	/* enable the discard support */
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
-	q->limits.discard_granularity = objset_bytes;
-	blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT);
-	blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT);
+	if (rbd_dev->opts->trim) {
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+		q->limits.discard_granularity = objset_bytes;
+		blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT);
+		blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT);
+	}
 
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
 		q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
@@ -5170,6 +5178,7 @@  static int rbd_add_parse_args(const char *buf,
 	rbd_opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT;
 	rbd_opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT;
 	rbd_opts->exclusive = RBD_EXCLUSIVE_DEFAULT;
+	rbd_opts->trim = RBD_TRIM_DEFAULT;
 
 	copts = ceph_parse_options(options, mon_addrs,
 					mon_addrs + mon_addrs_size - 1,
-- 
2.4.3