From patchwork Fri Apr 12 02:19:05 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 2433951 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 59861DF230 for ; Fri, 12 Apr 2013 02:19:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752733Ab3DLCTI (ORCPT ); Thu, 11 Apr 2013 22:19:08 -0400 Received: from mail-ia0-f180.google.com ([209.85.210.180]:63827 "EHLO mail-ia0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751315Ab3DLCTH (ORCPT ); Thu, 11 Apr 2013 22:19:07 -0400 Received: by mail-ia0-f180.google.com with SMTP id l29so1966142iag.25 for ; Thu, 11 Apr 2013 19:19:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=1nqX7K/DEqD9ozO8ZcLHeW+64zHL+GKpYjOpyDuInvA=; b=Bm4bzcmDk48lza/nJb9NHkG0C2CcZbzq9X79i8tZY1r/Yi+6sm0tXzDjYzoCO89riz 07TK813ewOTSzWo9FyOtihv2Tu/ChTJu4FHK3FKEuj6gcri4nSRrXqS587OAPiJMH2m7 nSA89UFWjcZotxe1zJLdbXR+HDtJMu+h3n3+EBANVSW8XYr8tllNHn6vBPznSNX4fYmX r/y4vPAS06uKzBZn+vWcOBGXJcoG7ZCOx7QxE7OxbrhPZx0Zag6ntxZLEh0O6T+DC5aP oNkNq6XKLNUKibNGp0XnSV2YldZgNb1oQhmyizncSybWc3wrpT9v1eSwA93tmsp7EYrT tHTQ== X-Received: by 10.50.62.66 with SMTP id w2mr394488igr.81.1365733146579; Thu, 11 Apr 2013 19:19:06 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPS id hi4sm1091532igc.6.2013.04.11.19.19.04 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 11 Apr 2013 19:19:05 -0700 (PDT) Message-ID: <51676F19.8080408@inktank.com> Date: Thu, 11 Apr 2013 21:19:05 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: ceph-devel@vger.kernel.org Subject: [PATCH 11/11] rbd: implement layered reads References: <51676E0F.2010504@inktank.com> In-Reply-To: <51676E0F.2010504@inktank.com> X-Gm-Message-State: ALoCoQkv35jeKjCmYcx0JXLuV1FiPE2R42d7gOu5vz/wTIq/S7T3DfYkLhgBlOCFWFTYM+oalPBg Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Implement layered read requests for format 2 rbd images. If an rbd image is a clone of a snapshot, the snapshot will be the clone's "parent" image. When an object read request on a clone comes back with ENOENT it indicates that the clone is not yet populated with that portion of the image's data, and the parent image should be consulted to satisfy the read. When this occurs, a new image request is created, directed to the parent image. The offset and length of the image are the same as the image-relative offset and length of the object request that produced ENOENT. Data from the parent image therefore satisfies the object read request for the original image request. While this code works, it will not be active until we enable the layering feature (by adding RBD_FEATURE_LAYERING to the value of RBD_FEATURES_SUPPORTED). Signed-off-by: Alex Elder --- drivers/block/rbd.c | 97 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 85 insertions(+), 12 deletions(-) @@ -1336,9 +1338,15 @@ static void rbd_osd_trivial_callback(struct rbd_obj_request *obj_request) static void rbd_osd_read_callback(struct rbd_obj_request *obj_request) { - dout("%s: obj %p result %d %llu/%llu\n", __func__, obj_request, - obj_request->result, obj_request->xferred, obj_request->length); - if (obj_request->img_request) + struct rbd_img_request *img_request = obj_request->img_request; + bool layered = img_request && img_request_layered_test(img_request); + + dout("%s: obj %p img %p result %d %llu/%llu\n", __func__, + obj_request, img_request, obj_request->result, + obj_request->xferred, obj_request->length); + if (layered && obj_request->result == -ENOENT) + rbd_img_parent_read(obj_request); + else if (img_request) rbd_img_obj_request_read_callback(obj_request); else obj_request_done_set(obj_request); @@ -1349,9 +1357,8 @@ static void rbd_osd_write_callback(struct rbd_obj_request *obj_request) dout("%s: obj %p result %d %llu\n", __func__, obj_request, obj_request->result, obj_request->length); /* - * There is no such thing as a successful short write. - * Our xferred value is the number of bytes transferred - * back. Set it to our originally-requested length. + * There is no such thing as a successful short write. Set + * it to our originally-requested length. */ obj_request->xferred = obj_request->length; obj_request_done_set(obj_request); @@ -1391,7 +1398,7 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req, * passed to blk_end_request(), which takes an unsigned int. */ obj_request->xferred = osd_req->r_reply_op_len[0]; - rbd_assert(obj_request->xferred < (u64) UINT_MAX); + rbd_assert(obj_request->xferred < (u64)UINT_MAX); opcode = osd_req->r_ops[0].op; switch (opcode) { case CEPH_OSD_OP_READ: @@ -1607,7 +1614,6 @@ static struct rbd_img_request *rbd_img_request_create( INIT_LIST_HEAD(&img_request->obj_requests); kref_init(&img_request->kref); - (void) img_request_layered_test(img_request); /* Avoid a warning */ rbd_img_request_get(img_request); /* Avoid a warning */ rbd_img_request_put(img_request); /* TEMPORARY */ @@ -1635,6 +1641,9 @@ static void rbd_img_request_destroy(struct kref *kref) if (img_request_write_test(img_request)) ceph_put_snap_context(img_request->snapc); + if (img_request_child_test(img_request)) + rbd_obj_request_put(img_request->obj_request); + kfree(img_request); } @@ -1643,13 +1652,11 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request) struct rbd_img_request *img_request; unsigned int xferred; int result; + bool more; rbd_assert(obj_request_img_data_test(obj_request)); img_request = obj_request->img_request; - rbd_assert(!img_request_child_test(img_request)); - rbd_assert(img_request->rq != NULL); - rbd_assert(obj_request->xferred <= (u64)UINT_MAX); xferred = (unsigned int)obj_request->xferred; result = obj_request->result; @@ -1666,7 +1673,15 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request) img_request->result = result; } - return blk_end_request(img_request->rq, result, xferred); + if (img_request_child_test(img_request)) { + rbd_assert(img_request->obj_request != NULL); + more = obj_request->which < img_request->obj_request_count - 1; + } else { + rbd_assert(img_request->rq != NULL); + more = blk_end_request(img_request->rq, result, xferred); + } + + return more; } static void rbd_img_obj_callback(struct rbd_obj_request *obj_request) @@ -1811,6 +1826,64 @@ static int rbd_img_request_submit(struct rbd_img_request *img_request) return 0; } +static void rbd_img_parent_read_callback(struct rbd_img_request *img_request) +{ + struct rbd_obj_request *obj_request; + + rbd_assert(img_request_child_test(img_request)); + + obj_request = img_request->obj_request; + rbd_assert(obj_request != NULL); + obj_request->result = img_request->result; + obj_request->xferred = img_request->xferred; + + rbd_img_obj_request_read_callback(obj_request); + rbd_obj_request_complete(obj_request); +} + +static void rbd_img_parent_read(struct rbd_obj_request *obj_request) +{ + struct rbd_device *rbd_dev; + struct rbd_img_request *img_request; + int result; + + rbd_assert(obj_request_img_data_test(obj_request)); + rbd_assert(obj_request->img_request != NULL); + rbd_assert(obj_request->result == (s32) -ENOENT); + rbd_assert(obj_request->type == OBJ_REQUEST_BIO); + + rbd_dev = obj_request->img_request->rbd_dev; + rbd_assert(rbd_dev->parent != NULL); + /* rbd_read_finish(obj_request, obj_request->length); */ + img_request = rbd_img_request_create(rbd_dev->parent, + obj_request->img_offset, + obj_request->length, + false, true); + result = -ENOMEM; + if (!img_request) + goto out_err; + + rbd_obj_request_get(obj_request); + img_request->obj_request = obj_request; + + result = rbd_img_request_fill_bio(img_request, obj_request->bio_list); + if (result) + goto out_err; + + img_request->callback = rbd_img_parent_read_callback; + result = rbd_img_request_submit(img_request); + if (result) + goto out_err; + + return; +out_err: + if (img_request) + rbd_img_request_put(img_request); + obj_request->result = result; + obj_request->xferred = 0; + obj_request_done_set(obj_request); +} + static int rbd_obj_notify_ack(struct rbd_device *rbd_dev, u64 ver, u64 notify_id) { diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 5c129c5..13a381b 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -398,6 +398,8 @@ void rbd_warn(struct rbd_device *rbd_dev, const char *fmt, ...) # define rbd_assert(expr) ((void) 0) #endif /* !RBD_DEBUG */ +static void rbd_img_parent_read(struct rbd_obj_request *obj_request); + static int rbd_dev_refresh(struct rbd_device *rbd_dev, u64 *hver); static int rbd_dev_v2_refresh(struct rbd_device *rbd_dev, u64 *hver);