From patchwork Tue Feb 7 12:28:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 9559753 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D061A602B1 for ; Tue, 7 Feb 2017 12:28:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A92B52522B for ; Tue, 7 Feb 2017 12:28:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C5F6280DE; Tue, 7 Feb 2017 12:28:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0ECBC26E97 for ; Tue, 7 Feb 2017 12:28:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753909AbdBGM2c (ORCPT ); Tue, 7 Feb 2017 07:28:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38446 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753857AbdBGM2b (ORCPT ); Tue, 7 Feb 2017 07:28:31 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E5B8F7FB60; Tue, 7 Feb 2017 12:28:31 +0000 (UTC) Received: from tleilax.poochiereds.net (ovpn-120-47.rdu2.redhat.com [10.10.120.47] (may be forged)) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v17CSTZU008114; Tue, 7 Feb 2017 07:28:31 -0500 From: Jeff Layton To: ceph-devel@vger.kernel.org Cc: zyan@redhat.com, sage@redhat.com, idryomov@gmail.com, jspray@redhat.com Subject: [PATCH v3 2/5] libceph: add ceph_osdc_abort_on_full Date: Tue, 7 Feb 2017 07:28:25 -0500 Message-Id: <20170207122828.5550-3-jlayton@redhat.com> In-Reply-To: <20170207122828.5550-1-jlayton@redhat.com> References: <20170207122828.5550-1-jlayton@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 07 Feb 2017 12:28:31 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Spray When a Ceph volume hits capacity, a flag is set in the OSD map to indicate that, and a new map is sprayed around the cluster. When the cephfs client sees that, we want it to shut down any OSD writes that are in-progress with an -ENOSPC error as they'll just hang otherwise. Add a routine that will see if there is an out-of-space condition in the cluster. It will then walk the tree and abort any request that has r_abort_on_full set with an ENOSPC error. Also, add a callback to the osdc that gets called on map updates and a way for upper layers to register that callback. [ jlayton: code style cleanup and adaptation to new osd msg handling ] Signed-off-by: John Spray Signed-off-by: Jeff Layton --- include/linux/ceph/osd_client.h | 4 ++++ net/ceph/osd_client.c | 52 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 5da666cc5891..1aaf4851f180 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -21,6 +21,7 @@ struct ceph_osd_client; /* * completion callback for async writepages */ +typedef void (*ceph_osdc_map_callback_t)(struct ceph_osd_client *); typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *); typedef void (*ceph_osdc_unsafe_callback_t)(struct ceph_osd_request *, bool); @@ -290,6 +291,8 @@ struct ceph_osd_client { struct ceph_msgpool msgpool_op_reply; struct workqueue_struct *notify_wq; + + ceph_osdc_map_callback_t map_cb; }; static inline bool ceph_osdmap_flag(struct ceph_osd_client *osdc, int flag) @@ -392,6 +395,7 @@ extern void ceph_osdc_put_request(struct ceph_osd_request *req); extern int ceph_osdc_start_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req, bool nofail); +extern u32 ceph_osdc_abort_on_full(struct ceph_osd_client *osdc); extern void ceph_osdc_cancel_request(struct ceph_osd_request *req); extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index f68bb42da240..5a4f60000a73 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -18,6 +18,7 @@ #include #include #include +#include #define OSD_OPREPLY_FRONT_LEN 512 @@ -1777,6 +1778,54 @@ static void complete_request(struct ceph_osd_request *req, int err) ceph_osdc_put_request(req); } +/* + * Drop all pending requests that have and complete + * them with the `r` as return code. + * + * Returns the highest OSD map epoch of a request that was + * cancelled, or 0 if none were cancelled. + */ +u32 ceph_osdc_abort_on_full(struct ceph_osd_client *osdc) +{ + struct ceph_osd_request *req; + struct ceph_osd *osd; + struct rb_node *m, *n; + u32 latest_epoch = 0; + bool osdmap_full = ceph_osdmap_flag(osdc, CEPH_OSDMAP_FULL); + + lockdep_assert_held(&osdc->lock); + + dout("enter complete_writes r=%d\n", r); + + if (!osdmap_full && !have_pool_full(osdc)) + goto out; + + for (n = rb_first(&osdc->osds); n; n = rb_next(n)) { + osd = rb_entry(n, struct ceph_osd, o_node); + m = rb_first(&osd->o_requests); + mutex_lock(&osd->lock); + while (m) { + req = rb_entry(m, struct ceph_osd_request, r_node); + m = rb_next(m); + + if (req->r_abort_on_full && + (osdmap_full || pool_full(osdc, req->r_t.base_oloc.pool))) { + u32 cur_epoch = le32_to_cpu(req->r_replay_version.epoch); + + dout("%s: abort tid=%llu flags 0x%x\n", __func__, req->r_tid, req->r_flags); + complete_request(req, -ENOSPC); + if (cur_epoch > latest_epoch) + latest_epoch = cur_epoch; + } + } + mutex_unlock(&osd->lock); + } +out: + dout("return abort_on_full latest_epoch=%u\n", latest_epoch); + return latest_epoch; +} +EXPORT_SYMBOL(ceph_osdc_abort_on_full); + static void cancel_map_check(struct ceph_osd_request *req) { struct ceph_osd_client *osdc = req->r_osdc; @@ -3292,6 +3341,8 @@ void ceph_osdc_handle_map(struct ceph_osd_client *osdc, struct ceph_msg *msg) ceph_monc_got_map(&osdc->client->monc, CEPH_SUB_OSDMAP, osdc->osdmap->epoch); + if (osdc->map_cb) + osdc->map_cb(osdc); up_write(&osdc->lock); wake_up_all(&osdc->client->auth_wq); return; @@ -4096,6 +4147,7 @@ int ceph_osdc_init(struct ceph_osd_client *osdc, struct ceph_client *client) osdc->linger_requests = RB_ROOT; osdc->map_checks = RB_ROOT; osdc->linger_map_checks = RB_ROOT; + osdc->map_cb = NULL; INIT_DELAYED_WORK(&osdc->timeout_work, handle_timeout); INIT_DELAYED_WORK(&osdc->osds_timeout_work, handle_osds_timeout);