From patchwork Wed Apr 12 11:08:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D86F5C7619A for ; Wed, 12 Apr 2023 11:25:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230256AbjDLLZU (ORCPT ); Wed, 12 Apr 2023 07:25:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231286AbjDLLZJ (ORCPT ); Wed, 12 Apr 2023 07:25:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C1F21FE0 for ; Wed, 12 Apr 2023 04:24:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298619; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pln8x1pSf1F/UGh2SztM6dmsayn0LV29yUk+OwBN0Z4=; b=K3vcvBWkPHu5VCN9giKZo8d13ekwEOQhDCacdYsZzSOz5N2WpmwU9MEgiADJ95eLLXXTFm YEL1hNJixM48OpwnCggH2zm0EDsLHCb42iaAmkLm9jgK6HKAvPCjovZhBh/mLt9s9B/WYT 66uE5lkqWI4NAc2lXmwASOXNVu1tcfo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-345-0gfWU1LwP2qdLJd-Y-ykng-1; Wed, 12 Apr 2023 07:09:52 -0400 X-MC-Unique: 0gfWU1LwP2qdLJd-Y-ykng-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C43B5858289; Wed, 12 Apr 2023 11:09:51 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id BC3DFC15BB8; Wed, 12 Apr 2023 11:09:47 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 01/71] libceph: add spinlock around osd->o_requests Date: Wed, 12 Apr 2023 19:08:20 +0800 Message-Id: <20230412110930.176835-2-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton In a later patch, we're going to need to search for a request in the rbtree, but taking the o_mutex is inconvenient as we already hold the con mutex at the point where we need it. Add a new spinlock that we take when inserting and erasing entries from the o_requests tree. Search of the rbtree can be done with either the mutex or the spinlock, but insertion and removal requires both. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/osd_client.h | 8 +++++++- net/ceph/osd_client.c | 5 +++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index fb6be72104df..92addef18738 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -29,7 +29,12 @@ typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *); #define CEPH_HOMELESS_OSD -1 -/* a given osd we're communicating with */ +/* + * A given osd we're communicating with. + * + * Note that the o_requests tree can be searched while holding the "lock" mutex + * or the "o_requests_lock" spinlock. Insertion or removal requires both! + */ struct ceph_osd { refcount_t o_ref; struct ceph_osd_client *o_osdc; @@ -37,6 +42,7 @@ struct ceph_osd { int o_incarnation; struct rb_node o_node; struct ceph_connection o_con; + spinlock_t o_requests_lock; struct rb_root o_requests; struct rb_root o_linger_requests; struct rb_root o_backoff_mappings; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 11c04e7d928e..a827995faba7 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1177,6 +1177,7 @@ static void osd_init(struct ceph_osd *osd) { refcount_set(&osd->o_ref, 1); RB_CLEAR_NODE(&osd->o_node); + spin_lock_init(&osd->o_requests_lock); osd->o_requests = RB_ROOT; osd->o_linger_requests = RB_ROOT; osd->o_backoff_mappings = RB_ROOT; @@ -1406,7 +1407,9 @@ static void link_request(struct ceph_osd *osd, struct ceph_osd_request *req) atomic_inc(&osd->o_osdc->num_homeless); get_osd(osd); + spin_lock(&osd->o_requests_lock); insert_request(&osd->o_requests, req); + spin_unlock(&osd->o_requests_lock); req->r_osd = osd; } @@ -1418,7 +1421,9 @@ static void unlink_request(struct ceph_osd *osd, struct ceph_osd_request *req) req, req->r_tid); req->r_osd = NULL; + spin_lock(&osd->o_requests_lock); erase_request(&osd->o_requests, req); + spin_unlock(&osd->o_requests_lock); put_osd(osd); if (!osd_homeless(osd)) From patchwork Wed Apr 12 11:08:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5397C77B6E for ; Wed, 12 Apr 2023 11:12:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229562AbjDLLMN (ORCPT ); Wed, 12 Apr 2023 07:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbjDLLML (ORCPT ); Wed, 12 Apr 2023 07:12:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC6544231 for ; Wed, 12 Apr 2023 04:10:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297798; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WxTAmDMYgzgLkWMI/WAhc/godkDRKr+m/AcMachmlaQ=; b=QkL5aU8qLIRbW8eJnsVoRlHvirc09/K2ZmwBBIB1Bg13c8AFn9VGOuYwVhb2Cb9e1s2Kmm Nfs5FIIAahkPE4vL1SeeQB4MuscgD74JuIlmrw7NS5hVm8FlYQCuay0/NFFtahu43smil6 +izJLlWMgtxy2bXOZiZdgzj0kp3eNJE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-55-a0dupIXXM7OYPHZZAWt5TQ-1; Wed, 12 Apr 2023 07:09:57 -0400 X-MC-Unique: a0dupIXXM7OYPHZZAWt5TQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A086101A531; Wed, 12 Apr 2023 11:09:56 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id AD341C16029; Wed, 12 Apr 2023 11:09:52 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 02/71] libceph: define struct ceph_sparse_extent and add some helpers Date: Wed, 12 Apr 2023 19:08:21 +0800 Message-Id: <20230412110930.176835-3-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When the OSD sends back a sparse read reply, it contains an array of these structures. Define the structure and add a couple of helpers for dealing with them. Also add a place in struct ceph_osd_req_op to store the extent buffer, and code to free it if it's populated when the req is torn down. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/osd_client.h | 43 ++++++++++++++++++++++++++++++++- net/ceph/osd_client.c | 13 ++++++++++ 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 92addef18738..05da1e755b7b 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -29,6 +29,17 @@ typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *); #define CEPH_HOMELESS_OSD -1 +/* + * A single extent in a SPARSE_READ reply. + * + * Note that these come from the OSD as little-endian values. On BE arches, + * we convert them in-place after receipt. + */ +struct ceph_sparse_extent { + u64 off; + u64 len; +} __packed; + /* * A given osd we're communicating with. * @@ -104,6 +115,8 @@ struct ceph_osd_req_op { u64 offset, length; u64 truncate_size; u32 truncate_seq; + int sparse_ext_cnt; + struct ceph_sparse_extent *sparse_ext; struct ceph_osd_data osd_data; } extent; struct { @@ -510,6 +523,20 @@ extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *, u32 truncate_seq, u64 truncate_size, bool use_mempool); +int __ceph_alloc_sparse_ext_map(struct ceph_osd_req_op *op, int cnt); + +/* + * How big an extent array should we preallocate for a sparse read? This is + * just a starting value. If we get more than this back from the OSD, the + * receiver will reallocate. + */ +#define CEPH_SPARSE_EXT_ARRAY_INITIAL 16 + +static inline int ceph_alloc_sparse_ext_map(struct ceph_osd_req_op *op) +{ + return __ceph_alloc_sparse_ext_map(op, CEPH_SPARSE_EXT_ARRAY_INITIAL); +} + extern void ceph_osdc_get_request(struct ceph_osd_request *req); extern void ceph_osdc_put_request(struct ceph_osd_request *req); @@ -564,5 +591,19 @@ int ceph_osdc_list_watchers(struct ceph_osd_client *osdc, struct ceph_object_locator *oloc, struct ceph_watch_item **watchers, u32 *num_watchers); -#endif +/* Find offset into the buffer of the end of the extent map */ +static inline u64 ceph_sparse_ext_map_end(struct ceph_osd_req_op *op) +{ + struct ceph_sparse_extent *ext; + + /* No extents? No data */ + if (op->extent.sparse_ext_cnt == 0) + return 0; + + ext = &op->extent.sparse_ext[op->extent.sparse_ext_cnt - 1]; + + return ext->off + ext->len - op->extent.offset; +} + +#endif diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index a827995faba7..e2f1d1dcbb84 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -378,6 +378,7 @@ static void osd_req_op_data_release(struct ceph_osd_request *osd_req, case CEPH_OSD_OP_READ: case CEPH_OSD_OP_WRITE: case CEPH_OSD_OP_WRITEFULL: + kfree(op->extent.sparse_ext); ceph_osd_data_release(&op->extent.osd_data); break; case CEPH_OSD_OP_CALL: @@ -1120,6 +1121,18 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, } EXPORT_SYMBOL(ceph_osdc_new_request); +int __ceph_alloc_sparse_ext_map(struct ceph_osd_req_op *op, int cnt) +{ + op->extent.sparse_ext_cnt = cnt; + op->extent.sparse_ext = kmalloc_array(cnt, + sizeof(*op->extent.sparse_ext), + GFP_NOFS); + if (!op->extent.sparse_ext) + return -ENOMEM; + return 0; +} +EXPORT_SYMBOL(__ceph_alloc_sparse_ext_map); + /* * We keep osd requests in an rbtree, sorted by ->r_tid. */ From patchwork Wed Apr 12 11:08:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208804 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FB3BC77B72 for ; Wed, 12 Apr 2023 11:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjDLLMi (ORCPT ); Wed, 12 Apr 2023 07:12:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229774AbjDLLMh (ORCPT ); Wed, 12 Apr 2023 07:12:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45BDB7A99 for ; Wed, 12 Apr 2023 04:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297805; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7bT15AT5lWEVSEz45J4jMu2uC7rM1rLDy15G2ALqDR8=; b=NcOt7wsIPc/Zcv2cH5cNRkAN1a/w0P7LBleOa9y1Vu9blYWz/9OFTYsIfxQs7xaY19TNQ8 /daDn6EcTcUs0jwOcartwViBPogVgf/3aE7EwP957IMcvOJqbPQkQVCgrtoqVx3aVhy8RZ LtVrcJ/ZcHtCwaZ71HDQzEOth3tcgZE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-wL2OoVGkNQ-BARPXPDCD9Q-1; Wed, 12 Apr 2023 07:10:01 -0400 X-MC-Unique: wL2OoVGkNQ-BARPXPDCD9Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8D5BA88B7A0; Wed, 12 Apr 2023 11:10:01 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DDEBC15BB8; Wed, 12 Apr 2023 11:09:57 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 03/71] libceph: add sparse read support to msgr2 crc state machine Date: Wed, 12 Apr 2023 19:08:22 +0800 Message-Id: <20230412110930.176835-4-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add support for a new sparse_read ceph_connection operation. The idea is that the client driver can define this operation use it to do special handling for incoming reads. The alloc_msg routine will look at the request and determine whether the reply is expected to be sparse. If it is, then we'll dispatch to a different set of state machine states that will repeatedly call the driver's sparse_read op to get length and placement info for reading the extent map, and the extents themselves. This necessitates adding some new field to some other structs: - The msg gets a new bool to track whether it's a sparse_read request. - A new field is added to the cursor to track the amount remaining in the current extent. This is used to cap the read from the socket into the msg_data - Handing a revoke with all of this is particularly difficult, so I've added a new data_len_remain field to the v2 connection info, and then use that to skip that much on a revoke. We may want to expand the use of that to the normal read path as well, just for consistency's sake. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/messenger.h | 28 ++++++ net/ceph/messenger.c | 1 + net/ceph/messenger_v2.c | 167 +++++++++++++++++++++++++++++++-- 3 files changed, 187 insertions(+), 9 deletions(-) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 99c1726be6ee..8a6938fa324e 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -17,6 +17,7 @@ struct ceph_msg; struct ceph_connection; +struct ceph_msg_data_cursor; /* * Ceph defines these callbacks for handling connection events. @@ -70,6 +71,30 @@ struct ceph_connection_operations { int used_proto, int result, const int *allowed_protos, int proto_cnt, const int *allowed_modes, int mode_cnt); + + /** + * sparse_read: read sparse data + * @con: connection we're reading from + * @cursor: data cursor for reading extents + * @buf: optional buffer to read into + * + * This should be called more than once, each time setting up to + * receive an extent into the current cursor position, and zeroing + * the holes between them. + * + * Returns amount of data to be read (in bytes), 0 if reading is + * complete, or -errno if there was an error. + * + * If @buf is set on a >0 return, then the data should be read into + * the provided buffer. Otherwise, it should be read into the cursor. + * + * The sparse read operation is expected to initialize the cursor + * with a length covering up to the end of the last extent. + */ + int (*sparse_read)(struct ceph_connection *con, + struct ceph_msg_data_cursor *cursor, + char **buf); + }; /* use format string %s%lld */ @@ -207,6 +232,7 @@ struct ceph_msg_data_cursor { struct ceph_msg_data *data; /* current data item */ size_t resid; /* bytes not yet consumed */ + int sr_resid; /* residual sparse_read len */ bool need_crc; /* crc update needed */ union { #ifdef CONFIG_BLOCK @@ -251,6 +277,7 @@ struct ceph_msg { struct kref kref; bool more_to_follow; bool needs_out_seq; + bool sparse_read; int front_alloc_len; struct ceph_msgpool *pool; @@ -395,6 +422,7 @@ struct ceph_connection_v2_info { void *conn_bufs[16]; int conn_buf_cnt; + int data_len_remain; struct kvec in_sign_kvecs[8]; struct kvec out_sign_kvecs[8]; diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index cd7b0bf5369e..3bc3c72a6d4f 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1013,6 +1013,7 @@ void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor, cursor->total_resid = length; cursor->data = msg->data; + cursor->sr_resid = 0; __ceph_msg_data_cursor_init(cursor); } diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index 301a991dc6a6..a0becd553d7c 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -52,14 +52,16 @@ #define FRAME_LATE_STATUS_COMPLETE 0xe #define FRAME_LATE_STATUS_ABORTED_MASK 0xf -#define IN_S_HANDLE_PREAMBLE 1 -#define IN_S_HANDLE_CONTROL 2 -#define IN_S_HANDLE_CONTROL_REMAINDER 3 -#define IN_S_PREPARE_READ_DATA 4 -#define IN_S_PREPARE_READ_DATA_CONT 5 -#define IN_S_PREPARE_READ_ENC_PAGE 6 -#define IN_S_HANDLE_EPILOGUE 7 -#define IN_S_FINISH_SKIP 8 +#define IN_S_HANDLE_PREAMBLE 1 +#define IN_S_HANDLE_CONTROL 2 +#define IN_S_HANDLE_CONTROL_REMAINDER 3 +#define IN_S_PREPARE_READ_DATA 4 +#define IN_S_PREPARE_READ_DATA_CONT 5 +#define IN_S_PREPARE_READ_ENC_PAGE 6 +#define IN_S_PREPARE_SPARSE_DATA 7 +#define IN_S_PREPARE_SPARSE_DATA_CONT 8 +#define IN_S_HANDLE_EPILOGUE 9 +#define IN_S_FINISH_SKIP 10 #define OUT_S_QUEUE_DATA 1 #define OUT_S_QUEUE_DATA_CONT 2 @@ -1815,6 +1817,123 @@ static void prepare_read_data_cont(struct ceph_connection *con) con->v2.in_state = IN_S_HANDLE_EPILOGUE; } +static int prepare_sparse_read_cont(struct ceph_connection *con) +{ + int ret; + struct bio_vec bv; + char *buf = NULL; + struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor; + + WARN_ON(con->v2.in_state != IN_S_PREPARE_SPARSE_DATA_CONT); + + if (iov_iter_is_bvec(&con->v2.in_iter)) { + if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) { + con->in_data_crc = crc32c(con->in_data_crc, + page_address(con->bounce_page), + con->v2.in_bvec.bv_len); + get_bvec_at(cursor, &bv); + memcpy_to_page(bv.bv_page, bv.bv_offset, + page_address(con->bounce_page), + con->v2.in_bvec.bv_len); + } else { + con->in_data_crc = ceph_crc32c_page(con->in_data_crc, + con->v2.in_bvec.bv_page, + con->v2.in_bvec.bv_offset, + con->v2.in_bvec.bv_len); + } + + ceph_msg_data_advance(cursor, con->v2.in_bvec.bv_len); + cursor->sr_resid -= con->v2.in_bvec.bv_len; + dout("%s: advance by 0x%x sr_resid 0x%x\n", __func__, + con->v2.in_bvec.bv_len, cursor->sr_resid); + WARN_ON_ONCE(cursor->sr_resid > cursor->total_resid); + if (cursor->sr_resid) { + get_bvec_at(cursor, &bv); + if (bv.bv_len > cursor->sr_resid) + bv.bv_len = cursor->sr_resid; + if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) { + bv.bv_page = con->bounce_page; + bv.bv_offset = 0; + } + set_in_bvec(con, &bv); + con->v2.data_len_remain -= bv.bv_len; + return 0; + } + } else if (iov_iter_is_kvec(&con->v2.in_iter)) { + /* On first call, we have no kvec so don't compute crc */ + if (con->v2.in_kvec_cnt) { + WARN_ON_ONCE(con->v2.in_kvec_cnt > 1); + con->in_data_crc = crc32c(con->in_data_crc, + con->v2.in_kvecs[0].iov_base, + con->v2.in_kvecs[0].iov_len); + } + } else { + return -EIO; + } + + /* get next extent */ + ret = con->ops->sparse_read(con, cursor, &buf); + if (ret <= 0) { + if (ret < 0) + return ret; + + reset_in_kvecs(con); + add_in_kvec(con, con->v2.in_buf, CEPH_EPILOGUE_PLAIN_LEN); + con->v2.in_state = IN_S_HANDLE_EPILOGUE; + return 0; + } + + if (buf) { + /* receive into buffer */ + reset_in_kvecs(con); + add_in_kvec(con, buf, ret); + con->v2.data_len_remain -= ret; + return 0; + } + + if (ret > cursor->total_resid) { + pr_warn("%s: ret 0x%x total_resid 0x%zx resid 0x%zx\n", + __func__, ret, cursor->total_resid, cursor->resid); + return -EIO; + } + get_bvec_at(cursor, &bv); + if (bv.bv_len > cursor->sr_resid) + bv.bv_len = cursor->sr_resid; + if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) { + if (unlikely(!con->bounce_page)) { + con->bounce_page = alloc_page(GFP_NOIO); + if (!con->bounce_page) { + pr_err("failed to allocate bounce page\n"); + return -ENOMEM; + } + } + + bv.bv_page = con->bounce_page; + bv.bv_offset = 0; + } + set_in_bvec(con, &bv); + con->v2.data_len_remain -= ret; + return ret; +} + +static int prepare_sparse_read_data(struct ceph_connection *con) +{ + struct ceph_msg *msg = con->in_msg; + + dout("%s: starting sparse read\n", __func__); + + if (WARN_ON_ONCE(!con->ops->sparse_read)) + return -EOPNOTSUPP; + + if (!con_secure(con)) + con->in_data_crc = -1; + + reset_in_kvecs(con); + con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT; + con->v2.data_len_remain = data_len(msg); + return prepare_sparse_read_cont(con); +} + static int prepare_read_tail_plain(struct ceph_connection *con) { struct ceph_msg *msg = con->in_msg; @@ -1835,7 +1954,10 @@ static int prepare_read_tail_plain(struct ceph_connection *con) } if (data_len(msg)) { - con->v2.in_state = IN_S_PREPARE_READ_DATA; + if (msg->sparse_read) + con->v2.in_state = IN_S_PREPARE_SPARSE_DATA; + else + con->v2.in_state = IN_S_PREPARE_READ_DATA; } else { add_in_kvec(con, con->v2.in_buf, CEPH_EPILOGUE_PLAIN_LEN); con->v2.in_state = IN_S_HANDLE_EPILOGUE; @@ -2888,6 +3010,12 @@ static int populate_in_iter(struct ceph_connection *con) prepare_read_enc_page(con); ret = 0; break; + case IN_S_PREPARE_SPARSE_DATA: + ret = prepare_sparse_read_data(con); + break; + case IN_S_PREPARE_SPARSE_DATA_CONT: + ret = prepare_sparse_read_cont(con); + break; case IN_S_HANDLE_EPILOGUE: ret = handle_epilogue(con); break; @@ -3479,6 +3607,23 @@ static void revoke_at_prepare_read_enc_page(struct ceph_connection *con) con->v2.in_state = IN_S_FINISH_SKIP; } +static void revoke_at_prepare_sparse_data(struct ceph_connection *con) +{ + int resid; /* current piece of data */ + int remaining; + + WARN_ON(con_secure(con)); + WARN_ON(!data_len(con->in_msg)); + WARN_ON(!iov_iter_is_bvec(&con->v2.in_iter)); + resid = iov_iter_count(&con->v2.in_iter); + dout("%s con %p resid %d\n", __func__, con, resid); + + remaining = CEPH_EPILOGUE_PLAIN_LEN + con->v2.data_len_remain; + con->v2.in_iter.count -= resid; + set_in_skip(con, resid + remaining); + con->v2.in_state = IN_S_FINISH_SKIP; +} + static void revoke_at_handle_epilogue(struct ceph_connection *con) { int resid; @@ -3495,6 +3640,7 @@ static void revoke_at_handle_epilogue(struct ceph_connection *con) void ceph_con_v2_revoke_incoming(struct ceph_connection *con) { switch (con->v2.in_state) { + case IN_S_PREPARE_SPARSE_DATA: case IN_S_PREPARE_READ_DATA: revoke_at_prepare_read_data(con); break; @@ -3504,6 +3650,9 @@ void ceph_con_v2_revoke_incoming(struct ceph_connection *con) case IN_S_PREPARE_READ_ENC_PAGE: revoke_at_prepare_read_enc_page(con); break; + case IN_S_PREPARE_SPARSE_DATA_CONT: + revoke_at_prepare_sparse_data(con); + break; case IN_S_HANDLE_EPILOGUE: revoke_at_handle_epilogue(con); break; From patchwork Wed Apr 12 11:08:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64BDEC7619A for ; Wed, 12 Apr 2023 11:25:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230109AbjDLLZQ (ORCPT ); Wed, 12 Apr 2023 07:25:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231185AbjDLLZG (ORCPT ); Wed, 12 Apr 2023 07:25:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B842B7 for ; Wed, 12 Apr 2023 04:23:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K/nN5nvUsGY3pstbw2xzohA61i1uQoz97reRF91S9JE=; b=F3Ml8RhPqQF3TKBcSC16J+ZN+jOOqy/3KutLueeMQDhgaj0xENiaBMkZ7MNFJVRMmjbDCG Bv2vFQL98K7rFH/lrKLUiNLlIE8z2KAGy9kjY8VgMtTKRKA5tsOIym2Xq3CfF18Xna80Mw s5w+BtFNPYR0y6PhBUwfTbFWv/Bi/Qs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-647-OZE3BjUVPpKunGbmcMp4bg-1; Wed, 12 Apr 2023 07:10:06 -0400 X-MC-Unique: OZE3BjUVPpKunGbmcMp4bg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F22801C05ABB; Wed, 12 Apr 2023 11:10:05 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A46CC15BB8; Wed, 12 Apr 2023 11:10:01 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 04/71] libceph: add sparse read support to OSD client Date: Wed, 12 Apr 2023 19:08:23 +0800 Message-Id: <20230412110930.176835-5-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Have get_reply check for the presence of sparse read ops in the request and set the sparse_read boolean in the msg. That will queue the messenger layer to use the sparse read codepath instead of the normal data receive. Add a new sparse_read operation for the OSD client, driven by its own state machine. The messenger will repeatedly call the sparse_read operation, and it will pass back the necessary info to set up to read the next extent of data, while zero-filling the sparse regions. The state machine will stop at the end of the last extent, and will attach the extent map buffer to the ceph_osd_req_op so that the caller can use it. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/osd_client.h | 32 ++++ net/ceph/osd_client.c | 255 +++++++++++++++++++++++++++++++- 2 files changed, 283 insertions(+), 4 deletions(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 05da1e755b7b..460881c93f9a 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -40,6 +40,36 @@ struct ceph_sparse_extent { u64 len; } __packed; +/* Sparse read state machine state values */ +enum ceph_sparse_read_state { + CEPH_SPARSE_READ_HDR = 0, + CEPH_SPARSE_READ_EXTENTS, + CEPH_SPARSE_READ_DATA_LEN, + CEPH_SPARSE_READ_DATA, +}; + +/* + * A SPARSE_READ reply is a 32-bit count of extents, followed by an array of + * 64-bit offset/length pairs, and then all of the actual file data + * concatenated after it (sans holes). + * + * Unfortunately, we don't know how long the extent array is until we've + * started reading the data section of the reply. The caller should send down + * a destination buffer for the array, but we'll alloc one if it's too small + * or if the caller doesn't. + */ +struct ceph_sparse_read { + enum ceph_sparse_read_state sr_state; /* state machine state */ + u64 sr_req_off; /* orig request offset */ + u64 sr_req_len; /* orig request length */ + u64 sr_pos; /* current pos in buffer */ + int sr_index; /* current extent index */ + __le32 sr_datalen; /* length of actual data */ + u32 sr_count; /* extent count in reply */ + int sr_ext_len; /* length of extent array */ + struct ceph_sparse_extent *sr_extent; /* extent array */ +}; + /* * A given osd we're communicating with. * @@ -48,6 +78,7 @@ struct ceph_sparse_extent { */ struct ceph_osd { refcount_t o_ref; + int o_sparse_op_idx; struct ceph_osd_client *o_osdc; int o_osd; int o_incarnation; @@ -63,6 +94,7 @@ struct ceph_osd { unsigned long lru_ttl; struct list_head o_keepalive_item; struct mutex lock; + struct ceph_sparse_read o_sparse_read; }; #define CEPH_OSD_SLAB_OPS 2 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index e2f1d1dcbb84..8534ca9c39b9 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -376,6 +376,7 @@ static void osd_req_op_data_release(struct ceph_osd_request *osd_req, switch (op->op) { case CEPH_OSD_OP_READ: + case CEPH_OSD_OP_SPARSE_READ: case CEPH_OSD_OP_WRITE: case CEPH_OSD_OP_WRITEFULL: kfree(op->extent.sparse_ext); @@ -670,6 +671,7 @@ static void get_num_data_items(struct ceph_osd_request *req, /* reply */ case CEPH_OSD_OP_STAT: case CEPH_OSD_OP_READ: + case CEPH_OSD_OP_SPARSE_READ: case CEPH_OSD_OP_LIST_WATCHERS: *num_reply_data_items += 1; break; @@ -739,7 +741,7 @@ void osd_req_op_extent_init(struct ceph_osd_request *osd_req, BUG_ON(opcode != CEPH_OSD_OP_READ && opcode != CEPH_OSD_OP_WRITE && opcode != CEPH_OSD_OP_WRITEFULL && opcode != CEPH_OSD_OP_ZERO && - opcode != CEPH_OSD_OP_TRUNCATE); + opcode != CEPH_OSD_OP_TRUNCATE && opcode != CEPH_OSD_OP_SPARSE_READ); op->extent.offset = offset; op->extent.length = length; @@ -964,6 +966,7 @@ static u32 osd_req_encode_op(struct ceph_osd_op *dst, case CEPH_OSD_OP_STAT: break; case CEPH_OSD_OP_READ: + case CEPH_OSD_OP_SPARSE_READ: case CEPH_OSD_OP_WRITE: case CEPH_OSD_OP_WRITEFULL: case CEPH_OSD_OP_ZERO: @@ -1060,7 +1063,8 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, BUG_ON(opcode != CEPH_OSD_OP_READ && opcode != CEPH_OSD_OP_WRITE && opcode != CEPH_OSD_OP_ZERO && opcode != CEPH_OSD_OP_TRUNCATE && - opcode != CEPH_OSD_OP_CREATE && opcode != CEPH_OSD_OP_DELETE); + opcode != CEPH_OSD_OP_CREATE && opcode != CEPH_OSD_OP_DELETE && + opcode != CEPH_OSD_OP_SPARSE_READ); req = ceph_osdc_alloc_request(osdc, snapc, num_ops, use_mempool, GFP_NOFS); @@ -1201,6 +1205,13 @@ static void osd_init(struct ceph_osd *osd) mutex_init(&osd->lock); } +static void ceph_init_sparse_read(struct ceph_sparse_read *sr) +{ + kfree(sr->sr_extent); + memset(sr, '\0', sizeof(*sr)); + sr->sr_state = CEPH_SPARSE_READ_HDR; +} + static void osd_cleanup(struct ceph_osd *osd) { WARN_ON(!RB_EMPTY_NODE(&osd->o_node)); @@ -1211,6 +1222,8 @@ static void osd_cleanup(struct ceph_osd *osd) WARN_ON(!list_empty(&osd->o_osd_lru)); WARN_ON(!list_empty(&osd->o_keepalive_item)); + ceph_init_sparse_read(&osd->o_sparse_read); + if (osd->o_auth.authorizer) { WARN_ON(osd_homeless(osd)); ceph_auth_destroy_authorizer(osd->o_auth.authorizer); @@ -1230,6 +1243,9 @@ static struct ceph_osd *create_osd(struct ceph_osd_client *osdc, int onum) osd_init(osd); osd->o_osdc = osdc; osd->o_osd = onum; + osd->o_sparse_op_idx = -1; + + ceph_init_sparse_read(&osd->o_sparse_read); ceph_con_init(&osd->o_con, osd, &osd_con_ops, &osdc->client->msgr); @@ -2034,6 +2050,7 @@ static void setup_request_data(struct ceph_osd_request *req) &op->raw_data_in); break; case CEPH_OSD_OP_READ: + case CEPH_OSD_OP_SPARSE_READ: ceph_osdc_msg_data_add(reply_msg, &op->extent.osd_data); break; @@ -2453,8 +2470,10 @@ static void finish_request(struct ceph_osd_request *req) req->r_end_latency = ktime_get(); - if (req->r_osd) + if (req->r_osd) { + ceph_init_sparse_read(&req->r_osd->o_sparse_read); unlink_request(req->r_osd, req); + } atomic_dec(&osdc->num_requests); /* @@ -5358,6 +5377,24 @@ static void osd_dispatch(struct ceph_connection *con, struct ceph_msg *msg) ceph_msg_put(msg); } +/* How much sparse data was requested? */ +static u64 sparse_data_requested(struct ceph_osd_request *req) +{ + u64 len = 0; + + if (req->r_flags & CEPH_OSD_FLAG_READ) { + int i; + + for (i = 0; i < req->r_num_ops; ++i) { + struct ceph_osd_req_op *op = &req->r_ops[i]; + + if (op->op == CEPH_OSD_OP_SPARSE_READ) + len += op->extent.length; + } + } + return len; +} + /* * Lookup and return message for incoming reply. Don't try to do * anything about a larger than preallocated data portion of the @@ -5374,6 +5411,7 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, int front_len = le32_to_cpu(hdr->front_len); int data_len = le32_to_cpu(hdr->data_len); u64 tid = le64_to_cpu(hdr->tid); + u64 srlen; down_read(&osdc->lock); if (!osd_registered(osd)) { @@ -5406,7 +5444,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, req->r_reply = m; } - if (data_len > req->r_reply->data_length) { + srlen = sparse_data_requested(req); + if (!srlen && data_len > req->r_reply->data_length) { pr_warn("%s osd%d tid %llu data %d > preallocated %zu, skipping\n", __func__, osd->o_osd, req->r_tid, data_len, req->r_reply->data_length); @@ -5416,6 +5455,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, } m = ceph_msg_get(req->r_reply); + m->sparse_read = (bool)srlen; + dout("get_reply tid %lld %p\n", tid, m); out_unlock_session: @@ -5648,9 +5689,215 @@ static int osd_check_message_signature(struct ceph_msg *msg) return ceph_auth_check_message_signature(auth, msg); } +static void advance_cursor(struct ceph_msg_data_cursor *cursor, size_t len, bool zero) +{ + while (len) { + struct page *page; + size_t poff, plen; + + page = ceph_msg_data_next(cursor, &poff, &plen); + if (plen > len) + plen = len; + if (zero) + zero_user_segment(page, poff, poff + plen); + len -= plen; + ceph_msg_data_advance(cursor, plen); + } +} + +static int prep_next_sparse_read(struct ceph_connection *con, + struct ceph_msg_data_cursor *cursor) +{ + struct ceph_osd *o = con->private; + struct ceph_sparse_read *sr = &o->o_sparse_read; + struct ceph_osd_request *req; + struct ceph_osd_req_op *op; + + spin_lock(&o->o_requests_lock); + req = lookup_request(&o->o_requests, le64_to_cpu(con->in_msg->hdr.tid)); + if (!req) { + spin_unlock(&o->o_requests_lock); + return -EBADR; + } + + if (o->o_sparse_op_idx < 0) { + u64 srlen = sparse_data_requested(req); + + dout("%s: [%d] starting new sparse read req. srlen=0x%llx\n", + __func__, o->o_osd, srlen); + ceph_msg_data_cursor_init(cursor, con->in_msg, srlen); + } else { + u64 end; + + op = &req->r_ops[o->o_sparse_op_idx]; + + WARN_ON_ONCE(op->extent.sparse_ext); + + /* hand back buffer we took earlier */ + op->extent.sparse_ext = sr->sr_extent; + sr->sr_extent = NULL; + op->extent.sparse_ext_cnt = sr->sr_count; + sr->sr_ext_len = 0; + dout("%s: [%d] completed extent array len %d cursor->resid %zd\n", + __func__, o->o_osd, op->extent.sparse_ext_cnt, cursor->resid); + /* Advance to end of data for this operation */ + end = ceph_sparse_ext_map_end(op); + if (end < sr->sr_req_len) + advance_cursor(cursor, sr->sr_req_len - end, false); + } + + ceph_init_sparse_read(sr); + + /* find next op in this request (if any) */ + while (++o->o_sparse_op_idx < req->r_num_ops) { + op = &req->r_ops[o->o_sparse_op_idx]; + if (op->op == CEPH_OSD_OP_SPARSE_READ) + goto found; + } + + /* reset for next sparse read request */ + spin_unlock(&o->o_requests_lock); + o->o_sparse_op_idx = -1; + return 0; +found: + sr->sr_req_off = op->extent.offset; + sr->sr_req_len = op->extent.length; + sr->sr_pos = sr->sr_req_off; + dout("%s: [%d] new sparse read op at idx %d 0x%llx~0x%llx\n", __func__, + o->o_osd, o->o_sparse_op_idx, sr->sr_req_off, sr->sr_req_len); + + /* hand off request's sparse extent map buffer */ + sr->sr_ext_len = op->extent.sparse_ext_cnt; + op->extent.sparse_ext_cnt = 0; + sr->sr_extent = op->extent.sparse_ext; + op->extent.sparse_ext = NULL; + + spin_unlock(&o->o_requests_lock); + return 1; +} + +#ifdef __BIG_ENDIAN +static inline void convert_extent_map(struct ceph_sparse_read *sr) +{ + int i; + + for (i = 0; i < sr->sr_count; i++) { + struct ceph_sparse_extent *ext = &sr->sr_extent[i]; + + ext->off = le64_to_cpu((__force __le64)ext->off); + ext->len = le64_to_cpu((__force __le64)ext->len); + } +} +#else +static inline void convert_extent_map(struct ceph_sparse_read *sr) +{ +} +#endif + +#define MAX_EXTENTS 4096 + +static int osd_sparse_read(struct ceph_connection *con, + struct ceph_msg_data_cursor *cursor, + char **pbuf) +{ + struct ceph_osd *o = con->private; + struct ceph_sparse_read *sr = &o->o_sparse_read; + u32 count = sr->sr_count; + u64 eoff, elen; + int ret; + + switch (sr->sr_state) { + case CEPH_SPARSE_READ_HDR: +next_op: + ret = prep_next_sparse_read(con, cursor); + if (ret <= 0) + return ret; + + /* number of extents */ + ret = sizeof(sr->sr_count); + *pbuf = (char *)&sr->sr_count; + sr->sr_state = CEPH_SPARSE_READ_EXTENTS; + break; + case CEPH_SPARSE_READ_EXTENTS: + /* Convert sr_count to host-endian */ + count = le32_to_cpu((__force __le32)sr->sr_count); + sr->sr_count = count; + dout("[%d] got %u extents\n", o->o_osd, count); + + if (count > 0) { + if (!sr->sr_extent || count > sr->sr_ext_len) { + /* + * Apply a hard cap to the number of extents. + * If we have more, assume something is wrong. + */ + if (count > MAX_EXTENTS) { + dout("%s: OSD returned 0x%x extents in a single reply!\n", + __func__, count); + return -EREMOTEIO; + } + + /* no extent array provided, or too short */ + kfree(sr->sr_extent); + sr->sr_extent = kmalloc_array(count, + sizeof(*sr->sr_extent), + GFP_NOIO); + if (!sr->sr_extent) + return -ENOMEM; + sr->sr_ext_len = count; + } + ret = count * sizeof(*sr->sr_extent); + *pbuf = (char *)sr->sr_extent; + sr->sr_state = CEPH_SPARSE_READ_DATA_LEN; + break; + } + /* No extents? Read data len */ + fallthrough; + case CEPH_SPARSE_READ_DATA_LEN: + convert_extent_map(sr); + ret = sizeof(sr->sr_datalen); + *pbuf = (char *)&sr->sr_datalen; + sr->sr_state = CEPH_SPARSE_READ_DATA; + break; + case CEPH_SPARSE_READ_DATA: + if (sr->sr_index >= count) { + sr->sr_state = CEPH_SPARSE_READ_HDR; + goto next_op; + } + + eoff = sr->sr_extent[sr->sr_index].off; + elen = sr->sr_extent[sr->sr_index].len; + + dout("[%d] ext %d off 0x%llx len 0x%llx\n", + o->o_osd, sr->sr_index, eoff, elen); + + if (elen > INT_MAX) { + dout("Sparse read extent length too long (0x%llx)\n", elen); + return -EREMOTEIO; + } + + /* zero out anything from sr_pos to start of extent */ + if (sr->sr_pos < eoff) + advance_cursor(cursor, eoff - sr->sr_pos, true); + + /* Set position to end of extent */ + sr->sr_pos = eoff + elen; + + /* send back the new length and nullify the ptr */ + cursor->sr_resid = elen; + ret = elen; + *pbuf = NULL; + + /* Bump the array index */ + ++sr->sr_index; + break; + } + return ret; +} + static const struct ceph_connection_operations osd_con_ops = { .get = osd_get_con, .put = osd_put_con, + .sparse_read = osd_sparse_read, .alloc_msg = osd_alloc_msg, .dispatch = osd_dispatch, .fault = osd_fault, From patchwork Wed Apr 12 11:08:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3FACC77B6E for ; Wed, 12 Apr 2023 11:12:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229745AbjDLLMa (ORCPT ); Wed, 12 Apr 2023 07:12:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229721AbjDLLM2 (ORCPT ); Wed, 12 Apr 2023 07:12:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 557497EF2 for ; Wed, 12 Apr 2023 04:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297816; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dtSaDEsZvchqiCmw9DWFV0kmpllOWAWB56w6zWV3gU8=; b=XCoa8GyyMerA5vcwcdi/8BLlmKhSjCMzJUl8999oNCeNlkVlUqXb4dGY335padknE4zDeH B8v9zXNt/WoLW80jEPSBkHF22VaqWbIej0+9l7IDnL1rd3K72wVxDDPSPvg+9IBt4N5eMY ghW0vhcNMQSBNzHdUhvlT9ontV/PG5U= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-454-uoI4-kSiNYm6bu3l6mZ10Q-1; Wed, 12 Apr 2023 07:10:12 -0400 X-MC-Unique: uoI4-kSiNYm6bu3l6mZ10Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EE1088996E1; Wed, 12 Apr 2023 11:10:11 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06D77C15BB8; Wed, 12 Apr 2023 11:10:06 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 05/71] libceph: support sparse reads on msgr2 secure codepath Date: Wed, 12 Apr 2023 19:08:24 +0800 Message-Id: <20230412110930.176835-6-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add a new init_sgs_pages helper that populates the scatterlist from an arbitrary point in an array of pages. Change setup_message_sgs to take an optional pointer to an array of pages. If that's set, then the scatterlist will be set using that array instead of the cursor. When given a sparse read on a secure connection, decrypt the data in-place rather than into the final destination, by passing it the in_enc_pages array. After decrypting, run the sparse_read state machine in a loop, copying data from the decrypted pages until it's complete. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- net/ceph/messenger_v2.c | 119 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 109 insertions(+), 10 deletions(-) diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index a0becd553d7c..a31ac13d5451 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -959,12 +959,48 @@ static void init_sgs_cursor(struct scatterlist **sg, } } +/** + * init_sgs_pages: set up scatterlist on an array of page pointers + * @sg: scatterlist to populate + * @pages: pointer to page array + * @dpos: position in the array to start (bytes) + * @dlen: len to add to sg (bytes) + * @pad: pointer to pad destination (if any) + * + * Populate the scatterlist from the page array, starting at an arbitrary + * byte in the array and running for a specified length. + */ +static void init_sgs_pages(struct scatterlist **sg, struct page **pages, + int dpos, int dlen, u8 *pad) +{ + int idx = dpos >> PAGE_SHIFT; + int off = offset_in_page(dpos); + int resid = dlen; + + do { + int len = min(resid, (int)PAGE_SIZE - off); + + sg_set_page(*sg, pages[idx], len, off); + *sg = sg_next(*sg); + off = 0; + ++idx; + resid -= len; + } while (resid); + + if (need_padding(dlen)) { + sg_set_buf(*sg, pad, padding_len(dlen)); + *sg = sg_next(*sg); + } +} + static int setup_message_sgs(struct sg_table *sgt, struct ceph_msg *msg, u8 *front_pad, u8 *middle_pad, u8 *data_pad, - void *epilogue, bool add_tag) + void *epilogue, struct page **pages, int dpos, + bool add_tag) { struct ceph_msg_data_cursor cursor; struct scatterlist *cur_sg; + int dlen = data_len(msg); int sg_cnt; int ret; @@ -978,9 +1014,15 @@ static int setup_message_sgs(struct sg_table *sgt, struct ceph_msg *msg, if (middle_len(msg)) sg_cnt += calc_sg_cnt(msg->middle->vec.iov_base, middle_len(msg)); - if (data_len(msg)) { - ceph_msg_data_cursor_init(&cursor, msg, data_len(msg)); - sg_cnt += calc_sg_cnt_cursor(&cursor); + if (dlen) { + if (pages) { + sg_cnt += calc_pages_for(dpos, dlen); + if (need_padding(dlen)) + sg_cnt++; + } else { + ceph_msg_data_cursor_init(&cursor, msg, dlen); + sg_cnt += calc_sg_cnt_cursor(&cursor); + } } ret = sg_alloc_table(sgt, sg_cnt, GFP_NOIO); @@ -994,9 +1036,13 @@ static int setup_message_sgs(struct sg_table *sgt, struct ceph_msg *msg, if (middle_len(msg)) init_sgs(&cur_sg, msg->middle->vec.iov_base, middle_len(msg), middle_pad); - if (data_len(msg)) { - ceph_msg_data_cursor_init(&cursor, msg, data_len(msg)); - init_sgs_cursor(&cur_sg, &cursor, data_pad); + if (dlen) { + if (pages) { + init_sgs_pages(&cur_sg, pages, dpos, dlen, data_pad); + } else { + ceph_msg_data_cursor_init(&cursor, msg, dlen); + init_sgs_cursor(&cur_sg, &cursor, data_pad); + } } WARN_ON(!sg_is_last(cur_sg)); @@ -1031,10 +1077,52 @@ static int decrypt_control_remainder(struct ceph_connection *con) padded_len(rem_len) + CEPH_GCM_TAG_LEN); } +/* Process sparse read data that lives in a buffer */ +static int process_v2_sparse_read(struct ceph_connection *con, struct page **pages, int spos) +{ + struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor; + int ret; + + for (;;) { + char *buf = NULL; + + ret = con->ops->sparse_read(con, cursor, &buf); + if (ret <= 0) + return ret; + + dout("%s: sparse_read return %x buf %p\n", __func__, ret, buf); + + do { + int idx = spos >> PAGE_SHIFT; + int soff = offset_in_page(spos); + struct page *spage = con->v2.in_enc_pages[idx]; + int len = min_t(int, ret, PAGE_SIZE - soff); + + if (buf) { + memcpy_from_page(buf, spage, soff, len); + buf += len; + } else { + struct bio_vec bv; + + get_bvec_at(cursor, &bv); + len = min_t(int, len, bv.bv_len); + memcpy_page(bv.bv_page, bv.bv_offset, + spage, soff, len); + ceph_msg_data_advance(cursor, len); + } + spos += len; + ret -= len; + } while (ret); + } +} + static int decrypt_tail(struct ceph_connection *con) { struct sg_table enc_sgt = {}; struct sg_table sgt = {}; + struct page **pages = NULL; + bool sparse = con->in_msg->sparse_read; + int dpos = 0; int tail_len; int ret; @@ -1045,9 +1133,14 @@ static int decrypt_tail(struct ceph_connection *con) if (ret) goto out; + if (sparse) { + dpos = padded_len(front_len(con->in_msg) + padded_len(middle_len(con->in_msg))); + pages = con->v2.in_enc_pages; + } + ret = setup_message_sgs(&sgt, con->in_msg, FRONT_PAD(con->v2.in_buf), - MIDDLE_PAD(con->v2.in_buf), DATA_PAD(con->v2.in_buf), - con->v2.in_buf, true); + MIDDLE_PAD(con->v2.in_buf), DATA_PAD(con->v2.in_buf), + con->v2.in_buf, pages, dpos, true); if (ret) goto out; @@ -1057,6 +1150,12 @@ static int decrypt_tail(struct ceph_connection *con) if (ret) goto out; + if (sparse && data_len(con->in_msg)) { + ret = process_v2_sparse_read(con, con->v2.in_enc_pages, dpos); + if (ret) + goto out; + } + WARN_ON(!con->v2.in_enc_page_cnt); ceph_release_page_vector(con->v2.in_enc_pages, con->v2.in_enc_page_cnt); @@ -1580,7 +1679,7 @@ static int prepare_message_secure(struct ceph_connection *con) encode_epilogue_secure(con, false); ret = setup_message_sgs(&sgt, con->out_msg, zerop, zerop, zerop, - &con->v2.out_epil, false); + &con->v2.out_epil, NULL, 0, false); if (ret) goto out; From patchwork Wed Apr 12 11:08:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208800 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59303C77B6E for ; Wed, 12 Apr 2023 11:12:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbjDLLMZ (ORCPT ); Wed, 12 Apr 2023 07:12:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229745AbjDLLMX (ORCPT ); Wed, 12 Apr 2023 07:12:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD4D67292 for ; Wed, 12 Apr 2023 04:11:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297818; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wFS0stdBhzAYzvMOJKSLzWscmtarNTNR+3mv5w55nCc=; b=fChM+GS8at5kHEwe1EUSegQFnBLcDP1O0ur6jrZK4O65iW625LXhP6YZRHZZbK9ffoNDEO e4k8AD60zwc+jXSBriuLkKzMFRBlAGlL5hBBqWevXR/yLumeRf1SWr8/BH6d48Q2Ruek9o iz8vI4jWZvCgj+efw1d/8ch15BJ2c78= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-450-X_BdbFccN72AwkxizLQ4uQ-1; Wed, 12 Apr 2023 07:10:17 -0400 X-MC-Unique: X_BdbFccN72AwkxizLQ4uQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AF6EF884EC1; Wed, 12 Apr 2023 11:10:16 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id C8F74C15BB8; Wed, 12 Apr 2023 11:10:12 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 06/71] libceph: add sparse read support to msgr1 Date: Wed, 12 Apr 2023 19:08:25 +0800 Message-Id: <20230412110930.176835-7-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add 2 new fields to ceph_connection_v1_info to track the necessary info in sparse reads. Skip initializing the cursor for a sparse read. Break out read_partial_message_section into a wrapper around a new read_partial_message_chunk function that doesn't zero out the crc first. Add new helper functions to drive receiving into the destinations provided by the sparse_read state machine. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/messenger.h | 4 ++ net/ceph/messenger_v1.c | 98 +++++++++++++++++++++++++++++++--- 2 files changed, 94 insertions(+), 8 deletions(-) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 8a6938fa324e..9fd7255172ad 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -336,6 +336,10 @@ struct ceph_connection_v1_info { int in_base_pos; /* bytes read */ + /* sparse reads */ + struct kvec in_sr_kvec; /* current location to receive into */ + u64 in_sr_len; /* amount of data in this extent */ + /* message in temps */ u8 in_tag; /* protocol control byte */ struct ceph_msg_header in_hdr; diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c index d664cb1593a7..f13e96f7f4cc 100644 --- a/net/ceph/messenger_v1.c +++ b/net/ceph/messenger_v1.c @@ -157,9 +157,9 @@ static size_t sizeof_footer(struct ceph_connection *con) static void prepare_message_data(struct ceph_msg *msg, u32 data_len) { - /* Initialize data cursor */ - - ceph_msg_data_cursor_init(&msg->cursor, msg, data_len); + /* Initialize data cursor if it's not a sparse read */ + if (!msg->sparse_read) + ceph_msg_data_cursor_init(&msg->cursor, msg, data_len); } /* @@ -964,9 +964,9 @@ static void process_ack(struct ceph_connection *con) prepare_read_tag(con); } -static int read_partial_message_section(struct ceph_connection *con, - struct kvec *section, - unsigned int sec_len, u32 *crc) +static int read_partial_message_chunk(struct ceph_connection *con, + struct kvec *section, + unsigned int sec_len, u32 *crc) { int ret, left; @@ -982,11 +982,91 @@ static int read_partial_message_section(struct ceph_connection *con, section->iov_len += ret; } if (section->iov_len == sec_len) - *crc = crc32c(0, section->iov_base, section->iov_len); + *crc = crc32c(*crc, section->iov_base, section->iov_len); return 1; } +static inline int read_partial_message_section(struct ceph_connection *con, + struct kvec *section, + unsigned int sec_len, u32 *crc) +{ + *crc = 0; + return read_partial_message_chunk(con, section, sec_len, crc); +} + +static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc) +{ + struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor; + bool do_bounce = ceph_test_opt(from_msgr(con->msgr), RXBOUNCE); + + if (do_bounce && unlikely(!con->bounce_page)) { + con->bounce_page = alloc_page(GFP_NOIO); + if (!con->bounce_page) { + pr_err("failed to allocate bounce page\n"); + return -ENOMEM; + } + } + + while (cursor->sr_resid > 0) { + struct page *page, *rpage; + size_t off, len; + int ret; + + page = ceph_msg_data_next(cursor, &off, &len); + rpage = do_bounce ? con->bounce_page : page; + + /* clamp to what remains in extent */ + len = min_t(int, len, cursor->sr_resid); + ret = ceph_tcp_recvpage(con->sock, rpage, (int)off, len); + if (ret <= 0) + return ret; + *crc = ceph_crc32c_page(*crc, rpage, off, ret); + ceph_msg_data_advance(cursor, (size_t)ret); + cursor->sr_resid -= ret; + if (do_bounce) + memcpy_page(page, off, rpage, off, ret); + } + return 1; +} + +static int read_sparse_msg_data(struct ceph_connection *con) +{ + struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor; + bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC); + u32 crc = 0; + int ret = 1; + + if (do_datacrc) + crc = con->in_data_crc; + + do { + if (con->v1.in_sr_kvec.iov_base) + ret = read_partial_message_chunk(con, + &con->v1.in_sr_kvec, + con->v1.in_sr_len, + &crc); + else if (cursor->sr_resid > 0) + ret = read_sparse_msg_extent(con, &crc); + + if (ret <= 0) { + if (do_datacrc) + con->in_data_crc = crc; + return ret; + } + + memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec)); + ret = con->ops->sparse_read(con, cursor, + (char **)&con->v1.in_sr_kvec.iov_base); + con->v1.in_sr_len = ret; + } while (ret > 0); + + if (do_datacrc) + con->in_data_crc = crc; + + return ret < 0 ? ret : 1; /* must return > 0 to indicate success */ +} + static int read_partial_msg_data(struct ceph_connection *con) { struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor; @@ -1177,7 +1257,9 @@ static int read_partial_message(struct ceph_connection *con) if (!m->num_data_items) return -EIO; - if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) + if (m->sparse_read) + ret = read_sparse_msg_data(con); + else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) ret = read_partial_msg_data_bounce(con); else ret = read_partial_msg_data(con); From patchwork Wed Apr 12 11:08:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208803 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55721C7619A for ; Wed, 12 Apr 2023 11:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229710AbjDLLMg (ORCPT ); Wed, 12 Apr 2023 07:12:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229775AbjDLLMe (ORCPT ); Wed, 12 Apr 2023 07:12:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38EDB7ED1 for ; Wed, 12 Apr 2023 04:11:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297824; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3NgKuesa9DeIR7p+BNpXzkC498u9MHe6h6GDUOuUIfQ=; b=XKPjCVLCI6k0gcoeVCKwaSstR5vnw1cXWKLGLUaN4QcwcWe02Ac/QCT28+tj/ogNEIXx5K SEN5hEeDUfIWhiW1xfVStMqBG3E0NVMrP65HAJYqPmi7I/ELIOCT0Hd0VuvXdIfSNVl4kQ 2DXXWORJ7R/UL5Y6ZLD5RDTe9XgHGlg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-208-cqSHEyvAOfWAv52OfioS0Q-1; Wed, 12 Apr 2023 07:10:21 -0400 X-MC-Unique: cqSHEyvAOfWAv52OfioS0Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 42FC4858F09; Wed, 12 Apr 2023 11:10:21 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 81994C15BB8; Wed, 12 Apr 2023 11:10:17 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 07/71] ceph: add new mount option to enable sparse reads Date: Wed, 12 Apr 2023 19:08:26 +0800 Message-Id: <20230412110930.176835-8-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add a new mount option that has the client issue sparse reads instead of normal ones. The callers now preallocate an sparse extent buffer that the libceph receive code can populate and hand back after the operation completes. After a successful sparse read, we can't use the req->r_result value to determine the amount of data "read", so instead we set the received length to be from the end of the last extent in the buffer. Any interstitial holes will have been filled by the receive code. [Xiubo: fix a double free for req bug reported by Ilya] Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/addr.c | 15 +++++++++++++-- fs/ceph/file.c | 51 +++++++++++++++++++++++++++++++++++++++++-------- fs/ceph/super.c | 16 +++++++++++++++- fs/ceph/super.h | 1 + 4 files changed, 72 insertions(+), 11 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index d5335f445233..7e6abec82be8 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -219,8 +219,10 @@ static void finish_netfs_read(struct ceph_osd_request *req) struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode); struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0); struct netfs_io_subrequest *subreq = req->r_priv; + struct ceph_osd_req_op *op = &req->r_ops[0]; int num_pages; int err = req->r_result; + bool sparse = (op->op == CEPH_OSD_OP_SPARSE_READ); ceph_update_read_metrics(&fsc->mdsc->metric, req->r_start_latency, req->r_end_latency, osd_data->length, err); @@ -229,7 +231,9 @@ static void finish_netfs_read(struct ceph_osd_request *req) subreq->len, i_size_read(req->r_inode)); /* no object means success but no data */ - if (err == -ENOENT) + if (sparse && err >= 0) + err = ceph_sparse_ext_map_end(op); + else if (err == -ENOENT) err = 0; else if (err == -EBLOCKLISTED) fsc->blocklisted = true; @@ -312,6 +316,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) size_t page_off; int err = 0; u64 len = subreq->len; + bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); if (ceph_inode_is_shutdown(inode)) { err = -EIO; @@ -322,7 +327,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) return; req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, subreq->start, &len, - 0, 1, CEPH_OSD_OP_READ, + 0, 1, sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ | fsc->client->osdc.client->options->read_from_replica, NULL, ci->i_truncate_seq, ci->i_truncate_size, false); if (IS_ERR(req)) { @@ -331,6 +336,12 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) goto out; } + if (sparse) { + err = ceph_alloc_sparse_ext_map(&req->r_ops[0]); + if (err) + goto out; + } + dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len); iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, subreq->start, len); err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index f4d8bf7dec88..1e2a306aafe4 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -935,6 +935,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, u64 off = iocb->ki_pos; u64 len = iov_iter_count(to); u64 i_size = i_size_read(inode); + bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len, (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); @@ -961,10 +962,12 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, bool more; int idx; size_t left; + struct ceph_osd_req_op *op; req = ceph_osdc_new_request(osdc, &ci->i_layout, ci->i_vino, off, &len, 0, 1, - CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ, + sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ, + CEPH_OSD_FLAG_READ, NULL, ci->i_truncate_seq, ci->i_truncate_size, false); if (IS_ERR(req)) { @@ -985,6 +988,16 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off, false, false); + + op = &req->r_ops[0]; + if (sparse) { + ret = ceph_alloc_sparse_ext_map(op); + if (ret) { + ceph_osdc_put_request(req); + break; + } + } + ceph_osdc_start_request(osdc, req); ret = ceph_osdc_wait_request(osdc, req); @@ -993,19 +1006,24 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, req->r_end_latency, len, ret); - ceph_osdc_put_request(req); - i_size = i_size_read(inode); dout("sync_read %llu~%llu got %zd i_size %llu%s\n", off, len, ret, i_size, (more ? " MORE" : "")); - if (ret == -ENOENT) + /* Fix it to go to end of extent map */ + if (sparse && ret >= 0) + ret = ceph_sparse_ext_map_end(op); + else if (ret == -ENOENT) ret = 0; + + ceph_osdc_put_request(req); + if (ret >= 0 && ret < len && (off + ret < i_size)) { int zlen = min(len - ret, i_size - off - ret); int zoff = page_off + ret; + dout("sync_read zero gap %llu~%llu\n", - off + ret, off + ret + zlen); + off + ret, off + ret + zlen); ceph_zero_page_vector_range(zoff, zlen, pages); ret += zlen; } @@ -1124,8 +1142,10 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req) struct inode *inode = req->r_inode; struct ceph_aio_request *aio_req = req->r_priv; struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0); + struct ceph_osd_req_op *op = &req->r_ops[0]; struct ceph_client_metric *metric = &ceph_sb_to_mdsc(inode->i_sb)->metric; unsigned int len = osd_data->bvec_pos.iter.bi_size; + bool sparse = (op->op == CEPH_OSD_OP_SPARSE_READ); BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS); BUG_ON(!osd_data->num_bvecs); @@ -1146,6 +1166,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req) } rc = -ENOMEM; } else if (!aio_req->write) { + if (sparse && rc >= 0) + rc = ceph_sparse_ext_map_end(op); if (rc == -ENOENT) rc = 0; if (rc >= 0 && len > rc) { @@ -1282,6 +1304,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, loff_t pos = iocb->ki_pos; bool write = iov_iter_rw(iter) == WRITE; bool should_dirty = !write && user_backed_iter(iter); + bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP) return -EROFS; @@ -1309,6 +1332,8 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, while (iov_iter_count(iter) > 0) { u64 size = iov_iter_count(iter); ssize_t len; + struct ceph_osd_req_op *op; + int readop = sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ; if (write) size = min_t(u64, size, fsc->mount_options->wsize); @@ -1319,8 +1344,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, pos, &size, 0, 1, - write ? CEPH_OSD_OP_WRITE : - CEPH_OSD_OP_READ, + write ? CEPH_OSD_OP_WRITE : readop, flags, snapc, ci->i_truncate_seq, ci->i_truncate_size, @@ -1371,6 +1395,14 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, } osd_req_op_extent_osd_data_bvecs(req, 0, bvecs, num_pages, len); + op = &req->r_ops[0]; + if (sparse) { + ret = ceph_alloc_sparse_ext_map(op); + if (ret) { + ceph_osdc_put_request(req); + break; + } + } if (aio_req) { aio_req->total_len += len; @@ -1398,8 +1430,11 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, size = i_size_read(inode); if (!write) { - if (ret == -ENOENT) + if (sparse && ret >= 0) + ret = ceph_sparse_ext_map_end(op); + else if (ret == -ENOENT) ret = 0; + if (ret >= 0 && ret < len && pos + ret < size) { struct iov_iter i; int zlen = min_t(size_t, len - ret, diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 3fc48b43cab0..7aaf40669509 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -165,6 +165,7 @@ enum { Opt_copyfrom, Opt_wsync, Opt_pagecache, + Opt_sparseread, }; enum ceph_recover_session_mode { @@ -207,6 +208,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = { fsparam_u32 ("wsize", Opt_wsize), fsparam_flag_no ("wsync", Opt_wsync), fsparam_flag_no ("pagecache", Opt_pagecache), + fsparam_flag_no ("sparseread", Opt_sparseread), {} }; @@ -576,6 +578,12 @@ static int ceph_parse_mount_param(struct fs_context *fc, else fsopt->flags &= ~CEPH_MOUNT_OPT_NOPAGECACHE; break; + case Opt_sparseread: + if (result.negated) + fsopt->flags &= ~CEPH_MOUNT_OPT_SPARSEREAD; + else + fsopt->flags |= CEPH_MOUNT_OPT_SPARSEREAD; + break; default: BUG(); } @@ -710,9 +718,10 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) if (!(fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)) seq_puts(m, ",wsync"); - if (fsopt->flags & CEPH_MOUNT_OPT_NOPAGECACHE) seq_puts(m, ",nopagecache"); + if (fsopt->flags & CEPH_MOUNT_OPT_SPARSEREAD) + seq_puts(m, ",sparseread"); if (fsopt->wsize != CEPH_MAX_WRITE_SIZE) seq_printf(m, ",wsize=%u", fsopt->wsize); @@ -1296,6 +1305,11 @@ static int ceph_reconfigure_fc(struct fs_context *fc) else ceph_clear_mount_opt(fsc, ASYNC_DIROPS); + if (fsopt->flags & CEPH_MOUNT_OPT_SPARSEREAD) + ceph_set_mount_opt(fsc, SPARSEREAD); + else + ceph_clear_mount_opt(fsc, SPARSEREAD); + if (strcmp_null(fsc->mount_options->mon_addr, fsopt->mon_addr)) { kfree(fsc->mount_options->mon_addr); fsc->mount_options->mon_addr = fsopt->mon_addr; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 0fe76427d314..5d6a1872fef5 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -42,6 +42,7 @@ #define CEPH_MOUNT_OPT_NOCOPYFROM (1<<14) /* don't use RADOS 'copy-from' op */ #define CEPH_MOUNT_OPT_ASYNC_DIROPS (1<<15) /* allow async directory ops */ #define CEPH_MOUNT_OPT_NOPAGECACHE (1<<16) /* bypass pagecache altogether */ +#define CEPH_MOUNT_OPT_SPARSEREAD (1<<17) /* always do sparse reads */ #define CEPH_MOUNT_OPT_DEFAULT \ (CEPH_MOUNT_OPT_DCACHE | \ From patchwork Wed Apr 12 11:08:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59CCCC77B6E for ; Wed, 12 Apr 2023 11:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229909AbjDLLMq (ORCPT ); Wed, 12 Apr 2023 07:12:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbjDLLMp (ORCPT ); Wed, 12 Apr 2023 07:12:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD00283E2 for ; Wed, 12 Apr 2023 04:11:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KMYBsFykNygeYVW3EeZY7i6UE4aJDXgvxoDpF1dzwWo=; b=UVW8l5a7HOZ1t5ijqRA/FSfYcO1AOpS49jjOh+OVThIQk58dVe5GlsnsFDldrKRLL0S6qn Ye/m75NUlta/dZJxG/DhgkrfICM7rg2LCa2FDeqLum7iYYrPfU8ehjZctQ4U9ewfoO1sjI T8pFLrtvlLsoiBqrtptrGcuoVgKliD8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-385-Ehkm5g06NV2nrJz3P7-gxA-1; Wed, 12 Apr 2023 07:10:26 -0400 X-MC-Unique: Ehkm5g06NV2nrJz3P7-gxA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DDA3F185A78B; Wed, 12 Apr 2023 11:10:25 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E6B6C15BB8; Wed, 12 Apr 2023 11:10:21 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 08/71] ceph: preallocate inode for ops that may create one Date: Wed, 12 Apr 2023 19:08:27 +0800 Message-Id: <20230412110930.176835-9-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When creating a new inode, we need to determine the crypto context before we can transmit the RPC. The fscrypt API has a routine for getting a crypto context before a create occurs, but it requires an inode. Change the ceph code to preallocate an inode in advance of a create of any sort (open(), mknod(), symlink(), etc). Move the existing code that generates the ACL and SELinux blobs into this routine since that's mostly common across all the different codepaths. In most cases, we just want to allow ceph_fill_trace to use that inode after the reply comes in, so add a new field to the MDS request for it (r_new_inode). The async create codepath is a bit different though. In that case, we want to hash the inode in advance of the RPC so that it can be used before the reply comes in. If the call subsequently fails with -EJUKEBOX, then just put the references and clean up the as_ctx. Note that with this change, we now need to regenerate the as_ctx when this occurs, but it's quite rare for it to happen. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 70 ++++++++++++++++++++----------------- fs/ceph/file.c | 59 ++++++++++++++++++------------- fs/ceph/inode.c | 82 ++++++++++++++++++++++++++++++++++++++++---- fs/ceph/mds_client.c | 13 +++++-- fs/ceph/mds_client.h | 1 + fs/ceph/super.h | 7 +++- 6 files changed, 167 insertions(+), 65 deletions(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 0ced8b570e42..ccbbd39a5eb9 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -865,13 +865,6 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir, goto out; } - err = ceph_pre_init_acls(dir, &mode, &as_ctx); - if (err < 0) - goto out; - err = ceph_security_init_secctx(dentry, mode, &as_ctx); - if (err < 0) - goto out; - dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n", dir, dentry, mode, rdev); req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS); @@ -879,6 +872,14 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir, err = PTR_ERR(req); goto out; } + + req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx); + if (IS_ERR(req->r_new_inode)) { + err = PTR_ERR(req->r_new_inode); + req->r_new_inode = NULL; + goto out_req; + } + req->r_dentry = dget(dentry); req->r_num_caps = 2; req->r_parent = dir; @@ -888,13 +889,13 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir, req->r_args.mknod.rdev = cpu_to_le32(rdev); req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL; req->r_dentry_unless = CEPH_CAP_FILE_EXCL; - if (as_ctx.pagelist) { - req->r_pagelist = as_ctx.pagelist; - as_ctx.pagelist = NULL; - } + + ceph_as_ctx_to_req(req, &as_ctx); + err = ceph_mdsc_do_request(mdsc, dir, req); if (!err && !req->r_reply_info.head->is_dentry) err = ceph_handle_notrace_create(dir, dentry); +out_req: ceph_mdsc_put_request(req); out: if (!err) @@ -917,6 +918,7 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir, struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb); struct ceph_mds_request *req; struct ceph_acl_sec_ctx as_ctx = {}; + umode_t mode = S_IFLNK | 0777; int err; if (ceph_snap(dir) != CEPH_NOSNAP) @@ -931,21 +933,24 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir, goto out; } - err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx); - if (err < 0) - goto out; - dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest); req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS); if (IS_ERR(req)) { err = PTR_ERR(req); goto out; } + + req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx); + if (IS_ERR(req->r_new_inode)) { + err = PTR_ERR(req->r_new_inode); + req->r_new_inode = NULL; + goto out_req; + } + req->r_path2 = kstrdup(dest, GFP_KERNEL); if (!req->r_path2) { err = -ENOMEM; - ceph_mdsc_put_request(req); - goto out; + goto out_req; } req->r_parent = dir; ihold(dir); @@ -955,13 +960,13 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir, req->r_num_caps = 2; req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL; req->r_dentry_unless = CEPH_CAP_FILE_EXCL; - if (as_ctx.pagelist) { - req->r_pagelist = as_ctx.pagelist; - as_ctx.pagelist = NULL; - } + + ceph_as_ctx_to_req(req, &as_ctx); + err = ceph_mdsc_do_request(mdsc, dir, req); if (!err && !req->r_reply_info.head->is_dentry) err = ceph_handle_notrace_create(dir, dentry); +out_req: ceph_mdsc_put_request(req); out: if (err) @@ -1002,13 +1007,6 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir, goto out; } - mode |= S_IFDIR; - err = ceph_pre_init_acls(dir, &mode, &as_ctx); - if (err < 0) - goto out; - err = ceph_security_init_secctx(dentry, mode, &as_ctx); - if (err < 0) - goto out; req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); if (IS_ERR(req)) { @@ -1016,6 +1014,14 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir, goto out; } + mode |= S_IFDIR; + req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx); + if (IS_ERR(req->r_new_inode)) { + err = PTR_ERR(req->r_new_inode); + req->r_new_inode = NULL; + goto out_req; + } + req->r_dentry = dget(dentry); req->r_num_caps = 2; req->r_parent = dir; @@ -1024,15 +1030,15 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir, req->r_args.mkdir.mode = cpu_to_le32(mode); req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL; req->r_dentry_unless = CEPH_CAP_FILE_EXCL; - if (as_ctx.pagelist) { - req->r_pagelist = as_ctx.pagelist; - as_ctx.pagelist = NULL; - } + + ceph_as_ctx_to_req(req, &as_ctx); + err = ceph_mdsc_do_request(mdsc, dir, req); if (!err && !req->r_reply_info.head->is_target && !req->r_reply_info.head->is_dentry) err = ceph_handle_notrace_create(dir, dentry); +out_req: ceph_mdsc_put_request(req); out: if (!err) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 1e2a306aafe4..e4dadad50f9b 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -604,7 +604,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc, ceph_mdsc_release_dir_caps(req); } -static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, +static int ceph_finish_async_create(struct inode *dir, struct inode *inode, + struct dentry *dentry, struct file *file, umode_t mode, struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx, @@ -616,7 +617,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, struct ceph_mds_reply_info_in iinfo = { .in = &in }; struct ceph_inode_info *ci = ceph_inode(dir); struct ceph_dentry_info *di = ceph_dentry(dentry); - struct inode *inode; struct timespec64 now; struct ceph_string *pool_ns; struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb); @@ -625,10 +625,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, ktime_get_real_ts64(&now); - inode = ceph_get_inode(dentry->d_sb, vino); - if (IS_ERR(inode)) - return PTR_ERR(inode); - iinfo.inline_version = CEPH_INLINE_NONE; iinfo.change_attr = 1; ceph_encode_timespec64(&iinfo.btime, &now); @@ -686,8 +682,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, ceph_dir_clear_complete(dir); if (!d_unhashed(dentry)) d_drop(dentry); - if (inode->i_state & I_NEW) - discard_new_inode(inode); + discard_new_inode(inode); } else { struct dentry *dn; @@ -733,6 +728,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb); struct ceph_mds_client *mdsc = fsc->mdsc; struct ceph_mds_request *req; + struct inode *new_inode = NULL; struct dentry *dn; struct ceph_acl_sec_ctx as_ctx = {}; bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS); @@ -755,15 +751,16 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, */ flags &= ~O_TRUNC; +retry: if (flags & O_CREAT) { if (ceph_quota_is_max_files_exceeded(dir)) return -EDQUOT; - err = ceph_pre_init_acls(dir, &mode, &as_ctx); - if (err < 0) - return err; - err = ceph_security_init_secctx(dentry, mode, &as_ctx); - if (err < 0) + + new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx); + if (IS_ERR(new_inode)) { + err = PTR_ERR(new_inode); goto out_ctx; + } /* Async create can't handle more than a page of xattrs */ if (as_ctx.pagelist && !list_is_singular(&as_ctx.pagelist->head)) @@ -772,7 +769,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, /* If it's not being looked up, it's negative */ return -ENOENT; } -retry: + /* do the open */ req = prepare_open_request(dir->i_sb, flags, mode); if (IS_ERR(req)) { @@ -793,32 +790,45 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL; req->r_dentry_unless = CEPH_CAP_FILE_EXCL; - if (as_ctx.pagelist) { - req->r_pagelist = as_ctx.pagelist; - as_ctx.pagelist = NULL; - } - if (try_async && - (req->r_dir_caps = - try_prep_async_create(dir, dentry, &lo, - &req->r_deleg_ino))) { + + ceph_as_ctx_to_req(req, &as_ctx); + + if (try_async && (req->r_dir_caps = + try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) { + struct ceph_vino vino = { .ino = req->r_deleg_ino, + .snap = CEPH_NOSNAP }; struct ceph_dentry_info *di = ceph_dentry(dentry); set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags); req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL); req->r_callback = ceph_async_create_cb; + /* Hash inode before RPC */ + new_inode = ceph_get_inode(dir->i_sb, vino, new_inode); + if (IS_ERR(new_inode)) { + err = PTR_ERR(new_inode); + new_inode = NULL; + goto out_req; + } + WARN_ON_ONCE(!(new_inode->i_state & I_NEW)); + spin_lock(&dentry->d_lock); di->flags |= CEPH_DENTRY_ASYNC_CREATE; spin_unlock(&dentry->d_lock); err = ceph_mdsc_submit_request(mdsc, dir, req); if (!err) { - err = ceph_finish_async_create(dir, dentry, + err = ceph_finish_async_create(dir, new_inode, dentry, file, mode, req, &as_ctx, &lo); + new_inode = NULL; } else if (err == -EJUKEBOX) { restore_deleg_ino(dir, req->r_deleg_ino); ceph_mdsc_put_request(req); + discard_new_inode(new_inode); + ceph_release_acl_sec_ctx(&as_ctx); + memset(&as_ctx, 0, sizeof(as_ctx)); + new_inode = NULL; try_async = false; ceph_put_string(rcu_dereference_raw(lo.pool_ns)); goto retry; @@ -829,6 +839,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, } set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags); + req->r_new_inode = new_inode; + new_inode = NULL; err = ceph_mdsc_do_request(mdsc, (flags & O_CREAT) ? dir : NULL, req); if (err == -ENOENT) { dentry = ceph_handle_snapdir(req, dentry); @@ -869,6 +881,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, } out_req: ceph_mdsc_put_request(req); + iput(new_inode); out_ctx: ceph_release_acl_sec_ctx(&as_ctx); dout("atomic_open result=%d\n", err); diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 30ff541c919e..28ce848fcb4a 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -52,17 +52,85 @@ static int ceph_set_ino_cb(struct inode *inode, void *data) return 0; } -struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino) +/** + * ceph_new_inode - allocate a new inode in advance of an expected create + * @dir: parent directory for new inode + * @dentry: dentry that may eventually point to new inode + * @mode: mode of new inode + * @as_ctx: pointer to inherited security context + * + * Allocate a new inode in advance of an operation to create a new inode. + * This allocates the inode and sets up the acl_sec_ctx with appropriate + * info for the new inode. + * + * Returns a pointer to the new inode or an ERR_PTR. + */ +struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry, + umode_t *mode, struct ceph_acl_sec_ctx *as_ctx) +{ + int err; + struct inode *inode; + + inode = new_inode(dir->i_sb); + if (!inode) + return ERR_PTR(-ENOMEM); + + if (!S_ISLNK(*mode)) { + err = ceph_pre_init_acls(dir, mode, as_ctx); + if (err < 0) + goto out_err; + } + + err = ceph_security_init_secctx(dentry, *mode, as_ctx); + if (err < 0) + goto out_err; + + inode->i_state = 0; + inode->i_mode = *mode; + return inode; +out_err: + iput(inode); + return ERR_PTR(err); +} + +void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx) +{ + if (as_ctx->pagelist) { + req->r_pagelist = as_ctx->pagelist; + as_ctx->pagelist = NULL; + } +} + +/** + * ceph_get_inode - find or create/hash a new inode + * @sb: superblock to search and allocate in + * @vino: vino to search for + * @newino: optional new inode to insert if one isn't found (may be NULL) + * + * Search for or insert a new inode into the hash for the given vino, and return a + * reference to it. If new is non-NULL, its reference is consumed. + */ +struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino) { struct inode *inode; if (ceph_vino_is_reserved(vino)) return ERR_PTR(-EREMOTEIO); - inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare, - ceph_set_ino_cb, &vino); - if (!inode) + if (newino) { + inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare, + ceph_set_ino_cb, &vino); + if (inode != newino) + iput(newino); + } else { + inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare, + ceph_set_ino_cb, &vino); + } + + if (!inode) { + dout("No inode found for %llx.%llx\n", vino.ino, vino.snap); return ERR_PTR(-ENOMEM); + } dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode), ceph_vinop(inode), inode, !!(inode->i_state & I_NEW)); @@ -78,7 +146,7 @@ struct inode *ceph_get_snapdir(struct inode *parent) .ino = ceph_ino(parent), .snap = CEPH_SNAPDIR, }; - struct inode *inode = ceph_get_inode(parent->i_sb, vino); + struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL); struct ceph_inode_info *ci = ceph_inode(inode); if (IS_ERR(inode)) @@ -1552,7 +1620,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req, vino.ino = le64_to_cpu(rde->inode.in->ino); vino.snap = le64_to_cpu(rde->inode.in->snapid); - in = ceph_get_inode(req->r_dentry->d_sb, vino); + in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL); if (IS_ERR(in)) { err = PTR_ERR(in); dout("new_inode badness got %d\n", err); @@ -1754,7 +1822,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, if (d_really_is_positive(dn)) { in = d_inode(dn); } else { - in = ceph_get_inode(parent->d_sb, tvino); + in = ceph_get_inode(parent->d_sb, tvino, NULL); if (IS_ERR(in)) { dout("new_inode badness\n"); d_drop(dn); diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index ecaf0ef11bf2..dcb25724b693 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -944,6 +944,7 @@ void ceph_mdsc_release_request(struct kref *kref) iput(req->r_parent); } iput(req->r_target_inode); + iput(req->r_new_inode); if (req->r_dentry) dput(req->r_dentry); if (req->r_old_dentry) @@ -3366,13 +3367,21 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) /* Must find target inode outside of mutexes to avoid deadlocks */ if ((err >= 0) && rinfo->head->is_target) { - struct inode *in; + struct inode *in = xchg(&req->r_new_inode, NULL); struct ceph_vino tvino = { .ino = le64_to_cpu(rinfo->targeti.in->ino), .snap = le64_to_cpu(rinfo->targeti.in->snapid) }; - in = ceph_get_inode(mdsc->fsc->sb, tvino); + /* If we ended up opening an existing inode, discard r_new_inode */ + if (req->r_op == CEPH_MDS_OP_CREATE && !req->r_reply_info.has_create_ino) { + /* This should never happen on an async create */ + WARN_ON_ONCE(req->r_deleg_ino); + iput(in); + in = NULL; + } + + in = ceph_get_inode(mdsc->fsc->sb, tvino, in); if (IS_ERR(in)) { err = PTR_ERR(in); mutex_lock(&session->s_mutex); diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 0598faa50e2e..ceef487abf7f 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -263,6 +263,7 @@ struct ceph_mds_request { struct inode *r_parent; /* parent dir inode */ struct inode *r_target_inode; /* resulting inode */ + struct inode *r_new_inode; /* new inode (for creates) */ #define CEPH_MDS_R_DIRECT_IS_HASH (1) /* r_direct_hash is valid */ #define CEPH_MDS_R_ABORTED (2) /* call was aborted */ diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 5d6a1872fef5..9eaa4a27b92d 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -988,6 +988,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci) /* inode.c */ struct ceph_mds_reply_info_in; struct ceph_mds_reply_dirfrag; +struct ceph_acl_sec_ctx; extern const struct inode_operations ceph_file_iops; @@ -995,8 +996,12 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb); extern void ceph_evict_inode(struct inode *inode); extern void ceph_free_inode(struct inode *inode); +struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry, + umode_t *mode, struct ceph_acl_sec_ctx *as_ctx); +void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx); + extern struct inode *ceph_get_inode(struct super_block *sb, - struct ceph_vino vino); + struct ceph_vino vino, struct inode *newino); extern struct inode *ceph_get_snapdir(struct inode *parent); extern int ceph_fill_file_size(struct inode *inode, int issued, u32 truncate_seq, u64 truncate_size, u64 size); From patchwork Wed Apr 12 11:08:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32D8DC77B6E for ; Wed, 12 Apr 2023 11:12:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229788AbjDLLMe (ORCPT ); Wed, 12 Apr 2023 07:12:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229721AbjDLLMd (ORCPT ); Wed, 12 Apr 2023 07:12:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C852783E7 for ; Wed, 12 Apr 2023 04:11:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LlTCD3+Z87yxA2E+H+VHEghSNKkZRUv9GKLQzyqryIk=; b=doTgu+Uxi/pjH3vWkb2Iulx0n2KFkbA2Yd0/MuZT4+6NpgNJEH3/WdvkSG3xV0h5JM+Zeu Cf2ef7QMpERZW2JEAPJ5mz2ICQSoLAv1vQ0BySw6GN+LGAYKSbgXEhuDf/MaFp4iVGVaLG JMBF020rfB7WGCmQ7MK6LE1V4++8+5k= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-578-3dQxY2AnO0usHskVMMOaWg-1; Wed, 12 Apr 2023 07:10:31 -0400 X-MC-Unique: 3dQxY2AnO0usHskVMMOaWg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9D0C028237F3; Wed, 12 Apr 2023 11:10:30 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id AEBE4C15BB8; Wed, 12 Apr 2023 11:10:26 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 09/71] ceph: make ceph_msdc_build_path use ref-walk Date: Wed, 12 Apr 2023 19:08:28 +0800 Message-Id: <20230412110930.176835-10-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Encryption potentially requires allocation, at which point we'll need to be in a non-atomic context. Convert ceph_msdc_build_path to take dentry spinlocks and references instead of using rcu_read_lock to walk the path. This is slightly less efficient, and we may want to eventually allow using RCU when the leaf dentry isn't encrypted. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/mds_client.c | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index dcb25724b693..0aa2c97a0460 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2400,7 +2400,8 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc) char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int stop_on_nosnap) { - struct dentry *temp; + struct dentry *cur; + struct inode *inode; char *path; int pos; unsigned seq; @@ -2417,34 +2418,35 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, path[pos] = '\0'; seq = read_seqbegin(&rename_lock); - rcu_read_lock(); - temp = dentry; + cur = dget(dentry); for (;;) { - struct inode *inode; + struct dentry *temp; - spin_lock(&temp->d_lock); - inode = d_inode(temp); + spin_lock(&cur->d_lock); + inode = d_inode(cur); if (inode && ceph_snap(inode) == CEPH_SNAPDIR) { dout("build_path path+%d: %p SNAPDIR\n", - pos, temp); - } else if (stop_on_nosnap && inode && dentry != temp && + pos, cur); + } else if (stop_on_nosnap && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) { - spin_unlock(&temp->d_lock); + spin_unlock(&cur->d_lock); pos++; /* get rid of any prepended '/' */ break; } else { - pos -= temp->d_name.len; + pos -= cur->d_name.len; if (pos < 0) { - spin_unlock(&temp->d_lock); + spin_unlock(&cur->d_lock); break; } - memcpy(path + pos, temp->d_name.name, temp->d_name.len); + memcpy(path + pos, cur->d_name.name, cur->d_name.len); } + temp = cur; spin_unlock(&temp->d_lock); - temp = READ_ONCE(temp->d_parent); + cur = dget_parent(temp); + dput(temp); /* Are we at the root? */ - if (IS_ROOT(temp)) + if (IS_ROOT(cur)) break; /* Are we out of buffer? */ @@ -2453,8 +2455,9 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, path[pos] = '/'; } - base = ceph_ino(d_inode(temp)); - rcu_read_unlock(); + inode = d_inode(cur); + base = inode ? ceph_ino(inode) : 0; + dput(cur); if (read_seqretry(&rename_lock, seq)) goto retry; From patchwork Wed Apr 12 11:08:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DD2FC77B6E for ; Wed, 12 Apr 2023 11:28:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231173AbjDLL2w (ORCPT ); Wed, 12 Apr 2023 07:28:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231137AbjDLL2W (ORCPT ); Wed, 12 Apr 2023 07:28:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 380AE7D92 for ; Wed, 12 Apr 2023 04:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298826; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Tq6og1gD4cWcZZzfNpB0nynWplvwZW81PZxhNq0PL0=; b=Z8K8BP35GqnGhf6QlhSs+DjSUQSsCfAj1hUjmZ27j24ffs2L+5wUJtqO3GoYeE13T34IQ2 4SG1+XGgTwLqtV/WfMFed5oMB5Szj7z6ihU/qcahxALlnGVW9osgu/SvL5bHFPLgKwQm7a 6XriST6CIgFPeCHQww/rl90DQMg2cj4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-w5PjhoGiPVyo2tiWh0L1AA-1; Wed, 12 Apr 2023 07:10:37 -0400 X-MC-Unique: w5PjhoGiPVyo2tiWh0L1AA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 955A7101A552; Wed, 12 Apr 2023 11:10:36 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7FE7AC15BBA; Wed, 12 Apr 2023 11:10:31 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Al Viro , David Howells , Xiubo Li Subject: [PATCH v18 10/71] libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type Date: Wed, 12 Apr 2023 19:08:29 +0800 Message-Id: <20230412110930.176835-11-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor. Instead of requiring a list of pages or bvecs, we can just use an iov_iter directly, and avoid extra allocations. We assume that the pages represented by the iter are pinned such that they shouldn't incur page faults, which is the case for the iov_iters created by netfs. While working on this, Al Viro informed me that he was going to change iov_iter_get_pages to auto-advance the iterator as that pattern is more or less required for ITER_PIPE anyway. We emulate that here for now by advancing in the _next op and tracking that amount in the "lastlen" field. In the event that _next is called twice without an intervening _advance, we revert the iov_iter by the remaining lastlen before calling iov_iter_get_pages. Cc: Al Viro Cc: David Howells Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Jeff Layton Signed-off-by: Xiubo Li --- include/linux/ceph/messenger.h | 8 ++++ include/linux/ceph/osd_client.h | 4 ++ net/ceph/messenger.c | 78 +++++++++++++++++++++++++++++++++ net/ceph/osd_client.c | 27 ++++++++++++ 4 files changed, 117 insertions(+) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 9fd7255172ad..2eaaabbe98cb 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -123,6 +123,7 @@ enum ceph_msg_data_type { CEPH_MSG_DATA_BIO, /* data source/destination is a bio list */ #endif /* CONFIG_BLOCK */ CEPH_MSG_DATA_BVECS, /* data source/destination is a bio_vec array */ + CEPH_MSG_DATA_ITER, /* data source/destination is an iov_iter */ }; #ifdef CONFIG_BLOCK @@ -224,6 +225,7 @@ struct ceph_msg_data { bool own_pages; }; struct ceph_pagelist *pagelist; + struct iov_iter iter; }; }; @@ -248,6 +250,10 @@ struct ceph_msg_data_cursor { struct page *page; /* page from list */ size_t offset; /* bytes from list */ }; + struct { + struct iov_iter iov_iter; + unsigned int lastlen; + }; }; }; @@ -605,6 +611,8 @@ void ceph_msg_data_add_bio(struct ceph_msg *msg, struct ceph_bio_iter *bio_pos, #endif /* CONFIG_BLOCK */ void ceph_msg_data_add_bvecs(struct ceph_msg *msg, struct ceph_bvec_iter *bvec_pos); +void ceph_msg_data_add_iter(struct ceph_msg *msg, + struct iov_iter *iter); struct ceph_msg *ceph_msg_new2(int type, int front_len, int max_data_items, gfp_t flags, bool can_fail); diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 460881c93f9a..680b4a70f6d4 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -108,6 +108,7 @@ enum ceph_osd_data_type { CEPH_OSD_DATA_TYPE_BIO, #endif /* CONFIG_BLOCK */ CEPH_OSD_DATA_TYPE_BVECS, + CEPH_OSD_DATA_TYPE_ITER, }; struct ceph_osd_data { @@ -131,6 +132,7 @@ struct ceph_osd_data { struct ceph_bvec_iter bvec_pos; u32 num_bvecs; }; + struct iov_iter iter; }; }; @@ -501,6 +503,8 @@ void osd_req_op_extent_osd_data_bvecs(struct ceph_osd_request *osd_req, void osd_req_op_extent_osd_data_bvec_pos(struct ceph_osd_request *osd_req, unsigned int which, struct ceph_bvec_iter *bvec_pos); +void osd_req_op_extent_osd_iter(struct ceph_osd_request *osd_req, + unsigned int which, struct iov_iter *iter); extern void osd_req_op_cls_request_data_pagelist(struct ceph_osd_request *, unsigned int which, diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 3bc3c72a6d4f..9dce65fac0bd 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -969,6 +969,63 @@ static bool ceph_msg_data_pagelist_advance(struct ceph_msg_data_cursor *cursor, return true; } +static void ceph_msg_data_iter_cursor_init(struct ceph_msg_data_cursor *cursor, + size_t length) +{ + struct ceph_msg_data *data = cursor->data; + + cursor->iov_iter = data->iter; + cursor->lastlen = 0; + iov_iter_truncate(&cursor->iov_iter, length); + cursor->resid = iov_iter_count(&cursor->iov_iter); +} + +static struct page *ceph_msg_data_iter_next(struct ceph_msg_data_cursor *cursor, + size_t *page_offset, + size_t *length) +{ + struct page *page; + ssize_t len; + + if (cursor->lastlen) + iov_iter_revert(&cursor->iov_iter, cursor->lastlen); + + len = iov_iter_get_pages2(&cursor->iov_iter, &page, PAGE_SIZE, + 1, page_offset); + BUG_ON(len < 0); + + cursor->lastlen = len; + + /* + * FIXME: The assumption is that the pages represented by the iov_iter + * are pinned, with the references held by the upper-level + * callers, or by virtue of being under writeback. Eventually, + * we'll get an iov_iter_get_pages2 variant that doesn't take page + * refs. Until then, just put the page ref. + */ + VM_BUG_ON_PAGE(!PageWriteback(page) && page_count(page) < 2, page); + put_page(page); + + *length = min_t(size_t, len, cursor->resid); + return page; +} + +static bool ceph_msg_data_iter_advance(struct ceph_msg_data_cursor *cursor, + size_t bytes) +{ + BUG_ON(bytes > cursor->resid); + cursor->resid -= bytes; + + if (bytes < cursor->lastlen) { + cursor->lastlen -= bytes; + } else { + iov_iter_advance(&cursor->iov_iter, bytes - cursor->lastlen); + cursor->lastlen = 0; + } + + return cursor->resid; +} + /* * Message data is handled (sent or received) in pieces, where each * piece resides on a single page. The network layer might not @@ -996,6 +1053,9 @@ static void __ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor) case CEPH_MSG_DATA_BVECS: ceph_msg_data_bvecs_cursor_init(cursor, length); break; + case CEPH_MSG_DATA_ITER: + ceph_msg_data_iter_cursor_init(cursor, length); + break; case CEPH_MSG_DATA_NONE: default: /* BUG(); */ @@ -1043,6 +1103,9 @@ struct page *ceph_msg_data_next(struct ceph_msg_data_cursor *cursor, case CEPH_MSG_DATA_BVECS: page = ceph_msg_data_bvecs_next(cursor, page_offset, length); break; + case CEPH_MSG_DATA_ITER: + page = ceph_msg_data_iter_next(cursor, page_offset, length); + break; case CEPH_MSG_DATA_NONE: default: page = NULL; @@ -1081,6 +1144,9 @@ void ceph_msg_data_advance(struct ceph_msg_data_cursor *cursor, size_t bytes) case CEPH_MSG_DATA_BVECS: new_piece = ceph_msg_data_bvecs_advance(cursor, bytes); break; + case CEPH_MSG_DATA_ITER: + new_piece = ceph_msg_data_iter_advance(cursor, bytes); + break; case CEPH_MSG_DATA_NONE: default: BUG(); @@ -1879,6 +1945,18 @@ void ceph_msg_data_add_bvecs(struct ceph_msg *msg, } EXPORT_SYMBOL(ceph_msg_data_add_bvecs); +void ceph_msg_data_add_iter(struct ceph_msg *msg, + struct iov_iter *iter) +{ + struct ceph_msg_data *data; + + data = ceph_msg_data_add(msg); + data->type = CEPH_MSG_DATA_ITER; + data->iter = *iter; + + msg->data_length += iov_iter_count(&data->iter); +} + /* * construct a new message with given type, size * the new msg has a ref count of 1. diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 8534ca9c39b9..9138abc07f45 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -171,6 +171,13 @@ static void ceph_osd_data_bvecs_init(struct ceph_osd_data *osd_data, osd_data->num_bvecs = num_bvecs; } +static void ceph_osd_iter_init(struct ceph_osd_data *osd_data, + struct iov_iter *iter) +{ + osd_data->type = CEPH_OSD_DATA_TYPE_ITER; + osd_data->iter = *iter; +} + static struct ceph_osd_data * osd_req_op_raw_data_in(struct ceph_osd_request *osd_req, unsigned int which) { @@ -264,6 +271,22 @@ void osd_req_op_extent_osd_data_bvec_pos(struct ceph_osd_request *osd_req, } EXPORT_SYMBOL(osd_req_op_extent_osd_data_bvec_pos); +/** + * osd_req_op_extent_osd_iter - Set up an operation with an iterator buffer + * @osd_req: The request to set up + * @which: Index of the operation in which to set the iter + * @iter: The buffer iterator + */ +void osd_req_op_extent_osd_iter(struct ceph_osd_request *osd_req, + unsigned int which, struct iov_iter *iter) +{ + struct ceph_osd_data *osd_data; + + osd_data = osd_req_op_data(osd_req, which, extent, osd_data); + ceph_osd_iter_init(osd_data, iter); +} +EXPORT_SYMBOL(osd_req_op_extent_osd_iter); + static void osd_req_op_cls_request_info_pagelist( struct ceph_osd_request *osd_req, unsigned int which, struct ceph_pagelist *pagelist) @@ -346,6 +369,8 @@ static u64 ceph_osd_data_length(struct ceph_osd_data *osd_data) #endif /* CONFIG_BLOCK */ case CEPH_OSD_DATA_TYPE_BVECS: return osd_data->bvec_pos.iter.bi_size; + case CEPH_OSD_DATA_TYPE_ITER: + return iov_iter_count(&osd_data->iter); default: WARN(true, "unrecognized data type %d\n", (int)osd_data->type); return 0; @@ -954,6 +979,8 @@ static void ceph_osdc_msg_data_add(struct ceph_msg *msg, #endif } else if (osd_data->type == CEPH_OSD_DATA_TYPE_BVECS) { ceph_msg_data_add_bvecs(msg, &osd_data->bvec_pos); + } else if (osd_data->type == CEPH_OSD_DATA_TYPE_ITER) { + ceph_msg_data_add_iter(msg, &osd_data->iter); } else { BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_NONE); } From patchwork Wed Apr 12 11:08:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DA47C77B6E for ; Wed, 12 Apr 2023 11:26:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229972AbjDLL0O (ORCPT ); Wed, 12 Apr 2023 07:26:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229482AbjDLL0N (ORCPT ); Wed, 12 Apr 2023 07:26:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52E1B6A6F for ; Wed, 12 Apr 2023 04:25:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298701; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a5OV+lCCs9r8zLwuG78ML4bN1ADnyl3nHUWR4xi6tkw=; b=NL+1y+SC5R0xcts9KjiUG1uaMxyLDsePgqkHhz6BI4sbZV3bC6UzMh+Cw722PPiL4L/QGY ABwJ9gYJ3jM7gbnpemywSzugqsFDR00KSXh3HjwOcBpyW/biNmqF0nKOsm1KiEz7C00Hil jkVFeGU8EyjA9No+2XAVpwVoMDxETrQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-628-QLviYB4RPy-yesbH2hW3OA-1; Wed, 12 Apr 2023 07:10:41 -0400 X-MC-Unique: QLviYB4RPy-yesbH2hW3OA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F7E38996E1; Wed, 12 Apr 2023 11:10:41 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6285DC15BBA; Wed, 12 Apr 2023 11:10:36 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 11/71] ceph: use osd_req_op_extent_osd_iter for netfs reads Date: Wed, 12 Apr 2023 19:08:30 +0800 Message-Id: <20230412110930.176835-12-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton The netfs layer has already pinned the pages involved before calling issue_op, so we can just pass down the iter directly instead of calling iov_iter_get_pages_alloc. Instead of having to allocate a page array, use CEPH_MSG_DATA_ITER and pass it the iov_iter directly to clone. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Jeff Layton Signed-off-by: Xiubo Li --- fs/ceph/addr.c | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 7e6abec82be8..5dca574e7f05 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -220,7 +220,6 @@ static void finish_netfs_read(struct ceph_osd_request *req) struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0); struct netfs_io_subrequest *subreq = req->r_priv; struct ceph_osd_req_op *op = &req->r_ops[0]; - int num_pages; int err = req->r_result; bool sparse = (op->op == CEPH_OSD_OP_SPARSE_READ); @@ -242,9 +241,6 @@ static void finish_netfs_read(struct ceph_osd_request *req) __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, err, false); - - num_pages = calc_pages_for(osd_data->alignment, osd_data->length); - ceph_put_page_vector(osd_data->pages, num_pages, false); iput(req->r_inode); } @@ -312,8 +308,6 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) struct ceph_osd_request *req = NULL; struct ceph_vino vino = ceph_vino(inode); struct iov_iter iter; - struct page **pages; - size_t page_off; int err = 0; u64 len = subreq->len; bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); @@ -344,18 +338,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len); iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, subreq->start, len); - err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off); - if (err < 0) { - dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err); - goto out; - } - - /* should always give us a page-aligned read */ - WARN_ON_ONCE(page_off); - len = err; - err = 0; - - osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false); + osd_req_op_extent_osd_iter(req, 0, &iter); req->r_callback = finish_netfs_read; req->r_priv = subreq; req->r_inode = inode; From patchwork Wed Apr 12 11:08:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0CECC77B72 for ; Wed, 12 Apr 2023 11:12:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbjDLLMn (ORCPT ); Wed, 12 Apr 2023 07:12:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229757AbjDLLMm (ORCPT ); Wed, 12 Apr 2023 07:12:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF31CB7 for ; Wed, 12 Apr 2023 04:11:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297848; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=085dLFpxzhOxavNkrvcnEjJ4714mkPiHqHJHajXw8QE=; b=JxiinEM8PS9NFT9/ypdBKjKfzZnmLbOqxar+JzyfCdFGkZCr49Tauo5dQlWxh9IVZ2MYBY UC3QxVZ6ZiFk+IDm5WG7GtHIFgmQW4LVCC1rsvLXMUY8UpbQ9cX30af6QTavHVBXl+R1Na t8SRS/8FUl5Ejl5hfHHuT942/9lvp3I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-22-q1hmq-kcP4SzNjyY88vmYw-1; Wed, 12 Apr 2023 07:10:46 -0400 X-MC-Unique: q1hmq-kcP4SzNjyY88vmYw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 398E4800B35; Wed, 12 Apr 2023 11:10:46 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2CC74C15BB8; Wed, 12 Apr 2023 11:10:41 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 12/71] ceph: fscrypt_auth handling for ceph Date: Wed, 12 Apr 2023 19:08:31 +0800 Message-Id: <20230412110930.176835-13-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Most fscrypt-enabled filesystems store the crypto context in an xattr, but that's problematic for ceph as xatts are governed by the XATTR cap, but we really want the crypto context as part of the AUTH cap. Because of this, the MDS has added two new inode metadata fields: fscrypt_auth and fscrypt_file. The former is used to hold the crypto context, and the latter is used to track the real file size. Parse new fscrypt_auth and fscrypt_file fields in inode traces. For now, we don't use fscrypt_file, but fscrypt_auth is used to hold the fscrypt context. Allow the client to use a setattr request for setting the fscrypt_auth field. Since this is not a standard setattr request from the VFS, we add a new field to __ceph_setattr that carries ceph-specific inode attrs. Have the set_context op do a setattr that sets the fscrypt_auth value, and get_context just return the contents of that field (since it should always be available). Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/Makefile | 1 + fs/ceph/acl.c | 4 +- fs/ceph/crypto.c | 76 ++++++++++++++++++++++++ fs/ceph/crypto.h | 36 ++++++++++++ fs/ceph/inode.c | 62 ++++++++++++++++++- fs/ceph/mds_client.c | 111 ++++++++++++++++++++++++++++++++--- fs/ceph/mds_client.h | 7 +++ fs/ceph/super.c | 3 + fs/ceph/super.h | 14 ++++- include/linux/ceph/ceph_fs.h | 21 ++++--- 10 files changed, 313 insertions(+), 22 deletions(-) create mode 100644 fs/ceph/crypto.c create mode 100644 fs/ceph/crypto.h diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile index 50c635dc7f71..1f77ca04c426 100644 --- a/fs/ceph/Makefile +++ b/fs/ceph/Makefile @@ -12,3 +12,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \ ceph-$(CONFIG_CEPH_FSCACHE) += cache.o ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o +ceph-$(CONFIG_FS_ENCRYPTION) += crypto.o diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c index 6945a938d396..8a56f979c7cb 100644 --- a/fs/ceph/acl.c +++ b/fs/ceph/acl.c @@ -140,7 +140,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry, newattrs.ia_ctime = current_time(inode); newattrs.ia_mode = new_mode; newattrs.ia_valid = ATTR_MODE | ATTR_CTIME; - ret = __ceph_setattr(inode, &newattrs); + ret = __ceph_setattr(inode, &newattrs, NULL); if (ret) goto out_free; } @@ -151,7 +151,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry, newattrs.ia_ctime = old_ctime; newattrs.ia_mode = old_mode; newattrs.ia_valid = ATTR_MODE | ATTR_CTIME; - __ceph_setattr(inode, &newattrs); + __ceph_setattr(inode, &newattrs, NULL); } goto out_free; } diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c new file mode 100644 index 000000000000..a513ff373b13 --- /dev/null +++ b/fs/ceph/crypto.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include "super.h" +#include "crypto.h" + +static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_fscrypt_auth *cfa = (struct ceph_fscrypt_auth *)ci->fscrypt_auth; + u32 ctxlen; + + /* Non existent or too short? */ + if (!cfa || (ci->fscrypt_auth_len < (offsetof(struct ceph_fscrypt_auth, cfa_blob) + 1))) + return -ENOBUFS; + + /* Some format we don't recognize? */ + if (le32_to_cpu(cfa->cfa_version) != CEPH_FSCRYPT_AUTH_VERSION) + return -ENOBUFS; + + ctxlen = le32_to_cpu(cfa->cfa_blob_len); + if (len < ctxlen) + return -ERANGE; + + memcpy(ctx, cfa->cfa_blob, ctxlen); + return ctxlen; +} + +static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data) +{ + int ret; + struct iattr attr = { }; + struct ceph_iattr cia = { }; + struct ceph_fscrypt_auth *cfa; + + WARN_ON_ONCE(fs_data); + + if (len > FSCRYPT_SET_CONTEXT_MAX_SIZE) + return -EINVAL; + + cfa = kzalloc(sizeof(*cfa), GFP_KERNEL); + if (!cfa) + return -ENOMEM; + + cfa->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION); + cfa->cfa_blob_len = cpu_to_le32(len); + memcpy(cfa->cfa_blob, ctx, len); + + cia.fscrypt_auth = cfa; + + ret = __ceph_setattr(inode, &attr, &cia); + if (ret == 0) + inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED); + kfree(cia.fscrypt_auth); + return ret; +} + +static bool ceph_crypt_empty_dir(struct inode *inode) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + + return ci->i_rsubdirs + ci->i_rfiles == 1; +} + +static struct fscrypt_operations ceph_fscrypt_ops = { + .get_context = ceph_crypt_get_context, + .set_context = ceph_crypt_set_context, + .empty_dir = ceph_crypt_empty_dir, +}; + +void ceph_fscrypt_set_ops(struct super_block *sb) +{ + fscrypt_set_ops(sb, &ceph_fscrypt_ops); +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h new file mode 100644 index 000000000000..6dca674f79b8 --- /dev/null +++ b/fs/ceph/crypto.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Ceph fscrypt functionality + */ + +#ifndef _CEPH_CRYPTO_H +#define _CEPH_CRYPTO_H + +#include + +struct ceph_fscrypt_auth { + __le32 cfa_version; + __le32 cfa_blob_len; + u8 cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE]; +} __packed; + +#define CEPH_FSCRYPT_AUTH_VERSION 1 +static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa) +{ + u32 ctxsize = le32_to_cpu(fa->cfa_blob_len); + + return offsetof(struct ceph_fscrypt_auth, cfa_blob) + ctxsize; +} + +#ifdef CONFIG_FS_ENCRYPTION +void ceph_fscrypt_set_ops(struct super_block *sb); + +#else /* CONFIG_FS_ENCRYPTION */ + +static inline void ceph_fscrypt_set_ops(struct super_block *sb) +{ +} + +#endif /* CONFIG_FS_ENCRYPTION */ + +#endif diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 28ce848fcb4a..5884e20ec4f0 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -14,10 +14,12 @@ #include #include #include +#include #include "super.h" #include "mds_client.h" #include "cache.h" +#include "crypto.h" #include /* @@ -616,6 +618,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb) INIT_WORK(&ci->i_work, ceph_inode_work); ci->i_work_mask = 0; memset(&ci->i_btime, '\0', sizeof(ci->i_btime)); +#ifdef CONFIG_FS_ENCRYPTION + ci->fscrypt_auth = NULL; + ci->fscrypt_auth_len = 0; +#endif return &ci->netfs.inode; } @@ -624,6 +630,9 @@ void ceph_free_inode(struct inode *inode) struct ceph_inode_info *ci = ceph_inode(inode); kfree(ci->i_symlink); +#ifdef CONFIG_FS_ENCRYPTION + kfree(ci->fscrypt_auth); +#endif kmem_cache_free(ceph_inode_cachep, ci); } @@ -644,6 +653,7 @@ void ceph_evict_inode(struct inode *inode) clear_inode(inode); ceph_fscache_unregister_inode_cookie(ci); + fscrypt_put_encryption_info(inode); __ceph_remove_caps(ci); @@ -1023,6 +1033,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, xattr_blob = NULL; } +#ifdef CONFIG_FS_ENCRYPTION + if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) { + ci->fscrypt_auth_len = iinfo->fscrypt_auth_len; + ci->fscrypt_auth = iinfo->fscrypt_auth; + iinfo->fscrypt_auth = NULL; + iinfo->fscrypt_auth_len = 0; + inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED); + } +#endif + /* finally update i_version */ if (le64_to_cpu(info->version) > ci->i_version) ci->i_version = le64_to_cpu(info->version); @@ -2078,7 +2098,7 @@ static const struct inode_operations ceph_symlink_iops = { .listxattr = ceph_listxattr, }; -int __ceph_setattr(struct inode *inode, struct iattr *attr) +int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia) { struct ceph_inode_info *ci = ceph_inode(inode); unsigned int ia_valid = attr->ia_valid; @@ -2118,6 +2138,43 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr) } dout("setattr %p issued %s\n", inode, ceph_cap_string(issued)); +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + if (cia && cia->fscrypt_auth) { + u32 len = ceph_fscrypt_auth_len(cia->fscrypt_auth); + + if (len > sizeof(*cia->fscrypt_auth)) { + err = -EINVAL; + spin_unlock(&ci->i_ceph_lock); + goto out; + } + + dout("setattr %llx:%llx fscrypt_auth len %u to %u)\n", + ceph_vinop(inode), ci->fscrypt_auth_len, len); + + /* It should never be re-set once set */ + WARN_ON_ONCE(ci->fscrypt_auth); + + if (issued & CEPH_CAP_AUTH_EXCL) { + dirtied |= CEPH_CAP_AUTH_EXCL; + kfree(ci->fscrypt_auth); + ci->fscrypt_auth = (u8 *)cia->fscrypt_auth; + ci->fscrypt_auth_len = len; + } else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 || + ci->fscrypt_auth_len != len || + memcmp(ci->fscrypt_auth, cia->fscrypt_auth, len)) { + req->r_fscrypt_auth = cia->fscrypt_auth; + mask |= CEPH_SETATTR_FSCRYPT_AUTH; + release |= CEPH_CAP_AUTH_SHARED; + } + cia->fscrypt_auth = NULL; + } +#else + if (cia && cia->fscrypt_auth) { + err = -EINVAL; + spin_unlock(&ci->i_ceph_lock); + goto out; + } +#endif /* CONFIG_FS_ENCRYPTION */ if (ia_valid & ATTR_UID) { dout("setattr %p uid %d -> %d\n", inode, @@ -2281,6 +2338,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr) req->r_stamp = attr->ia_ctime; err = ceph_mdsc_do_request(mdsc, NULL, req); } +out: dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err, ceph_cap_string(dirtied), mask); @@ -2321,7 +2379,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry, ceph_quota_is_max_bytes_exceeded(inode, attr->ia_size)) return -EDQUOT; - err = __ceph_setattr(inode, attr); + err = __ceph_setattr(inode, attr, NULL); if (err >= 0 && (attr->ia_valid & ATTR_MODE)) err = posix_acl_chmod(&nop_mnt_idmap, dentry, attr->ia_mode); diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 0aa2c97a0460..a72f1961d3e1 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -15,6 +15,7 @@ #include "super.h" #include "mds_client.h" +#include "crypto.h" #include #include @@ -184,8 +185,52 @@ static int parse_reply_info_in(void **p, void *end, info->rsnaps = 0; } + if (struct_v >= 5) { + u32 alen; + + ceph_decode_32_safe(p, end, alen, bad); + + while (alen--) { + u32 len; + + /* key */ + ceph_decode_32_safe(p, end, len, bad); + ceph_decode_skip_n(p, end, len, bad); + /* value */ + ceph_decode_32_safe(p, end, len, bad); + ceph_decode_skip_n(p, end, len, bad); + } + } + + /* fscrypt flag -- ignore */ + if (struct_v >= 6) + ceph_decode_skip_8(p, end, bad); + + info->fscrypt_auth = NULL; + info->fscrypt_auth_len = 0; + info->fscrypt_file = NULL; + info->fscrypt_file_len = 0; + if (struct_v >= 7) { + ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad); + if (info->fscrypt_auth_len) { + info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL); + if (!info->fscrypt_auth) + return -ENOMEM; + ceph_decode_copy_safe(p, end, info->fscrypt_auth, + info->fscrypt_auth_len, bad); + } + ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad); + if (info->fscrypt_file_len) { + info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL); + if (!info->fscrypt_file) + return -ENOMEM; + ceph_decode_copy_safe(p, end, info->fscrypt_file, + info->fscrypt_file_len, bad); + } + } *p = end; } else { + /* legacy (unversioned) struct */ if (features & CEPH_FEATURE_MDS_INLINE_DATA) { ceph_decode_64_safe(p, end, info->inline_version, bad); ceph_decode_32_safe(p, end, info->inline_len, bad); @@ -650,8 +695,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg, static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info) { + int i; + + kfree(info->diri.fscrypt_auth); + kfree(info->diri.fscrypt_file); + kfree(info->targeti.fscrypt_auth); + kfree(info->targeti.fscrypt_file); if (!info->dir_entries) return; + + for (i = 0; i < info->dir_nr; i++) { + struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i; + + kfree(rde->inode.fscrypt_auth); + kfree(rde->inode.fscrypt_file); + } free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size)); } @@ -965,6 +1023,7 @@ void ceph_mdsc_release_request(struct kref *kref) put_cred(req->r_cred); if (req->r_pagelist) ceph_pagelist_release(req->r_pagelist); + kfree(req->r_fscrypt_auth); put_request_session(req); ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation); WARN_ON_ONCE(!list_empty(&req->r_wait)); @@ -2556,8 +2615,7 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry, return r; } -static void encode_timestamp_and_gids(void **p, - const struct ceph_mds_request *req) +static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *req) { struct ceph_timespec ts; int i; @@ -2570,6 +2628,20 @@ static void encode_timestamp_and_gids(void **p, for (i = 0; i < req->r_cred->group_info->ngroups; i++) ceph_encode_64(p, from_kgid(&init_user_ns, req->r_cred->group_info->gid[i])); + + /* v5: altname (TODO: skip for now) */ + ceph_encode_32(p, 0); + + /* v6: fscrypt_auth and fscrypt_file */ + if (req->r_fscrypt_auth) { + u32 authlen = ceph_fscrypt_auth_len(req->r_fscrypt_auth); + + ceph_encode_32(p, authlen); + ceph_encode_copy(p, req->r_fscrypt_auth, authlen); + } else { + ceph_encode_32(p, 0); + } + ceph_encode_32(p, 0); // fscrypt_file for now } /* @@ -2614,12 +2686,14 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, goto out_free1; } + /* head */ len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head); - len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) + - sizeof(struct ceph_timespec); - len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups); - /* calculate (max) length for cap releases */ + /* filepaths */ + len += 2 * (1 + sizeof(u32) + sizeof(u64)); + len += pathlen1 + pathlen2; + + /* cap releases */ len += sizeof(struct ceph_mds_request_release) * (!!req->r_inode_drop + !!req->r_dentry_drop + !!req->r_old_inode_drop + !!req->r_old_dentry_drop); @@ -2629,6 +2703,25 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, if (req->r_old_dentry_drop) len += pathlen2; + /* MClientRequest tail */ + + /* req->r_stamp */ + len += sizeof(struct ceph_timespec); + + /* gid list */ + len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups); + + /* alternate name */ + len += sizeof(u32); // TODO + + /* fscrypt_auth */ + len += sizeof(u32); // fscrypt_auth + if (req->r_fscrypt_auth) + len += ceph_fscrypt_auth_len(req->r_fscrypt_auth); + + /* fscrypt_file */ + len += sizeof(u32); + msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false); if (!msg) { msg = ERR_PTR(-ENOMEM); @@ -2648,7 +2741,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, } else { struct ceph_mds_request_head *new_head = msg->front.iov_base; - msg->hdr.version = cpu_to_le16(4); + msg->hdr.version = cpu_to_le16(6); new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); head = (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid; p = msg->front.iov_base + sizeof(*new_head); @@ -2699,7 +2792,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, head->num_releases = cpu_to_le16(releases); - encode_timestamp_and_gids(&p, req); + encode_mclientrequest_tail(&p, req); if (WARN_ON_ONCE(p > end)) { ceph_msg_put(msg); @@ -2829,7 +2922,7 @@ static int __prepare_send_request(struct ceph_mds_session *session, rhead->num_releases = 0; p = msg->front.iov_base + req->r_request_release_offset; - encode_timestamp_and_gids(&p, req); + encode_mclientrequest_tail(&p, req); msg->front.iov_len = p - msg->front.iov_base; msg->hdr.front_len = cpu_to_le32(msg->front.iov_len); diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index ceef487abf7f..00838adaf694 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -86,6 +86,10 @@ struct ceph_mds_reply_info_in { s32 dir_pin; struct ceph_timespec btime; struct ceph_timespec snap_btime; + u8 *fscrypt_auth; + u8 *fscrypt_file; + u32 fscrypt_auth_len; + u32 fscrypt_file_len; u64 rsnaps; u64 change_attr; }; @@ -278,6 +282,9 @@ struct ceph_mds_request { struct mutex r_fill_mutex; union ceph_mds_request_args r_args; + + struct ceph_fscrypt_auth *r_fscrypt_auth; + int r_fmode; /* file mode, if expecting cap */ int r_request_release_offset; const struct cred *r_cred; diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 7aaf40669509..88e3f3bacc4e 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -20,6 +20,7 @@ #include "super.h" #include "mds_client.h" #include "cache.h" +#include "crypto.h" #include #include @@ -1135,6 +1136,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc) s->s_time_max = U32_MAX; s->s_flags |= SB_NODIRATIME | SB_NOATIME; + ceph_fscrypt_set_ops(s); + ret = set_anon_super_fc(s, fc); if (ret != 0) fsc->sb = NULL; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 9eaa4a27b92d..8d4984a8b15b 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -452,6 +452,13 @@ struct ceph_inode_info { struct work_struct i_work; unsigned long i_work_mask; + +#ifdef CONFIG_FS_ENCRYPTION + u32 fscrypt_auth_len; + u32 fscrypt_file_len; + u8 *fscrypt_auth; + u8 *fscrypt_file; +#endif }; static inline struct ceph_inode_info * @@ -1060,7 +1067,12 @@ static inline int ceph_do_getattr(struct inode *inode, int mask, bool force) } extern int ceph_permission(struct mnt_idmap *idmap, struct inode *inode, int mask); -extern int __ceph_setattr(struct inode *inode, struct iattr *attr); + +struct ceph_iattr { + struct ceph_fscrypt_auth *fscrypt_auth; +}; + +extern int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia); extern int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr); extern int ceph_getattr(struct mnt_idmap *idmap, diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff26152..45f8ce61e103 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -359,14 +359,19 @@ enum { extern const char *ceph_mds_op_name(int op); - -#define CEPH_SETATTR_MODE 1 -#define CEPH_SETATTR_UID 2 -#define CEPH_SETATTR_GID 4 -#define CEPH_SETATTR_MTIME 8 -#define CEPH_SETATTR_ATIME 16 -#define CEPH_SETATTR_SIZE 32 -#define CEPH_SETATTR_CTIME 64 +#define CEPH_SETATTR_MODE (1 << 0) +#define CEPH_SETATTR_UID (1 << 1) +#define CEPH_SETATTR_GID (1 << 2) +#define CEPH_SETATTR_MTIME (1 << 3) +#define CEPH_SETATTR_ATIME (1 << 4) +#define CEPH_SETATTR_SIZE (1 << 5) +#define CEPH_SETATTR_CTIME (1 << 6) +#define CEPH_SETATTR_MTIME_NOW (1 << 7) +#define CEPH_SETATTR_ATIME_NOW (1 << 8) +#define CEPH_SETATTR_BTIME (1 << 9) +#define CEPH_SETATTR_KILL_SGUID (1 << 10) +#define CEPH_SETATTR_FSCRYPT_AUTH (1 << 11) +#define CEPH_SETATTR_FSCRYPT_FILE (1 << 12) /* * Ceph setxattr request flags. From patchwork Wed Apr 12 11:08:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8752BC7619A for ; Wed, 12 Apr 2023 11:12:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229957AbjDLLM4 (ORCPT ); Wed, 12 Apr 2023 07:12:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229920AbjDLLMx (ORCPT ); Wed, 12 Apr 2023 07:12:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FE8B7DAF for ; Wed, 12 Apr 2023 04:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297854; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fJNQmz6TG0r6TqL5bOlAsY4mrqtq1/xQMMNlSZVeYV0=; b=CEt/ivzKign3KIh7tmiUC1dIeGlkL6zLJskvt1NH3tmSUV1FQ5H2MiaPd0CkE7bP6/TDWO dTQa53u8k8gmx2yD6O45duyEsJO7kRUnMr5QYkvPCYMn6EpiFywQmY+P+zbtXPPxQdC15o BI1LdVjD4Zal/fc2WDD4+UojrKHsLIQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-XU4G6uxsNsC5hgIhcwY_KQ-1; Wed, 12 Apr 2023 07:10:51 -0400 X-MC-Unique: XU4G6uxsNsC5hgIhcwY_KQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9B786800B23; Wed, 12 Apr 2023 11:10:50 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 14A8CC15BB8; Wed, 12 Apr 2023 11:10:46 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 13/71] ceph: ensure that we accept a new context from MDS for new inodes Date: Wed, 12 Apr 2023 19:08:32 +0800 Message-Id: <20230412110930.176835-14-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 5884e20ec4f0..36ab0bb56a07 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -944,6 +944,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files); +#ifdef CONFIG_FS_ENCRYPTION + if (iinfo->fscrypt_auth_len && (inode->i_state & I_NEW)) { + kfree(ci->fscrypt_auth); + ci->fscrypt_auth_len = iinfo->fscrypt_auth_len; + ci->fscrypt_auth = iinfo->fscrypt_auth; + iinfo->fscrypt_auth = NULL; + iinfo->fscrypt_auth_len = 0; + inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED); + } +#endif + if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) && (issued & CEPH_CAP_AUTH_EXCL) == 0) { inode->i_mode = mode; @@ -1033,16 +1044,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, xattr_blob = NULL; } -#ifdef CONFIG_FS_ENCRYPTION - if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) { - ci->fscrypt_auth_len = iinfo->fscrypt_auth_len; - ci->fscrypt_auth = iinfo->fscrypt_auth; - iinfo->fscrypt_auth = NULL; - iinfo->fscrypt_auth_len = 0; - inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED); - } -#endif - /* finally update i_version */ if (le64_to_cpu(info->version) > ci->i_version) ci->i_version = le64_to_cpu(info->version); From patchwork Wed Apr 12 11:08:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 007CBC7619A for ; Wed, 12 Apr 2023 11:12:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229825AbjDLLMx (ORCPT ); Wed, 12 Apr 2023 07:12:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229903AbjDLLMw (ORCPT ); Wed, 12 Apr 2023 07:12:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF2367DBD for ; Wed, 12 Apr 2023 04:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297860; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tASoPZoMDuH7DHvW5BfRz9fJSLuKVFwSChgwysDYxvE=; b=Jyl8f0Xa6uEGpQ+CeKYh5XWOUkseTgbnI90D8gaVtlgbGc3oJGhJQXqUn2kThCWXy6tYXy RapZ/FM+/CJ6m4rnlpCtI4n2f2SVn4uVvqU7PwJB4jMvfWAL3aBih6KopL1Qricw/CvZlH CrBaSp+HZOHp73yn+6MWqaI7t1UKLAM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-7-b49hcUAoPea61NI5B0fNCQ-1; Wed, 12 Apr 2023 07:10:55 -0400 X-MC-Unique: b49hcUAoPea61NI5B0fNCQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7EB2828237CD; Wed, 12 Apr 2023 11:10:55 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6C03FC15BB8; Wed, 12 Apr 2023 11:10:51 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 14/71] ceph: add support for fscrypt_auth/fscrypt_file to cap messages Date: Wed, 12 Apr 2023 19:08:33 +0800 Message-Id: <20230412110930.176835-15-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add support for new version 12 cap messages that carry the new fscrypt_auth and fscrypt_file fields from the inode. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 76 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 63 insertions(+), 13 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index e02c5ea3f108..3cca23394769 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -14,6 +14,7 @@ #include "super.h" #include "mds_client.h" #include "cache.h" +#include "crypto.h" #include #include @@ -1216,15 +1217,12 @@ struct cap_msg_args { umode_t mode; bool inline_data; bool wake; + u32 fscrypt_auth_len; + u32 fscrypt_file_len; + u8 fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context + u8 fscrypt_file[sizeof(u64)]; // for size }; -/* - * cap struct size + flock buffer size + inline version + inline data size + - * osd_epoch_barrier + oldest_flush_tid - */ -#define CAP_MSG_SIZE (sizeof(struct ceph_mds_caps) + \ - 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4) - /* Marshal up the cap msg to the MDS */ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) { @@ -1240,7 +1238,7 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) arg->size, arg->max_size, arg->xattr_version, arg->xattr_buf ? (int)arg->xattr_buf->vec.iov_len : 0); - msg->hdr.version = cpu_to_le16(10); + msg->hdr.version = cpu_to_le16(12); msg->hdr.tid = cpu_to_le64(arg->flush_tid); fc = msg->front.iov_base; @@ -1311,6 +1309,21 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) /* Advisory flags (version 10) */ ceph_encode_32(&p, arg->flags); + + /* dirstats (version 11) - these are r/o on the client */ + ceph_encode_64(&p, 0); + ceph_encode_64(&p, 0); + +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + /* fscrypt_auth and fscrypt_file (version 12) */ + ceph_encode_32(&p, arg->fscrypt_auth_len); + ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len); + ceph_encode_32(&p, arg->fscrypt_file_len); + ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len); +#else /* CONFIG_FS_ENCRYPTION */ + ceph_encode_32(&p, 0); + ceph_encode_32(&p, 0); +#endif /* CONFIG_FS_ENCRYPTION */ } /* @@ -1432,8 +1445,37 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap, } } arg->flags = flags; +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + if (ci->fscrypt_auth_len && + WARN_ON_ONCE(ci->fscrypt_auth_len > sizeof(struct ceph_fscrypt_auth))) { + /* Don't set this if it's too big */ + arg->fscrypt_auth_len = 0; + } else { + arg->fscrypt_auth_len = ci->fscrypt_auth_len; + memcpy(arg->fscrypt_auth, ci->fscrypt_auth, + min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth))); + } + /* FIXME: use this to track "real" size */ + arg->fscrypt_file_len = 0; +#endif /* CONFIG_FS_ENCRYPTION */ } +#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \ + 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4) + +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +static inline int cap_msg_size(struct cap_msg_args *arg) +{ + return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len + + arg->fscrypt_file_len; +} +#else +static inline int cap_msg_size(struct cap_msg_args *arg) +{ + return CAP_MSG_FIXED_FIELDS; +} +#endif /* CONFIG_FS_ENCRYPTION */ + /* * Send a cap msg on the given inode. * @@ -1444,7 +1486,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci) struct ceph_msg *msg; struct inode *inode = &ci->netfs.inode; - msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false); + msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(arg), GFP_NOFS, false); if (!msg) { pr_err("error allocating cap msg: ino (%llx.%llx) flushing %s tid %llu, requeuing cap.\n", ceph_vinop(inode), ceph_cap_string(arg->dirty), @@ -1470,10 +1512,6 @@ static inline int __send_flush_snap(struct inode *inode, struct cap_msg_args arg; struct ceph_msg *msg; - msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false); - if (!msg) - return -ENOMEM; - arg.session = session; arg.ino = ceph_vino(inode).ino; arg.cid = 0; @@ -1511,6 +1549,18 @@ static inline int __send_flush_snap(struct inode *inode, arg.flags = 0; arg.wake = false; + /* + * No fscrypt_auth changes from a capsnap. It will need + * to update fscrypt_file on size changes (TODO). + */ + arg.fscrypt_auth_len = 0; + arg.fscrypt_file_len = 0; + + msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg), + GFP_NOFS, false); + if (!msg) + return -ENOMEM; + encode_cap_msg(msg, &arg); ceph_con_send(&arg.session->s_con, msg); return 0; From patchwork Wed Apr 12 11:08:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1135C77B6E for ; Wed, 12 Apr 2023 11:13:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229944AbjDLLNN (ORCPT ); Wed, 12 Apr 2023 07:13:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229971AbjDLLNL (ORCPT ); Wed, 12 Apr 2023 07:13:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AE1383DE for ; Wed, 12 Apr 2023 04:12:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+NSgmP5LS22QeQEbH9+qg1lPqXHX2JdmCG9rY8CvN0=; b=St4uD03Ia/cS0HV3hmgURjG/gJZY95iJFH9sIDfKOjDd8Tc8vH/dLbxMuh3Lwqddkfjc75 FQsjnDu1/ntQ5g8QCRclDgSgGi7clawXqjT3YTyE7ghAIxjsHzqoQ3NMY67zGdzd+mEXIa 2YElwmHc3FAFE7J2+eSaAYVZn+ho+4c= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-283-77hLMT9eNoK0mj5YDLJVhA-1; Wed, 12 Apr 2023 07:11:00 -0400 X-MC-Unique: 77hLMT9eNoK0mj5YDLJVhA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3AB0E1C08795; Wed, 12 Apr 2023 11:11:00 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4B274C15BB8; Wed, 12 Apr 2023 11:10:55 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 15/71] ceph: implement -o test_dummy_encryption mount option Date: Wed, 12 Apr 2023 19:08:34 +0800 Message-Id: <20230412110930.176835-16-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add support for the test_dummy_encryption mount option. This allows us to test the encrypted codepaths in ceph without having to manually set keys, etc. Luís Henriques has fixed a potential 'fsc->fsc_dummy_enc_policy' memory leak bug in mount error path. Cc: Luís Henriques Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton Signed-off-by: Xiubo Li --- fs/ceph/crypto.c | 53 ++++++++++++++++++++++++++++++ fs/ceph/crypto.h | 26 +++++++++++++++ fs/ceph/inode.c | 10 ++++-- fs/ceph/super.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++-- fs/ceph/super.h | 9 ++++- fs/ceph/xattr.c | 3 ++ 6 files changed, 180 insertions(+), 6 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index a513ff373b13..fd3192917e8d 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -4,6 +4,7 @@ #include #include "super.h" +#include "mds_client.h" #include "crypto.h" static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len) @@ -64,9 +65,15 @@ static bool ceph_crypt_empty_dir(struct inode *inode) return ci->i_rsubdirs + ci->i_rfiles == 1; } +static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb) +{ + return ceph_sb_to_client(sb)->fsc_dummy_enc_policy.policy; +} + static struct fscrypt_operations ceph_fscrypt_ops = { .get_context = ceph_crypt_get_context, .set_context = ceph_crypt_set_context, + .get_dummy_policy = ceph_get_dummy_policy, .empty_dir = ceph_crypt_empty_dir, }; @@ -74,3 +81,49 @@ void ceph_fscrypt_set_ops(struct super_block *sb) { fscrypt_set_ops(sb, &ceph_fscrypt_ops); } + +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc) +{ + fscrypt_free_dummy_policy(&fsc->fsc_dummy_enc_policy); +} + +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, + struct ceph_acl_sec_ctx *as) +{ + int ret, ctxsize; + bool encrypted = false; + struct ceph_inode_info *ci = ceph_inode(inode); + + ret = fscrypt_prepare_new_inode(dir, inode, &encrypted); + if (ret) + return ret; + if (!encrypted) + return 0; + + as->fscrypt_auth = kzalloc(sizeof(*as->fscrypt_auth), GFP_KERNEL); + if (!as->fscrypt_auth) + return -ENOMEM; + + ctxsize = fscrypt_context_for_new_inode(as->fscrypt_auth->cfa_blob, inode); + if (ctxsize < 0) + return ctxsize; + + as->fscrypt_auth->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION); + as->fscrypt_auth->cfa_blob_len = cpu_to_le32(ctxsize); + + WARN_ON_ONCE(ci->fscrypt_auth); + kfree(ci->fscrypt_auth); + ci->fscrypt_auth_len = ceph_fscrypt_auth_len(as->fscrypt_auth); + ci->fscrypt_auth = kmemdup(as->fscrypt_auth, ci->fscrypt_auth_len, GFP_KERNEL); + if (!ci->fscrypt_auth) + return -ENOMEM; + + inode->i_flags |= S_ENCRYPTED; + + return 0; +} + +void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as) +{ + swap(req->r_fscrypt_auth, as->fscrypt_auth); +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 6dca674f79b8..cb00fe42d5b7 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -8,6 +8,10 @@ #include +struct ceph_fs_client; +struct ceph_acl_sec_ctx; +struct ceph_mds_request; + struct ceph_fscrypt_auth { __le32 cfa_version; __le32 cfa_blob_len; @@ -25,12 +29,34 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa) #ifdef CONFIG_FS_ENCRYPTION void ceph_fscrypt_set_ops(struct super_block *sb); +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc); + +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, + struct ceph_acl_sec_ctx *as); +void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); + #else /* CONFIG_FS_ENCRYPTION */ static inline void ceph_fscrypt_set_ops(struct super_block *sb) { } +static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc) +{ +} + +static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, + struct ceph_acl_sec_ctx *as) +{ + if (IS_ENCRYPTED(dir)) + return -EOPNOTSUPP; + return 0; +} + +static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, + struct ceph_acl_sec_ctx *as_ctx) +{ +} #endif /* CONFIG_FS_ENCRYPTION */ #endif diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 36ab0bb56a07..8b6b12a5a5e2 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -83,12 +83,17 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry, goto out_err; } + inode->i_state = 0; + inode->i_mode = *mode; + err = ceph_security_init_secctx(dentry, *mode, as_ctx); if (err < 0) goto out_err; - inode->i_state = 0; - inode->i_mode = *mode; + err = ceph_fscrypt_prepare_context(dir, inode, as_ctx); + if (err) + goto out_err; + return inode; out_err: iput(inode); @@ -101,6 +106,7 @@ void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *a req->r_pagelist = as_ctx->pagelist; as_ctx->pagelist = NULL; } + ceph_fscrypt_as_ctx_to_req(req, as_ctx); } /** diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 88e3f3bacc4e..b9dd2fa36d8b 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -47,6 +47,7 @@ static void ceph_put_super(struct super_block *s) struct ceph_fs_client *fsc = ceph_sb_to_client(s); dout("put_super\n"); + ceph_fscrypt_free_dummy_policy(fsc); ceph_mdsc_close_sessions(fsc->mdsc); } @@ -152,6 +153,7 @@ enum { Opt_recover_session, Opt_source, Opt_mon_addr, + Opt_test_dummy_encryption, /* string args above */ Opt_dirstat, Opt_rbytes, @@ -194,6 +196,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = { fsparam_string ("fsc", Opt_fscache), // fsc=... fsparam_flag_no ("ino32", Opt_ino32), fsparam_string ("mds_namespace", Opt_mds_namespace), + fsparam_string ("mon_addr", Opt_mon_addr), fsparam_flag_no ("poolperm", Opt_poolperm), fsparam_flag_no ("quotadf", Opt_quotadf), fsparam_u32 ("rasize", Opt_rasize), @@ -205,7 +208,8 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = { fsparam_u32 ("rsize", Opt_rsize), fsparam_string ("snapdirname", Opt_snapdirname), fsparam_string ("source", Opt_source), - fsparam_string ("mon_addr", Opt_mon_addr), + fsparam_flag ("test_dummy_encryption", Opt_test_dummy_encryption), + fsparam_string ("test_dummy_encryption", Opt_test_dummy_encryption), fsparam_u32 ("wsize", Opt_wsize), fsparam_flag_no ("wsync", Opt_wsync), fsparam_flag_no ("pagecache", Opt_pagecache), @@ -585,6 +589,23 @@ static int ceph_parse_mount_param(struct fs_context *fc, else fsopt->flags |= CEPH_MOUNT_OPT_SPARSEREAD; break; + case Opt_test_dummy_encryption: +#ifdef CONFIG_FS_ENCRYPTION + fscrypt_free_dummy_policy(&fsopt->dummy_enc_policy); + ret = fscrypt_parse_test_dummy_encryption(param, + &fsopt->dummy_enc_policy); + if (ret == -EINVAL) { + warnfc(fc, "Value of option \"%s\" is unrecognized", + param->key); + } else if (ret == -EEXIST) { + warnfc(fc, "Conflicting test_dummy_encryption options"); + ret = -EINVAL; + } +#else + warnfc(fc, + "FS encryption not supported: test_dummy_encryption mount option ignored"); +#endif + break; default: BUG(); } @@ -605,6 +626,7 @@ static void destroy_mount_options(struct ceph_mount_options *args) kfree(args->server_path); kfree(args->fscache_uniq); kfree(args->mon_addr); + fscrypt_free_dummy_policy(&args->dummy_enc_policy); kfree(args); } @@ -724,6 +746,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) if (fsopt->flags & CEPH_MOUNT_OPT_SPARSEREAD) seq_puts(m, ",sparseread"); + fscrypt_show_test_dummy_encryption(m, ',', root->d_sb); + if (fsopt->wsize != CEPH_MAX_WRITE_SIZE) seq_printf(m, ",wsize=%u", fsopt->wsize); if (fsopt->rsize != CEPH_MAX_READ_SIZE) @@ -1062,6 +1086,50 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc, return root; } +#ifdef CONFIG_FS_ENCRYPTION +static int ceph_apply_test_dummy_encryption(struct super_block *sb, + struct fs_context *fc, + struct ceph_mount_options *fsopt) +{ + struct ceph_fs_client *fsc = sb->s_fs_info; + + if (!fscrypt_is_dummy_policy_set(&fsopt->dummy_enc_policy)) + return 0; + + /* No changing encryption context on remount. */ + if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE && + !fscrypt_is_dummy_policy_set(&fsc->fsc_dummy_enc_policy)) { + if (fscrypt_dummy_policies_equal(&fsopt->dummy_enc_policy, + &fsc->fsc_dummy_enc_policy)) + return 0; + errorfc(fc, "Can't set test_dummy_encryption on remount"); + return -EINVAL; + } + + /* Also make sure fsopt doesn't contain a conflicting value. */ + if (fscrypt_is_dummy_policy_set(&fsc->fsc_dummy_enc_policy)) { + if (fscrypt_dummy_policies_equal(&fsopt->dummy_enc_policy, + &fsc->fsc_dummy_enc_policy)) + return 0; + errorfc(fc, "Conflicting test_dummy_encryption options"); + return -EINVAL; + } + + fsc->fsc_dummy_enc_policy = fsopt->dummy_enc_policy; + memset(&fsopt->dummy_enc_policy, 0, sizeof(fsopt->dummy_enc_policy)); + + warnfc(fc, "test_dummy_encryption mode enabled"); + return 0; +} +#else +static int ceph_apply_test_dummy_encryption(struct super_block *sb, + struct fs_context *fc, + struct ceph_mount_options *fsopt) +{ + return 0; +} +#endif + /* * mount: join the ceph cluster, and open root directory. */ @@ -1090,6 +1158,10 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc, goto out; } + err = ceph_apply_test_dummy_encryption(fsc->sb, fc, fsc->mount_options); + if (err) + goto out; + dout("mount opening path '%s'\n", path); ceph_fs_debugfs_init(fsc); @@ -1111,6 +1183,7 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc, out: mutex_unlock(&fsc->client->mount_mutex); + ceph_fscrypt_free_dummy_policy(fsc); return ERR_PTR(err); } @@ -1299,9 +1372,15 @@ static void ceph_free_fc(struct fs_context *fc) static int ceph_reconfigure_fc(struct fs_context *fc) { + int err; struct ceph_parse_opts_ctx *pctx = fc->fs_private; struct ceph_mount_options *fsopt = pctx->opts; - struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb); + struct super_block *sb = fc->root->d_sb; + struct ceph_fs_client *fsc = ceph_sb_to_client(sb); + + err = ceph_apply_test_dummy_encryption(sb, fc, fsopt); + if (err) + return err; if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS) ceph_set_mount_opt(fsc, ASYNC_DIROPS); @@ -1320,7 +1399,7 @@ static int ceph_reconfigure_fc(struct fs_context *fc) pr_notice("ceph: monitor addresses recorded, but not used for reconnection"); } - sync_filesystem(fc->root->d_sb); + sync_filesystem(sb); return 0; } diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 8d4984a8b15b..355701a1ac70 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -22,6 +22,7 @@ #include #include +#include "crypto.h" /* large granularity for statfs utilization stats to facilitate * large volume sizes on 32-bit machines. */ @@ -99,6 +100,7 @@ struct ceph_mount_options { char *server_path; /* default NULL (means "/") */ char *fscache_uniq; /* default NULL */ char *mon_addr; + struct fscrypt_dummy_policy dummy_enc_policy; }; /* mount state */ @@ -155,9 +157,11 @@ struct ceph_fs_client { #ifdef CONFIG_CEPH_FSCACHE struct fscache_volume *fscache; #endif +#ifdef CONFIG_FS_ENCRYPTION + struct fscrypt_dummy_policy fsc_dummy_enc_policy; +#endif }; - /* * File i/o capability. This tracks shared state with the metadata * server that allows us to cache or writeback attributes or to read @@ -1106,6 +1110,9 @@ struct ceph_acl_sec_ctx { #ifdef CONFIG_CEPH_FS_SECURITY_LABEL void *sec_ctx; u32 sec_ctxlen; +#endif +#ifdef CONFIG_FS_ENCRYPTION + struct ceph_fscrypt_auth *fscrypt_auth; #endif struct ceph_pagelist *pagelist; }; diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index f65b07cc33a2..d2242d7b082e 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -1401,6 +1401,9 @@ void ceph_release_acl_sec_ctx(struct ceph_acl_sec_ctx *as_ctx) #endif #ifdef CONFIG_CEPH_FS_SECURITY_LABEL security_release_secctx(as_ctx->sec_ctx, as_ctx->sec_ctxlen); +#endif +#ifdef CONFIG_FS_ENCRYPTION + kfree(as_ctx->fscrypt_auth); #endif if (as_ctx->pagelist) ceph_pagelist_release(as_ctx->pagelist); From patchwork Wed Apr 12 11:08:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB567C77B6E for ; Wed, 12 Apr 2023 11:12:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229914AbjDLLMu (ORCPT ); Wed, 12 Apr 2023 07:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229825AbjDLLMt (ORCPT ); Wed, 12 Apr 2023 07:12:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CFF47D97 for ; Wed, 12 Apr 2023 04:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297868; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tzDmkwK1M6hKhU+OESUxNMu5vf1KN2eV3V1fRZonik4=; b=i6HXhlPzN2N7oDmah8xoUPd0U2pcYDnHj9brZ4H85X2VPUQtHFqnDGGJNQaWwwXsDlhZhE pzg1lx9kUCtIl28MtvfdzDNgNuTtIKUoySQKRHs5vbBYso2iuGfRaSoGEChPk6LIYSqnN5 8j1KLkQA7HwXvcVY1kPoC56aF81ev7M= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-107-Cu47rbwANn2leq_rXg86Qg-1; Wed, 12 Apr 2023 07:11:05 -0400 X-MC-Unique: Cu47rbwANn2leq_rXg86Qg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 369433C0F37F; Wed, 12 Apr 2023 11:11:05 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43C28C15BB8; Wed, 12 Apr 2023 11:11:00 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 16/71] ceph: decode alternate_name in lease info Date: Wed, 12 Apr 2023 19:08:35 +0800 Message-Id: <20230412110930.176835-17-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Ceph is a bit different from local filesystems, in that we don't want to store filenames as raw binary data, since we may also be dealing with clients that don't support fscrypt. We could just base64-encode the encrypted filenames, but that could leave us with filenames longer than NAME_MAX. It turns out that the MDS doesn't care much about filename length, but the clients do. To manage this, we've added a new "alternate name" field that can be optionally added to any dentry that we'll use to store the binary crypttext of the filename if its base64-encoded value will be longer than NAME_MAX. When a dentry has one of these names attached, the MDS will send it along in the lease info, which we can then store for later usage. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Jeff Layton Signed-off-by: Xiubo Li --- fs/ceph/mds_client.c | 43 +++++++++++++++++++++++++++++++++---------- fs/ceph/mds_client.h | 5 +++++ 2 files changed, 38 insertions(+), 10 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index a72f1961d3e1..d371e8bfe19d 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -308,27 +308,47 @@ static int parse_reply_info_dir(void **p, void *end, static int parse_reply_info_lease(void **p, void *end, struct ceph_mds_reply_lease **lease, - u64 features) + u64 features, u32 *altname_len, u8 **altname) { + u8 struct_v; + u32 struct_len; + void *lend; + if (features == (u64)-1) { - u8 struct_v, struct_compat; - u32 struct_len; + u8 struct_compat; + ceph_decode_8_safe(p, end, struct_v, bad); ceph_decode_8_safe(p, end, struct_compat, bad); + /* struct_v is expected to be >= 1. we only understand * encoding whose struct_compat == 1. */ if (!struct_v || struct_compat != 1) goto bad; + ceph_decode_32_safe(p, end, struct_len, bad); - ceph_decode_need(p, end, struct_len, bad); - end = *p + struct_len; + } else { + struct_len = sizeof(**lease); + *altname_len = 0; + *altname = NULL; } - ceph_decode_need(p, end, sizeof(**lease), bad); + lend = *p + struct_len; + ceph_decode_need(p, end, struct_len, bad); *lease = *p; *p += sizeof(**lease); - if (features == (u64)-1) - *p = end; + + if (features == (u64)-1) { + if (struct_v >= 2) { + ceph_decode_32_safe(p, end, *altname_len, bad); + ceph_decode_need(p, end, *altname_len, bad); + *altname = *p; + *p += *altname_len; + } else { + *altname = NULL; + *altname_len = 0; + } + } + *p = lend; return 0; bad: return -EIO; @@ -358,7 +378,8 @@ static int parse_reply_info_trace(void **p, void *end, info->dname = *p; *p += info->dname_len; - err = parse_reply_info_lease(p, end, &info->dlease, features); + err = parse_reply_info_lease(p, end, &info->dlease, features, + &info->altname_len, &info->altname); if (err < 0) goto out_bad; } @@ -425,9 +446,11 @@ static int parse_reply_info_readdir(void **p, void *end, dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name); /* dentry lease */ - err = parse_reply_info_lease(p, end, &rde->lease, features); + err = parse_reply_info_lease(p, end, &rde->lease, features, + &rde->altname_len, &rde->altname); if (err) goto out_bad; + /* inode */ err = parse_reply_info_in(p, end, &rde->inode, features); if (err < 0) diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 00838adaf694..b3e3fcefef4a 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -44,6 +44,7 @@ enum ceph_feature_type { CEPHFS_FEATURE_MULTI_RECONNECT, \ CEPHFS_FEATURE_DELEG_INO, \ CEPHFS_FEATURE_METRIC_COLLECT, \ + CEPHFS_FEATURE_ALTERNATE_NAME, \ CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ CEPHFS_FEATURE_OP_GETVXATTR, \ } @@ -96,7 +97,9 @@ struct ceph_mds_reply_info_in { struct ceph_mds_reply_dir_entry { char *name; + u8 *altname; u32 name_len; + u32 altname_len; struct ceph_mds_reply_lease *lease; struct ceph_mds_reply_info_in inode; loff_t offset; @@ -120,7 +123,9 @@ struct ceph_mds_reply_info_parsed { struct ceph_mds_reply_info_in diri, targeti; struct ceph_mds_reply_dirfrag *dirfrag; char *dname; + u8 *altname; u32 dname_len; + u32 altname_len; struct ceph_mds_reply_lease *dlease; struct ceph_mds_reply_xattr xattr_info; From patchwork Wed Apr 12 11:08:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54784C77B71 for ; Wed, 12 Apr 2023 11:12:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229903AbjDLLMz (ORCPT ); Wed, 12 Apr 2023 07:12:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229933AbjDLLMy (ORCPT ); Wed, 12 Apr 2023 07:12:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B10AC7EC1 for ; Wed, 12 Apr 2023 04:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297872; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V+2eMWyrtN5UzMycRrfA7OT0wxrT9+zFhCg6iiisbG0=; b=Qjn48fFdmWwzX/OvXC0MRAu6NXx9tMUXLEbIh4vjFSI4A3TVM0YPuJsYsztMi0o5wu/2ui DbgKaSp4h4rA0bLLVxWoMldDvbshyYLWeEvBFxgeRSznYywy6aMXJwxA829f+EDBxnhGJ5 vURMR3hzehXOm0GvcORorMWvC5/egg4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-347-VKXseXAZM467nqJrS18saA-1; Wed, 12 Apr 2023 07:11:10 -0400 X-MC-Unique: VKXseXAZM467nqJrS18saA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19DBB28237C4; Wed, 12 Apr 2023 11:11:10 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1B4D7C15BB8; Wed, 12 Apr 2023 11:11:05 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 17/71] ceph: add fscrypt ioctls Date: Wed, 12 Apr 2023 19:08:36 +0800 Message-Id: <20230412110930.176835-18-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton We gate most of the ioctls on MDS feature support. The exception is the key removal and status functions that we still want to work if the MDS's were to (inexplicably) lose the feature. For the set_policy ioctl, we take Fs caps to ensure that nothing can create files in the directory while the ioctl is running. That should be enough to ensure that the "empty_dir" check is reliable. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/ioctl.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c index deac817647eb..2a5c48107026 100644 --- a/fs/ceph/ioctl.c +++ b/fs/ceph/ioctl.c @@ -6,6 +6,7 @@ #include "mds_client.h" #include "ioctl.h" #include +#include /* * ioctls @@ -268,8 +269,54 @@ static long ceph_ioctl_syncio(struct file *file) return 0; } +static int vet_mds_for_fscrypt(struct file *file) +{ + int i, ret = -EOPNOTSUPP; + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb); + + mutex_lock(&mdsc->mutex); + for (i = 0; i < mdsc->max_sessions; i++) { + struct ceph_mds_session *s = mdsc->sessions[i]; + + if (!s) + continue; + if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features)) + ret = 0; + break; + } + mutex_unlock(&mdsc->mutex); + return ret; +} + +static long ceph_set_encryption_policy(struct file *file, unsigned long arg) +{ + int ret, got = 0; + struct inode *inode = file_inode(file); + struct ceph_inode_info *ci = ceph_inode(inode); + + ret = vet_mds_for_fscrypt(file); + if (ret) + return ret; + + /* + * Ensure we hold these caps so that we _know_ that the rstats check + * in the empty_dir check is reliable. + */ + ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got); + if (ret) + return ret; + + ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg); + if (got) + ceph_put_cap_refs(ci, got); + + return ret; +} + long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { + int ret; + dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg); switch (cmd) { case CEPH_IOC_GET_LAYOUT: @@ -289,6 +336,42 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case CEPH_IOC_SYNCIO: return ceph_ioctl_syncio(file); + + case FS_IOC_SET_ENCRYPTION_POLICY: + return ceph_set_encryption_policy(file, arg); + + case FS_IOC_GET_ENCRYPTION_POLICY: + ret = vet_mds_for_fscrypt(file); + if (ret) + return ret; + return fscrypt_ioctl_get_policy(file, (void __user *)arg); + + case FS_IOC_GET_ENCRYPTION_POLICY_EX: + ret = vet_mds_for_fscrypt(file); + if (ret) + return ret; + return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg); + + case FS_IOC_ADD_ENCRYPTION_KEY: + ret = vet_mds_for_fscrypt(file); + if (ret) + return ret; + return fscrypt_ioctl_add_key(file, (void __user *)arg); + + case FS_IOC_REMOVE_ENCRYPTION_KEY: + return fscrypt_ioctl_remove_key(file, (void __user *)arg); + + case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS: + return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg); + + case FS_IOC_GET_ENCRYPTION_KEY_STATUS: + return fscrypt_ioctl_get_key_status(file, (void __user *)arg); + + case FS_IOC_GET_ENCRYPTION_NONCE: + ret = vet_mds_for_fscrypt(file); + if (ret) + return ret; + return fscrypt_ioctl_get_nonce(file, (void __user *)arg); } return -ENOTTY; From patchwork Wed Apr 12 11:08:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA1DFC77B71 for ; Wed, 12 Apr 2023 11:12:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229955AbjDLLM5 (ORCPT ); Wed, 12 Apr 2023 07:12:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229555AbjDLLM4 (ORCPT ); Wed, 12 Apr 2023 07:12:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6B7F7DA0 for ; Wed, 12 Apr 2023 04:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297879; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UlTMnX+6hCLFJK8VcYvqjXNsov60PvDnXkXlkAVoTcg=; b=U9iJUMdAak0jTwU4ZE75b2a7/qgPLjIBCymWhULD5ZO/SOq84SqCfx9oBsnnLAQQhDX4ru b/Py6kwR96jJaIjuMUgKD1RPIAQJHMdZXT62rP0MNSQ1FVWUqkv31Ff0abtI7RlqOhRqZr 2DkYpiShHEIp6kwmPNHqTKN5uJavB/Y= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-609-AEHJ3f7YMSm4oFQ0_L2qkA-1; Wed, 12 Apr 2023 07:11:15 -0400 X-MC-Unique: AEHJ3f7YMSm4oFQ0_L2qkA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14CDB1C08795; Wed, 12 Apr 2023 11:11:15 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3541EC15BB8; Wed, 12 Apr 2023 11:11:10 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 18/71] ceph: make the ioctl cmd more readable in debug log Date: Wed, 12 Apr 2023 19:08:37 +0800 Message-Id: <20230412110930.176835-19-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li ioctl file 0000000004e6b054 cmd 2148296211 arg 824635143532 The numerical cmd valye in the ioctl debug log message is too hard to understand even when you look at it in the code. Make it more readable. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/ioctl.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c index 2a5c48107026..7f0e181d4329 100644 --- a/fs/ceph/ioctl.c +++ b/fs/ceph/ioctl.c @@ -313,11 +313,48 @@ static long ceph_set_encryption_policy(struct file *file, unsigned long arg) return ret; } +static const char *ceph_ioctl_cmd_name(const unsigned int cmd) +{ + switch (cmd) { + case CEPH_IOC_GET_LAYOUT: + return "get_layout"; + case CEPH_IOC_SET_LAYOUT: + return "set_layout"; + case CEPH_IOC_SET_LAYOUT_POLICY: + return "set_layout_policy"; + case CEPH_IOC_GET_DATALOC: + return "get_dataloc"; + case CEPH_IOC_LAZYIO: + return "lazyio"; + case CEPH_IOC_SYNCIO: + return "syncio"; + case FS_IOC_SET_ENCRYPTION_POLICY: + return "set encryption_policy"; + case FS_IOC_GET_ENCRYPTION_POLICY: + return "get_encryption_policy"; + case FS_IOC_GET_ENCRYPTION_POLICY_EX: + return "get_encryption_policy_ex"; + case FS_IOC_ADD_ENCRYPTION_KEY: + return "add_encryption_key"; + case FS_IOC_REMOVE_ENCRYPTION_KEY: + return "remove_encryption_key"; + case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS: + return "remove_encryption_key_all_users"; + case FS_IOC_GET_ENCRYPTION_KEY_STATUS: + return "get_encryption_key_status"; + case FS_IOC_GET_ENCRYPTION_NONCE: + return "get_encryption_nonce"; + default: + return "unknown"; + } +} + long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { int ret; - dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg); + dout("ioctl file %p cmd %s arg %lu\n", file, + ceph_ioctl_cmd_name(cmd), arg); switch (cmd) { case CEPH_IOC_GET_LAYOUT: return ceph_ioctl_get_layout(file, (void __user *)arg); From patchwork Wed Apr 12 11:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208812 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7277C7619A for ; Wed, 12 Apr 2023 11:12:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229884AbjDLLM6 (ORCPT ); Wed, 12 Apr 2023 07:12:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229971AbjDLLM4 (ORCPT ); Wed, 12 Apr 2023 07:12:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 021C46592 for ; Wed, 12 Apr 2023 04:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LHS9cNWtOWSOEauG1jdTIaZn1BpR3ObUlXsHzIxwOAo=; b=cpU+82bQR4/CfvFpkdDavfQzkINvMBag8wPbE8q2UGMt7A8K9Vy+jMKJoT4u0JXfXqiQCq TN3v6naMi5HY/RGDZD7jeO7CblQ3CRpONYf3DYPRJ1UnkV4/fX8oX0f4DMMh/o5AFFVC71 055uaO3xSePnrYpzF38+KOOxqSEcXFo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594-pGiDIdw0PsKNe3qOfCBNyQ-1; Wed, 12 Apr 2023 07:11:20 -0400 X-MC-Unique: pGiDIdw0PsKNe3qOfCBNyQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E34651C08795; Wed, 12 Apr 2023 11:11:19 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id EBE53C15BBA; Wed, 12 Apr 2023 11:11:15 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 19/71] ceph: add base64 endcoding routines for encrypted names Date: Wed, 12 Apr 2023 19:08:38 +0800 Message-Id: <20230412110930.176835-20-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques The base64url encoding used by fscrypt includes the '_' character, which may cause problems in snapshot names (if the name starts with '_'). Thus, use the base64 encoding defined for IMAP mailbox names (RFC 3501), which uses '+' and ',' instead of '-' and '_'. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/ceph/crypto.h | 32 ++++++++++++++++++++++++++ 2 files changed, 92 insertions(+) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index fd3192917e8d..947ac98119aa 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -1,4 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * The base64 encode/decode code was copied from fscrypt: + * Copyright (C) 2015, Google, Inc. + * Copyright (C) 2015, Motorola Mobility + * Written by Uday Savagaonkar, 2014. + * Modified by Jaegeuk Kim, 2015. + */ #include #include #include @@ -7,6 +14,59 @@ #include "mds_client.h" #include "crypto.h" +/* + * The base64url encoding used by fscrypt includes the '_' character, which may + * cause problems in snapshot names (which can not starts with '_'). Thus, we + * used the base64 encoding defined for IMAP mailbox names (RFC 3501) instead, + * which replaces '-' and '_' by '+' and ','. + */ +static const char base64_table[65] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; + +int ceph_base64_encode(const u8 *src, int srclen, char *dst) +{ + u32 ac = 0; + int bits = 0; + int i; + char *cp = dst; + + for (i = 0; i < srclen; i++) { + ac = (ac << 8) | src[i]; + bits += 8; + do { + bits -= 6; + *cp++ = base64_table[(ac >> bits) & 0x3f]; + } while (bits >= 6); + } + if (bits) + *cp++ = base64_table[(ac << (6 - bits)) & 0x3f]; + return cp - dst; +} + +int ceph_base64_decode(const char *src, int srclen, u8 *dst) +{ + u32 ac = 0; + int bits = 0; + int i; + u8 *bp = dst; + + for (i = 0; i < srclen; i++) { + const char *p = strchr(base64_table, src[i]); + + if (p == NULL || src[i] == 0) + return -1; + ac = (ac << 6) | (p - base64_table); + bits += 6; + if (bits >= 8) { + bits -= 8; + *bp++ = (u8)(ac >> bits); + } + } + if (ac & ((1 << bits) - 1)) + return -1; + return bp - dst; +} + static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len) { struct ceph_inode_info *ci = ceph_inode(inode); diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index cb00fe42d5b7..f5d38d8a1995 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -27,6 +27,38 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa) } #ifdef CONFIG_FS_ENCRYPTION +/* + * We want to encrypt filenames when creating them, but the encrypted + * versions of those names may have illegal characters in them. To mitigate + * that, we base64 encode them, but that gives us a result that can exceed + * NAME_MAX. + * + * Follow a similar scheme to fscrypt itself, and cap the filename to a + * smaller size. If the ciphertext name is longer than the value below, then + * sha256 hash the remaining bytes. + * + * For the fscrypt_nokey_name struct the dirhash[2] member is useless in ceph + * so the corresponding struct will be: + * + * struct fscrypt_ceph_nokey_name { + * u8 bytes[157]; + * u8 sha256[SHA256_DIGEST_SIZE]; + * }; // 180 bytes => 240 bytes base64-encoded, which is <= NAME_MAX (255) + * + * (240 bytes is the maximum size allowed for snapshot names to take into + * account the format: '__'.) + * + * Note that for long names that end up having their tail portion hashed, we + * must also store the full encrypted name (in the dentry's alternate_name + * field). + */ +#define CEPH_NOHASH_NAME_MAX (180 - SHA256_DIGEST_SIZE) + +#define CEPH_BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) + +int ceph_base64_encode(const u8 *src, int srclen, char *dst); +int ceph_base64_decode(const char *src, int srclen, u8 *dst); + void ceph_fscrypt_set_ops(struct super_block *sb); void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc); From patchwork Wed Apr 12 11:08:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B619C77B71 for ; Wed, 12 Apr 2023 11:13:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229985AbjDLLM7 (ORCPT ); Wed, 12 Apr 2023 07:12:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229965AbjDLLM5 (ORCPT ); Wed, 12 Apr 2023 07:12:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B61BF6A68 for ; Wed, 12 Apr 2023 04:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SWhckWNoxdbZejAFuEdA07DEsv9demG8LIKzx79ew/o=; b=BhhbHmhdsL5bfgIy60R9Uuuv6HNfxjUrFv3SJV+CoNgrf16E3HP9epdYidhrGfDy9ZQdd+ MG1blToymPpkN3p7gMc3FF+hUQJZr0qecmkZ3t4xDia75L2xmzsA1A8nyB1Sw2nFzKOAyN /afyW4EMDn+Z8/54qCF8tZYWgvr81s0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-3SCzA1GeMMe8cg8kQWS-EQ-1; Wed, 12 Apr 2023 07:11:24 -0400 X-MC-Unique: 3SCzA1GeMMe8cg8kQWS-EQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 68C0F101A54F; Wed, 12 Apr 2023 11:11:24 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1D04C15BB8; Wed, 12 Apr 2023 11:11:20 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 20/71] ceph: add encrypted fname handling to ceph_mdsc_build_path Date: Wed, 12 Apr 2023 19:08:39 +0800 Message-Id: <20230412110930.176835-21-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Allow ceph_mdsc_build_path to encrypt and base64 encode the filename when the parent is encrypted and we're sending the path to the MDS. In most cases, we just encrypt the filenames and base64 encode them, but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar scheme to fscrypt proper, and hash the remaning bits with sha256. When doing this, we then send along the full crypttext of the name in the new alternate_name field of the MClientRequest. The MDS can then send that along in readdir responses and traces. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 47 ++++++++++++++++++++++++++ fs/ceph/crypto.h | 8 +++++ fs/ceph/mds_client.c | 80 ++++++++++++++++++++++++++++++++++---------- 3 files changed, 117 insertions(+), 18 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 947ac98119aa..e54ea4c3e478 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -187,3 +187,50 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se { swap(req->r_fscrypt_auth, as->fscrypt_auth); } + +int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) +{ + u32 len; + int elen; + int ret; + u8 *cryptbuf; + + WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)); + + /* + * Convert cleartext d_name to ciphertext. If result is longer than + * CEPH_NOHASH_NAME_MAX, sha256 the remaining bytes + * + * See: fscrypt_setup_filename + */ + if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len)) + return -ENAMETOOLONG; + + /* Allocate a buffer appropriate to hold the result */ + cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL); + if (!cryptbuf) + return -ENOMEM; + + ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len); + if (ret) { + kfree(cryptbuf); + return ret; + } + + /* hash the end if the name is long enough */ + if (len > CEPH_NOHASH_NAME_MAX) { + u8 hash[SHA256_DIGEST_SIZE]; + u8 *extra = cryptbuf + CEPH_NOHASH_NAME_MAX; + + /* hash the extra bytes and overwrite crypttext beyond that point with it */ + sha256(extra, len - CEPH_NOHASH_NAME_MAX, hash); + memcpy(extra, hash, SHA256_DIGEST_SIZE); + len = CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE; + } + + /* base64 encode the encrypted name */ + elen = ceph_base64_encode(cryptbuf, len, buf); + kfree(cryptbuf); + dout("base64-encoded ciphertext name = %.*s\n", elen, buf); + return elen; +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index f5d38d8a1995..8abf10604722 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -6,6 +6,7 @@ #ifndef _CEPH_CRYPTO_H #define _CEPH_CRYPTO_H +#include #include struct ceph_fs_client; @@ -66,6 +67,7 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc); int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, struct ceph_acl_sec_ctx *as); void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); +int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf); #else /* CONFIG_FS_ENCRYPTION */ @@ -89,6 +91,12 @@ static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx) { } + +static inline int ceph_encode_encrypted_fname(const struct inode *parent, + struct dentry *dentry, char *buf) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_FS_ENCRYPTION */ #endif diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index d371e8bfe19d..a5e5349aac38 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -14,6 +14,7 @@ #include #include "super.h" +#include "crypto.h" #include "mds_client.h" #include "crypto.h" @@ -2469,18 +2470,27 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc) return mdsc->oldest_tid; } -/* - * Build a dentry's path. Allocate on heap; caller must kfree. Based - * on build_path_from_dentry in fs/cifs/dir.c. +/** + * ceph_mdsc_build_path - build a path string to a given dentry + * @dentry: dentry to which path should be built + * @plen: returned length of string + * @pbase: returned base inode number + * @for_wire: is this path going to be sent to the MDS? + * + * Build a string that represents the path to the dentry. This is mostly called + * for two different purposes: * - * If @stop_on_nosnap, generate path relative to the first non-snapped - * inode. + * 1) we need to build a path string to send to the MDS (for_wire == true) + * 2) we need a path string for local presentation (e.g. debugfs) (for_wire == false) + * + * The path is built in reverse, starting with the dentry. Walk back up toward + * the root, building the path until the first non-snapped inode is reached (for_wire) + * or the root inode is reached (!for_wire). * * Encode hidden .snap dirs as a double /, i.e. * foo/.snap/bar -> foo//bar */ -char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, - int stop_on_nosnap) +char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for_wire) { struct dentry *cur; struct inode *inode; @@ -2502,30 +2512,65 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, seq = read_seqbegin(&rename_lock); cur = dget(dentry); for (;;) { - struct dentry *temp; + struct dentry *parent; spin_lock(&cur->d_lock); inode = d_inode(cur); if (inode && ceph_snap(inode) == CEPH_SNAPDIR) { dout("build_path path+%d: %p SNAPDIR\n", pos, cur); - } else if (stop_on_nosnap && inode && dentry != cur && - ceph_snap(inode) == CEPH_NOSNAP) { + spin_unlock(&cur->d_lock); + parent = dget_parent(cur); + } else if (for_wire && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) { spin_unlock(&cur->d_lock); pos++; /* get rid of any prepended '/' */ break; - } else { + } else if (!for_wire || !IS_ENCRYPTED(d_inode(cur->d_parent))) { pos -= cur->d_name.len; if (pos < 0) { spin_unlock(&cur->d_lock); break; } memcpy(path + pos, cur->d_name.name, cur->d_name.len); + spin_unlock(&cur->d_lock); + parent = dget_parent(cur); + } else { + int len, ret; + char buf[NAME_MAX]; + + /* + * Proactively copy name into buf, in case we need to present + * it as-is. + */ + memcpy(buf, cur->d_name.name, cur->d_name.len); + len = cur->d_name.len; + spin_unlock(&cur->d_lock); + parent = dget_parent(cur); + + ret = __fscrypt_prepare_readdir(d_inode(parent)); + if (ret < 0) { + dput(parent); + dput(cur); + return ERR_PTR(ret); + } + + if (fscrypt_has_encryption_key(d_inode(parent))) { + len = ceph_encode_encrypted_fname(d_inode(parent), cur, buf); + if (len < 0) { + dput(parent); + dput(cur); + return ERR_PTR(len); + } + } + pos -= len; + if (pos < 0) { + dput(parent); + break; + } + memcpy(path + pos, buf, len); } - temp = cur; - spin_unlock(&temp->d_lock); - cur = dget_parent(temp); - dput(temp); + dput(cur); + cur = parent; /* Are we at the root? */ if (IS_ROOT(cur)) @@ -2549,8 +2594,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, * A rename didn't occur, but somehow we didn't end up where * we thought we would. Throw a warning and try again. */ - pr_warn("build_path did not end path lookup where " - "expected, pos is %d\n", pos); + pr_warn("build_path did not end path lookup where expected (pos = %d)\n", pos); goto retry; } @@ -2570,7 +2614,7 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir, rcu_read_lock(); if (!dir) dir = d_inode_rcu(dentry->d_parent); - if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP) { + if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) { *pino = ceph_ino(dir); rcu_read_unlock(); *ppath = dentry->d_name.name; From patchwork Wed Apr 12 11:08:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208814 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A43EC7619A for ; Wed, 12 Apr 2023 11:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229585AbjDLLNL (ORCPT ); Wed, 12 Apr 2023 07:13:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230045AbjDLLNH (ORCPT ); Wed, 12 Apr 2023 07:13:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11CE94202 for ; Wed, 12 Apr 2023 04:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297893; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zs2lQcuqaw6h/nFNoZJ7131ly8DAwuryOj091vKsJEs=; b=YsBZ6v9KGacw3g/T2Acfz9Ec8J6ObvahrGL+Cv3wXzVsO2caCj7OIwltwzQc1n32JN9QpI 5LWPdsJ3HVuicwWicrkwugBlIbpM9dC6HcXXOZNf3QDJXa9L3Xu2MI2dIn8eM2sxTfWkvb lv1idPN4/wkufcevojEKERn7xH14oms= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-251-QOzeNDHLOBW5ET77HFBQug-1; Wed, 12 Apr 2023 07:11:29 -0400 X-MC-Unique: QOzeNDHLOBW5ET77HFBQug-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BFE591C08795; Wed, 12 Apr 2023 11:11:28 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 313A5C15BB8; Wed, 12 Apr 2023 11:11:24 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 21/71] ceph: send altname in MClientRequest Date: Wed, 12 Apr 2023 19:08:40 +0800 Message-Id: <20230412110930.176835-22-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX, we'll need to hash the tail of the filename. The client however will still need to know the full name of the file if it has a key. To support this, the MClientRequest field has grown a new alternate_name field that we populate with the full (binary) crypttext of the filename. This is then transmitted to the clients in readdir or traces as part of the dentry lease. Add support for populating this field when the filenames are very long. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/mds_client.c | 75 +++++++++++++++++++++++++++++++++++++++++--- fs/ceph/mds_client.h | 3 ++ 2 files changed, 73 insertions(+), 5 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index a5e5349aac38..579faa2c76a1 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1048,6 +1048,7 @@ void ceph_mdsc_release_request(struct kref *kref) if (req->r_pagelist) ceph_pagelist_release(req->r_pagelist); kfree(req->r_fscrypt_auth); + kfree(req->r_altname); put_request_session(req); ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation); WARN_ON_ONCE(!list_empty(&req->r_wait)); @@ -2470,6 +2471,63 @@ static inline u64 __get_oldest_tid(struct ceph_mds_client *mdsc) return mdsc->oldest_tid; } +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen) +{ + struct inode *dir = req->r_parent; + struct dentry *dentry = req->r_dentry; + u8 *cryptbuf = NULL; + u32 len = 0; + int ret = 0; + + /* only encode if we have parent and dentry */ + if (!dir || !dentry) + goto success; + + /* No-op unless this is encrypted */ + if (!IS_ENCRYPTED(dir)) + goto success; + + ret = __fscrypt_prepare_readdir(dir); + if (ret) + return ERR_PTR(ret); + + /* No key? Just ignore it. */ + if (!fscrypt_has_encryption_key(dir)) + goto success; + + if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) { + WARN_ON_ONCE(1); + return ERR_PTR(-ENAMETOOLONG); + } + + /* No need to append altname if name is short enough */ + if (len <= CEPH_NOHASH_NAME_MAX) { + len = 0; + goto success; + } + + cryptbuf = kmalloc(len, GFP_KERNEL); + if (!cryptbuf) + return ERR_PTR(-ENOMEM); + + ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len); + if (ret) { + kfree(cryptbuf); + return ERR_PTR(ret); + } +success: + *plen = len; + return cryptbuf; +} +#else +static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen) +{ + *plen = 0; + return NULL; +} +#endif + /** * ceph_mdsc_build_path - build a path string to a given dentry * @dentry: dentry to which path should be built @@ -2690,14 +2748,15 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request * ceph_encode_timespec64(&ts, &req->r_stamp); ceph_encode_copy(p, &ts, sizeof(ts)); - /* gid_list */ + /* v4: gid_list */ ceph_encode_32(p, req->r_cred->group_info->ngroups); for (i = 0; i < req->r_cred->group_info->ngroups; i++) ceph_encode_64(p, from_kgid(&init_user_ns, req->r_cred->group_info->gid[i])); - /* v5: altname (TODO: skip for now) */ - ceph_encode_32(p, 0); + /* v5: altname */ + ceph_encode_32(p, req->r_altname_len); + ceph_encode_copy(p, req->r_altname, req->r_altname_len); /* v6: fscrypt_auth and fscrypt_file */ if (req->r_fscrypt_auth) { @@ -2753,7 +2812,13 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, goto out_free1; } - /* head */ + req->r_altname = get_fscrypt_altname(req, &req->r_altname_len); + if (IS_ERR(req->r_altname)) { + msg = ERR_CAST(req->r_altname); + req->r_altname = NULL; + goto out_free2; + } + len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head); /* filepaths */ @@ -2779,7 +2844,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups); /* alternate name */ - len += sizeof(u32); // TODO + len += sizeof(u32) + req->r_altname_len; /* fscrypt_auth */ len += sizeof(u32); // fscrypt_auth diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index b3e3fcefef4a..e7af8090b760 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -290,6 +290,9 @@ struct ceph_mds_request { struct ceph_fscrypt_auth *r_fscrypt_auth; + u8 *r_altname; /* fscrypt binary crypttext for long filenames */ + u32 r_altname_len; /* length of r_altname */ + int r_fmode; /* file mode, if expecting cap */ int r_request_release_offset; const struct cred *r_cred; From patchwork Wed Apr 12 11:08:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEDFAC7619A for ; Wed, 12 Apr 2023 11:13:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230008AbjDLLNp (ORCPT ); Wed, 12 Apr 2023 07:13:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbjDLLNm (ORCPT ); Wed, 12 Apr 2023 07:13:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 951297EDD for ; Wed, 12 Apr 2023 04:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297897; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9CBK3NJsr4QCz/I6uuja9hdI3lMYCI5uL7GJdgN0VpY=; b=hyUkjUyP7P1ztaz5brIR1f9st3yJqFDaPpH6Gd4gtJJdmFw2GCrM5fAbbahcvBDZEYM5yy 3D91ujOMjMMtTWlirAKEwCaUe6shs8qIacKkX5aZSFToWT2QEBBHut1Q2b6VngHZX6yHa3 wlwwiomojc3mqiOKBSFn9XQEqqh2tSk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-484-3_Iz8DflOROLRkTpkeTA5A-1; Wed, 12 Apr 2023 07:11:33 -0400 X-MC-Unique: 3_Iz8DflOROLRkTpkeTA5A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3A59388B7A8; Wed, 12 Apr 2023 11:11:33 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 978F9C15BB8; Wed, 12 Apr 2023 11:11:29 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 22/71] ceph: encode encrypted name in dentry release Date: Wed, 12 Apr 2023 19:08:41 +0800 Message-Id: <20230412110930.176835-23-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Encode encrypted dentry names when sending a dentry release request. Also add a more helpful comment over ceph_encode_dentry_release. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 32 ++++++++++++++++++++++++++++---- fs/ceph/mds_client.c | 20 ++++++++++++++++---- 2 files changed, 44 insertions(+), 8 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 3cca23394769..7fb1003bdc7e 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4635,6 +4635,18 @@ int ceph_encode_inode_release(void **p, struct inode *inode, return ret; } +/** + * ceph_encode_dentry_release - encode a dentry release into an outgoing request + * @p: outgoing request buffer + * @dentry: dentry to release + * @dir: dir to release it from + * @mds: mds that we're speaking to + * @drop: caps being dropped + * @unless: unless we have these caps + * + * Encode a dentry release into an outgoing request buffer. Returns 1 if the + * thing was released, or a negative error code otherwise. + */ int ceph_encode_dentry_release(void **p, struct dentry *dentry, struct inode *dir, int mds, int drop, int unless) @@ -4667,13 +4679,25 @@ int ceph_encode_dentry_release(void **p, struct dentry *dentry, if (ret && di->lease_session && di->lease_session->s_mds == mds) { dout("encode_dentry_release %p mds%d seq %d\n", dentry, mds, (int)di->lease_seq); - rel->dname_len = cpu_to_le32(dentry->d_name.len); - memcpy(*p, dentry->d_name.name, dentry->d_name.len); - *p += dentry->d_name.len; rel->dname_seq = cpu_to_le32(di->lease_seq); __ceph_mdsc_drop_dentry_lease(dentry); + spin_unlock(&dentry->d_lock); + if (IS_ENCRYPTED(dir) && fscrypt_has_encryption_key(dir)) { + int ret2 = ceph_encode_encrypted_fname(dir, dentry, *p); + + if (ret2 < 0) + return ret2; + + rel->dname_len = cpu_to_le32(ret2); + *p += ret2; + } else { + rel->dname_len = cpu_to_le32(dentry->d_name.len); + memcpy(*p, dentry->d_name.name, dentry->d_name.len); + *p += dentry->d_name.len; + } + } else { + spin_unlock(&dentry->d_lock); } - spin_unlock(&dentry->d_lock); return ret; } diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 579faa2c76a1..1296ee127f3c 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2903,15 +2903,23 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, req->r_inode ? req->r_inode : d_inode(req->r_dentry), mds, req->r_inode_drop, req->r_inode_unless, req->r_op == CEPH_MDS_OP_READDIR); - if (req->r_dentry_drop) - releases += ceph_encode_dentry_release(&p, req->r_dentry, + if (req->r_dentry_drop) { + ret = ceph_encode_dentry_release(&p, req->r_dentry, req->r_parent, mds, req->r_dentry_drop, req->r_dentry_unless); - if (req->r_old_dentry_drop) - releases += ceph_encode_dentry_release(&p, req->r_old_dentry, + if (ret < 0) + goto out_err; + releases += ret; + } + if (req->r_old_dentry_drop) { + ret = ceph_encode_dentry_release(&p, req->r_old_dentry, req->r_old_dentry_dir, mds, req->r_old_dentry_drop, req->r_old_dentry_unless); + if (ret < 0) + goto out_err; + releases += ret; + } if (req->r_old_inode_drop) releases += ceph_encode_inode_release(&p, d_inode(req->r_old_dentry), @@ -2953,6 +2961,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, ceph_mdsc_free_path((char *)path1, pathlen1); out: return msg; +out_err: + ceph_msg_put(msg); + msg = ERR_PTR(ret); + goto out_free2; } /* From patchwork Wed Apr 12 11:08:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AAC6C77B6E for ; Wed, 12 Apr 2023 11:14:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229863AbjDLLOZ (ORCPT ); Wed, 12 Apr 2023 07:14:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229928AbjDLLOV (ORCPT ); Wed, 12 Apr 2023 07:14:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E52E36A5D for ; Wed, 12 Apr 2023 04:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297901; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kK7byRy8KW8dSWT6M3MaOK6w0U2My2Ytl28gWkfO0Mk=; b=QyvAeMa1tkRJ+R8qXzb3F3+XNgY1LaYPF7vIKpNKPGU+FpxFW5VdP1XG3IdimVxMJ2nvM8 v7qgm5sAxmOpXBIb5Hd9f/uaVxf2RdZCx27AMmi5E4g7y1BPork5MdMZWMyXwCR6wY18HU EJSkTiLAYZbNw9nrWuga8/Am+kBHjH0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-453-Qx_Pqy08Mv-L-8ZqXk-Axw-1; Wed, 12 Apr 2023 07:11:38 -0400 X-MC-Unique: Qx_Pqy08Mv-L-8ZqXk-Axw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E2E403C0F362; Wed, 12 Apr 2023 11:11:37 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 05677C15BB8; Wed, 12 Apr 2023 11:11:33 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 23/71] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Date: Wed, 12 Apr 2023 19:08:42 +0800 Message-Id: <20230412110930.176835-24-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton This is required so that we know to invalidate these dentries when the directory is unlocked. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index ccbbd39a5eb9..f5e04f63c8e6 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -760,6 +760,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry, if (dentry->d_name.len > NAME_MAX) return ERR_PTR(-ENAMETOOLONG); + if (IS_ENCRYPTED(dir)) { + err = __fscrypt_prepare_readdir(dir); + if (err) + return ERR_PTR(err); + if (!fscrypt_has_encryption_key(dir)) { + spin_lock(&dentry->d_lock); + dentry->d_flags |= DCACHE_NOKEY_NAME; + spin_unlock(&dentry->d_lock); + } + } + /* can we conclude ENOENT locally? */ if (d_really_is_negative(dentry)) { struct ceph_inode_info *ci = ceph_inode(dir); From patchwork Wed Apr 12 11:08:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208816 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3FF4C77B6E for ; Wed, 12 Apr 2023 11:13:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230039AbjDLLNi (ORCPT ); Wed, 12 Apr 2023 07:13:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230034AbjDLLNe (ORCPT ); Wed, 12 Apr 2023 07:13:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAF4F7EE1 for ; Wed, 12 Apr 2023 04:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297906; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FOeK7G/UA0IGFaRqADqcTHDnh6atk2Z1+7QovQPsKBw=; b=bswjzjPffRZpzCLy6WRG18BlIW4Fhv+u2j5uO5gTNGg5Mk3gABrapcre+RVNUgJkjbfvtU ZbKH6ur1hhul0djpwQwbUyi1A44npRwzWF72SfnySgYF7J3SBNzMtx/0wIt9EKJpjvyAAZ bfa1ZPp0YYBmOyB3j3I0EntTTov6nfA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-298-GAcWooisMNiMaHtAhQqL5Q-1; Wed, 12 Apr 2023 07:11:43 -0400 X-MC-Unique: GAcWooisMNiMaHtAhQqL5Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C6D513C0F37F; Wed, 12 Apr 2023 11:11:42 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id E486FC15BB8; Wed, 12 Apr 2023 11:11:38 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 24/71] ceph: set DCACHE_NOKEY_NAME in atomic open Date: Wed, 12 Apr 2023 19:08:43 +0800 Message-Id: <20230412110930.176835-25-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Atomic open can act as a lookup if handed a dentry that is negative on the MDS. Ensure that we set DCACHE_NOKEY_NAME on the dentry in atomic_open, if we don't have the key for the parent. Otherwise, we can end up validating the dentry inappropriately if someone later adds a key. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index e4dadad50f9b..1d63537be61c 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -784,6 +784,13 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, req->r_args.open.mask = cpu_to_le32(mask); req->r_parent = dir; ihold(dir); + if (IS_ENCRYPTED(dir)) { + if (!fscrypt_has_encryption_key(dir)) { + spin_lock(&dentry->d_lock); + dentry->d_flags |= DCACHE_NOKEY_NAME; + spin_unlock(&dentry->d_lock); + } + } if (flags & O_CREAT) { struct ceph_file_layout lo; From patchwork Wed Apr 12 11:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B97A4C77B6E for ; Wed, 12 Apr 2023 11:13:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbjDLLNo (ORCPT ); Wed, 12 Apr 2023 07:13:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229810AbjDLLNm (ORCPT ); Wed, 12 Apr 2023 07:13:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ABD97D8B for ; Wed, 12 Apr 2023 04:12:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t8OjJkEljpkYorAeFRAa2jBe4Al6iSVdzCinyxqzKEU=; b=Pi+tg0AHsTmVO2q2mgjgx91/Q8vd3aGqtG8u39zlO7RiB0i6BWoB2JIcJdlYZ+0i3LHQii chkb0Vn58eGlGphCRSz68c3F2d7waTTkdm799377O4Joj+Ho0Ggiun2MOu/U0aU22zVQcG uIBMp/EjKA2eiQhljgKZ1QR7+C3RcIM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-542-LD9cAJw_NeW8PztDEz--HA-1; Wed, 12 Apr 2023 07:11:47 -0400 X-MC-Unique: LD9cAJw_NeW8PztDEz--HA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5A8B9800B23; Wed, 12 Apr 2023 11:11:47 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8EC10C15BB8; Wed, 12 Apr 2023 11:11:43 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 25/71] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Date: Wed, 12 Apr 2023 19:08:44 +0800 Message-Id: <20230412110930.176835-26-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton If we have a dentry which represents a no-key name, then we need to test whether the parent directory's encryption key has since been added. Do that before we test anything else about the dentry. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index f5e04f63c8e6..ffdeccb699ae 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) struct inode *dir, *inode; struct ceph_mds_client *mdsc; + valid = fscrypt_d_revalidate(dentry, flags); + if (valid <= 0) + return valid; + if (flags & LOOKUP_RCU) { parent = READ_ONCE(dentry->d_parent); dir = d_inode_rcu(parent); @@ -1782,8 +1786,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) inode = d_inode(dentry); } - dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry, - dentry, inode, ceph_dentry(dentry)->offset); + dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry, + dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME)); mdsc = ceph_sb_to_client(dir->i_sb)->mdsc; From patchwork Wed Apr 12 11:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864E1C77B6E for ; Wed, 12 Apr 2023 11:13:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229971AbjDLLNr (ORCPT ); Wed, 12 Apr 2023 07:13:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229810AbjDLLNp (ORCPT ); Wed, 12 Apr 2023 07:13:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 445596E9A for ; Wed, 12 Apr 2023 04:12:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VahqIYWbeVWTl/tnPE8MOrQgib481Em0csjEISf1KFk=; b=W9nbsSKUQPbyi2XIczKC0/MIztV5CSQc7hf8Sm1feEMrpsBwCp3f+4CPPOHhm5wKLlsxbN F3nPigRd2Il5Pu75rolqQja22PhQ58xu3rX+FhbAlVYozxTTthchzcaSF4o1vyvlGdvY/v jub9RyRksy/nsPv1I1w89NHGqih0828= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-jIekjrFpNzGE_r-Qz2a93g-1; Wed, 12 Apr 2023 07:11:52 -0400 X-MC-Unique: jIekjrFpNzGE_r-Qz2a93g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C41C7858F09; Wed, 12 Apr 2023 11:11:51 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 231E4C15BB8; Wed, 12 Apr 2023 11:11:47 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 26/71] ceph: add helpers for converting names for userland presentation Date: Wed, 12 Apr 2023 19:08:45 +0800 Message-Id: <20230412110930.176835-27-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Define a new ceph_fname struct that we can use to carry information about encrypted dentry names. Add helpers for working with these objects, including ceph_fname_to_usr which formats an encrypted filename for userland presentation. [ xiubli: s/FSCRYPT_BASE64URL_CHARS/CEPH_BASE64_CHARS/ reported by kernel test robot ] Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/ceph/crypto.h | 41 ++++++++++++++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index e54ea4c3e478..03f4b9e73884 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -234,3 +234,79 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr dout("base64-encoded ciphertext name = %.*s\n", elen, buf); return elen; } + +/** + * ceph_fname_to_usr - convert a filename for userland presentation + * @fname: ceph_fname to be converted + * @tname: temporary name buffer to use for conversion (may be NULL) + * @oname: where converted name should be placed + * @is_nokey: set to true if key wasn't available during conversion (may be NULL) + * + * Given a filename (usually from the MDS), format it for presentation to + * userland. If @parent is not encrypted, just pass it back as-is. + * + * Otherwise, base64 decode the string, and then ask fscrypt to format it + * for userland presentation. + * + * Returns 0 on success or negative error code on error. + */ +int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, + struct fscrypt_str *oname, bool *is_nokey) +{ + int ret; + struct fscrypt_str _tname = FSTR_INIT(NULL, 0); + struct fscrypt_str iname; + + if (!IS_ENCRYPTED(fname->dir)) { + oname->name = fname->name; + oname->len = fname->name_len; + return 0; + } + + /* Sanity check that the resulting name will fit in the buffer */ + if (fname->name_len > CEPH_BASE64_CHARS(NAME_MAX)) + return -EIO; + + ret = __fscrypt_prepare_readdir(fname->dir); + if (ret) + return ret; + + /* + * Use the raw dentry name as sent by the MDS instead of + * generating a nokey name via fscrypt. + */ + if (!fscrypt_has_encryption_key(fname->dir)) { + memcpy(oname->name, fname->name, fname->name_len); + oname->len = fname->name_len; + if (is_nokey) + *is_nokey = true; + return 0; + } + + if (fname->ctext_len == 0) { + int declen; + + if (!tname) { + ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname); + if (ret) + return ret; + tname = &_tname; + } + + declen = fscrypt_base64url_decode(fname->name, fname->name_len, tname->name); + if (declen <= 0) { + ret = -EIO; + goto out; + } + iname.name = tname->name; + iname.len = declen; + } else { + iname.name = fname->ctext; + iname.len = fname->ctext_len; + } + + ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname); +out: + fscrypt_fname_free_buffer(&_tname); + return ret; +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 8abf10604722..f231d6e3ba0f 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -13,6 +13,14 @@ struct ceph_fs_client; struct ceph_acl_sec_ctx; struct ceph_mds_request; +struct ceph_fname { + struct inode *dir; + char *name; // b64 encoded, possibly hashed + unsigned char *ctext; // binary crypttext (if any) + u32 name_len; // length of name buffer + u32 ctext_len; // length of crypttext +}; + struct ceph_fscrypt_auth { __le32 cfa_version; __le32 cfa_blob_len; @@ -69,6 +77,22 @@ int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf); +static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname) +{ + if (!IS_ENCRYPTED(parent)) + return 0; + return fscrypt_fname_alloc_buffer(NAME_MAX, fname); +} + +static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname) +{ + if (IS_ENCRYPTED(parent)) + fscrypt_fname_free_buffer(fname); +} + +int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, + struct fscrypt_str *oname, bool *is_nokey); + #else /* CONFIG_FS_ENCRYPTION */ static inline void ceph_fscrypt_set_ops(struct super_block *sb) @@ -97,6 +121,23 @@ static inline int ceph_encode_encrypted_fname(const struct inode *parent, { return -EOPNOTSUPP; } + +static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname) +{ + return 0; +} + +static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname) +{ +} + +static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, + struct fscrypt_str *oname, bool *is_nokey) +{ + oname->name = fname->name; + oname->len = fname->name_len; + return 0; +} #endif /* CONFIG_FS_ENCRYPTION */ #endif From patchwork Wed Apr 12 11:08:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A68C3C7619A for ; Wed, 12 Apr 2023 11:14:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229988AbjDLLOm (ORCPT ); Wed, 12 Apr 2023 07:14:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbjDLLOk (ORCPT ); Wed, 12 Apr 2023 07:14:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F7EF769F for ; Wed, 12 Apr 2023 04:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k/Xf0LdqODJjnbZB4Bt6QPfxbVf1qHAj0bbk8++uDaA=; b=Ns3AdYUkQFz5YK/wzwkB4EqBB8irYC5apQTnwJQ9/SYlrEf9Z/UpOy9URdp1ZBmSRLOl9y VPDpimq0hfSHTaLJ3YFIJapFhPSB9bt0kB5pKJLye+q7+nEX4s5QMYvUQBUOItG+fMgt+j ugzPGhXI2B38DJHzQIXFrrif0MFRvT0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-627-HPaQUZbkNbaHyYzkd3bnRg-1; Wed, 12 Apr 2023 07:11:57 -0400 X-MC-Unique: HPaQUZbkNbaHyYzkd3bnRg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5393D28237CA; Wed, 12 Apr 2023 11:11:57 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6098C15BB8; Wed, 12 Apr 2023 11:11:52 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 27/71] ceph: fix base64 encoded name's length check in ceph_fname_to_usr() Date: Wed, 12 Apr 2023 19:08:46 +0800 Message-Id: <20230412110930.176835-28-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li The fname->name is based64_encoded names and the max long shouldn't exceed the NAME_MAX. The FSCRYPT_BASE64URL_CHARS(NAME_MAX) will be 255 * 4 / 3. [ xiubli: s/FSCRYPT_BASE64URL_CHARS/CEPH_BASE64_CHARS/ reported by kernel test robot ] Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 03f4b9e73884..2222f4968f74 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -264,7 +264,7 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, } /* Sanity check that the resulting name will fit in the buffer */ - if (fname->name_len > CEPH_BASE64_CHARS(NAME_MAX)) + if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX) return -EIO; ret = __fscrypt_prepare_readdir(fname->dir); From patchwork Wed Apr 12 11:08:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 899E9C77B6E for ; Wed, 12 Apr 2023 11:14:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbjDLLOJ (ORCPT ); Wed, 12 Apr 2023 07:14:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbjDLLOG (ORCPT ); Wed, 12 Apr 2023 07:14:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3610376A9 for ; Wed, 12 Apr 2023 04:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297927; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WMQ3u7g01qDSths6JY2KCp6r8JttFjC6BKuIl5Zkfs0=; b=CpPAoSdCWULz+gA7VaHynsxDmPyU/Yox65/rjoefB0MQ1gVLDZT5IlwvHYq2rtaGtg6+Sm H5LzupVI0tpRVB+evjrw3Sg9y73kJouwIP9+k5MyaHhTbIvzNou+tYZOnLYuwalOguH/lV N7WhqcvQhCLpxoEOJWNFFNFmCWQ9rsU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-wpFICgTbPw-UtHzQjkULfQ-1; Wed, 12 Apr 2023 07:12:04 -0400 X-MC-Unique: wpFICgTbPw-UtHzQjkULfQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AAB0C857FBD; Wed, 12 Apr 2023 11:12:03 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53A6CC15BBA; Wed, 12 Apr 2023 11:11:57 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 28/71] ceph: add fscrypt support to ceph_fill_trace Date: Wed, 12 Apr 2023 19:08:47 +0800 Message-Id: <20230412110930.176835-29-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When we get a dentry in a trace, decrypt the name so we can properly instantiate the dentry. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 8b6b12a5a5e2..9a1d8ad9bf86 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1405,8 +1405,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME && test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) && !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) { + bool is_nokey = false; struct qstr dname; struct dentry *dn, *parent; + struct fscrypt_str oname = FSTR_INIT(NULL, 0); + struct ceph_fname fname = { .dir = dir, + .name = rinfo->dname, + .ctext = rinfo->altname, + .name_len = rinfo->dname_len, + .ctext_len = rinfo->altname_len }; BUG_ON(!rinfo->head->is_target); BUG_ON(req->r_dentry); @@ -1414,8 +1421,20 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) parent = d_find_any_alias(dir); BUG_ON(!parent); - dname.name = rinfo->dname; - dname.len = rinfo->dname_len; + err = ceph_fname_alloc_buffer(dir, &oname); + if (err < 0) { + dput(parent); + goto done; + } + + err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey); + if (err < 0) { + dput(parent); + ceph_fname_free_buffer(dir, &oname); + goto done; + } + dname.name = oname.name; + dname.len = oname.len; dname.hash = full_name_hash(parent, dname.name, dname.len); tvino.ino = le64_to_cpu(rinfo->targeti.in->ino); tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid); @@ -1430,9 +1449,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) dname.len, dname.name, dn); if (!dn) { dput(parent); + ceph_fname_free_buffer(dir, &oname); err = -ENOMEM; goto done; } + if (is_nokey) { + spin_lock(&dn->d_lock); + dn->d_flags |= DCACHE_NOKEY_NAME; + spin_unlock(&dn->d_lock); + } err = 0; } else if (d_really_is_positive(dn) && (ceph_ino(d_inode(dn)) != tvino.ino || @@ -1444,6 +1469,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) dput(dn); goto retry_lookup; } + ceph_fname_free_buffer(dir, &oname); req->r_dentry = dn; dput(parent); From patchwork Wed Apr 12 11:08:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208832 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5ACC7619A for ; Wed, 12 Apr 2023 11:14:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229759AbjDLLOX (ORCPT ); Wed, 12 Apr 2023 07:14:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229593AbjDLLOO (ORCPT ); Wed, 12 Apr 2023 07:14:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D5EB869A for ; Wed, 12 Apr 2023 04:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cxw3EeZh2k6HGqYnUqjsz7PrPx3elpaHNnXaSzZ2oXs=; b=KYpaFlq6cB/nlawD73k8QIWWE/4UFRO8GuTvFNzyRokyNjIm02C9Ox3EcqJgHM1QGWDvt2 PgclT7BN65OllOTl8rUJxGEutC3S7h176xtU6jfHJNqb5b9eXRqm9c9VA8tV0shNkLUxBv wootLxjBJ5/DTxUHL/Qc5xxOK60DxTM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-zSejOBTjNhign7fsyVGTSQ-1; Wed, 12 Apr 2023 07:12:08 -0400 X-MC-Unique: zSejOBTjNhign7fsyVGTSQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4C9A13813F58; Wed, 12 Apr 2023 11:12:08 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7A92FC15BB8; Wed, 12 Apr 2023 11:12:04 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 29/71] ceph: pass the request to parse_reply_info_readdir() Date: Wed, 12 Apr 2023 19:08:48 +0800 Message-Id: <20230412110930.176835-30-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Instead of passing just the r_reply_info to the readdir reply parser, pass the request pointer directly instead. This will facilitate implementing readdir on fscrypted directories. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/mds_client.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 1296ee127f3c..37e183d383b1 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -406,9 +406,10 @@ static int parse_reply_info_trace(void **p, void *end, * parse readdir results */ static int parse_reply_info_readdir(void **p, void *end, - struct ceph_mds_reply_info_parsed *info, - u64 features) + struct ceph_mds_request *req, + u64 features) { + struct ceph_mds_reply_info_parsed *info = &req->r_reply_info; u32 num, i = 0; int err; @@ -650,15 +651,16 @@ static int parse_reply_info_getvxattr(void **p, void *end, * parse extra results */ static int parse_reply_info_extra(void **p, void *end, - struct ceph_mds_reply_info_parsed *info, + struct ceph_mds_request *req, u64 features, struct ceph_mds_session *s) { + struct ceph_mds_reply_info_parsed *info = &req->r_reply_info; u32 op = le32_to_cpu(info->head->op); if (op == CEPH_MDS_OP_GETFILELOCK) return parse_reply_info_filelock(p, end, info, features); else if (op == CEPH_MDS_OP_READDIR || op == CEPH_MDS_OP_LSSNAP) - return parse_reply_info_readdir(p, end, info, features); + return parse_reply_info_readdir(p, end, req, features); else if (op == CEPH_MDS_OP_CREATE) return parse_reply_info_create(p, end, info, features, s); else if (op == CEPH_MDS_OP_GETVXATTR) @@ -671,9 +673,9 @@ static int parse_reply_info_extra(void **p, void *end, * parse entire mds reply */ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg, - struct ceph_mds_reply_info_parsed *info, - u64 features) + struct ceph_mds_request *req, u64 features) { + struct ceph_mds_reply_info_parsed *info = &req->r_reply_info; void *p, *end; u32 len; int err; @@ -695,7 +697,7 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg, ceph_decode_32_safe(&p, end, len, bad); if (len > 0) { ceph_decode_need(&p, end, len, bad); - err = parse_reply_info_extra(&p, p+len, info, features, s); + err = parse_reply_info_extra(&p, p+len, req, features, s); if (err < 0) goto out_bad; } @@ -3598,14 +3600,14 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) } dout("handle_reply tid %lld result %d\n", tid, result); - rinfo = &req->r_reply_info; if (test_bit(CEPHFS_FEATURE_REPLY_ENCODING, &session->s_features)) - err = parse_reply_info(session, msg, rinfo, (u64)-1); + err = parse_reply_info(session, msg, req, (u64)-1); else - err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features); + err = parse_reply_info(session, msg, req, session->s_con.peer_features); mutex_unlock(&mdsc->mutex); /* Must find target inode outside of mutexes to avoid deadlocks */ + rinfo = &req->r_reply_info; if ((err >= 0) && rinfo->head->is_target) { struct inode *in = xchg(&req->r_new_inode, NULL); struct ceph_vino tvino = { From patchwork Wed Apr 12 11:08:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 342A3C77B6E for ; Wed, 12 Apr 2023 11:14:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230012AbjDLLOW (ORCPT ); Wed, 12 Apr 2023 07:14:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230071AbjDLLOL (ORCPT ); Wed, 12 Apr 2023 07:14:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBA5D7EEA for ; Wed, 12 Apr 2023 04:13:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297934; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4PEAoAmsMrqjG/zsNSNYfaqUPJN5p349wjqQCgnwnWA=; b=W1WrDJY4wlMMDBmAp7CJKUcNSvSsvcQ2HuzxirPF86gkONVrlV2VrZFkyHfPz5eUIoawn8 KyG9c1k8FiKflVwDrEgkVWJh8B+MzutG6G9EPITt160GBEG6A6tzH1ZtfGaTE25pkNQQ+W lkiyMkK3Crwyk3wJjrymyggWQQCghrE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-152-TTpAk4MVMxKVHjuao8nP3Q-1; Wed, 12 Apr 2023 07:12:13 -0400 X-MC-Unique: TTpAk4MVMxKVHjuao8nP3Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C1D94811E7C; Wed, 12 Apr 2023 11:12:12 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 259E7C15BC0; Wed, 12 Apr 2023 11:12:08 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 30/71] ceph: add ceph_encode_encrypted_dname() helper Date: Wed, 12 Apr 2023 19:08:49 +0800 Message-Id: <20230412110930.176835-31-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Add a new helper that basically calls ceph_encode_encrypted_fname, but with a qstr pointer instead of a dentry pointer. This will make it simpler to decrypt names in a readdir reply, before we have a dentry. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 11 ++++++++--- fs/ceph/crypto.h | 8 ++++++++ 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 2222f4968f74..ee0e0db2cf0e 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -188,7 +188,7 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se swap(req->r_fscrypt_auth, as->fscrypt_auth); } -int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) +int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf) { u32 len; int elen; @@ -203,7 +203,7 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr * * See: fscrypt_setup_filename */ - if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len)) + if (!fscrypt_fname_encrypted_size(parent, d_name->len, NAME_MAX, &len)) return -ENAMETOOLONG; /* Allocate a buffer appropriate to hold the result */ @@ -211,7 +211,7 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr if (!cryptbuf) return -ENOMEM; - ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len); + ret = fscrypt_fname_encrypt(parent, d_name, cryptbuf, len); if (ret) { kfree(cryptbuf); return ret; @@ -235,6 +235,11 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr return elen; } +int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) +{ + return ceph_encode_encrypted_dname(parent, &dentry->d_name, buf); +} + /** * ceph_fname_to_usr - convert a filename for userland presentation * @fname: ceph_fname to be converted diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index f231d6e3ba0f..704e5a0e96f7 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -75,6 +75,7 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc); int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, struct ceph_acl_sec_ctx *as); void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); +int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf); int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf); static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname) @@ -116,6 +117,13 @@ static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, { } +static inline int ceph_encode_encrypted_dname(const struct inode *parent, + struct qstr *d_name, char *buf) +{ + memcpy(buf, d_name->name, d_name->len); + return d_name->len; +} + static inline int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) { From patchwork Wed Apr 12 11:08:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4071FC77B6E for ; Wed, 12 Apr 2023 11:14:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229783AbjDLLOd (ORCPT ); Wed, 12 Apr 2023 07:14:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbjDLLOc (ORCPT ); Wed, 12 Apr 2023 07:14:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0A524231 for ; Wed, 12 Apr 2023 04:13:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297940; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AmRUUZE7ZbZE1kkVUfNHBnD8fTU9MNmdK4jJHhdSMpI=; b=YZTipY3oLVQ7gnqGqzUrg/ytgCa4YBiottwlUrjs7VavrHDQzouBsE/oMS6ccddXxaVnYR TJkp1LEWXyo+QQJ9yRSEUKHJJwfdfR99N/K73XHzQPfPRdoqCHEbLVE7VOwgFtzGIHNJIO XKlAki7uWFVJW+yKPEw7CLVLk19k5lk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-195-LkmVlXqxPYG6s0CL3W0gFQ-1; Wed, 12 Apr 2023 07:12:19 -0400 X-MC-Unique: LkmVlXqxPYG6s0CL3W0gFQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6B65428237C7; Wed, 12 Apr 2023 11:12:18 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id AE181C15BB8; Wed, 12 Apr 2023 11:12:13 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 31/71] ceph: add support to readdir for encrypted filenames Date: Wed, 12 Apr 2023 19:08:50 +0800 Message-Id: <20230412110930.176835-32-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Once we've decrypted the names in a readdir reply, we no longer need the crypttext, so overwrite them in ceph_mds_reply_dir_entry with the unencrypted names. Then in both ceph_readdir_prepopulate() and ceph_readdir() we will use the dencrypted name directly. [ jlayton: convert some BUG_ONs into error returns ] Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 12 +++++-- fs/ceph/crypto.h | 1 + fs/ceph/dir.c | 35 +++++++++++++++---- fs/ceph/inode.c | 12 ++++--- fs/ceph/mds_client.c | 81 ++++++++++++++++++++++++++++++++++++++++---- fs/ceph/mds_client.h | 4 +-- 6 files changed, 124 insertions(+), 21 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index ee0e0db2cf0e..1803ec7a69a9 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -195,7 +195,10 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, int ret; u8 *cryptbuf; - WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)); + if (!fscrypt_has_encryption_key(parent)) { + memcpy(buf, d_name->name, d_name->len); + return d_name->len; + } /* * Convert cleartext d_name to ciphertext. If result is longer than @@ -237,6 +240,8 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) { + WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)); + return ceph_encode_encrypted_dname(parent, &dentry->d_name, buf); } @@ -281,7 +286,10 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, * generating a nokey name via fscrypt. */ if (!fscrypt_has_encryption_key(fname->dir)) { - memcpy(oname->name, fname->name, fname->name_len); + if (fname->no_copy) + oname->name = fname->name; + else + memcpy(oname->name, fname->name, fname->name_len); oname->len = fname->name_len; if (is_nokey) *is_nokey = true; diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 704e5a0e96f7..05db33f1a421 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -19,6 +19,7 @@ struct ceph_fname { unsigned char *ctext; // binary crypttext (if any) u32 name_len; // length of name buffer u32 ctext_len; // length of crypttext + bool no_copy; }; struct ceph_fscrypt_auth { diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index ffdeccb699ae..b3add6cbc8dd 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -9,6 +9,7 @@ #include "super.h" #include "mds_client.h" +#include "crypto.h" /* * Directory operations: readdir, lookup, create, link, unlink, @@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file, struct dir_context *ctx, di = ceph_dentry(dentry); if (d_unhashed(dentry) || d_really_is_negative(dentry) || - di->lease_shared_gen != shared_gen) { + di->lease_shared_gen != shared_gen || + ((dentry->d_flags & DCACHE_NOKEY_NAME) && + fscrypt_has_encryption_key(dir))) { spin_unlock(&dentry->d_lock); dput(dentry); err = -EAGAIN; @@ -340,6 +343,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) ctx->pos = 2; } + err = fscrypt_prepare_readdir(inode); + if (err) + return err; + spin_lock(&ci->i_ceph_lock); /* request Fx cap. if have Fx, we don't need to release Fs cap * for later create/unlink. */ @@ -389,6 +396,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); if (IS_ERR(req)) return PTR_ERR(req); + err = ceph_alloc_readdir_reply_buffer(req, inode); if (err) { ceph_mdsc_put_request(req); @@ -402,11 +410,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) req->r_inode_drop = CEPH_CAP_FILE_EXCL; } if (dfi->last_name) { - req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL); + struct qstr d_name = { .name = dfi->last_name, + .len = strlen(dfi->last_name) }; + + req->r_path2 = kzalloc(NAME_MAX + 1, GFP_KERNEL); if (!req->r_path2) { ceph_mdsc_put_request(req); return -ENOMEM; } + + err = ceph_encode_encrypted_dname(inode, &d_name, req->r_path2); + if (err < 0) { + ceph_mdsc_put_request(req); + return err; + } } else if (is_hash_order(ctx->pos)) { req->r_args.readdir.offset_hash = cpu_to_le32(fpos_hash(ctx->pos)); @@ -511,15 +528,20 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) for (; i < rinfo->dir_nr; i++) { struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i; - BUG_ON(rde->offset < ctx->pos); + if (rde->offset < ctx->pos) { + pr_warn("%s: rde->offset 0x%llx ctx->pos 0x%llx\n", + __func__, rde->offset, ctx->pos); + return -EIO; + } + + if (WARN_ON_ONCE(!rde->inode.in)) + return -EIO; ctx->pos = rde->offset; dout("readdir (%d/%d) -> %llx '%.*s' %p\n", i, rinfo->dir_nr, ctx->pos, rde->name_len, rde->name, &rde->inode.in); - BUG_ON(!rde->inode.in); - if (!dir_emit(ctx, rde->name, rde->name_len, ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)), le32_to_cpu(rde->inode.in->mode) >> 12)) { @@ -532,6 +554,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) dout("filldir stopping us...\n"); return 0; } + + /* Reset the lengths to their original allocated vals */ ctx->pos++; } @@ -586,7 +610,6 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) dfi->dir_ordered_count); spin_unlock(&ci->i_ceph_lock); } - dout("readdir %p file %p done.\n", inode, file); return 0; } diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 9a1d8ad9bf86..85507b057a64 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1751,7 +1751,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, struct ceph_mds_session *session) { struct dentry *parent = req->r_dentry; - struct ceph_inode_info *ci = ceph_inode(d_inode(parent)); + struct inode *inode = d_inode(parent); + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info; struct qstr dname; struct dentry *dn; @@ -1825,9 +1826,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, tvino.snap = le64_to_cpu(rde->inode.in->snapid); if (rinfo->hash_order) { - u32 hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash, - rde->name, rde->name_len); - hash = ceph_frag_value(hash); + u32 hash = ceph_frag_value(rde->raw_hash); if (hash != last_hash) fpos_offset = 2; last_hash = hash; @@ -1850,6 +1849,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, err = -ENOMEM; goto out; } + if (rde->is_nokey) { + spin_lock(&dn->d_lock); + dn->d_flags |= DCACHE_NOKEY_NAME; + spin_unlock(&dn->d_lock); + } } else if (d_really_is_positive(dn) && (ceph_ino(d_inode(dn)) != tvino.ino || ceph_snap(d_inode(dn)) != tvino.snap)) { diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 37e183d383b1..cc5342a7f40e 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -439,20 +439,87 @@ static int parse_reply_info_readdir(void **p, void *end, info->dir_nr = num; while (num) { + struct inode *inode = d_inode(req->r_dentry); + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i; + struct fscrypt_str tname = FSTR_INIT(NULL, 0); + struct fscrypt_str oname = FSTR_INIT(NULL, 0); + struct ceph_fname fname; + u32 altname_len, _name_len; + u8 *altname, *_name; + /* dentry */ - ceph_decode_32_safe(p, end, rde->name_len, bad); - ceph_decode_need(p, end, rde->name_len, bad); - rde->name = *p; - *p += rde->name_len; - dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name); + ceph_decode_32_safe(p, end, _name_len, bad); + ceph_decode_need(p, end, _name_len, bad); + _name = *p; + *p += _name_len; + dout("parsed dir dname '%.*s'\n", _name_len, _name); + + if (info->hash_order) + rde->raw_hash = ceph_str_hash(ci->i_dir_layout.dl_dir_hash, + _name, _name_len); /* dentry lease */ err = parse_reply_info_lease(p, end, &rde->lease, features, - &rde->altname_len, &rde->altname); + &altname_len, &altname); if (err) goto out_bad; + /* + * Try to dencrypt the dentry names and update them + * in the ceph_mds_reply_dir_entry struct. + */ + fname.dir = inode; + fname.name = _name; + fname.name_len = _name_len; + fname.ctext = altname; + fname.ctext_len = altname_len; + /* + * The _name_len maybe larger than altname_len, such as + * when the human readable name length is in range of + * (CEPH_NOHASH_NAME_MAX, CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE), + * then the copy in ceph_fname_to_usr will corrupt the + * data if there has no encryption key. + * + * Just set the no_copy flag and then if there has no + * encryption key the oname.name will be assigned to + * _name always. + */ + fname.no_copy = true; + if (altname_len == 0) { + /* + * Set tname to _name, and this will be used + * to do the base64_decode in-place. It's + * safe because the decoded string should + * always be shorter, which is 3/4 of origin + * string. + */ + tname.name = _name; + + /* + * Set oname to _name too, and this will be + * used to do the dencryption in-place. + */ + oname.name = _name; + oname.len = _name_len; + } else { + /* + * This will do the decryption only in-place + * from altname cryptext directly. + */ + oname.name = altname; + oname.len = altname_len; + } + rde->is_nokey = false; + err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey); + if (err) { + pr_err("%s unable to decode %.*s, got %d\n", __func__, + _name_len, _name, err); + goto out_bad; + } + rde->name = oname.name; + rde->name_len = oname.len; + /* inode */ err = parse_reply_info_in(p, end, &rde->inode, features); if (err < 0) @@ -3666,7 +3733,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) if (err == 0) { if (result == 0 && (req->r_op == CEPH_MDS_OP_READDIR || req->r_op == CEPH_MDS_OP_LSSNAP)) - ceph_readdir_prepopulate(req, req->r_session); + err = ceph_readdir_prepopulate(req, req->r_session); } current->journal_info = NULL; mutex_unlock(&req->r_fill_mutex); diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index e7af8090b760..728b7d72bf76 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -96,10 +96,10 @@ struct ceph_mds_reply_info_in { }; struct ceph_mds_reply_dir_entry { + bool is_nokey; char *name; - u8 *altname; u32 name_len; - u32 altname_len; + u32 raw_hash; struct ceph_mds_reply_lease *lease; struct ceph_mds_reply_info_in inode; loff_t offset; From patchwork Wed Apr 12 11:08:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1803C7619A for ; Wed, 12 Apr 2023 11:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230295AbjDLLSW (ORCPT ); Wed, 12 Apr 2023 07:18:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230176AbjDLLSL (ORCPT ); Wed, 12 Apr 2023 07:18:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 423C97D87 for ; Wed, 12 Apr 2023 04:17:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298160; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZAUwhOl1JeoVeE5cTt/xFQqlLbZZqncCkROe459a9Oc=; b=euKeeL/QOWpTfVDCYj6XRjCPp6/AWEG+qpYjPGDptkPptb/3QnCG59GkyDuLi5HyEphJgJ KeUSJdxogNZtAOOsdw8y4UgZWyd+40Zk+xEPEkIrsMpCkamsltdieCS9jjSuOx8Kyzx/y9 jz3e+Vm16NSNE/F/tbBvK5iWcUaNt3s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-487-kuZgHUnuMwKlGymd8o3Flg-1; Wed, 12 Apr 2023 07:12:23 -0400 X-MC-Unique: kuZgHUnuMwKlGymd8o3Flg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8927588B7A0; Wed, 12 Apr 2023 11:12:23 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C58AC15BB8; Wed, 12 Apr 2023 11:12:19 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 32/71] ceph: create symlinks with encrypted and base64-encoded targets Date: Wed, 12 Apr 2023 19:08:51 +0800 Message-Id: <20230412110930.176835-33-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When creating symlinks in encrypted directories, encrypt and base64-encode the target with the new inode's key before sending to the MDS. When filling a symlinked inode, base64-decode it into a buffer that we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer into a new one that will hang off i_link. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 2 +- fs/ceph/dir.c | 51 +++++++++++++++++++--- fs/ceph/inode.c | 107 +++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 142 insertions(+), 18 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 1803ec7a69a9..5b807f8f4c69 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -306,7 +306,7 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, tname = &_tname; } - declen = fscrypt_base64url_decode(fname->name, fname->name_len, tname->name); + declen = ceph_base64_decode(fname->name, fname->name_len, tname->name); if (declen <= 0) { ret = -EIO; goto out; diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index b3add6cbc8dd..3784fdd77e3e 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -946,6 +946,40 @@ static int ceph_create(struct mnt_idmap *idmap, struct inode *dir, return ceph_mknod(idmap, dir, dentry, mode, 0); } +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest) +{ + int err; + int len = strlen(dest); + struct fscrypt_str osd_link = FSTR_INIT(NULL, 0); + + err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link); + if (err) + goto out; + + err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link); + if (err) + goto out; + + req->r_path2 = kmalloc(CEPH_BASE64_CHARS(osd_link.len) + 1, GFP_KERNEL); + if (!req->r_path2) { + err = -ENOMEM; + goto out; + } + + len = ceph_base64_encode(osd_link.name, osd_link.len, req->r_path2); + req->r_path2[len] = '\0'; +out: + fscrypt_fname_free_buffer(&osd_link); + return err; +} +#else +static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest) +{ + return -EOPNOTSUPP; +} +#endif + static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, const char *dest) { @@ -981,14 +1015,21 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir, goto out_req; } - req->r_path2 = kstrdup(dest, GFP_KERNEL); - if (!req->r_path2) { - err = -ENOMEM; - goto out_req; - } req->r_parent = dir; ihold(dir); + if (IS_ENCRYPTED(req->r_new_inode)) { + err = prep_encrypted_symlink_target(req, dest); + if (err) + goto out_req; + } else { + req->r_path2 = kstrdup(dest, GFP_KERNEL); + if (!req->r_path2) { + err = -ENOMEM; + goto out_req; + } + } + set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags); req->r_dentry = dget(dentry); req->r_num_caps = 2; diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 85507b057a64..17de62acb5d4 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -35,6 +35,7 @@ */ static const struct inode_operations ceph_symlink_iops; +static const struct inode_operations ceph_encrypted_symlink_iops; static void ceph_inode_work(struct work_struct *work); @@ -639,6 +640,7 @@ void ceph_free_inode(struct inode *inode) #ifdef CONFIG_FS_ENCRYPTION kfree(ci->fscrypt_auth); #endif + fscrypt_free_inode(inode); kmem_cache_free(ceph_inode_cachep, ci); } @@ -836,6 +838,34 @@ void ceph_fill_file_time(struct inode *inode, int issued, inode, time_warp_seq, ci->i_time_warp_seq); } +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym) +{ + int declen; + u8 *sym; + + sym = kmalloc(enclen + 1, GFP_NOFS); + if (!sym) + return -ENOMEM; + + declen = ceph_base64_decode(encsym, enclen, sym); + if (declen < 0) { + pr_err("%s: can't decode symlink (%d). Content: %.*s\n", + __func__, declen, enclen, encsym); + kfree(sym); + return -EIO; + } + sym[declen + 1] = '\0'; + *decsym = sym; + return declen; +} +#else +static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym) +{ + return -EOPNOTSUPP; +} +#endif + /* * Populate an inode based on info from mds. May be called on new or * existing inodes. @@ -1070,26 +1100,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, inode->i_fop = &ceph_file_fops; break; case S_IFLNK: - inode->i_op = &ceph_symlink_iops; if (!ci->i_symlink) { u32 symlen = iinfo->symlink_len; char *sym; spin_unlock(&ci->i_ceph_lock); - if (symlen != i_size_read(inode)) { - pr_err("%s %llx.%llx BAD symlink " - "size %lld\n", __func__, - ceph_vinop(inode), - i_size_read(inode)); + if (IS_ENCRYPTED(inode)) { + if (symlen != i_size_read(inode)) + pr_err("%s %llx.%llx BAD symlink size %lld\n", + __func__, ceph_vinop(inode), i_size_read(inode)); + + err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym); + if (err < 0) { + pr_err("%s decoding encrypted symlink failed: %d\n", + __func__, err); + goto out; + } + symlen = err; i_size_write(inode, symlen); inode->i_blocks = calc_inode_blocks(symlen); - } + } else { + if (symlen != i_size_read(inode)) { + pr_err("%s %llx.%llx BAD symlink size %lld\n", + __func__, ceph_vinop(inode), i_size_read(inode)); + i_size_write(inode, symlen); + inode->i_blocks = calc_inode_blocks(symlen); + } - err = -ENOMEM; - sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS); - if (!sym) - goto out; + err = -ENOMEM; + sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS); + if (!sym) + goto out; + } spin_lock(&ci->i_ceph_lock); if (!ci->i_symlink) @@ -1097,7 +1140,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, else kfree(sym); /* lost a race */ } - inode->i_link = ci->i_symlink; + + if (IS_ENCRYPTED(inode)) { + /* + * Encrypted symlinks need to be decrypted before we can + * cache their targets in i_link. Don't touch it here. + */ + inode->i_op = &ceph_encrypted_symlink_iops; + } else { + inode->i_link = ci->i_symlink; + inode->i_op = &ceph_symlink_iops; + } break; case S_IFDIR: inode->i_op = &ceph_dir_iops; @@ -2125,6 +2178,29 @@ static void ceph_inode_work(struct work_struct *work) iput(inode); } +static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode, + struct delayed_call *done) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + + if (!dentry) + return ERR_PTR(-ECHILD); + + return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done); +} + +static int ceph_encrypted_symlink_getattr(struct mnt_idmap *idmap, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + int ret; + + ret = ceph_getattr(idmap, path, stat, request_mask, query_flags); + if (ret) + return ret; + return fscrypt_symlink_getattr(path, stat); +} + /* * symlinks */ @@ -2135,6 +2211,13 @@ static const struct inode_operations ceph_symlink_iops = { .listxattr = ceph_listxattr, }; +static const struct inode_operations ceph_encrypted_symlink_iops = { + .get_link = ceph_encrypted_get_link, + .setattr = ceph_setattr, + .getattr = ceph_encrypted_symlink_getattr, + .listxattr = ceph_listxattr, +}; + int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia) { struct ceph_inode_info *ci = ceph_inode(inode); From patchwork Wed Apr 12 11:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78DE1C77B6E for ; Wed, 12 Apr 2023 11:15:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229834AbjDLLPj (ORCPT ); Wed, 12 Apr 2023 07:15:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbjDLLPi (ORCPT ); Wed, 12 Apr 2023 07:15:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BB2E7EEA for ; Wed, 12 Apr 2023 04:14:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297952; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gULHX1l2XVUL1zFbI7naJsmfFkwFjIyfe92rqjRdECE=; b=O3sOEcSeD312gRSi3QbemUjRyIEv+X4oDg2fjpCjmWK7h8R/bft6oebCssGiMnGtoe0Npp JDGFX4WshxFeuO4c01JevVvzhM20oYHOurKth/07pkX0q5wTVwmNrp/+DGkEvBOFeTEEaA 6i+0mnyg4p+W83lmokeL4QlhL4jVBNI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-223-WEb9QCHpODSOpUmxIGCtOw-1; Wed, 12 Apr 2023 07:12:29 -0400 X-MC-Unique: WEb9QCHpODSOpUmxIGCtOw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA0B93813F3E; Wed, 12 Apr 2023 11:12:28 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F609C15BB8; Wed, 12 Apr 2023 11:12:24 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 33/71] ceph: make ceph_get_name decrypt filenames Date: Wed, 12 Apr 2023 19:08:52 +0800 Message-Id: <20230412110930.176835-34-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When we do a lookupino to the MDS, we get a filename in the trace. ceph_get_name uses that name directly, so we must properly decrypt it before copying it to the name buffer. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/export.c | 44 ++++++++++++++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/fs/ceph/export.c b/fs/ceph/export.c index f780e4e0d062..8559990a59a5 100644 --- a/fs/ceph/export.c +++ b/fs/ceph/export.c @@ -7,6 +7,7 @@ #include "super.h" #include "mds_client.h" +#include "crypto.h" /* * Basic fh @@ -535,7 +536,9 @@ static int ceph_get_name(struct dentry *parent, char *name, { struct ceph_mds_client *mdsc; struct ceph_mds_request *req; + struct inode *dir = d_inode(parent); struct inode *inode = d_inode(child); + struct ceph_mds_reply_info_parsed *rinfo; int err; if (ceph_snap(inode) != CEPH_NOSNAP) @@ -547,30 +550,47 @@ static int ceph_get_name(struct dentry *parent, char *name, if (IS_ERR(req)) return PTR_ERR(req); - inode_lock(d_inode(parent)); - + inode_lock(dir); req->r_inode = inode; ihold(inode); req->r_ino2 = ceph_vino(d_inode(parent)); - req->r_parent = d_inode(parent); - ihold(req->r_parent); + req->r_parent = dir; + ihold(dir); set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags); req->r_num_caps = 2; err = ceph_mdsc_do_request(mdsc, NULL, req); + inode_unlock(dir); - inode_unlock(d_inode(parent)); + if (err) + goto out; - if (!err) { - struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info; + rinfo = &req->r_reply_info; + if (!IS_ENCRYPTED(dir)) { memcpy(name, rinfo->dname, rinfo->dname_len); name[rinfo->dname_len] = 0; - dout("get_name %p ino %llx.%llx name %s\n", - child, ceph_vinop(inode), name); } else { - dout("get_name %p ino %llx.%llx err %d\n", - child, ceph_vinop(inode), err); - } + struct fscrypt_str oname = FSTR_INIT(NULL, 0); + struct ceph_fname fname = { .dir = dir, + .name = rinfo->dname, + .ctext = rinfo->altname, + .name_len = rinfo->dname_len, + .ctext_len = rinfo->altname_len }; + + err = ceph_fname_alloc_buffer(dir, &oname); + if (err < 0) + goto out; + err = ceph_fname_to_usr(&fname, NULL, &oname, NULL); + if (!err) { + memcpy(name, oname.name, oname.len); + name[oname.len] = 0; + } + ceph_fname_free_buffer(dir, &oname); + } +out: + dout("get_name %p ino %llx.%llx err %d %s%s\n", + child, ceph_vinop(inode), err, + err ? "" : "name ", err ? "" : name); ceph_mdsc_put_request(req); return err; } From patchwork Wed Apr 12 11:08:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62981C7619A for ; Wed, 12 Apr 2023 11:18:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230199AbjDLLS1 (ORCPT ); Wed, 12 Apr 2023 07:18:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230254AbjDLLSM (ORCPT ); Wed, 12 Apr 2023 07:18:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DCFF8685 for ; Wed, 12 Apr 2023 04:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MlCH+rzxZabFiLMQk+HCe5kjPrXg0DThVZA8JtN41BA=; b=ITkkAJcITlCAKwPIcJn4hRMkRoMYcHuirS3/aJZJILRxW/RodJfmP9SjQ9KUmps3cGnVPY wWN4vh+0vQmipinPndbf2v7KbHY9raAz84ZZlDtQnOX2+BcRaAlGiZognq7Afg0uJmfwWs OIWiMBi1aMXeS+hSexzSggIPUt1CMCg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-zb6NmuslNeSYF322DxGU-w-1; Wed, 12 Apr 2023 07:12:34 -0400 X-MC-Unique: zb6NmuslNeSYF322DxGU-w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B6D421C08795; Wed, 12 Apr 2023 11:12:33 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E20AC15BB8; Wed, 12 Apr 2023 11:12:29 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 34/71] ceph: add a new ceph.fscrypt.auth vxattr Date: Wed, 12 Apr 2023 19:08:53 +0800 Message-Id: <20230412110930.176835-35-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Give the client a way to get at the xattr from userland, mostly for future debugging purposes. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/xattr.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index d2242d7b082e..69500ac96d43 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -352,6 +352,23 @@ static ssize_t ceph_vxattrcb_auth_mds(struct ceph_inode_info *ci, return ret; } +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +static bool ceph_vxattrcb_fscrypt_auth_exists(struct ceph_inode_info *ci) +{ + return ci->fscrypt_auth_len; +} + +static ssize_t ceph_vxattrcb_fscrypt_auth(struct ceph_inode_info *ci, char *val, size_t size) +{ + if (size) { + if (size < ci->fscrypt_auth_len) + return -ERANGE; + memcpy(val, ci->fscrypt_auth, ci->fscrypt_auth_len); + } + return ci->fscrypt_auth_len; +} +#endif /* CONFIG_FS_ENCRYPTION */ + #define CEPH_XATTR_NAME(_type, _name) XATTR_CEPH_PREFIX #_type "." #_name #define CEPH_XATTR_NAME2(_type, _name, _name2) \ XATTR_CEPH_PREFIX #_type "." #_name "." #_name2 @@ -500,6 +517,15 @@ static struct ceph_vxattr ceph_common_vxattrs[] = { .exists_cb = NULL, .flags = VXATTR_FLAG_READONLY, }, +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + { + .name = "ceph.fscrypt.auth", + .name_size = sizeof("ceph.fscrypt.auth"), + .getxattr_cb = ceph_vxattrcb_fscrypt_auth, + .exists_cb = ceph_vxattrcb_fscrypt_auth_exists, + .flags = VXATTR_FLAG_READONLY, + }, +#endif /* CONFIG_FS_ENCRYPTION */ { .name = NULL, 0 } /* Required table terminator */ }; From patchwork Wed Apr 12 11:08:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208860 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADE7DC7619A for ; Wed, 12 Apr 2023 11:18:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230157AbjDLLST (ORCPT ); Wed, 12 Apr 2023 07:18:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230173AbjDLLSL (ORCPT ); Wed, 12 Apr 2023 07:18:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 289C383F0 for ; Wed, 12 Apr 2023 04:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1TYz9inwhhU36C7LvFvFOJ7q/eac4TMUKJ8VImRAH90=; b=guMQc/7oOme/hFl5gbOzl3y7saDHjvvr+o003v6kisnbUjC8nCQhB9VPxmc9Qi9HT0kt1z htnneAZSoy8dYkLXB08bxoM9hOhfm9SQ8DeqsaQ3NIdsJ+rUPLafuOFwnp+KWhoejCn23+ vXxSE5kvI62T9NyvF0bP7y9mPCnL4NQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-606-Sn8PsZPbN_61vjBraqqsEQ-1; Wed, 12 Apr 2023 07:12:40 -0400 X-MC-Unique: Sn8PsZPbN_61vjBraqqsEQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 94BAD3813F43; Wed, 12 Apr 2023 11:12:39 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE373C15BB8; Wed, 12 Apr 2023 11:12:34 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 35/71] ceph: add some fscrypt guardrails Date: Wed, 12 Apr 2023 19:08:54 +0800 Message-Id: <20230412110930.176835-36-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add the appropriate calls into fscrypt for various actions, including link, rename, setattr, and the open codepaths. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 8 ++++++++ fs/ceph/file.c | 14 +++++++++++++- fs/ceph/inode.c | 4 ++++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 3784fdd77e3e..adf44d99d120 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1138,6 +1138,10 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir, if (ceph_snap(dir) != CEPH_NOSNAP) return -EROFS; + err = fscrypt_prepare_link(old_dentry, dir, dentry); + if (err) + return err; + dout("link in dir %p old_dentry %p dentry %p\n", dir, old_dentry, dentry); req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_LINK, USE_AUTH_MDS); @@ -1379,6 +1383,10 @@ static int ceph_rename(struct mnt_idmap *idmap, struct inode *old_dir, if (err) return err; + err = fscrypt_prepare_rename(old_dir, old_dentry, new_dir, new_dentry, flags); + if (err) + return err; + dout("rename dir %p dentry %p to dir %p dentry %p\n", old_dir, old_dentry, new_dir, new_dentry); req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 1d63537be61c..e5c01cd634eb 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -366,8 +366,13 @@ int ceph_open(struct inode *inode, struct file *file) /* filter out O_CREAT|O_EXCL; vfs did that already. yuck. */ flags = file->f_flags & ~(O_CREAT|O_EXCL); - if (S_ISDIR(inode->i_mode)) + if (S_ISDIR(inode->i_mode)) { flags = O_DIRECTORY; /* mds likes to know */ + } else if (S_ISREG(inode->i_mode)) { + err = fscrypt_file_open(inode, file); + if (err) + return err; + } dout("open inode %p ino %llx.%llx file %p flags %d (%d)\n", inode, ceph_vinop(inode), file, flags, file->f_flags); @@ -876,6 +881,13 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, dout("atomic_open finish_no_open on dn %p\n", dn); err = finish_no_open(file, dn); } else { + if (IS_ENCRYPTED(dir) && + !fscrypt_has_permitted_context(dir, d_inode(dentry))) { + pr_warn("Inconsistent encryption context (parent %llx:%llx child %llx:%llx)\n", + ceph_vinop(dir), ceph_vinop(d_inode(dentry))); + goto out_req; + } + dout("atomic_open finish_open on dn %p\n", dn); if (req->r_op == CEPH_MDS_OP_CREATE && req->r_reply_info.has_create_ino) { struct inode *newino = d_inode(dentry); diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 17de62acb5d4..282c8af0a49c 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2487,6 +2487,10 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (ceph_inode_is_shutdown(inode)) return -ESTALE; + err = fscrypt_prepare_setattr(dentry, attr); + if (err) + return err; + err = setattr_prepare(&nop_mnt_idmap, dentry, attr); if (err != 0) return err; From patchwork Wed Apr 12 11:08:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DA5C7619A for ; Wed, 12 Apr 2023 11:15:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229700AbjDLLO6 (ORCPT ); Wed, 12 Apr 2023 07:14:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbjDLLO4 (ORCPT ); Wed, 12 Apr 2023 07:14:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B00E7B7 for ; Wed, 12 Apr 2023 04:13:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297969; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0kYCJF9DGGtp4ZqNQo6CgNdWCJ4wx95TuHBVUhFXafg=; b=XChlid5gBfRmOt60ugqhmXPePckEJh2Q4pNAZiHcUzN2VjyWpwi5E+HmdBNP8FK5ZV7SYE N/yscKAeljl2Gqu5oZp6n14jAY9eDbsML9arZ8P668lRlLE608sqL2v269F8twP3Cg4cxX 7rNnelAkoogYr9FxhqDdMZqbYFyY/fI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-625-VQpY8KJ-NnyGs6LP-l8Olw-1; Wed, 12 Apr 2023 07:12:46 -0400 X-MC-Unique: VQpY8KJ-NnyGs6LP-l8Olw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7774A811E7C; Wed, 12 Apr 2023 11:12:45 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id C7B13C15BB8; Wed, 12 Apr 2023 11:12:40 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 36/71] ceph: allow encrypting a directory while not having Ax caps Date: Wed, 12 Apr 2023 19:08:55 +0800 Message-Id: <20230412110930.176835-37-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques If a client doesn't have Fx caps on a directory, it will get errors while trying encrypt it: ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48) fscrypt (ceph, inode 1099511627812): Error -105 getting encryption context A simple way to reproduce this is to use two clients: client1 # mkdir /mnt/mydir client2 # ls /mnt/mydir client1 # fscrypt encrypt /mnt/mydir client1 # echo hello > /mnt/mydir/world This happens because, in __ceph_setattr(), we only initialize ci->fscrypt_auth if we have Ax and ceph_fill_inode() won't use the fscrypt_auth received if the inode state isn't I_NEW. Fix it by allowing ceph_fill_inode() to also set ci->fscrypt_auth if the inode doesn't have it set already. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 282c8af0a49c..e88335e05b74 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -981,7 +981,8 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files); #ifdef CONFIG_FS_ENCRYPTION - if (iinfo->fscrypt_auth_len && (inode->i_state & I_NEW)) { + if (iinfo->fscrypt_auth_len && + ((inode->i_state & I_NEW) || (ci->fscrypt_auth_len == 0))) { kfree(ci->fscrypt_auth); ci->fscrypt_auth_len = iinfo->fscrypt_auth_len; ci->fscrypt_auth = iinfo->fscrypt_auth; From patchwork Wed Apr 12 11:08:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54D59C7619A for ; Wed, 12 Apr 2023 11:25:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229981AbjDLLZM (ORCPT ); Wed, 12 Apr 2023 07:25:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230212AbjDLLYu (ORCPT ); Wed, 12 Apr 2023 07:24:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D53702D40 for ; Wed, 12 Apr 2023 04:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1tiYX6foTTZEZeqEoQcul8NnL0v2KKrEg9JON5Oi+Ao=; b=W1L1GGAyIiKmllfnWfl6gwbiZr/0wxCKb2l5s8MHQLaHSeXNTEcAm7pef9FVi6gzkArpyb hvRka7soY0aSfvVwn+qjUmN/UzL8KMLtAKaDSt1uYOp6N311wsMzih933B1JQsMBbWXj+V fmcviq3lkkV1ijAs4pUg2BHpAZ4Nnak= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-191-tiemV38pOOusksPHfDQIcQ-1; Wed, 12 Apr 2023 07:12:50 -0400 X-MC-Unique: tiemV38pOOusksPHfDQIcQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0FD2B1C08796; Wed, 12 Apr 2023 11:12:50 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 44601C15BB8; Wed, 12 Apr 2023 11:12:45 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 37/71] ceph: mark directory as non-complete after loading key Date: Wed, 12 Apr 2023 19:08:56 +0800 Message-Id: <20230412110930.176835-38-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques When setting a directory's crypt context, ceph_dir_clear_complete() needs to be called otherwise if it was complete before, any existing (old) dentry will still be valid. This patch adds a wrapper around __fscrypt_prepare_readdir() which will ensure a directory is marked as non-complete if key status changes. [ xiubli: revise the commit title as Milind pointed out ] Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/crypto.c | 35 +++++++++++++++++++++++++++++++++-- fs/ceph/crypto.h | 6 ++++++ fs/ceph/dir.c | 8 ++++---- fs/ceph/mds_client.c | 6 +++--- 4 files changed, 46 insertions(+), 9 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 5b807f8f4c69..fe47fbdaead9 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -277,8 +277,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX) return -EIO; - ret = __fscrypt_prepare_readdir(fname->dir); - if (ret) + ret = ceph_fscrypt_prepare_readdir(fname->dir); + if (ret < 0) return ret; /* @@ -323,3 +323,34 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, fscrypt_fname_free_buffer(&_tname); return ret; } + +/** + * ceph_fscrypt_prepare_readdir - simple __fscrypt_prepare_readdir() wrapper + * @dir: directory inode for readdir prep + * + * Simple wrapper around __fscrypt_prepare_readdir() that will mark directory as + * non-complete if this call results in having the directory unlocked. + * + * Returns: + * 1 - if directory was locked and key is now loaded (i.e. dir is unlocked) + * 0 - if directory is still locked + * < 0 - if __fscrypt_prepare_readdir() fails + */ +int ceph_fscrypt_prepare_readdir(struct inode *dir) +{ + bool had_key = fscrypt_has_encryption_key(dir); + int err; + + if (!IS_ENCRYPTED(dir)) + return 0; + + err = __fscrypt_prepare_readdir(dir); + if (err) + return err; + if (!had_key && fscrypt_has_encryption_key(dir)) { + /* directory just got unlocked, mark it as not complete */ + ceph_dir_clear_complete(dir); + return 1; + } + return 0; +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 05db33f1a421..f8d5f33f708a 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -94,6 +94,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_s int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, struct fscrypt_str *oname, bool *is_nokey); +int ceph_fscrypt_prepare_readdir(struct inode *dir); #else /* CONFIG_FS_ENCRYPTION */ @@ -147,6 +148,11 @@ static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscry oname->len = fname->name_len; return 0; } + +static inline int ceph_fscrypt_prepare_readdir(struct inode *dir) +{ + return 0; +} #endif /* CONFIG_FS_ENCRYPTION */ #endif diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index adf44d99d120..4746e53bdf51 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -343,8 +343,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) ctx->pos = 2; } - err = fscrypt_prepare_readdir(inode); - if (err) + err = ceph_fscrypt_prepare_readdir(inode); + if (err < 0) return err; spin_lock(&ci->i_ceph_lock); @@ -784,8 +784,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry, return ERR_PTR(-ENAMETOOLONG); if (IS_ENCRYPTED(dir)) { - err = __fscrypt_prepare_readdir(dir); - if (err) + err = ceph_fscrypt_prepare_readdir(dir); + if (err < 0) return ERR_PTR(err); if (!fscrypt_has_encryption_key(dir)) { spin_lock(&dentry->d_lock); diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index cc5342a7f40e..d1dcd6cfaeba 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2557,8 +2557,8 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen) if (!IS_ENCRYPTED(dir)) goto success; - ret = __fscrypt_prepare_readdir(dir); - if (ret) + ret = ceph_fscrypt_prepare_readdir(dir); + if (ret < 0) return ERR_PTR(ret); /* No key? Just ignore it. */ @@ -2674,7 +2674,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for spin_unlock(&cur->d_lock); parent = dget_parent(cur); - ret = __fscrypt_prepare_readdir(d_inode(parent)); + ret = ceph_fscrypt_prepare_readdir(d_inode(parent)); if (ret < 0) { dput(parent); dput(cur); From patchwork Wed Apr 12 11:08:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5856CC77B71 for ; Wed, 12 Apr 2023 11:25:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230252AbjDLLZS (ORCPT ); Wed, 12 Apr 2023 07:25:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230313AbjDLLYx (ORCPT ); Wed, 12 Apr 2023 07:24:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA4C37DB6 for ; Wed, 12 Apr 2023 04:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UH9BnOoUP7le6hWvWAcwAS2eKdPwPFko3U6ROvEq/ps=; b=Pis9C/DyeOM3iisDDTa3T/UxOLwjfP5vvJ1Z0cWVflqkrz0MjYgM+rn9u1F26h7NuRBfBv u7CFHCaBmM4aLs52RPllGlenKCs8rKSBoj8tpEal2ZvkT10RfaSAFZRo1ACCOeye4QKA77 yUwKY1gokFFxSFpxM8MHGVMj6ctX2A4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-256-Ix8zkKEtNrWVoIn0-7L3Og-1; Wed, 12 Apr 2023 07:12:55 -0400 X-MC-Unique: Ix8zkKEtNrWVoIn0-7L3Og-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F63585A588; Wed, 12 Apr 2023 11:12:54 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id D1806C15BB8; Wed, 12 Apr 2023 11:12:50 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 38/71] ceph: don't allow changing layout on encrypted files/directories Date: Wed, 12 Apr 2023 19:08:57 +0800 Message-Id: <20230412110930.176835-39-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques Encryption is currently only supported on files/directories with layouts where stripe_count=1. Forbid changing layouts when encryption is involved. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luis Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/ioctl.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c index 7f0e181d4329..679402bd80ba 100644 --- a/fs/ceph/ioctl.c +++ b/fs/ceph/ioctl.c @@ -294,6 +294,10 @@ static long ceph_set_encryption_policy(struct file *file, unsigned long arg) struct inode *inode = file_inode(file); struct ceph_inode_info *ci = ceph_inode(inode); + /* encrypted directories can't have striped layout */ + if (ci->i_layout.stripe_count > 1) + return -EINVAL; + ret = vet_mds_for_fscrypt(file); if (ret) return ret; From patchwork Wed Apr 12 11:08:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208842 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C6BFC7619A for ; Wed, 12 Apr 2023 11:15:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230090AbjDLLPl (ORCPT ); Wed, 12 Apr 2023 07:15:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbjDLLPj (ORCPT ); Wed, 12 Apr 2023 07:15:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08630869B for ; Wed, 12 Apr 2023 04:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297986; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Imu+8bQRL/WgKjA27tBjIQtmPrl+XWCSdZsDH5+PX2E=; b=P6RhzbvofF9d+jD2DOgeI3t917skVwusiSkRSSAPbOuM4DXftwu2gbFXq92mktXf0RvgHE hPC05+s15eFjXWaymd/TYQ3A97no/iXhlbqAF9OD/keSsEBUF4/nc9w2DAQpjmJjOkUWz1 2AyxjcaFQ5CMaITooN11s1IS40BzL1s= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-445-gj5YmG_CNaWIGRyznM1huw-1; Wed, 12 Apr 2023 07:13:01 -0400 X-MC-Unique: gj5YmG_CNaWIGRyznM1huw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5B4A63813F3E; Wed, 12 Apr 2023 11:12:59 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 50988C15BBA; Wed, 12 Apr 2023 11:12:54 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 39/71] libceph: add CEPH_OSD_OP_ASSERT_VER support Date: Wed, 12 Apr 2023 19:08:58 +0800 Message-Id: <20230412110930.176835-40-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton ...and record the user_version in the reply in a new field in ceph_osd_request, so we can populate the assert_ver appropriately. Shuffle the fields a bit too so that the new field fits in an existing hole on x86_64. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- include/linux/ceph/osd_client.h | 6 +++++- include/linux/ceph/rados.h | 4 ++++ net/ceph/osd_client.c | 5 +++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 680b4a70f6d4..41bcd71cfa7a 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -198,6 +198,9 @@ struct ceph_osd_req_op { u32 src_fadvise_flags; struct ceph_osd_data osd_data; } copy_from; + struct { + u64 ver; + } assert_ver; }; }; @@ -252,6 +255,7 @@ struct ceph_osd_request { struct ceph_osd_client *r_osdc; struct kref r_kref; bool r_mempool; + bool r_linger; /* don't resend on failure */ struct completion r_completion; /* private to osd_client.c */ ceph_osdc_callback_t r_callback; @@ -264,9 +268,9 @@ struct ceph_osd_request { struct ceph_snap_context *r_snapc; /* for writes */ struct timespec64 r_mtime; /* ditto */ u64 r_data_offset; /* ditto */ - bool r_linger; /* don't resend on failure */ /* internal */ + u64 r_version; /* data version sent in reply */ unsigned long r_stamp; /* jiffies, send or check time */ unsigned long r_start_stamp; /* jiffies */ ktime_t r_start_latency; /* ktime_t */ diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h index 43a7a1573b51..73c3efbec36c 100644 --- a/include/linux/ceph/rados.h +++ b/include/linux/ceph/rados.h @@ -523,6 +523,10 @@ struct ceph_osd_op { struct { __le64 cookie; } __attribute__ ((packed)) notify; + struct { + __le64 unused; + __le64 ver; + } __attribute__ ((packed)) assert_ver; struct { __le64 offset, length; __le64 src_offset; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 9138abc07f45..2dcb1140ed21 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1048,6 +1048,10 @@ static u32 osd_req_encode_op(struct ceph_osd_op *dst, dst->copy_from.src_fadvise_flags = cpu_to_le32(src->copy_from.src_fadvise_flags); break; + case CEPH_OSD_OP_ASSERT_VER: + dst->assert_ver.unused = cpu_to_le64(0); + dst->assert_ver.ver = cpu_to_le64(src->assert_ver.ver); + break; default: pr_err("unsupported osd opcode %s\n", ceph_osd_op_name(src->op)); @@ -3852,6 +3856,7 @@ static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg) * one (type of) reply back. */ WARN_ON(!(m.flags & CEPH_OSD_FLAG_ONDISK)); + req->r_version = m.user_version; req->r_result = m.result ?: data_len; finish_request(req); mutex_unlock(&osd->lock); From patchwork Wed Apr 12 11:08:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208838 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03732C7619A for ; Wed, 12 Apr 2023 11:15:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229697AbjDLLP1 (ORCPT ); Wed, 12 Apr 2023 07:15:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbjDLLPY (ORCPT ); Wed, 12 Apr 2023 07:15:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08E6D6E9A for ; Wed, 12 Apr 2023 04:14:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681297987; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DjEwLRVXPF6Azrj1pp/qdxnIDvtrKIO08jJ77z8Y6rw=; b=VPbFDySxUmwdipKjZKVCLdETWKmrxmCbk5w3m4jxILILnfggxrrzjgIT08Sti7BY3obNOS LFLKL44kuCgPx7kKywrDuWOnj5THJZbJY8V0axcGR0qLWRym8c1i4LiUAjwpCSVREIyyK/ nkIG/Nn+mJ4S2Py10ZUnf56ryg9xz+4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-344-Q-76Bui0Nn2UOAUrhjkONA-1; Wed, 12 Apr 2023 07:13:04 -0400 X-MC-Unique: Q-76Bui0Nn2UOAUrhjkONA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 22C931C08799; Wed, 12 Apr 2023 11:13:04 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 23455C15BBA; Wed, 12 Apr 2023 11:12:59 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 40/71] ceph: size handling for encrypted inodes in cap updates Date: Wed, 12 Apr 2023 19:08:59 +0800 Message-Id: <20230412110930.176835-41-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Transmit the rounded-up size as the normal size, and fill out the fscrypt_file field with the real file size. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 43 +++++++++++++++++++++++++------------------ fs/ceph/crypto.h | 4 ++++ 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 7fb1003bdc7e..2ac9ee492616 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1217,10 +1217,9 @@ struct cap_msg_args { umode_t mode; bool inline_data; bool wake; + bool encrypted; u32 fscrypt_auth_len; - u32 fscrypt_file_len; u8 fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context - u8 fscrypt_file[sizeof(u64)]; // for size }; /* Marshal up the cap msg to the MDS */ @@ -1255,7 +1254,12 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) fc->ino = cpu_to_le64(arg->ino); fc->snap_follows = cpu_to_le64(arg->follows); - fc->size = cpu_to_le64(arg->size); +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + if (arg->encrypted) + fc->size = cpu_to_le64(round_up(arg->size, CEPH_FSCRYPT_BLOCK_SIZE)); + else +#endif + fc->size = cpu_to_le64(arg->size); fc->max_size = cpu_to_le64(arg->max_size); ceph_encode_timespec64(&fc->mtime, &arg->mtime); ceph_encode_timespec64(&fc->atime, &arg->atime); @@ -1315,11 +1319,17 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) ceph_encode_64(&p, 0); #if IS_ENABLED(CONFIG_FS_ENCRYPTION) - /* fscrypt_auth and fscrypt_file (version 12) */ + /* + * fscrypt_auth and fscrypt_file (version 12) + * + * fscrypt_auth holds the crypto context (if any). fscrypt_file + * tracks the real i_size as an __le64 field (and we use a rounded-up + * i_size in * the traditional size field). + */ ceph_encode_32(&p, arg->fscrypt_auth_len); ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len); - ceph_encode_32(&p, arg->fscrypt_file_len); - ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len); + ceph_encode_32(&p, sizeof(__le64)); + ceph_encode_64(&p, arg->size); #else /* CONFIG_FS_ENCRYPTION */ ceph_encode_32(&p, 0); ceph_encode_32(&p, 0); @@ -1391,7 +1401,6 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap, arg->follows = flushing ? ci->i_head_snapc->seq : 0; arg->flush_tid = flush_tid; arg->oldest_flush_tid = oldest_flush_tid; - arg->size = i_size_read(inode); ci->i_reported_size = arg->size; arg->max_size = ci->i_wanted_max_size; @@ -1445,6 +1454,7 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap, } } arg->flags = flags; + arg->encrypted = IS_ENCRYPTED(inode); #if IS_ENABLED(CONFIG_FS_ENCRYPTION) if (ci->fscrypt_auth_len && WARN_ON_ONCE(ci->fscrypt_auth_len > sizeof(struct ceph_fscrypt_auth))) { @@ -1455,21 +1465,21 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap, memcpy(arg->fscrypt_auth, ci->fscrypt_auth, min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth))); } - /* FIXME: use this to track "real" size */ - arg->fscrypt_file_len = 0; #endif /* CONFIG_FS_ENCRYPTION */ } +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) #define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \ - 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4) + 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4 + 8) -#if IS_ENABLED(CONFIG_FS_ENCRYPTION) static inline int cap_msg_size(struct cap_msg_args *arg) { - return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len + - arg->fscrypt_file_len; + return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len; } #else +#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \ + 4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4) + static inline int cap_msg_size(struct cap_msg_args *arg) { return CAP_MSG_FIXED_FIELDS; @@ -1548,13 +1558,10 @@ static inline int __send_flush_snap(struct inode *inode, arg.inline_data = capsnap->inline_data; arg.flags = 0; arg.wake = false; + arg.encrypted = IS_ENCRYPTED(inode); - /* - * No fscrypt_auth changes from a capsnap. It will need - * to update fscrypt_file on size changes (TODO). - */ + /* No fscrypt_auth changes from a capsnap.*/ arg.fscrypt_auth_len = 0; - arg.fscrypt_file_len = 0; msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg), GFP_NOFS, false); diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index f8d5f33f708a..80acb23d0bb4 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -9,6 +9,10 @@ #include #include +#define CEPH_FSCRYPT_BLOCK_SHIFT 12 +#define CEPH_FSCRYPT_BLOCK_SIZE (_AC(1, UL) << CEPH_FSCRYPT_BLOCK_SHIFT) +#define CEPH_FSCRYPT_BLOCK_MASK (~(CEPH_FSCRYPT_BLOCK_SIZE-1)) + struct ceph_fs_client; struct ceph_acl_sec_ctx; struct ceph_mds_request; From patchwork Wed Apr 12 11:09:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D671C77B77 for ; Wed, 12 Apr 2023 11:25:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbjDLLZP (ORCPT ); Wed, 12 Apr 2023 07:25:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231270AbjDLLZI (ORCPT ); Wed, 12 Apr 2023 07:25:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EEF510FA for ; Wed, 12 Apr 2023 04:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y9nIBhVP7JMISq/h/+Jat0P8T5PwCyah4skqjbFHaio=; b=HFBS/gz/osM+7HoN5dDTNEAEtNNPxKYK8HvQ7HaCFkfr9Cl5OoTOrSApepUP40yhzFpzev iaO4RteFx3Tjuszr8q19dGlnID0S+kpPIQorkJ9sK9phoptMTU0j5FRnWrcrdSdResYdCU nspa+T+fxcGkWB4Txwe6p525dT5yXaM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-363-hUVVwf5BOoKm2mqg2ae5eA-1; Wed, 12 Apr 2023 07:13:10 -0400 X-MC-Unique: hUVVwf5BOoKm2mqg2ae5eA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 07C2E1C0514A; Wed, 12 Apr 2023 11:13:10 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D33BC15BB8; Wed, 12 Apr 2023 11:13:04 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 41/71] ceph: fscrypt_file field handling in MClientRequest messages Date: Wed, 12 Apr 2023 19:09:00 +0800 Message-Id: <20230412110930.176835-42-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton For encrypted inodes, transmit a rounded-up size to the MDS as the normal file size and send the real inode size in fscrypt_file field. Also, fix up creates and truncates to also transmit fscrypt_file. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 3 +++ fs/ceph/file.c | 1 + fs/ceph/inode.c | 18 ++++++++++++++++-- fs/ceph/mds_client.c | 9 ++++++++- fs/ceph/mds_client.h | 2 ++ 5 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 4746e53bdf51..98a9b1592ba6 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -914,6 +914,9 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir, goto out_req; } + if (S_ISREG(mode) && IS_ENCRYPTED(dir)) + set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); + req->r_dentry = dget(dentry); req->r_num_caps = 2; req->r_parent = dir; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index e5c01cd634eb..f869ab31685a 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -790,6 +790,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, req->r_parent = dir; ihold(dir); if (IS_ENCRYPTED(dir)) { + set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); if (!fscrypt_has_encryption_key(dir)) { spin_lock(&dentry->d_lock); dentry->d_flags |= DCACHE_NOKEY_NAME; diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index e88335e05b74..4c5ced950821 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2378,11 +2378,25 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c } } else if ((issued & CEPH_CAP_FILE_SHARED) == 0 || attr->ia_size != isize) { - req->r_args.setattr.size = cpu_to_le64(attr->ia_size); - req->r_args.setattr.old_size = cpu_to_le64(isize); mask |= CEPH_SETATTR_SIZE; release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL | CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR; + if (IS_ENCRYPTED(inode) && attr->ia_size) { + set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); + mask |= CEPH_SETATTR_FSCRYPT_FILE; + req->r_args.setattr.size = + cpu_to_le64(round_up(attr->ia_size, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_args.setattr.old_size = + cpu_to_le64(round_up(isize, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_fscrypt_file = attr->ia_size; + /* FIXME: client must zero out any partial blocks! */ + } else { + req->r_args.setattr.size = cpu_to_le64(attr->ia_size); + req->r_args.setattr.old_size = cpu_to_le64(isize); + req->r_fscrypt_file = 0; + } } } if (ia_valid & ATTR_MTIME) { diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index d1dcd6cfaeba..85d639f75ea1 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2836,7 +2836,12 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request * } else { ceph_encode_32(p, 0); } - ceph_encode_32(p, 0); // fscrypt_file for now + if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags)) { + ceph_encode_32(p, sizeof(__le64)); + ceph_encode_64(p, req->r_fscrypt_file); + } else { + ceph_encode_32(p, 0); + } } /* @@ -2922,6 +2927,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, /* fscrypt_file */ len += sizeof(u32); + if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags)) + len += sizeof(__le64); msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false); if (!msg) { diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 728b7d72bf76..81a1f9a4ac3b 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -282,6 +282,7 @@ struct ceph_mds_request { #define CEPH_MDS_R_DID_PREPOPULATE (6) /* prepopulated readdir */ #define CEPH_MDS_R_PARENT_LOCKED (7) /* is r_parent->i_rwsem wlocked? */ #define CEPH_MDS_R_ASYNC (8) /* async request */ +#define CEPH_MDS_R_FSCRYPT_FILE (9) /* must marshal fscrypt_file field */ unsigned long r_req_flags; struct mutex r_fill_mutex; @@ -289,6 +290,7 @@ struct ceph_mds_request { union ceph_mds_request_args r_args; struct ceph_fscrypt_auth *r_fscrypt_auth; + u64 r_fscrypt_file; u8 *r_altname; /* fscrypt binary crypttext for long filenames */ u32 r_altname_len; /* length of r_altname */ From patchwork Wed Apr 12 11:09:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2414C77B6E for ; Wed, 12 Apr 2023 11:25:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230190AbjDLLZN (ORCPT ); Wed, 12 Apr 2023 07:25:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231315AbjDLLZK (ORCPT ); Wed, 12 Apr 2023 07:25:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E72A76A3 for ; Wed, 12 Apr 2023 04:24:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tVbOw4rYOvYvRxXHvd0mrUDExDGJh1AnndTZKzYquiA=; b=eG8IF8S38nSg1prbR1igtqHfdDvu0JdrhAyqNUHaCrbbfZMnob4N2A4d5xk37LuHWaSpiO quufwmSOwuVgwl3lT7HIwmI3cwUAQEHzOY9n+0bCRh9jxBEf1WjADzL1PEezGtHHIF/5w/ vqr+731y3ghsEvC/8CBlN4jv6ZPs1k0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-126-9WLUxf5sP2eyLLORDMq7FA-1; Wed, 12 Apr 2023 07:13:14 -0400 X-MC-Unique: 9WLUxf5sP2eyLLORDMq7FA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 55208101A552; Wed, 12 Apr 2023 11:13:14 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id C8B0BC15BB8; Wed, 12 Apr 2023 11:13:10 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 42/71] ceph: get file size from fscrypt_file when present in inode traces Date: Wed, 12 Apr 2023 19:09:01 +0800 Message-Id: <20230412110930.176835-43-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li When we get an inode trace from the MDS, grab the fscrypt_file field if the inode is encrypted, and use it to populate the i_size field instead of the regular inode size field. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 4c5ced950821..db54cc44a82f 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1025,6 +1025,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, if (new_version || (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) { + u64 size = le64_to_cpu(info->size); s64 old_pool = ci->i_layout.pool_id; struct ceph_string *old_ns; @@ -1038,10 +1039,21 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, pool_ns = old_ns; + if (IS_ENCRYPTED(inode) && size && (iinfo->fscrypt_file_len == sizeof(__le64))) { + u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file); + + if (size == round_up(fsize, CEPH_FSCRYPT_BLOCK_SIZE)) { + size = fsize; + } else { + pr_warn("fscrypt size mismatch: size=%llu fscrypt_file=%llu, discarding fscrypt_file size.\n", + info->size, size); + } + } + queue_trunc = ceph_fill_file_size(inode, issued, le32_to_cpu(info->truncate_seq), le64_to_cpu(info->truncate_size), - le64_to_cpu(info->size)); + size); /* only update max_size on auth cap */ if ((info->cap.flags & CEPH_CAP_FLAG_AUTH) && ci->i_max_size != le64_to_cpu(info->max_size)) { From patchwork Wed Apr 12 11:09:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25595C77B6E for ; Wed, 12 Apr 2023 11:15:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230014AbjDLLPZ (ORCPT ); Wed, 12 Apr 2023 07:15:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229696AbjDLLPY (ORCPT ); Wed, 12 Apr 2023 07:15:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0FD37D87 for ; Wed, 12 Apr 2023 04:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298002; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UijHpydNfb4eqqegPMMOWSHULMQpNI0L25HHR8EbHUo=; b=izSsB+7OvxE/xdcCN9irXVNnODPYqq6moyDFcqlcU3Bby2d1hfjqi+S9OecJ+wny+cHuOT bwUzlVjUSD9fmgyOO0vS+A24DkOqPq1wTTaXySbTWIU/C+rXlW6Rk8ManZB2P+yh8JmW3h iCayeojf0/mJ/nGK5t7Dg+EiWPKM6Aw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-609-FJsUTJX5OOKmsH4BJuRRmA-1; Wed, 12 Apr 2023 07:13:19 -0400 X-MC-Unique: FJsUTJX5OOKmsH4BJuRRmA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C2AA800B35; Wed, 12 Apr 2023 11:13:19 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 46154C15BB8; Wed, 12 Apr 2023 11:13:14 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 43/71] ceph: handle fscrypt fields in cap messages from MDS Date: Wed, 12 Apr 2023 19:09:02 +0800 Message-Id: <20230412110930.176835-44-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Handle the new fscrypt_file and fscrypt_auth fields in cap messages. Use them to populate new fields in cap_extra_info and update the inode with those values. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 2 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 2ac9ee492616..a0601a00b9d3 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3369,6 +3369,9 @@ struct cap_extra_info { /* currently issued */ int issued; struct timespec64 btime; + u8 *fscrypt_auth; + u32 fscrypt_auth_len; + u64 fscrypt_file_size; }; /* @@ -3401,6 +3404,14 @@ static void handle_cap_grant(struct inode *inode, bool deleted_inode = false; bool fill_inline = false; + /* + * If there is at least one crypto block then we'll trust fscrypt_file_size. + * If the real length of the file is 0, then ignore it (it has probably been + * truncated down to 0 by the MDS). + */ + if (IS_ENCRYPTED(inode) && size) + size = extra_info->fscrypt_file_size; + dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n", inode, cap, session->s_mds, seq, ceph_cap_string(newcaps)); dout(" size %llu max_size %llu, i_size %llu\n", size, max_size, @@ -3467,6 +3478,10 @@ static void handle_cap_grant(struct inode *inode, dout("%p mode 0%o uid.gid %d.%d\n", inode, inode->i_mode, from_kuid(&init_user_ns, inode->i_uid), from_kgid(&init_user_ns, inode->i_gid)); + + WARN_ON_ONCE(ci->fscrypt_auth_len != extra_info->fscrypt_auth_len || + memcmp(ci->fscrypt_auth, extra_info->fscrypt_auth, + ci->fscrypt_auth_len)); } if ((newcaps & CEPH_CAP_LINK_SHARED) && @@ -3878,7 +3893,8 @@ static void handle_cap_flushsnap_ack(struct inode *inode, u64 flush_tid, */ static bool handle_cap_trunc(struct inode *inode, struct ceph_mds_caps *trunc, - struct ceph_mds_session *session) + struct ceph_mds_session *session, + struct cap_extra_info *extra_info) { struct ceph_inode_info *ci = ceph_inode(inode); int mds = session->s_mds; @@ -3895,6 +3911,14 @@ static bool handle_cap_trunc(struct inode *inode, issued |= implemented | dirty; + /* + * If there is at least one crypto block then we'll trust fscrypt_file_size. + * If the real length of the file is 0, then ignore it (it has probably been + * truncated down to 0 by the MDS). + */ + if (IS_ENCRYPTED(inode) && size) + size = extra_info->fscrypt_file_size; + dout("handle_cap_trunc inode %p mds%d seq %d to %lld seq %d\n", inode, mds, seq, truncate_size, truncate_seq); queue_trunc = ceph_fill_file_size(inode, issued, @@ -4116,6 +4140,49 @@ static void handle_cap_import(struct ceph_mds_client *mdsc, *target_cap = cap; } +#ifdef CONFIG_FS_ENCRYPTION +static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra) +{ + u32 len; + + ceph_decode_32_safe(p, end, extra->fscrypt_auth_len, bad); + if (extra->fscrypt_auth_len) { + ceph_decode_need(p, end, extra->fscrypt_auth_len, bad); + extra->fscrypt_auth = kmalloc(extra->fscrypt_auth_len, GFP_KERNEL); + if (!extra->fscrypt_auth) + return -ENOMEM; + ceph_decode_copy_safe(p, end, extra->fscrypt_auth, + extra->fscrypt_auth_len, bad); + } + + ceph_decode_32_safe(p, end, len, bad); + if (len >= sizeof(u64)) { + ceph_decode_64_safe(p, end, extra->fscrypt_file_size, bad); + len -= sizeof(u64); + } + ceph_decode_skip_n(p, end, len, bad); + return 0; +bad: + return -EIO; +} +#else +static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra) +{ + u32 len; + + /* Don't care about these fields unless we're encryption-capable */ + ceph_decode_32_safe(p, end, len, bad); + if (len) + ceph_decode_skip_n(p, end, len, bad); + ceph_decode_32_safe(p, end, len, bad); + if (len) + ceph_decode_skip_n(p, end, len, bad); + return 0; +bad: + return -EIO; +} +#endif + /* * Handle a caps message from the MDS. * @@ -4235,6 +4302,11 @@ void ceph_handle_caps(struct ceph_mds_session *session, ceph_decode_64_safe(&p, end, extra_info.nsubdirs, bad); } + if (msg_version >= 12) { + if (parse_fscrypt_fields(&p, end, &extra_info)) + goto bad; + } + /* lookup ino */ inode = ceph_find_inode(mdsc->fsc->sb, vino); dout(" op %s ino %llx.%llx inode %p\n", ceph_cap_op_name(op), vino.ino, @@ -4335,7 +4407,8 @@ void ceph_handle_caps(struct ceph_mds_session *session, break; case CEPH_CAP_OP_TRUNC: - queue_trunc = handle_cap_trunc(inode, h, session); + queue_trunc = handle_cap_trunc(inode, h, session, + &extra_info); spin_unlock(&ci->i_ceph_lock); if (queue_trunc) ceph_queue_vmtruncate(inode); @@ -4358,6 +4431,7 @@ void ceph_handle_caps(struct ceph_mds_session *session, if (close_sessions) ceph_mdsc_close_sessions(mdsc); + kfree(extra_info.fscrypt_auth); return; flush_cap_releases: From patchwork Wed Apr 12 11:09:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208840 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1632AC77B6E for ; Wed, 12 Apr 2023 11:15:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230047AbjDLLPe (ORCPT ); Wed, 12 Apr 2023 07:15:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbjDLLPb (ORCPT ); Wed, 12 Apr 2023 07:15:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFB7483FA for ; Wed, 12 Apr 2023 04:14:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298009; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VfqG0LD8bgo/ABXnX91I7Y08x4iCOZffh4V4AHkFOVU=; b=fpyDm/v22CjoNXbLPyaQYHUC/3WgZ1zJ17CR3PYKQc+kHL27OoVKIunVTMJwvC9XOw1eOf UohjyXj1zQUoXu2qQwcwrg4Z0OKmwOUCdvkcICHIUcnoYa43B1xZNgNtvC+nqJ8sZ2FVCT 2UWxjK0TNoLme3oNO544Jxg1sVzYkRQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-61-3iCraslfO72hafUJ-Y2Kpw-1; Wed, 12 Apr 2023 07:13:23 -0400 X-MC-Unique: 3iCraslfO72hafUJ-Y2Kpw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7C29028237C9; Wed, 12 Apr 2023 11:13:23 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9022C15BB8; Wed, 12 Apr 2023 11:13:19 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 44/71] ceph: update WARN_ON message to pr_warn Date: Wed, 12 Apr 2023 19:09:03 +0800 Message-Id: <20230412110930.176835-45-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Give some more helpful info Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index a0601a00b9d3..10e69857f692 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3478,10 +3478,13 @@ static void handle_cap_grant(struct inode *inode, dout("%p mode 0%o uid.gid %d.%d\n", inode, inode->i_mode, from_kuid(&init_user_ns, inode->i_uid), from_kgid(&init_user_ns, inode->i_gid)); - - WARN_ON_ONCE(ci->fscrypt_auth_len != extra_info->fscrypt_auth_len || - memcmp(ci->fscrypt_auth, extra_info->fscrypt_auth, - ci->fscrypt_auth_len)); +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + if (ci->fscrypt_auth_len != extra_info->fscrypt_auth_len || + memcmp(ci->fscrypt_auth, extra_info->fscrypt_auth, + ci->fscrypt_auth_len)) + pr_warn_ratelimited("%s: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len %d new len %d)\n", + __func__, ci->fscrypt_auth_len, extra_info->fscrypt_auth_len); +#endif } if ((newcaps & CEPH_CAP_LINK_SHARED) && From patchwork Wed Apr 12 11:09:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208839 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C53C7619A for ; Wed, 12 Apr 2023 11:15:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230035AbjDLLPd (ORCPT ); Wed, 12 Apr 2023 07:15:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230047AbjDLLPb (ORCPT ); Wed, 12 Apr 2023 07:15:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54AC9868B for ; Wed, 12 Apr 2023 04:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298011; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nr1aAWtciEvh5lMfBblUcJ9h3PvSN5dvkPx3t7UgDlA=; b=MVRpXKYQedvxdDGBw/i4zhMpCA9WOKiUZlCTk/Z9b67iD1Kij1JcUEgIaoGdV7t0+vegMD mrX60QvQCMoR9VbqYO9mpUMRYeuGi/eWNyz2RRZyu2C6QBFIOWm7e4nrKM8BTcGn+aHwX5 dyHzXGXV0lf6CtyWVm8Bq3DmdAnUFwk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-122-_TIvBB4DP_KjFdI_DOJ6dQ-1; Wed, 12 Apr 2023 07:13:28 -0400 X-MC-Unique: _TIvBB4DP_KjFdI_DOJ6dQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D42EA185A794; Wed, 12 Apr 2023 11:13:27 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 49096C15BB8; Wed, 12 Apr 2023 11:13:23 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 45/71] ceph: add __ceph_get_caps helper support Date: Wed, 12 Apr 2023 19:09:04 +0800 Message-Id: <20230412110930.176835-46-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Break out the guts of ceph_get_caps into a helper that takes an inode and ceph_file_info instead of a file pointer. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/caps.c | 17 ++++++++++++----- fs/ceph/super.h | 2 ++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 10e69857f692..6379c0070492 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2952,10 +2952,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, * due to a small max_size, make sure we check_max_size (and possibly * ask the mds) so we don't get hung up indefinitely. */ -int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got) +int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need, + int want, loff_t endoff, int *got) { - struct ceph_file_info *fi = filp->private_data; - struct inode *inode = file_inode(filp); struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); int ret, _got, flags; @@ -2964,7 +2963,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got if (ret < 0) return ret; - if ((fi->fmode & CEPH_FILE_MODE_WR) && + if (fi && (fi->fmode & CEPH_FILE_MODE_WR) && fi->filp_gen != READ_ONCE(fsc->filp_gen)) return -EBADF; @@ -3017,7 +3016,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got continue; } - if ((fi->fmode & CEPH_FILE_MODE_WR) && + if (fi && (fi->fmode & CEPH_FILE_MODE_WR) && fi->filp_gen != READ_ONCE(fsc->filp_gen)) { if (ret >= 0 && _got) ceph_put_cap_refs(ci, _got); @@ -3080,6 +3079,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got return 0; } +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got) +{ + struct ceph_file_info *fi = filp->private_data; + struct inode *inode = file_inode(filp); + + return __ceph_get_caps(inode, fi, need, want, endoff, got); +} + /* * Take cap refs. Caller must already know we hold at least one ref * on the caps in question or we don't know this is safe. diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 355701a1ac70..844b6bed616d 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1249,6 +1249,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn, struct inode *dir, int mds, int drop, int unless); +extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, + int need, int want, loff_t endoff, int *got); extern int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got); extern int ceph_try_get_caps(struct inode *inode, From patchwork Wed Apr 12 11:09:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40E1CC77B6E for ; Wed, 12 Apr 2023 11:21:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229966AbjDLLVv (ORCPT ); Wed, 12 Apr 2023 07:21:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbjDLLVu (ORCPT ); Wed, 12 Apr 2023 07:21:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CBDC76A4 for ; Wed, 12 Apr 2023 04:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p1PU9hlaWbPkXdb+MWviNoq7BYFxon0yCNUC3EqHWlM=; b=jOw45wo1HcehJToQT4R/05N4Bq9Ybjw7b43EYt6EohcRGOVlel2JRlCi+32a2RBwPH6NVV qVgIlV6oG3O2pvY8piMHbwnUGBXurlYZH76OA2b+n5arusXw6dIuHfguzvelJaltIJXT4N dDop5pO+KfxSBUI/HcW5kXuesjZNlBg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-TpcXxLFpPL-OGIy68uA6YQ-1; Wed, 12 Apr 2023 07:13:32 -0400 X-MC-Unique: TpcXxLFpPL-OGIy68uA6YQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4BD8A3C0F37F; Wed, 12 Apr 2023 11:13:32 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id A0557C15BB8; Wed, 12 Apr 2023 11:13:28 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 46/71] ceph: add __ceph_sync_read helper support Date: Wed, 12 Apr 2023 19:09:05 +0800 Message-Id: <20230412110930.176835-47-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Turn the guts of ceph_sync_read into a new helper that takes an inode and an offset instead of a kiocb struct, and make ceph_sync_read call the helper as a wrapper. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 36 ++++++++++++++++++++++++------------ fs/ceph/super.h | 2 ++ 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index f869ab31685a..ab55caeed2c6 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -956,22 +956,22 @@ enum { * If we get a short result from the OSD, check against i_size; we need to * only return a short read to the caller if we hit EOF. */ -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, - int *retry_op) +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, + struct iov_iter *to, int *retry_op) { - struct file *file = iocb->ki_filp; - struct inode *inode = file_inode(file); struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); struct ceph_osd_client *osdc = &fsc->client->osdc; ssize_t ret; - u64 off = iocb->ki_pos; + u64 off = *ki_pos; u64 len = iov_iter_count(to); u64 i_size = i_size_read(inode); bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); - dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len, - (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); + dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len); + + if (ceph_inode_is_shutdown(inode)) + return -EIO; if (!len) return 0; @@ -1089,14 +1089,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, break; } - if (off > iocb->ki_pos) { + if (off > *ki_pos) { if (off >= i_size) { *retry_op = CHECK_EOF; - ret = i_size - iocb->ki_pos; - iocb->ki_pos = i_size; + ret = i_size - *ki_pos; + *ki_pos = i_size; } else { - ret = off - iocb->ki_pos; - iocb->ki_pos = off; + ret = off - *ki_pos; + *ki_pos = off; } } @@ -1104,6 +1104,18 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, return ret; } +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, + int *retry_op) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + + dout("sync_read on file %p %llx~%zx %s\n", file, iocb->ki_pos, + iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); + + return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op); +} + struct ceph_aio_request { struct kiocb *iocb; size_t total_len; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 844b6bed616d..09cf4a4351dc 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1286,6 +1286,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode); extern int ceph_open(struct inode *inode, struct file *file); extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode); +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, + struct iov_iter *to, int *retry_op); extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); From patchwork Wed Apr 12 11:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208843 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0D85C77B6E for ; Wed, 12 Apr 2023 11:15:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229824AbjDLLPx (ORCPT ); Wed, 12 Apr 2023 07:15:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229692AbjDLLPv (ORCPT ); Wed, 12 Apr 2023 07:15:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA3B5769F for ; Wed, 12 Apr 2023 04:14:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=saLOzyOxNv2xHQ/WhlnRB39qTQDd6pRuvC1XhWizOi8=; b=icZLggEx29FGtZWUIkMt2Q/sQDqP/x0P2v2lPbDmZWyCi71ODfbi99EydVOOmm1BBi6fqH pV9jIqMTHsLYwfngDaRAiH1mCfds5Ss3zk5rsijhCguuxZgS+gBHxkBNK2wURMfayuAm4y GPaSqm7REjnSDyIb4htJSJEqmnhEVQU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-424-h6tXc0YrNxarL3J2L9G9Hw-1; Wed, 12 Apr 2023 07:13:37 -0400 X-MC-Unique: h6tXc0YrNxarL3J2L9G9Hw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2850D1C0514A; Wed, 12 Apr 2023 11:13:37 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24D8EC15BB8; Wed, 12 Apr 2023 11:13:32 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 47/71] ceph: add object version support for sync read Date: Wed, 12 Apr 2023 19:09:06 +0800 Message-Id: <20230412110930.176835-48-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Always return the last object's version. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 12 ++++++++++-- fs/ceph/super.h | 3 ++- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index ab55caeed2c6..df321933153a 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -957,7 +957,8 @@ enum { * only return a short read to the caller if we hit EOF. */ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, - struct iov_iter *to, int *retry_op) + struct iov_iter *to, int *retry_op, + u64 *last_objver) { struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); @@ -967,6 +968,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, u64 len = iov_iter_count(to); u64 i_size = i_size_read(inode); bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); + u64 objver = 0; dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len); @@ -1039,6 +1041,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, req->r_end_latency, len, ret); + if (ret > 0) + objver = req->r_version; + i_size = i_size_read(inode); dout("sync_read %llu~%llu got %zd i_size %llu%s\n", off, len, ret, i_size, (more ? " MORE" : "")); @@ -1100,6 +1105,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, } } + if (last_objver && ret > 0) + *last_objver = objver; + dout("sync_read result %zd retry_op %d\n", ret, *retry_op); return ret; } @@ -1113,7 +1121,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, dout("sync_read on file %p %llx~%zx %s\n", file, iocb->ki_pos, iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); - return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op); + return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op, NULL); } struct ceph_aio_request { diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 09cf4a4351dc..f4659b2a4731 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1287,7 +1287,8 @@ extern int ceph_open(struct inode *inode, struct file *file); extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode); extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, - struct iov_iter *to, int *retry_op); + struct iov_iter *to, int *retry_op, + u64 *last_objver); extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); From patchwork Wed Apr 12 11:09:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60CD6C77B6E for ; Wed, 12 Apr 2023 11:24:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229509AbjDLLYb (ORCPT ); Wed, 12 Apr 2023 07:24:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229521AbjDLLYa (ORCPT ); Wed, 12 Apr 2023 07:24:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A8487A99 for ; Wed, 12 Apr 2023 04:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5On+rVz56FHbW2aerpXazdW0B2HkFT2soLwUNWVtZHE=; b=EWN4cfRkqKTnoKZWhkp7EYy+VamwJ3+ws7IUJT7hmfbId+SVyXpZ68wITqpwoY6i4f7cOX pzp3yO4pex5eWRFni3DfgT+7nQg9dFtA4pEPIv9qoFn6IQw1ensi+t/dcd6VbU1vMo9dfj DXXGrIwOi1ENb80YnM20frTTkSaZ6cc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-191-OlWjnmTcOo-5hFj-fPDH1Q-1; Wed, 12 Apr 2023 07:13:42 -0400 X-MC-Unique: OlWjnmTcOo-5hFj-fPDH1Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F218E886462; Wed, 12 Apr 2023 11:13:41 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id E710EC15BB8; Wed, 12 Apr 2023 11:13:37 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 48/71] ceph: add infrastructure for file encryption and decryption Date: Wed, 12 Apr 2023 19:09:07 +0800 Message-Id: <20230412110930.176835-49-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton ...and allow test_dummy_encryption to bypass content encryption if mounted with test_dummy_encryption=clear. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 177 +++++++++++++++++++++++++++++++++++++++++++++++ fs/ceph/crypto.h | 71 +++++++++++++++++++ fs/ceph/super.c | 6 ++ fs/ceph/super.h | 1 + 4 files changed, 255 insertions(+) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index fe47fbdaead9..35e292045e9d 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "super.h" #include "mds_client.h" @@ -354,3 +355,179 @@ int ceph_fscrypt_prepare_readdir(struct inode *dir) } return 0; } + +int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num) +{ + struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options; + + if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR) + return 0; + + dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num); + return fscrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num); +} + +int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num, gfp_t gfp_flags) +{ + struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options; + + if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR) + return 0; + + dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num); + return fscrypt_encrypt_block_inplace(inode, page, len, offs, lblk_num, gfp_flags); +} + +/** + * ceph_fscrypt_decrypt_pages - decrypt an array of pages + * @inode: pointer to inode associated with these pages + * @page: pointer to page array + * @off: offset into the file that the read data starts + * @len: max length to decrypt + * + * Decrypt an array of fscrypt'ed pages and return the amount of + * data decrypted. Any data in the page prior to the start of the + * first complete block in the read is ignored. Any incomplete + * crypto blocks at the end of the array are ignored (and should + * probably be zeroed by the caller). + * + * Returns the length of the decrypted data or a negative errno. + */ +int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len) +{ + int i, num_blocks; + u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT; + int ret = 0; + + /* + * We can't deal with partial blocks on an encrypted file, so mask off + * the last bit. + */ + num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK); + + /* Decrypt each block */ + for (i = 0; i < num_blocks; ++i) { + int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT; + int pgidx = blkoff >> PAGE_SHIFT; + unsigned int pgoffs = offset_in_page(blkoff); + int fret; + + fret = ceph_fscrypt_decrypt_block_inplace(inode, page[pgidx], + CEPH_FSCRYPT_BLOCK_SIZE, pgoffs, + baseblk + i); + if (fret < 0) { + if (ret == 0) + ret = fret; + break; + } + ret += CEPH_FSCRYPT_BLOCK_SIZE; + } + return ret; +} + +/** + * ceph_fscrypt_decrypt_extents: decrypt received extents in given buffer + * @inode: inode associated with pages being decrypted + * @page: pointer to page array + * @off: offset into the file that the data in page[0] starts + * @map: pointer to extent array + * @ext_cnt: length of extent array + * + * Given an extent map and a page array, decrypt the received data in-place, + * skipping holes. Returns the offset into buffer of end of last decrypted + * block. + */ +int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 off, + struct ceph_sparse_extent *map, u32 ext_cnt) +{ + int i, ret = 0; + struct ceph_inode_info *ci = ceph_inode(inode); + u64 objno, objoff; + u32 xlen; + + /* Nothing to do for empty array */ + if (ext_cnt == 0) { + dout("%s: empty array, ret 0\n", __func__); + return 0; + } + + ceph_calc_file_object_mapping(&ci->i_layout, off, map[0].len, + &objno, &objoff, &xlen); + + for (i = 0; i < ext_cnt; ++i) { + struct ceph_sparse_extent *ext = &map[i]; + int pgsoff = ext->off - objoff; + int pgidx = pgsoff >> PAGE_SHIFT; + int fret; + + if ((ext->off | ext->len) & ~CEPH_FSCRYPT_BLOCK_MASK) { + pr_warn("%s: bad encrypted sparse extent idx %d off %llx len %llx\n", + __func__, i, ext->off, ext->len); + return -EIO; + } + fret = ceph_fscrypt_decrypt_pages(inode, &page[pgidx], + off + pgsoff, ext->len); + dout("%s: [%d] 0x%llx~0x%llx fret %d\n", __func__, i, + ext->off, ext->len, fret); + if (fret < 0) { + if (ret == 0) + ret = fret; + break; + } + ret = pgsoff + fret; + } + dout("%s: ret %d\n", __func__, ret); + return ret; +} + +/** + * ceph_fscrypt_encrypt_pages - encrypt an array of pages + * @inode: pointer to inode associated with these pages + * @page: pointer to page array + * @off: offset into the file that the data starts + * @len: max length to encrypt + * @gfp: gfp flags to use for allocation + * + * Decrypt an array of cleartext pages and return the amount of + * data encrypted. Any data in the page prior to the start of the + * first complete block in the read is ignored. Any incomplete + * crypto blocks at the end of the array are ignored. + * + * Returns the length of the encrypted data or a negative errno. + */ +int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off, + int len, gfp_t gfp) +{ + int i, num_blocks; + u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT; + int ret = 0; + + /* + * We can't deal with partial blocks on an encrypted file, so mask off + * the last bit. + */ + num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK); + + /* Encrypt each block */ + for (i = 0; i < num_blocks; ++i) { + int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT; + int pgidx = blkoff >> PAGE_SHIFT; + unsigned int pgoffs = offset_in_page(blkoff); + int fret; + + fret = ceph_fscrypt_encrypt_block_inplace(inode, page[pgidx], + CEPH_FSCRYPT_BLOCK_SIZE, pgoffs, + baseblk + i, gfp); + if (fret < 0) { + if (ret == 0) + ret = fret; + break; + } + ret += CEPH_FSCRYPT_BLOCK_SIZE; + } + return ret; +} diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 80acb23d0bb4..887f191cc423 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -100,6 +100,40 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, struct fscrypt_str *oname, bool *is_nokey); int ceph_fscrypt_prepare_readdir(struct inode *dir); +static inline unsigned int ceph_fscrypt_blocks(u64 off, u64 len) +{ + /* crypto blocks cannot span more than one page */ + BUILD_BUG_ON(CEPH_FSCRYPT_BLOCK_SHIFT > PAGE_SHIFT); + + return ((off+len+CEPH_FSCRYPT_BLOCK_SIZE-1) >> CEPH_FSCRYPT_BLOCK_SHIFT) - + (off >> CEPH_FSCRYPT_BLOCK_SHIFT); +} + +/* + * If we have an encrypted inode then we must adjust the offset and + * range of the on-the-wire read to cover an entire encryption block. + * The copy will be done using the original offset and length, after + * we've decrypted the result. + */ +static inline void ceph_fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len) +{ + if (IS_ENCRYPTED(inode)) { + *len = ceph_fscrypt_blocks(*off, *len) * CEPH_FSCRYPT_BLOCK_SIZE; + *off &= CEPH_FSCRYPT_BLOCK_MASK; + } +} + +int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num); +int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num, gfp_t gfp_flags); +int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len); +int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 off, + struct ceph_sparse_extent *map, u32 ext_cnt); +int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off, + int len, gfp_t gfp); #else /* CONFIG_FS_ENCRYPTION */ static inline void ceph_fscrypt_set_ops(struct super_block *sb) @@ -157,6 +191,43 @@ static inline int ceph_fscrypt_prepare_readdir(struct inode *dir) { return 0; } + +static inline void ceph_fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len) +{ +} + +static inline int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num) +{ + return 0; +} + +static inline int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode, + struct page *page, unsigned int len, + unsigned int offs, u64 lblk_num, gfp_t gfp_flags) +{ + return 0; +} + +static inline int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, + u64 off, int len) +{ + return 0; +} + +static inline int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, + u64 off, struct ceph_sparse_extent *map, + u32 ext_cnt) +{ + return 0; +} + +static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, + u64 off, int len, gfp_t gfp) +{ + return 0; +} #endif /* CONFIG_FS_ENCRYPTION */ #endif diff --git a/fs/ceph/super.c b/fs/ceph/super.c index b9dd2fa36d8b..4b0a070d5c6d 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -591,6 +591,12 @@ static int ceph_parse_mount_param(struct fs_context *fc, break; case Opt_test_dummy_encryption: #ifdef CONFIG_FS_ENCRYPTION + /* HACK: allow for cleartext "encryption" in files for testing */ + if (param->string && !strcmp(param->string, "clear")) { + fsopt->flags |= CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR; + kfree(param->string); + param->string = NULL; + } fscrypt_free_dummy_policy(&fsopt->dummy_enc_policy); ret = fscrypt_parse_test_dummy_encryption(param, &fsopt->dummy_enc_policy); diff --git a/fs/ceph/super.h b/fs/ceph/super.h index f4659b2a4731..e23bfd9191b3 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -44,6 +44,7 @@ #define CEPH_MOUNT_OPT_ASYNC_DIROPS (1<<15) /* allow async directory ops */ #define CEPH_MOUNT_OPT_NOPAGECACHE (1<<16) /* bypass pagecache altogether */ #define CEPH_MOUNT_OPT_SPARSEREAD (1<<17) /* always do sparse reads */ +#define CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR (1<<18) /* don't actually encrypt content */ #define CEPH_MOUNT_OPT_DEFAULT \ (CEPH_MOUNT_OPT_DCACHE | \ From patchwork Wed Apr 12 11:09:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04639C77B78 for ; Wed, 12 Apr 2023 11:28:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231324AbjDLL21 (ORCPT ); Wed, 12 Apr 2023 07:28:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230377AbjDLL2O (ORCPT ); Wed, 12 Apr 2023 07:28:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04C2D76A6 for ; Wed, 12 Apr 2023 04:27:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298823; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3jv89bIaHop2aMspi6CpWLcJ0SgzgwqihYj1Ol7dF+k=; b=Y/Q2JhS5Owae51E6OdJyO3Xnxz1T1JZCQi2jsGUCvdWF56GXhCzGaAMJZdOy79myCr7+8i jDNqOKGd9H+WZdkNEVtw7WhHFksQJKvqAtmC7DS9ZYtjS5JiV7OQ/gMRepMkCTF5/gx771 s+dioQenagkqUuWxMxyuEhLmbQ1gj00= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-104-NRuUt52YNhKVZ-cZl4VFPw-1; Wed, 12 Apr 2023 07:13:46 -0400 X-MC-Unique: NRuUt52YNhKVZ-cZl4VFPw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6C5B1811E7C; Wed, 12 Apr 2023 11:13:46 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id D94B5C15BB8; Wed, 12 Apr 2023 11:13:42 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 49/71] ceph: add truncate size handling support for fscrypt Date: Wed, 12 Apr 2023 19:09:08 +0800 Message-Id: <20230412110930.176835-50-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li This will transfer the encrypted last block contents to the MDS along with the truncate request only when the new size is smaller and not aligned to the fscrypt BLOCK size. When the last block is located in the file hole, the truncate request will only contain the header. The MDS could fail to do the truncate if there has another client or process has already updated the RADOS object which contains the last block, and will return -EAGAIN, then the kclient needs to retry it. The RMW will take around 50ms, and will let it retry 20 times for now. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.h | 21 +++++ fs/ceph/inode.c | 198 +++++++++++++++++++++++++++++++++++++++++++++-- fs/ceph/super.h | 5 ++ 3 files changed, 217 insertions(+), 7 deletions(-) diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 887f191cc423..db6b399645ba 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -26,6 +26,27 @@ struct ceph_fname { bool no_copy; }; +/* + * Header for the crypted file when truncating the size, this + * will be sent to MDS, and the MDS will update the encrypted + * last block and then truncate the size. + */ +struct ceph_fscrypt_truncate_size_header { + __u8 ver; + __u8 compat; + + /* + * It will be sizeof(assert_ver + file_offset + block_size) + * if the last block is empty when it's located in a file + * hole. Or the data_len will plus CEPH_FSCRYPT_BLOCK_SIZE. + */ + __le32 data_len; + + __le64 change_attr; + __le64 file_offset; + __le32 block_size; +} __packed; + struct ceph_fscrypt_auth { __le32 cfa_version; __le32 cfa_blob_len; diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index db54cc44a82f..50664f7b18e3 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -595,6 +595,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb) ci->i_truncate_seq = 0; ci->i_truncate_size = 0; ci->i_truncate_pending = 0; + ci->i_truncate_pagecache_size = 0; ci->i_max_size = 0; ci->i_reported_size = 0; @@ -766,6 +767,10 @@ int ceph_fill_file_size(struct inode *inode, int issued, dout("truncate_size %lld -> %llu\n", ci->i_truncate_size, truncate_size); ci->i_truncate_size = truncate_size; + if (IS_ENCRYPTED(inode)) + ci->i_truncate_pagecache_size = size; + else + ci->i_truncate_pagecache_size = truncate_size; } return queue_trunc; } @@ -2140,7 +2145,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode) /* there should be no reader or writer */ WARN_ON_ONCE(ci->i_rd_ref || ci->i_wr_ref); - to = ci->i_truncate_size; + to = ci->i_truncate_pagecache_size; wrbuffer_refs = ci->i_wrbuffer_ref; dout("__do_pending_vmtruncate %p (%d) to %lld\n", inode, ci->i_truncate_pending, to); @@ -2150,7 +2155,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode) truncate_pagecache(inode, to); spin_lock(&ci->i_ceph_lock); - if (to == ci->i_truncate_size) { + if (to == ci->i_truncate_pagecache_size) { ci->i_truncate_pending = 0; finish = 1; } @@ -2231,6 +2236,142 @@ static const struct inode_operations ceph_encrypted_symlink_iops = { .listxattr = ceph_listxattr, }; +/* + * Transfer the encrypted last block to the MDS and the MDS + * will help update it when truncating a smaller size. + * + * We don't support a PAGE_SIZE that is smaller than the + * CEPH_FSCRYPT_BLOCK_SIZE. + */ +static int fill_fscrypt_truncate(struct inode *inode, + struct ceph_mds_request *req, + struct iattr *attr) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE; + loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE); + u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT; + struct ceph_pagelist *pagelist = NULL; + struct kvec iov = {0}; + struct iov_iter iter; + struct page *page = NULL; + struct ceph_fscrypt_truncate_size_header header; + int retry_op = 0; + int len = CEPH_FSCRYPT_BLOCK_SIZE; + loff_t i_size = i_size_read(inode); + int got, ret, issued; + u64 objver; + + ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got); + if (ret < 0) + return ret; + + issued = __ceph_caps_issued(ci, NULL); + + dout("%s size %lld -> %lld got cap refs on %s, issued %s\n", __func__, + i_size, attr->ia_size, ceph_cap_string(got), + ceph_cap_string(issued)); + + /* Try to writeback the dirty pagecaches */ + if (issued & (CEPH_CAP_FILE_BUFFER)) { + loff_t lend = orig_pos + CEPH_FSCRYPT_BLOCK_SHIFT - 1; + ret = filemap_write_and_wait_range(inode->i_mapping, + orig_pos, lend); + if (ret < 0) + goto out; + } + + page = __page_cache_alloc(GFP_KERNEL); + if (page == NULL) { + ret = -ENOMEM; + goto out; + } + + pagelist = ceph_pagelist_alloc(GFP_KERNEL); + if (!pagelist) { + ret = -ENOMEM; + goto out; + } + + iov.iov_base = kmap_local_page(page); + iov.iov_len = len; + iov_iter_kvec(&iter, READ, &iov, 1, len); + + pos = orig_pos; + ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &objver); + if (ret < 0) + goto out; + + /* Insert the header first */ + header.ver = 1; + header.compat = 1; + header.change_attr = cpu_to_le64(inode_peek_iversion_raw(inode)); + + /* + * Always set the block_size to CEPH_FSCRYPT_BLOCK_SIZE, + * because in MDS it may need this to do the truncate. + */ + header.block_size = cpu_to_le32(CEPH_FSCRYPT_BLOCK_SIZE); + + /* + * If we hit a hole here, we should just skip filling + * the fscrypt for the request, because once the fscrypt + * is enabled, the file will be split into many blocks + * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there + * has a hole, the hole size should be multiple of block + * size. + * + * If the Rados object doesn't exist, it will be set to 0. + */ + if (!objver) { + dout("%s hit hole, ppos %lld < size %lld\n", __func__, + pos, i_size); + + header.data_len = cpu_to_le32(8 + 8 + 4); + header.file_offset = 0; + ret = 0; + } else { + header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE); + header.file_offset = cpu_to_le64(orig_pos); + + /* truncate and zero out the extra contents for the last block */ + memset(iov.iov_base + boff, 0, PAGE_SIZE - boff); + + /* encrypt the last block */ + ret = ceph_fscrypt_encrypt_block_inplace(inode, page, + CEPH_FSCRYPT_BLOCK_SIZE, + 0, block, + GFP_KERNEL); + if (ret) + goto out; + } + + /* Insert the header */ + ret = ceph_pagelist_append(pagelist, &header, sizeof(header)); + if (ret) + goto out; + + if (header.block_size) { + /* Append the last block contents to pagelist */ + ret = ceph_pagelist_append(pagelist, iov.iov_base, + CEPH_FSCRYPT_BLOCK_SIZE); + if (ret) + goto out; + } + req->r_pagelist = pagelist; +out: + dout("%s %p size dropping cap refs on %s\n", __func__, + inode, ceph_cap_string(got)); + ceph_put_cap_refs(ci, got); + if (iov.iov_base) + kunmap_local(iov.iov_base); + if (page) + __free_pages(page, 0); + if (ret && pagelist) + ceph_pagelist_release(pagelist); + return ret; +} + int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia) { struct ceph_inode_info *ci = ceph_inode(inode); @@ -2238,13 +2379,17 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c struct ceph_mds_request *req; struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc; struct ceph_cap_flush *prealloc_cf; + loff_t isize = i_size_read(inode); int issued; int release = 0, dirtied = 0; int mask = 0; int err = 0; int inode_dirty_flags = 0; bool lock_snap_rwsem = false; + bool fill_fscrypt; + int truncate_retry = 20; /* The RMW will take around 50ms */ +retry: prealloc_cf = ceph_alloc_cap_flush(); if (!prealloc_cf) return -ENOMEM; @@ -2256,6 +2401,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c return PTR_ERR(req); } + fill_fscrypt = false; spin_lock(&ci->i_ceph_lock); issued = __ceph_caps_issued(ci, NULL); @@ -2377,10 +2523,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c } } if (ia_valid & ATTR_SIZE) { - loff_t isize = i_size_read(inode); - dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size); - if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) { + /* + * Only when the new size is smaller and not aligned to + * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed. + */ + if (IS_ENCRYPTED(inode) && attr->ia_size < isize && + (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) { + mask |= CEPH_SETATTR_SIZE; + release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL | + CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR; + set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); + mask |= CEPH_SETATTR_FSCRYPT_FILE; + req->r_args.setattr.size = + cpu_to_le64(round_up(attr->ia_size, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_args.setattr.old_size = + cpu_to_le64(round_up(isize, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_fscrypt_file = attr->ia_size; + fill_fscrypt = true; + } else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) { if (attr->ia_size > isize) { i_size_write(inode, attr->ia_size); inode->i_blocks = calc_inode_blocks(attr->ia_size); @@ -2403,7 +2566,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c cpu_to_le64(round_up(isize, CEPH_FSCRYPT_BLOCK_SIZE)); req->r_fscrypt_file = attr->ia_size; - /* FIXME: client must zero out any partial blocks! */ } else { req->r_args.setattr.size = cpu_to_le64(attr->ia_size); req->r_args.setattr.old_size = cpu_to_le64(isize); @@ -2470,8 +2632,10 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c release &= issued; spin_unlock(&ci->i_ceph_lock); - if (lock_snap_rwsem) + if (lock_snap_rwsem) { up_read(&mdsc->snap_rwsem); + lock_snap_rwsem = false; + } if (inode_dirty_flags) __mark_inode_dirty(inode, inode_dirty_flags); @@ -2483,7 +2647,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c req->r_args.setattr.mask = cpu_to_le32(mask); req->r_num_caps = 1; req->r_stamp = attr->ia_ctime; + if (fill_fscrypt) { + err = fill_fscrypt_truncate(inode, req, attr); + if (err) + goto out; + } + + /* + * The truncate request will return -EAGAIN when the + * last block has been updated just before the MDS + * successfully gets the xlock for the FILE lock. To + * avoid corrupting the file contents we need to retry + * it. + */ err = ceph_mdsc_do_request(mdsc, NULL, req); + if (err == -EAGAIN && truncate_retry--) { + dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", + inode, err, ceph_cap_string(dirtied), mask); + ceph_mdsc_put_request(req); + ceph_free_cap_flush(prealloc_cf); + goto retry; + } } out: dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err, diff --git a/fs/ceph/super.h b/fs/ceph/super.h index e23bfd9191b3..a785e5cb9b40 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -427,6 +427,11 @@ struct ceph_inode_info { u32 i_truncate_seq; /* last truncate to smaller size */ u64 i_truncate_size; /* and the size we last truncated down to */ int i_truncate_pending; /* still need to call vmtruncate */ + /* + * For none fscrypt case it equals to i_truncate_size or it will + * equals to fscrypt_file_size + */ + u64 i_truncate_pagecache_size; u64 i_max_size; /* max file size authorized by mds */ u64 i_reported_size; /* (max_)size reported to or requested of mds */ From patchwork Wed Apr 12 11:09:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208844 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC795C77B6E for ; Wed, 12 Apr 2023 11:16:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229692AbjDLLQK (ORCPT ); Wed, 12 Apr 2023 07:16:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230045AbjDLLQH (ORCPT ); Wed, 12 Apr 2023 07:16:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DED638687 for ; Wed, 12 Apr 2023 04:14:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wGDY0llPdUOZbR/c5nKwopu+Fff+ftLZmhhEzsb7sSM=; b=CXoy0u/YgP/8MkHT4svntqfWQBjUtgB29SPjzn8Zgmqg2oQuUw2c5LN5a3COMsYo00WQIK bCozQpMuDr/YygAevi/sqyJc/1sJeFZjnuS6SQbaIgoRWK1wizm+yO2vRg2G09tUpexQCz 8Jq7FeVKEZNKYej1TF2bKnL5wHT8HSI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-230-DpQ-R25iMRSRfbwsfrfxUw-1; Wed, 12 Apr 2023 07:13:51 -0400 X-MC-Unique: DpQ-R25iMRSRfbwsfrfxUw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3059828237C4; Wed, 12 Apr 2023 11:13:51 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3EF4CC15BB8; Wed, 12 Apr 2023 11:13:46 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 50/71] libceph: allow ceph_osdc_new_request to accept a multi-op read Date: Wed, 12 Apr 2023 19:09:09 +0800 Message-Id: <20230412110930.176835-51-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Currently we have some special-casing for multi-op writes, but in the case of a read, we can't really handle it. All of the current multi-op callers call it with CEPH_OSD_FLAG_WRITE set. Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set, allocate multiple reply ops instead of multiple request ops. If neither flag is set, return -EINVAL. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- net/ceph/osd_client.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 2dcb1140ed21..78b622178a3d 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1136,15 +1136,30 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, if (flags & CEPH_OSD_FLAG_WRITE) req->r_data_offset = off; - if (num_ops > 1) + if (num_ops > 1) { + int num_req_ops, num_rep_ops; + /* - * This is a special case for ceph_writepages_start(), but it - * also covers ceph_uninline_data(). If more multi-op request - * use cases emerge, we will need a separate helper. + * If this is a multi-op write request, assume that we'll need + * request ops. If it's a multi-op read then assume we'll need + * reply ops. Anything else and call it -EINVAL. */ - r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_ops, 0); - else + if (flags & CEPH_OSD_FLAG_WRITE) { + num_req_ops = num_ops; + num_rep_ops = 0; + } else if (flags & CEPH_OSD_FLAG_READ) { + num_req_ops = 0; + num_rep_ops = num_ops; + } else { + r = -EINVAL; + goto fail; + } + + r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_req_ops, + num_rep_ops); + } else { r = ceph_osdc_alloc_messages(req, GFP_NOFS); + } if (r) goto fail; From patchwork Wed Apr 12 11:09:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208852 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB229C7619A for ; Wed, 12 Apr 2023 11:17:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230197AbjDLLRL (ORCPT ); Wed, 12 Apr 2023 07:17:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230113AbjDLLRE (ORCPT ); Wed, 12 Apr 2023 07:17:04 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 874996A69 for ; Wed, 12 Apr 2023 04:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zT1plFBgUWEqMGv4P19HzcyoJmhOCrMTWhticHODhwM=; b=WxX04onf+XedyofGkVqddhx/kdFZSqXYxd9bfXvp36L/IGlm7PqyhTNUz2vMALMhWC6neB +FrwaDJJTvV4XW4TAMQRuKu3mWorkKmiD5xog23S+uX9wONNWyKJ6JocuQ1Hxqcq9f5qG2 IPnbGWfhS7z0zWr1xLZaZRe4YYrYYJk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-547-uJT6XuhPP6GTbewbDXOiLg-1; Wed, 12 Apr 2023 07:13:56 -0400 X-MC-Unique: uJT6XuhPP6GTbewbDXOiLg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 10B8F1C08796; Wed, 12 Apr 2023 11:13:56 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id F27E6C15BBA; Wed, 12 Apr 2023 11:13:51 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 51/71] ceph: disable fallocate for encrypted inodes Date: Wed, 12 Apr 2023 19:09:10 +0800 Message-Id: <20230412110930.176835-52-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton ...hopefully, just for now. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index df321933153a..e3d07f4049df 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2195,6 +2195,9 @@ static long ceph_fallocate(struct file *file, int mode, if (!S_ISREG(inode->i_mode)) return -EOPNOTSUPP; + if (IS_ENCRYPTED(inode)) + return -EOPNOTSUPP; + prealloc_cf = ceph_alloc_cap_flush(); if (!prealloc_cf) return -ENOMEM; From patchwork Wed Apr 12 11:09:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1329C77B71 for ; Wed, 12 Apr 2023 11:16:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230015AbjDLLQS (ORCPT ); Wed, 12 Apr 2023 07:16:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230011AbjDLLQP (ORCPT ); Wed, 12 Apr 2023 07:16:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46A2F1FE0 for ; Wed, 12 Apr 2023 04:15:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k5AMA9hr8EFvDFi4PQNQIP0wk6TaajuQcar9R3CSrNc=; b=MjJBWn4ELK96qFdG6DzPXCq2VL83Vd/97+kZ+ZmYScb0abMdieBnivFITiFQKprzjEUmmk 7D1XhliRr+0bF/icwp9VRZJrtb/dfHlesISs/wOM1qNBXX2jqJb+SHMNyb3TtHNui9TEtd JUbFeKUgSg7/3egkqLu+rL41ZvlAdj8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550-1L6jwj0COcKt6vcxXdP5_A-1; Wed, 12 Apr 2023 07:14:02 -0400 X-MC-Unique: 1L6jwj0COcKt6vcxXdP5_A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 001B91C08795; Wed, 12 Apr 2023 11:14:01 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id CDF05C15BB8; Wed, 12 Apr 2023 11:13:56 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 52/71] ceph: disable copy offload on encrypted inodes Date: Wed, 12 Apr 2023 19:09:11 +0800 Message-Id: <20230412110930.176835-53-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton If we have an encrypted inode, then the client will need to re-encrypt the contents of the new object. Disable copy offload to or from encrypted inodes. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index e3d07f4049df..f621f733898e 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2519,6 +2519,10 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, return -EOPNOTSUPP; } + /* Every encrypted inode gets its own key, so we can't offload them */ + if (IS_ENCRYPTED(src_inode) || IS_ENCRYPTED(dst_inode)) + return -EOPNOTSUPP; + if (len < src_ci->i_layout.object_size) return -EOPNOTSUPP; /* no remote copy will be done */ From patchwork Wed Apr 12 11:09:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71110C77B6E for ; Wed, 12 Apr 2023 11:16:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230037AbjDLLQQ (ORCPT ); Wed, 12 Apr 2023 07:16:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230036AbjDLLQM (ORCPT ); Wed, 12 Apr 2023 07:16:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB76B7283 for ; Wed, 12 Apr 2023 04:15:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MjlcZTFTyvUhGjwy0YvfFc7oGfEiQN7cQhnOyj/kMeo=; b=Z4lW3xsDSj3UxI+43xLJm+X3BQPvUxf75W58Zz2j2XTJ8pjhe6fUe9GmvG5+a08RtI2Qv2 KRPgPWy8VH1J1yYr1k2rwLZGRhMNOGU1jgoUKel/n5g24Nl7ElkblbVzuNTyWrLw3Z1U7T a4I02V0mppEOiF7lbfphwvx2L7QU4HU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-604-sggqnVGZPbuBS40BEDMXPw-1; Wed, 12 Apr 2023 07:14:06 -0400 X-MC-Unique: sggqnVGZPbuBS40BEDMXPw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DFD50100DEC1; Wed, 12 Apr 2023 11:14:05 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id E1E4FC15BBA; Wed, 12 Apr 2023 11:14:01 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 53/71] ceph: don't use special DIO path for encrypted inodes Date: Wed, 12 Apr 2023 19:09:12 +0800 Message-Id: <20230412110930.176835-54-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Eventually I want to merge the synchronous and direct read codepaths, possibly via new netfs infrastructure. For now, the direct path is not crypto-enabled, so use the sync read/write paths instead. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index f621f733898e..70c9b9d7dc33 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1735,7 +1735,9 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to) ceph_cap_string(got)); if (!ceph_has_inline_data(ci)) { - if (!retry_op && (iocb->ki_flags & IOCB_DIRECT)) { + if (!retry_op && + (iocb->ki_flags & IOCB_DIRECT) && + !IS_ENCRYPTED(inode)) { ret = ceph_direct_read_write(iocb, to, NULL, NULL); if (ret >= 0 && ret < len) @@ -1961,7 +1963,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) /* we might need to revert back to that point */ data = *from; - if (iocb->ki_flags & IOCB_DIRECT) + if ((iocb->ki_flags & IOCB_DIRECT) && !IS_ENCRYPTED(inode)) written = ceph_direct_read_write(iocb, &data, snapc, &prealloc_cf); else From patchwork Wed Apr 12 11:09:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208850 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E427C7619A for ; Wed, 12 Apr 2023 11:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230077AbjDLLQz (ORCPT ); Wed, 12 Apr 2023 07:16:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230103AbjDLLQx (ORCPT ); Wed, 12 Apr 2023 07:16:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F51759F3 for ; Wed, 12 Apr 2023 04:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hgce5a3AGaR9CiTtAuonYnc1lVfPuXmUoEacFtylQeM=; b=E/u+J3WJDhYpAUmgap+ZSWP+Ij5LG8jRo9GWrU3ZAnI8QyoNST52vD1VPNGgstcybn6JUS y2sqsPlzx5O0bRHDHAdjfSQIdgM9uZcTJ9rOcHd8enN49Xtem6YB2tMAlOCPFAIB3otnSN 4oUfQJM81V04nf9QmX0iWAXnqQ2R3sY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-tqKZhL3OMWysFqsq-IKw1w-1; Wed, 12 Apr 2023 07:14:11 -0400 X-MC-Unique: tqKZhL3OMWysFqsq-IKw1w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FC4E101A531; Wed, 12 Apr 2023 11:14:11 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id E586DC15BB8; Wed, 12 Apr 2023 11:14:06 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 54/71] ceph: align data in pages in ceph_sync_write Date: Wed, 12 Apr 2023 19:09:13 +0800 Message-Id: <20230412110930.176835-55-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Encrypted files will need to be dealt with in block-sized chunks and once we do that, the way that ceph_sync_write aligns the data in the bounce buffer won't be acceptable. Change it to align the data the same way it would be aligned in the pagecache. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 70c9b9d7dc33..c9d83e87e58a 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1579,6 +1579,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, bool check_caps = false; struct timespec64 mtime = current_time(inode); size_t count = iov_iter_count(from); + size_t off; if (ceph_snap(file_inode(file)) != CEPH_NOSNAP) return -EROFS; @@ -1616,12 +1617,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, break; } - /* - * write from beginning of first page, - * regardless of io alignment - */ - num_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; - + num_pages = calc_pages_for(pos, len); pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL); if (IS_ERR(pages)) { ret = PTR_ERR(pages); @@ -1629,9 +1625,12 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, } left = len; + off = offset_in_page(pos); for (n = 0; n < num_pages; n++) { - size_t plen = min_t(size_t, left, PAGE_SIZE); - ret = copy_page_from_iter(pages[n], 0, plen, from); + size_t plen = min_t(size_t, left, PAGE_SIZE - off); + + ret = copy_page_from_iter(pages[n], off, plen, from); + off = 0; if (ret != plen) { ret = -EFAULT; break; @@ -1646,8 +1645,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, req->r_inode = inode; - osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, - false, true); + osd_req_op_extent_osd_data_pages(req, 0, pages, len, + offset_in_page(pos), + false, true); req->r_mtime = mtime; ceph_osdc_start_request(&fsc->client->osdc, req); From patchwork Wed Apr 12 11:09:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208848 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6B6C77B72 for ; Wed, 12 Apr 2023 11:16:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230093AbjDLLQp (ORCPT ); Wed, 12 Apr 2023 07:16:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230094AbjDLLQk (ORCPT ); Wed, 12 Apr 2023 07:16:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B6801BD for ; Wed, 12 Apr 2023 04:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xM8Wpi5DuwPytTcObbow9r1EWPtKOeVv0XB6kY4ezS8=; b=FK2liYF6rgU4AYJpyimoofhLWBN6x+Dl+pBCcNmiWfG7DJU4g1snE/39ryS2MJXDCbkTQh LHQX3mhXKqo+oOm/EFgxaGiIdHDkZRCAV8sqR0SXN2/Dx+BmKPMaC+SmdHQcB3HYhEW9p4 tLN1DOHHJEOV4ItMLNB9Y0odwcpEIaI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-611-ODAFkaOaN6CtYy5yPpMENw-1; Wed, 12 Apr 2023 07:14:17 -0400 X-MC-Unique: ODAFkaOaN6CtYy5yPpMENw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA55928237C2; Wed, 12 Apr 2023 11:14:16 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 663A1C15BB8; Wed, 12 Apr 2023 11:14:11 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 55/71] ceph: add read/modify/write to ceph_sync_write Date: Wed, 12 Apr 2023 19:09:14 +0800 Message-Id: <20230412110930.176835-56-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton When doing a synchronous write on an encrypted inode, we have no guarantee that the caller is writing crypto block-aligned data. When that happens, we must do a read/modify/write cycle. First, expand the range to cover complete blocks. If we had to change the original pos or length, issue a read to fill the first and/or last pages, and fetch the version of the object from the result. We then copy data into the pages as usual, encrypt the result and issue a write prefixed by an assertion that the version hasn't changed. If it has changed then we restart the whole thing again. If there is no object at that position in the file (-ENOENT), we prefix the write on an exclusive create of the object instead. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 316 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 288 insertions(+), 28 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index c9d83e87e58a..3ba5f74acbaa 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1568,18 +1568,16 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, struct inode *inode = file_inode(file); struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); - struct ceph_vino vino; + struct ceph_osd_client *osdc = &fsc->client->osdc; struct ceph_osd_request *req; struct page **pages; u64 len; int num_pages; int written = 0; - int flags; int ret; bool check_caps = false; struct timespec64 mtime = current_time(inode); size_t count = iov_iter_count(from); - size_t off; if (ceph_snap(file_inode(file)) != CEPH_NOSNAP) return -EROFS; @@ -1599,29 +1597,234 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, if (ret < 0) dout("invalidate_inode_pages2_range returned %d\n", ret); - flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE; - while ((len = iov_iter_count(from)) > 0) { size_t left; int n; + u64 write_pos = pos; + u64 write_len = len; + u64 objnum, objoff; + u32 xlen; + u64 assert_ver = 0; + bool rmw; + bool first, last; + struct iov_iter saved_iter = *from; + size_t off; + + ceph_fscrypt_adjust_off_and_len(inode, &write_pos, &write_len); + + /* clamp the length to the end of first object */ + ceph_calc_file_object_mapping(&ci->i_layout, write_pos, + write_len, &objnum, &objoff, + &xlen); + write_len = xlen; + + /* adjust len downward if it goes beyond current object */ + if (pos + len > write_pos + write_len) + len = write_pos + write_len - pos; - vino = ceph_vino(inode); - req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, - vino, pos, &len, 0, 1, - CEPH_OSD_OP_WRITE, flags, snapc, - ci->i_truncate_seq, - ci->i_truncate_size, - false); - if (IS_ERR(req)) { - ret = PTR_ERR(req); - break; - } + /* + * If we had to adjust the length or position to align with a + * crypto block, then we must do a read/modify/write cycle. We + * use a version assertion to redrive the thing if something + * changes in between. + */ + first = pos != write_pos; + last = (pos + len) != (write_pos + write_len); + rmw = first || last; - num_pages = calc_pages_for(pos, len); + dout("sync_write ino %llx %lld~%llu adjusted %lld~%llu -- %srmw\n", + ci->i_vino.ino, pos, len, write_pos, write_len, rmw ? "" : "no "); + + /* + * The data is emplaced into the page as it would be if it were in + * an array of pagecache pages. + */ + num_pages = calc_pages_for(write_pos, write_len); pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL); if (IS_ERR(pages)) { ret = PTR_ERR(pages); - goto out; + break; + } + + /* Do we need to preload the pages? */ + if (rmw) { + u64 first_pos = write_pos; + u64 last_pos = (write_pos + write_len) - CEPH_FSCRYPT_BLOCK_SIZE; + u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE; + struct ceph_osd_req_op *op; + + /* We should only need to do this for encrypted inodes */ + WARN_ON_ONCE(!IS_ENCRYPTED(inode)); + + /* No need to do two reads if first and last blocks are same */ + if (first && last_pos == first_pos) + last = false; + + /* + * Allocate a read request for one or two extents, depending + * on how the request was aligned. + */ + req = ceph_osdc_new_request(osdc, &ci->i_layout, + ci->i_vino, first ? first_pos : last_pos, + &read_len, 0, (first && last) ? 2 : 1, + CEPH_OSD_OP_SPARSE_READ, CEPH_OSD_FLAG_READ, + NULL, ci->i_truncate_seq, + ci->i_truncate_size, false); + if (IS_ERR(req)) { + ceph_release_page_vector(pages, num_pages); + ret = PTR_ERR(req); + break; + } + + /* Something is misaligned! */ + if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) { + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + ret = -EIO; + break; + } + + /* Add extent for first block? */ + op = &req->r_ops[0]; + + if (first) { + osd_req_op_extent_osd_data_pages(req, 0, pages, + CEPH_FSCRYPT_BLOCK_SIZE, + offset_in_page(first_pos), + false, false); + /* We only expect a single extent here */ + ret = __ceph_alloc_sparse_ext_map(op, 1); + if (ret) { + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + break; + } + } + + /* Add extent for last block */ + if (last) { + /* Init the other extent if first extent has been used */ + if (first) { + op = &req->r_ops[1]; + osd_req_op_extent_init(req, 1, CEPH_OSD_OP_SPARSE_READ, + last_pos, CEPH_FSCRYPT_BLOCK_SIZE, + ci->i_truncate_size, + ci->i_truncate_seq); + } + + ret = __ceph_alloc_sparse_ext_map(op, 1); + if (ret) { + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + break; + } + + osd_req_op_extent_osd_data_pages(req, first ? 1 : 0, + &pages[num_pages - 1], + CEPH_FSCRYPT_BLOCK_SIZE, + offset_in_page(last_pos), + false, false); + } + + ceph_osdc_start_request(osdc, req); + ret = ceph_osdc_wait_request(osdc, req); + + /* FIXME: length field is wrong if there are 2 extents */ + ceph_update_read_metrics(&fsc->mdsc->metric, + req->r_start_latency, + req->r_end_latency, + read_len, ret); + + /* Ok if object is not already present */ + if (ret == -ENOENT) { + /* + * If there is no object, then we can't assert + * on its version. Set it to 0, and we'll use an + * exclusive create instead. + */ + ceph_osdc_put_request(req); + ret = 0; + + /* + * zero out the soon-to-be uncopied parts of the + * first and last pages. + */ + if (first) + zero_user_segment(pages[0], 0, + offset_in_page(first_pos)); + if (last) + zero_user_segment(pages[num_pages - 1], + offset_in_page(last_pos), + PAGE_SIZE); + } else { + if (ret < 0) { + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + break; + } + + op = &req->r_ops[0]; + if (op->extent.sparse_ext_cnt == 0) { + if (first) + zero_user_segment(pages[0], 0, + offset_in_page(first_pos)); + else + zero_user_segment(pages[num_pages - 1], + offset_in_page(last_pos), + PAGE_SIZE); + } else if (op->extent.sparse_ext_cnt != 1 || + ceph_sparse_ext_map_end(op) != + CEPH_FSCRYPT_BLOCK_SIZE) { + ret = -EIO; + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + break; + } + + if (first && last) { + op = &req->r_ops[1]; + if (op->extent.sparse_ext_cnt == 0) { + zero_user_segment(pages[num_pages - 1], + offset_in_page(last_pos), + PAGE_SIZE); + } else if (op->extent.sparse_ext_cnt != 1 || + ceph_sparse_ext_map_end(op) != + CEPH_FSCRYPT_BLOCK_SIZE) { + ret = -EIO; + ceph_osdc_put_request(req); + ceph_release_page_vector(pages, num_pages); + break; + } + } + + /* Grab assert version. It must be non-zero. */ + assert_ver = req->r_version; + WARN_ON_ONCE(ret > 0 && assert_ver == 0); + + ceph_osdc_put_request(req); + if (first) { + ret = ceph_fscrypt_decrypt_block_inplace(inode, + pages[0], + CEPH_FSCRYPT_BLOCK_SIZE, + offset_in_page(first_pos), + first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT); + if (ret < 0) { + ceph_release_page_vector(pages, num_pages); + break; + } + } + if (last) { + ret = ceph_fscrypt_decrypt_block_inplace(inode, + pages[num_pages - 1], + CEPH_FSCRYPT_BLOCK_SIZE, + offset_in_page(last_pos), + last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT); + if (ret < 0) { + ceph_release_page_vector(pages, num_pages); + break; + } + } + } } left = len; @@ -1629,35 +1832,90 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, for (n = 0; n < num_pages; n++) { size_t plen = min_t(size_t, left, PAGE_SIZE - off); + /* copy the data */ ret = copy_page_from_iter(pages[n], off, plen, from); - off = 0; if (ret != plen) { ret = -EFAULT; break; } + off = 0; left -= ret; } - if (ret < 0) { + dout("sync_write write failed with %d\n", ret); ceph_release_page_vector(pages, num_pages); - goto out; + break; } - req->r_inode = inode; + if (IS_ENCRYPTED(inode)) { + ret = ceph_fscrypt_encrypt_pages(inode, pages, + write_pos, write_len, + GFP_KERNEL); + if (ret < 0) { + dout("encryption failed with %d\n", ret); + ceph_release_page_vector(pages, num_pages); + break; + } + } - osd_req_op_extent_osd_data_pages(req, 0, pages, len, - offset_in_page(pos), - false, true); + req = ceph_osdc_new_request(osdc, &ci->i_layout, + ci->i_vino, write_pos, &write_len, + rmw ? 1 : 0, rmw ? 2 : 1, + CEPH_OSD_OP_WRITE, + CEPH_OSD_FLAG_WRITE, + snapc, ci->i_truncate_seq, + ci->i_truncate_size, false); + if (IS_ERR(req)) { + ret = PTR_ERR(req); + ceph_release_page_vector(pages, num_pages); + break; + } + dout("sync_write write op %lld~%llu\n", write_pos, write_len); + osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, write_len, + offset_in_page(write_pos), false, + true); + req->r_inode = inode; req->r_mtime = mtime; - ceph_osdc_start_request(&fsc->client->osdc, req); - ret = ceph_osdc_wait_request(&fsc->client->osdc, req); + + /* Set up the assertion */ + if (rmw) { + /* + * Set up the assertion. If we don't have a version number, + * then the object doesn't exist yet. Use an exclusive create + * instead of a version assertion in that case. + */ + if (assert_ver) { + osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0); + req->r_ops[0].assert_ver.ver = assert_ver; + } else { + osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE, + CEPH_OSD_OP_FLAG_EXCL); + } + } + + ceph_osdc_start_request(osdc, req); + ret = ceph_osdc_wait_request(osdc, req); ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency, req->r_end_latency, len, ret); -out: ceph_osdc_put_request(req); if (ret != 0) { + dout("sync_write osd write returned %d\n", ret); + /* Version changed! Must re-do the rmw cycle */ + if ((assert_ver && (ret == -ERANGE || ret == -EOVERFLOW)) || + (!assert_ver && ret == -EEXIST)) { + /* We should only ever see this on a rmw */ + WARN_ON_ONCE(!rmw); + + /* The version should never go backward */ + WARN_ON_ONCE(ret == -EOVERFLOW); + + *from = saved_iter; + + /* FIXME: limit number of times we loop? */ + continue; + } ceph_set_error_write(ci); break; } @@ -1665,6 +1923,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, ceph_clear_error_write(ci); pos += len; written += len; + dout("sync_write written %d\n", written); if (pos > i_size_read(inode)) { check_caps = ceph_inode_set_size(inode, pos); if (check_caps) @@ -1678,6 +1937,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, ret = written; iocb->ki_pos = pos; } + dout("sync_write returning %d\n", ret); return ret; } From patchwork Wed Apr 12 11:09:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DD01C7619A for ; Wed, 12 Apr 2023 11:16:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230053AbjDLLQc (ORCPT ); Wed, 12 Apr 2023 07:16:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229950AbjDLLQ3 (ORCPT ); Wed, 12 Apr 2023 07:16:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD1187EDD for ; Wed, 12 Apr 2023 04:15:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298064; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zgLvx18T9nrqh10Nfe9L7Y+rTmX9LfYvn7rnnnAmj7Y=; b=dKhdPwP3ofItXW07J2sR+VYJpyIrQyvMFAI5w9hRPHF8ZpOOaGLTbx02e0FXQVN6EGNRXa gZmdaJJrtCK+Dek1aO8fDN6vFS1eY2mcfyILASbz3MiiMe2K4OyZYhQS7y0xzLdrAUW+YF 7mLutd8nesSFRPvd+TKKh0pCcESAobc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594--oqw74aoM7CR0DCs-3Bw6A-1; Wed, 12 Apr 2023 07:14:21 -0400 X-MC-Unique: -oqw74aoM7CR0DCs-3Bw6A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5CE5C185A78F; Wed, 12 Apr 2023 11:14:21 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA165C15BB8; Wed, 12 Apr 2023 11:14:17 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 56/71] ceph: plumb in decryption during sync reads Date: Wed, 12 Apr 2023 19:09:15 +0800 Message-Id: <20230412110930.176835-57-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Switch to using sparse reads when the inode is encrypted. Note that the crypto block may be smaller than a page, but the reverse cannot be true. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 89 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 65 insertions(+), 24 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 3ba5f74acbaa..0adaa89d0604 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -967,7 +967,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, u64 off = *ki_pos; u64 len = iov_iter_count(to); u64 i_size = i_size_read(inode); - bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); + bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD); u64 objver = 0; dout("sync_read on inode %p %llx~%llx\n", inode, *ki_pos, len); @@ -998,10 +998,19 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, int idx; size_t left; struct ceph_osd_req_op *op; + u64 read_off = off; + u64 read_len = len; + + /* determine new offset/length if encrypted */ + ceph_fscrypt_adjust_off_and_len(inode, &read_off, &read_len); + + dout("sync_read orig %llu~%llu reading %llu~%llu", + off, len, read_off, read_len); req = ceph_osdc_new_request(osdc, &ci->i_layout, - ci->i_vino, off, &len, 0, 1, - sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ, + ci->i_vino, read_off, &read_len, 0, 1, + sparse ? CEPH_OSD_OP_SPARSE_READ : + CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ, NULL, ci->i_truncate_seq, ci->i_truncate_size, false); @@ -1010,10 +1019,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } + /* adjust len downward if the request truncated the len */ + if (off + len > read_off + read_len) + len = read_off + read_len - off; more = len < iov_iter_count(to); - num_pages = calc_pages_for(off, len); - page_off = off & ~PAGE_MASK; + num_pages = calc_pages_for(read_off, read_len); + page_off = offset_in_page(off); pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL); if (IS_ERR(pages)) { ceph_osdc_put_request(req); @@ -1021,7 +1033,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } - osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off, + osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, + offset_in_page(read_off), false, false); op = &req->r_ops[0]; @@ -1039,7 +1052,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, ceph_update_read_metrics(&fsc->mdsc->metric, req->r_start_latency, req->r_end_latency, - len, ret); + read_len, ret); if (ret > 0) objver = req->r_version; @@ -1054,8 +1067,34 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, else if (ret == -ENOENT) ret = 0; + if (ret > 0 && IS_ENCRYPTED(inode)) { + int fret; + + fret = ceph_fscrypt_decrypt_extents(inode, pages, read_off, + op->extent.sparse_ext, op->extent.sparse_ext_cnt); + if (fret < 0) { + ret = fret; + ceph_osdc_put_request(req); + break; + } + + /* account for any partial block at the beginning */ + fret -= (off - read_off); + + /* + * Short read after big offset adjustment? + * Nothing is usable, just call it a zero + * len read. + */ + fret = max(fret, 0); + + /* account for partial block at the end */ + ret = min_t(ssize_t, fret, len); + } + ceph_osdc_put_request(req); + /* Short read but not EOF? Zero out the remainder. */ if (ret >= 0 && ret < len && (off + ret < i_size)) { int zlen = min(len - ret, i_size - off - ret); int zoff = page_off + ret; @@ -1069,15 +1108,16 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, idx = 0; left = ret > 0 ? ret : 0; while (left > 0) { - size_t len, copied; - page_off = off & ~PAGE_MASK; - len = min_t(size_t, left, PAGE_SIZE - page_off); + size_t plen, copied; + + plen = min_t(size_t, left, PAGE_SIZE - page_off); SetPageUptodate(pages[idx]); copied = copy_page_to_iter(pages[idx++], - page_off, len, to); + page_off, plen, to); off += copied; left -= copied; - if (copied < len) { + page_off = 0; + if (copied < plen) { ret = -EFAULT; break; } @@ -1094,20 +1134,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } - if (off > *ki_pos) { - if (off >= i_size) { - *retry_op = CHECK_EOF; - ret = i_size - *ki_pos; - *ki_pos = i_size; - } else { - ret = off - *ki_pos; - *ki_pos = off; + if (ret > 0) { + if (off > *ki_pos) { + if (off >= i_size) { + *retry_op = CHECK_EOF; + ret = i_size - *ki_pos; + *ki_pos = i_size; + } else { + ret = off - *ki_pos; + *ki_pos = off; + } } - } - - if (last_objver && ret > 0) - *last_objver = objver; + if (last_objver) + *last_objver = objver; + } dout("sync_read result %zd retry_op %d\n", ret, *retry_op); return ret; } From patchwork Wed Apr 12 11:09:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A498C77B72 for ; Wed, 12 Apr 2023 11:17:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230239AbjDLLRf (ORCPT ); Wed, 12 Apr 2023 07:17:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230156AbjDLLR3 (ORCPT ); Wed, 12 Apr 2023 07:17:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0AEA11D for ; Wed, 12 Apr 2023 04:16:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ump5WzKeRh4K42yLqvuJ0E+8vos66dYE7Ng6Ne24GKc=; b=ZIf1SpLhRBGoqA+FhyIAEabNa01/OISg66W4aZULNT40u2q8n8rC0m1U/ChL5yTmChh+/a 5by/qNoA/r4JqyxOpAxKE/s4ZF4cH38IM5oiqelXU02tp97T/TprUsRK40eKO+s3vE+alJ So4PYrNrbpeAivXdaohsrttReJumVEE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-477-5Eq5-qsWO7y-1HRfrHB7BQ-1; Wed, 12 Apr 2023 07:14:26 -0400 X-MC-Unique: 5Eq5-qsWO7y-1HRfrHB7BQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DCD2F88646B; Wed, 12 Apr 2023 11:14:25 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27165C15BB8; Wed, 12 Apr 2023 11:14:21 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 57/71] ceph: add fscrypt decryption support to ceph_netfs_issue_op Date: Wed, 12 Apr 2023 19:09:16 +0800 Message-Id: <20230412110930.176835-58-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Force the use of sparse reads when the inode is encrypted, and add the appropriate code to decrypt the extent map after receiving. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/addr.c | 64 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 9 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5dca574e7f05..f8d94ff4f19d 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -18,6 +18,7 @@ #include "mds_client.h" #include "cache.h" #include "metric.h" +#include "crypto.h" #include #include @@ -216,7 +217,8 @@ static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) static void finish_netfs_read(struct ceph_osd_request *req) { - struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode); + struct inode *inode = req->r_inode; + struct ceph_fs_client *fsc = ceph_inode_to_client(inode); struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0); struct netfs_io_subrequest *subreq = req->r_priv; struct ceph_osd_req_op *op = &req->r_ops[0]; @@ -230,16 +232,30 @@ static void finish_netfs_read(struct ceph_osd_request *req) subreq->len, i_size_read(req->r_inode)); /* no object means success but no data */ - if (sparse && err >= 0) - err = ceph_sparse_ext_map_end(op); - else if (err == -ENOENT) + if (err == -ENOENT) err = 0; else if (err == -EBLOCKLISTED) fsc->blocklisted = true; - if (err >= 0 && err < subreq->len) - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (err >= 0) { + if (sparse && err > 0) + err = ceph_sparse_ext_map_end(op); + if (err < subreq->len) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (IS_ENCRYPTED(inode) && err > 0) { + err = ceph_fscrypt_decrypt_extents(inode, osd_data->pages, + subreq->start, op->extent.sparse_ext, + op->extent.sparse_ext_cnt); + if (err > subreq->len) + err = subreq->len; + } + } + if (osd_data->type == CEPH_OSD_DATA_TYPE_PAGES) { + ceph_put_page_vector(osd_data->pages, + calc_pages_for(osd_data->alignment, + osd_data->length), false); + } netfs_subreq_terminated(subreq, err, false); iput(req->r_inode); } @@ -310,7 +326,8 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) struct iov_iter iter; int err = 0; u64 len = subreq->len; - bool sparse = ceph_test_mount_opt(fsc, SPARSEREAD); + bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD); + u64 off = subreq->start; if (ceph_inode_is_shutdown(inode)) { err = -EIO; @@ -320,7 +337,9 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq)) return; - req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, subreq->start, &len, + ceph_fscrypt_adjust_off_and_len(inode, &off, &len); + + req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, off, &len, 0, 1, sparse ? CEPH_OSD_OP_SPARSE_READ : CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ | fsc->client->osdc.client->options->read_from_replica, NULL, ci->i_truncate_seq, ci->i_truncate_size, false); @@ -337,8 +356,35 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) } dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len); + iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, subreq->start, len); - osd_req_op_extent_osd_iter(req, 0, &iter); + + /* + * FIXME: For now, use CEPH_OSD_DATA_TYPE_PAGES instead of _ITER for + * encrypted inodes. We'd need infrastructure that handles an iov_iter + * instead of page arrays, and we don't have that as of yet. Once the + * dust settles on the write helpers and encrypt/decrypt routines for + * netfs, we should be able to rework this. + */ + if (IS_ENCRYPTED(inode)) { + struct page **pages; + size_t page_off; + + err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off); + if (err < 0) { + dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err); + goto out; + } + + /* should always give us a page-aligned read */ + WARN_ON_ONCE(page_off); + len = err; + err = 0; + + osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false, false); + } else { + osd_req_op_extent_osd_iter(req, 0, &iter); + } req->r_callback = finish_netfs_read; req->r_priv = subreq; req->r_inode = inode; From patchwork Wed Apr 12 11:09:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCE0FC77B79 for ; Wed, 12 Apr 2023 11:30:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231297AbjDLLaD (ORCPT ); Wed, 12 Apr 2023 07:30:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231584AbjDLL3S (ORCPT ); Wed, 12 Apr 2023 07:29:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEE3F83FC for ; Wed, 12 Apr 2023 04:28:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298889; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eOURFCEMAAqwcmUVhFHCyIlxRbrtEPen/SiwzoY5HT0=; b=iOfNLTMlml3t8CI0Uh0BOg7WfxnduUXqgVR1v40BF4PLvXhfKVPNllmfcAkBR5rzeQ5/Hv +VOM6Gc5/HGL4asyTELi4A1YrUpBlqEolPZZeH/ZkUFDuqVSsMce1vHgHO3bgnIDpZhlq8 NFVfvMQFLSgltUAfBxlu8deYCuO1l3s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-315-pyTeQdfbPF6Sd-fhf_Bv3g-1; Wed, 12 Apr 2023 07:14:31 -0400 X-MC-Unique: pyTeQdfbPF6Sd-fhf_Bv3g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 93A89101A54F; Wed, 12 Apr 2023 11:14:30 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id A7C2FC15BBA; Wed, 12 Apr 2023 11:14:26 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 58/71] ceph: set i_blkbits to crypto block size for encrypted inodes Date: Wed, 12 Apr 2023 19:09:17 +0800 Message-Id: <20230412110930.176835-59-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Some of the underlying infrastructure for fscrypt relies on i_blkbits being aligned to the crypto blocksize. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 50664f7b18e3..bf05110f8532 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -976,13 +976,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, issued |= __ceph_caps_dirty(ci); new_issued = ~issued & info_caps; - /* directories have fl_stripe_unit set to zero */ - if (le32_to_cpu(info->layout.fl_stripe_unit)) - inode->i_blkbits = - fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; - else - inode->i_blkbits = CEPH_BLOCK_SHIFT; - __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files); #ifdef CONFIG_FS_ENCRYPTION @@ -1009,6 +1002,15 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, ceph_decode_timespec64(&ci->i_snap_btime, &iinfo->snap_btime); } + /* directories have fl_stripe_unit set to zero */ + if (IS_ENCRYPTED(inode)) + inode->i_blkbits = CEPH_FSCRYPT_BLOCK_SHIFT; + else if (le32_to_cpu(info->layout.fl_stripe_unit)) + inode->i_blkbits = + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + else + inode->i_blkbits = CEPH_BLOCK_SHIFT; + if ((new_version || (new_issued & CEPH_CAP_LINK_SHARED)) && (issued & CEPH_CAP_LINK_EXCL) == 0) set_nlink(inode, le32_to_cpu(info->nlink)); From patchwork Wed Apr 12 11:09:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AE91C7619A for ; Wed, 12 Apr 2023 11:17:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230264AbjDLLR5 (ORCPT ); Wed, 12 Apr 2023 07:17:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230268AbjDLLRn (ORCPT ); Wed, 12 Apr 2023 07:17:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B101D83DF for ; Wed, 12 Apr 2023 04:16:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0r4D0w6NbEVMRfKMMhOJF5L7Krevq/+fITp8vNTu41Y=; b=FTiJtOSk4DTfDYT0ivsbRc4GzeKS9RnhTKC8qiWQEnRHg0PBr+dUrD4/Mb9Bgiae677JGS c6S/tdjPzM6wVvr3371Vw7Klb310N93f9P6YE5AHR2JX5JsVRgAmHlNok/G6cPj1i4ErCz rstKiGrSw1HM29m2jridgAslMaHUXnY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-339-xtpYGZOzMTe7jf7yPzz8iw-1; Wed, 12 Apr 2023 07:14:35 -0400 X-MC-Unique: xtpYGZOzMTe7jf7yPzz8iw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 90BA1185A78B; Wed, 12 Apr 2023 11:14:35 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5B95DC15BB8; Wed, 12 Apr 2023 11:14:30 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 59/71] ceph: add encryption support to writepage Date: Wed, 12 Apr 2023 19:09:18 +0800 Message-Id: <20230412110930.176835-60-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Allow writepage to issue encrypted writes. Extend out the requested size and offset to cover complete blocks, and then encrypt and write them to the OSDs. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/addr.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index f8d94ff4f19d..b47de8891efc 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -601,10 +601,12 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc) loff_t page_off = page_offset(page); int err; loff_t len = thp_size(page); + loff_t wlen; struct ceph_writeback_ctl ceph_wbc; struct ceph_osd_client *osdc = &fsc->client->osdc; struct ceph_osd_request *req; bool caching = ceph_is_cache_enabled(inode); + struct page *bounce_page = NULL; dout("writepage %p idx %lu\n", page, page->index); @@ -640,31 +642,50 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc) if (ceph_wbc.i_size < page_off + len) len = ceph_wbc.i_size - page_off; + wlen = IS_ENCRYPTED(inode) ? round_up(len, CEPH_FSCRYPT_BLOCK_SIZE) : len; dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n", - inode, page, page->index, page_off, len, snapc, snapc->seq); + inode, page, page->index, page_off, wlen, snapc, snapc->seq); if (atomic_long_inc_return(&fsc->writeback_count) > CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb)) fsc->write_congested = true; - req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1, - CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc, - ceph_wbc.truncate_seq, ceph_wbc.truncate_size, - true); + req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), + page_off, &wlen, 0, 1, CEPH_OSD_OP_WRITE, + CEPH_OSD_FLAG_WRITE, snapc, + ceph_wbc.truncate_seq, + ceph_wbc.truncate_size, true); if (IS_ERR(req)) { redirty_page_for_writepage(wbc, page); return PTR_ERR(req); } + if (wlen < len) + len = wlen; + set_page_writeback(page); if (caching) ceph_set_page_fscache(page); ceph_fscache_write_to_cache(inode, page_off, len, caching); + if (IS_ENCRYPTED(inode)) { + bounce_page = fscrypt_encrypt_pagecache_blocks(page, CEPH_FSCRYPT_BLOCK_SIZE, + 0, GFP_NOFS); + if (IS_ERR(bounce_page)) { + redirty_page_for_writepage(wbc, page); + end_page_writeback(page); + ceph_osdc_put_request(req); + return PTR_ERR(bounce_page); + } + } + /* it may be a short write due to an object boundary */ WARN_ON_ONCE(len > thp_size(page)); - osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false); - dout("writepage %llu~%llu (%llu bytes)\n", page_off, len, len); + osd_req_op_extent_osd_data_pages(req, 0, + bounce_page ? &bounce_page : &page, wlen, 0, + false, false); + dout("writepage %llu~%llu (%llu bytes, %sencrypted)\n", + page_off, len, wlen, IS_ENCRYPTED(inode) ? "" : "not "); req->r_mtime = inode->i_mtime; ceph_osdc_start_request(osdc, req); @@ -672,7 +693,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc) ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency, req->r_end_latency, len, err); - + fscrypt_free_bounce_page(bounce_page); ceph_osdc_put_request(req); if (err == 0) err = len; From patchwork Wed Apr 12 11:09:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89813C77B6E for ; Wed, 12 Apr 2023 11:16:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230101AbjDLLQx (ORCPT ); Wed, 12 Apr 2023 07:16:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230087AbjDLLQv (ORCPT ); Wed, 12 Apr 2023 07:16:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 335221730 for ; Wed, 12 Apr 2023 04:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=74nEdsKTAHnTfjHi6PkIFCpjCPl4iYDvizab4NavSOk=; b=jVeyTEj5VeFT4ko0N+iDUjcVIXiO8iA+SQgjTN3X4l4F7sMNNQXHiMAGEG1LqjsBSNN2EZ onUsHmEbDXrntvWbZDlv07kwLKtCBBYwHwkFcwyxSra5LCEWkJAF7NRlBekEo/w3KYVWo4 PDrIAWpv/XBL57Hmz+3aK1VNkDgmxRw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-266-qa9jbjALMfWbyHULZY4THg-1; Wed, 12 Apr 2023 07:14:40 -0400 X-MC-Unique: qa9jbjALMfWbyHULZY4THg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E4FA801779; Wed, 12 Apr 2023 11:14:40 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5D8E9C15BB8; Wed, 12 Apr 2023 11:14:35 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 60/71] ceph: fscrypt support for writepages Date: Wed, 12 Apr 2023 19:09:19 +0800 Message-Id: <20230412110930.176835-61-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Add the appropriate machinery to write back dirty data with encryption. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/addr.c | 63 +++++++++++++++++++++++++++++++++++++++--------- fs/ceph/crypto.h | 18 +++++++++++++- 2 files changed, 68 insertions(+), 13 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index b47de8891efc..24fac903bff3 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -562,10 +562,12 @@ static u64 get_writepages_data_length(struct inode *inode, struct page *page, u64 start) { struct ceph_inode_info *ci = ceph_inode(inode); - struct ceph_snap_context *snapc = page_snap_context(page); + struct ceph_snap_context *snapc; struct ceph_cap_snap *capsnap = NULL; u64 end = i_size_read(inode); + u64 ret; + snapc = page_snap_context(ceph_fscrypt_pagecache_page(page)); if (snapc != ci->i_head_snapc) { bool found = false; spin_lock(&ci->i_ceph_lock); @@ -580,9 +582,12 @@ static u64 get_writepages_data_length(struct inode *inode, spin_unlock(&ci->i_ceph_lock); WARN_ON(!found); } - if (end > page_offset(page) + thp_size(page)) - end = page_offset(page) + thp_size(page); - return end > start ? end - start : 0; + if (end > ceph_fscrypt_page_offset(page) + thp_size(page)) + end = ceph_fscrypt_page_offset(page) + thp_size(page); + ret = end > start ? end - start : 0; + if (ret && fscrypt_is_bounce_page(page)) + ret = round_up(ret, CEPH_FSCRYPT_BLOCK_SIZE); + return ret; } /* @@ -812,6 +817,11 @@ static void writepages_finish(struct ceph_osd_request *req) total_pages += num_pages; for (j = 0; j < num_pages; j++) { page = osd_data->pages[j]; + if (fscrypt_is_bounce_page(page)) { + page = fscrypt_pagecache_page(page); + fscrypt_free_bounce_page(osd_data->pages[j]); + osd_data->pages[j] = page; + } BUG_ON(!page); WARN_ON(!PageUptodate(page)); @@ -1073,9 +1083,28 @@ static int ceph_writepages_start(struct address_space *mapping, fsc->mount_options->congestion_kb)) fsc->write_congested = true; - pages[locked_pages++] = page; - fbatch.folios[i] = NULL; + if (IS_ENCRYPTED(inode)) { + pages[locked_pages] = + fscrypt_encrypt_pagecache_blocks(page, + PAGE_SIZE, 0, + locked_pages ? GFP_NOWAIT : GFP_NOFS); + if (IS_ERR(pages[locked_pages])) { + if (PTR_ERR(pages[locked_pages]) == -EINVAL) + pr_err("%s: inode->i_blkbits=%hhu\n", + __func__, inode->i_blkbits); + /* better not fail on first page! */ + BUG_ON(locked_pages == 0); + pages[locked_pages] = NULL; + redirty_page_for_writepage(wbc, page); + unlock_page(page); + break; + } + ++locked_pages; + } else { + pages[locked_pages++] = page; + } + fbatch.folios[i] = NULL; len += thp_size(page); } @@ -1103,7 +1132,7 @@ static int ceph_writepages_start(struct address_space *mapping, } new_request: - offset = page_offset(pages[0]); + offset = ceph_fscrypt_page_offset(pages[0]); len = wsize; req = ceph_osdc_new_request(&fsc->client->osdc, @@ -1124,8 +1153,8 @@ static int ceph_writepages_start(struct address_space *mapping, ceph_wbc.truncate_size, true); BUG_ON(IS_ERR(req)); } - BUG_ON(len < page_offset(pages[locked_pages - 1]) + - thp_size(page) - offset); + BUG_ON(len < ceph_fscrypt_page_offset(pages[locked_pages - 1]) + + thp_size(pages[locked_pages - 1]) - offset); req->r_callback = writepages_finish; req->r_inode = inode; @@ -1135,7 +1164,9 @@ static int ceph_writepages_start(struct address_space *mapping, data_pages = pages; op_idx = 0; for (i = 0; i < locked_pages; i++) { - u64 cur_offset = page_offset(pages[i]); + struct page *page = ceph_fscrypt_pagecache_page(pages[i]); + + u64 cur_offset = page_offset(page); /* * Discontinuity in page range? Ceph can handle that by just passing * multiple extents in the write op. @@ -1164,9 +1195,9 @@ static int ceph_writepages_start(struct address_space *mapping, op_idx++; } - set_page_writeback(pages[i]); + set_page_writeback(page); if (caching) - ceph_set_page_fscache(pages[i]); + ceph_set_page_fscache(page); len += thp_size(page); } ceph_fscache_write_to_cache(inode, offset, len, caching); @@ -1182,8 +1213,16 @@ static int ceph_writepages_start(struct address_space *mapping, offset); len = max(len, min_len); } + if (IS_ENCRYPTED(inode)) + len = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE); + dout("writepages got pages at %llu~%llu\n", offset, len); + if (IS_ENCRYPTED(inode) && + ((offset | len) & ~CEPH_FSCRYPT_BLOCK_MASK)) + pr_warn("%s: bad encrypted write offset=%lld len=%llu\n", + __func__, offset, len); + osd_req_op_extent_osd_data_pages(req, op_idx, data_pages, len, 0, from_pool, false); osd_req_op_extent_update(req, op_idx, len); diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index db6b399645ba..0d0343906d29 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -155,6 +155,12 @@ int ceph_fscrypt_decrypt_extents(struct inode *inode, struct page **page, u64 of struct ceph_sparse_extent *map, u32 ext_cnt); int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off, int len, gfp_t gfp); + +static inline struct page *ceph_fscrypt_pagecache_page(struct page *page) +{ + return fscrypt_is_bounce_page(page) ? fscrypt_pagecache_page(page) : page; +} + #else /* CONFIG_FS_ENCRYPTION */ static inline void ceph_fscrypt_set_ops(struct super_block *sb) @@ -249,6 +255,16 @@ static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page ** { return 0; } + +static inline struct page *ceph_fscrypt_pagecache_page(struct page *page) +{ + return page; +} #endif /* CONFIG_FS_ENCRYPTION */ -#endif +static inline loff_t ceph_fscrypt_page_offset(struct page *page) +{ + return page_offset(ceph_fscrypt_pagecache_page(page)); +} + +#endif /* _CEPH_CRYPTO_H */ From patchwork Wed Apr 12 11:09:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B1CEC7619A for ; Wed, 12 Apr 2023 11:23:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229875AbjDLLXF (ORCPT ); Wed, 12 Apr 2023 07:23:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229870AbjDLLXD (ORCPT ); Wed, 12 Apr 2023 07:23:03 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1F0C6A6F for ; Wed, 12 Apr 2023 04:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zzbLqrgA8EipnrKVPc8Z/iS71yh1xrU5wAVY9TB5X5A=; b=II3NtSOC/DmGdC7ZHnCvGaXFnbY5MJNcdjraapjy+fz/sBXlK3bvL2s8zOkAeeg9BuOVeY hhjDYrrs+RNmbLpkL88hfC/ttX8Al89RsbwFa87b/ZrqVjfLacbbnGUMs5JSc/BXS0LU7E mUM315FZwrLbZbytMtHy9fA/4b5aqW0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-434-_E4nBnuLPsCG0jrQqVrSow-1; Wed, 12 Apr 2023 07:14:44 -0400 X-MC-Unique: _E4nBnuLPsCG0jrQqVrSow-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8D67028237C4; Wed, 12 Apr 2023 11:14:44 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id D0D8CC15BB8; Wed, 12 Apr 2023 11:14:40 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 61/71] ceph: invalidate pages when doing direct/sync writes Date: Wed, 12 Apr 2023 19:09:20 +0800 Message-Id: <20230412110930.176835-62-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques When doing a direct/sync write, we need to invalidate the page cache in the range being written to. If we don't do this, the cache will include invalid data as we just did a write that avoided the page cache. In the event that invalidation fails, just ignore the error. That likely just means that we raced with another task doing a buffered write, in which case we want to leave the page intact anyway. [ jlayton: minor comment update ] Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/file.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 0adaa89d0604..317087ea017e 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1632,11 +1632,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, return ret; ceph_fscache_invalidate(inode, false); - ret = invalidate_inode_pages2_range(inode->i_mapping, - pos >> PAGE_SHIFT, - (pos + count - 1) >> PAGE_SHIFT); - if (ret < 0) - dout("invalidate_inode_pages2_range returned %d\n", ret); while ((len = iov_iter_count(from)) > 0) { size_t left; @@ -1962,6 +1957,20 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, } ceph_clear_error_write(ci); + + /* + * We successfully wrote to a range of the file. Declare + * that region of the pagecache invalid. + */ + ret = invalidate_inode_pages2_range( + inode->i_mapping, + pos >> PAGE_SHIFT, + (pos + len - 1) >> PAGE_SHIFT); + if (ret < 0) { + dout("invalidate_inode_pages2_range returned %d\n", + ret); + ret = 0; + } pos += len; written += len; dout("sync_write written %d\n", written); From patchwork Wed Apr 12 11:09:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E754C77B71 for ; Wed, 12 Apr 2023 11:17:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230123AbjDLLQ7 (ORCPT ); Wed, 12 Apr 2023 07:16:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230100AbjDLLQ6 (ORCPT ); Wed, 12 Apr 2023 07:16:58 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E837E7D97 for ; Wed, 12 Apr 2023 04:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298092; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u9QXSIk7CwnJq7ykYRU87CVgpNlN/H86m0d0EXQa9GY=; b=Us6f2Mo6WmIxlNffbhYfti1tJ8UjevPrw0bMrstDWMINYzz8bWMoxNuJXDsBd/SMp6hqbR y95WtBHwy0V9Sa8MQnoKQXOPpx2OuDsU2StUKdmoPO7oC3UghFiLJy0O8DLm3wDji6Tc23 GF8BFskXRtuIVTaL15FD4sR36jOsgdY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550-InUpm0WlMt2jN5GklOt2hg-1; Wed, 12 Apr 2023 07:14:49 -0400 X-MC-Unique: InUpm0WlMt2jN5GklOt2hg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 29D543C0F362; Wed, 12 Apr 2023 11:14:49 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 571B9C15BB8; Wed, 12 Apr 2023 11:14:44 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 62/71] ceph: add support for encrypted snapshot names Date: Wed, 12 Apr 2023 19:09:21 +0800 Message-Id: <20230412110930.176835-63-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques Since filenames in encrypted directories are already encrypted and shown as a base64-encoded string when the directory is locked, snapshot names should show a similar behaviour. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index bf05110f8532..be3936b28f57 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -91,9 +91,15 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry, if (err < 0) goto out_err; - err = ceph_fscrypt_prepare_context(dir, inode, as_ctx); - if (err) - goto out_err; + /* + * We'll skip setting fscrypt context for snapshots, leaving that for + * the handle_reply(). + */ + if (ceph_snap(dir) != CEPH_SNAPDIR) { + err = ceph_fscrypt_prepare_context(dir, inode, as_ctx); + if (err) + goto out_err; + } return inode; out_err: @@ -157,6 +163,7 @@ struct inode *ceph_get_snapdir(struct inode *parent) }; struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL); struct ceph_inode_info *ci = ceph_inode(inode); + int ret = -ENOTDIR; if (IS_ERR(inode)) return inode; @@ -182,6 +189,24 @@ struct inode *ceph_get_snapdir(struct inode *parent) ci->i_rbytes = 0; ci->i_btime = ceph_inode(parent)->i_btime; +#ifdef CONFIG_FS_ENCRYPTION + /* if encrypted, just borrow fscrypt_auth from parent */ + if (IS_ENCRYPTED(parent)) { + struct ceph_inode_info *pci = ceph_inode(parent); + + ci->fscrypt_auth = kmemdup(pci->fscrypt_auth, + pci->fscrypt_auth_len, + GFP_KERNEL); + if (ci->fscrypt_auth) { + inode->i_flags |= S_ENCRYPTED; + ci->fscrypt_auth_len = pci->fscrypt_auth_len; + } else { + dout("Failed to alloc snapdir fscrypt_auth\n"); + ret = -ENOMEM; + goto err; + } + } +#endif if (inode->i_state & I_NEW) { inode->i_op = &ceph_snapdir_iops; inode->i_fop = &ceph_snapdir_fops; @@ -195,7 +220,7 @@ struct inode *ceph_get_snapdir(struct inode *parent) discard_new_inode(inode); else iput(inode); - return ERR_PTR(-ENOTDIR); + return ERR_PTR(ret); } const struct inode_operations ceph_file_iops = { From patchwork Wed Apr 12 11:09:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 056FEC77B6E for ; Wed, 12 Apr 2023 11:17:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbjDLLRT (ORCPT ); Wed, 12 Apr 2023 07:17:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230157AbjDLLRK (ORCPT ); Wed, 12 Apr 2023 07:17:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A835C7ABB for ; Wed, 12 Apr 2023 04:16:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3BAbATd1xJgJPUynNeS30f3WMP5FoE3ulAIRL1cQX/8=; b=Gr4s/2xCS9aLxxUVxG5y8oafpNZ/69aBkEPiI8hS6++TDy4KA1eMadfpAACi2TXTS2gKh9 veVNgkF5Gl86wDZOqm0H8fP3RICZxqedYPPMvTVrO3Q2lk6jfGFSb7SrnX/8zGQTHsB/cg As6yz9MnPqeCimc9gYhpjU2lhTyIH0U= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-81-96UOSXVNM8m0lpuE5sY8Yw-1; Wed, 12 Apr 2023 07:14:54 -0400 X-MC-Unique: 96UOSXVNM8m0lpuE5sY8Yw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A88C93813F3E; Wed, 12 Apr 2023 11:14:53 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id EFF4BC15BB8; Wed, 12 Apr 2023 11:14:49 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 63/71] ceph: add support for handling encrypted snapshot names Date: Wed, 12 Apr 2023 19:09:22 +0800 Message-Id: <20230412110930.176835-64-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques When creating a snapshot, the .snap directories for every subdirectory will show the snapshot name in the "long format": # mkdir .snap/my-snap # ls my-dir/.snap/ _my-snap_1099511627782 Encrypted snapshots will need to be able to handle these snapshot names by encrypting/decrypting only the snapshot part of the string ('my-snap'). Also, since the MDS prevents snapshot names to be bigger than 240 characters it is necessary to adapt CEPH_NOHASH_NAME_MAX to accommodate this extra limitation. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/crypto.c | 192 ++++++++++++++++++++++++++++++++++++++++------- fs/ceph/crypto.h | 4 +- 2 files changed, 166 insertions(+), 30 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index 35e292045e9d..5bdd779217d8 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -189,16 +189,100 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se swap(req->r_fscrypt_auth, as->fscrypt_auth); } -int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf) +/* + * User-created snapshots can't start with '_'. Snapshots that start with this + * character are special (hint: there aren't real snapshots) and use the + * following format: + * + * __ + * + * where: + * - - the real snapshot name that may need to be decrypted, + * - - the inode number (in decimal) for the actual snapshot + * + * This function parses these snapshot names and returns the inode + * . 'name_len' will also bet set with the + * length. + */ +static struct inode *parse_longname(const struct inode *parent, const char *name, + int *name_len) +{ + struct inode *dir = NULL; + struct ceph_vino vino = { .snap = CEPH_NOSNAP }; + char *inode_number; + char *name_end; + int orig_len = *name_len; + int ret = -EIO; + + /* Skip initial '_' */ + name++; + name_end = strrchr(name, '_'); + if (!name_end) { + dout("Failed to parse long snapshot name: %s\n", name); + return ERR_PTR(-EIO); + } + *name_len = (name_end - name); + if (*name_len <= 0) { + pr_err("Failed to parse long snapshot name\n"); + return ERR_PTR(-EIO); + } + + /* Get the inode number */ + inode_number = kmemdup_nul(name_end + 1, + orig_len - *name_len - 2, + GFP_KERNEL); + if (!inode_number) + return ERR_PTR(-ENOMEM); + ret = kstrtou64(inode_number, 10, &vino.ino); + if (ret) { + dout("Failed to parse inode number: %s\n", name); + dir = ERR_PTR(ret); + goto out; + } + + /* And finally the inode */ + dir = ceph_find_inode(parent->i_sb, vino); + if (!dir) { + /* This can happen if we're not mounting cephfs on the root */ + dir = ceph_get_inode(parent->i_sb, vino, NULL); + if (!dir) + dir = ERR_PTR(-ENOENT); + } + if (IS_ERR(dir)) + dout("Can't find inode %s (%s)\n", inode_number, name); + +out: + kfree(inode_number); + return dir; +} + +int ceph_encode_encrypted_dname(struct inode *parent, struct qstr *d_name, char *buf) { + struct inode *dir = parent; + struct qstr iname; u32 len; + int name_len; int elen; int ret; - u8 *cryptbuf; + u8 *cryptbuf = NULL; + + iname.name = d_name->name; + name_len = d_name->len; + + /* Handle the special case of snapshot names that start with '_' */ + if ((ceph_snap(dir) == CEPH_SNAPDIR) && (name_len > 0) && + (iname.name[0] == '_')) { + dir = parse_longname(parent, iname.name, &name_len); + if (IS_ERR(dir)) + return PTR_ERR(dir); + iname.name++; /* skip initial '_' */ + } + iname.len = name_len; - if (!fscrypt_has_encryption_key(parent)) { + if (!fscrypt_has_encryption_key(dir)) { memcpy(buf, d_name->name, d_name->len); - return d_name->len; + elen = d_name->len; + goto out; } /* @@ -207,18 +291,22 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, * * See: fscrypt_setup_filename */ - if (!fscrypt_fname_encrypted_size(parent, d_name->len, NAME_MAX, &len)) - return -ENAMETOOLONG; + if (!fscrypt_fname_encrypted_size(dir, iname.len, NAME_MAX, &len)) { + elen = -ENAMETOOLONG; + goto out; + } /* Allocate a buffer appropriate to hold the result */ cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL); - if (!cryptbuf) - return -ENOMEM; + if (!cryptbuf) { + elen = -ENOMEM; + goto out; + } - ret = fscrypt_fname_encrypt(parent, d_name, cryptbuf, len); + ret = fscrypt_fname_encrypt(dir, &iname, cryptbuf, len); if (ret) { - kfree(cryptbuf); - return ret; + elen = ret; + goto out; } /* hash the end if the name is long enough */ @@ -234,12 +322,30 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, /* base64 encode the encrypted name */ elen = ceph_base64_encode(cryptbuf, len, buf); - kfree(cryptbuf); dout("base64-encoded ciphertext name = %.*s\n", elen, buf); + + /* To understand the 240 limit, see CEPH_NOHASH_NAME_MAX comments */ + WARN_ON(elen > 240); + if ((elen > 0) && (dir != parent)) { + char tmp_buf[NAME_MAX]; + + elen = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%ld", + elen, buf, dir->i_ino); + memcpy(buf, tmp_buf, elen); + } + +out: + kfree(cryptbuf); + if (dir != parent) { + if ((dir->i_state & I_NEW)) + discard_new_inode(dir); + else + iput(dir); + } return elen; } -int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf) +int ceph_encode_encrypted_fname(struct inode *parent, struct dentry *dentry, char *buf) { WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)); @@ -264,29 +370,42 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, struct fscrypt_str *oname, bool *is_nokey) { - int ret; + struct inode *dir = fname->dir; struct fscrypt_str _tname = FSTR_INIT(NULL, 0); struct fscrypt_str iname; - - if (!IS_ENCRYPTED(fname->dir)) { - oname->name = fname->name; - oname->len = fname->name_len; - return 0; - } + char *name = fname->name; + int name_len = fname->name_len; + int ret; /* Sanity check that the resulting name will fit in the buffer */ if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX) return -EIO; - ret = ceph_fscrypt_prepare_readdir(fname->dir); - if (ret < 0) - return ret; + /* Handle the special case of snapshot names that start with '_' */ + if ((ceph_snap(dir) == CEPH_SNAPDIR) && (name_len > 0) && + (name[0] == '_')) { + dir = parse_longname(dir, name, &name_len); + if (IS_ERR(dir)) + return PTR_ERR(dir); + name++; /* skip initial '_' */ + } + + if (!IS_ENCRYPTED(dir)) { + oname->name = fname->name; + oname->len = fname->name_len; + ret = 0; + goto out_inode; + } + + ret = ceph_fscrypt_prepare_readdir(dir); + if (ret) + goto out_inode; /* * Use the raw dentry name as sent by the MDS instead of * generating a nokey name via fscrypt. */ - if (!fscrypt_has_encryption_key(fname->dir)) { + if (!fscrypt_has_encryption_key(dir)) { if (fname->no_copy) oname->name = fname->name; else @@ -294,7 +413,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, oname->len = fname->name_len; if (is_nokey) *is_nokey = true; - return 0; + ret = 0; + goto out_inode; } if (fname->ctext_len == 0) { @@ -303,11 +423,11 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, if (!tname) { ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname); if (ret) - return ret; + goto out_inode; tname = &_tname; } - declen = ceph_base64_decode(fname->name, fname->name_len, tname->name); + declen = ceph_base64_decode(name, name_len, tname->name); if (declen <= 0) { ret = -EIO; goto out; @@ -319,9 +439,25 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, iname.len = fname->ctext_len; } - ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname); + ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname); + if (!ret && (dir != fname->dir)) { + char tmp_buf[CEPH_BASE64_CHARS(NAME_MAX)]; + + name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%ld", + oname->len, oname->name, dir->i_ino); + memcpy(oname->name, tmp_buf, name_len); + oname->len = name_len; + } + out: fscrypt_fname_free_buffer(&_tname); +out_inode: + if ((dir != fname->dir) && !IS_ERR(dir)) { + if ((dir->i_state & I_NEW)) + discard_new_inode(dir); + else + iput(dir); + } return ret; } diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 0d0343906d29..d0e88f6d254d 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -101,8 +101,8 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc); int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, struct ceph_acl_sec_ctx *as); void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); -int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf); -int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf); +int ceph_encode_encrypted_dname(struct inode *parent, struct qstr *d_name, char *buf); +int ceph_encode_encrypted_fname(struct inode *parent, struct dentry *dentry, char *buf); static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname) { From patchwork Wed Apr 12 11:09:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB207C77B6E for ; Wed, 12 Apr 2023 11:26:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230158AbjDLL0Z (ORCPT ); Wed, 12 Apr 2023 07:26:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230206AbjDLL0X (ORCPT ); Wed, 12 Apr 2023 07:26:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9F2A10CF for ; Wed, 12 Apr 2023 04:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UAdvSFv8/gEyGlx7OzHcZCSbz8FqPjAyMJDJXSeTpbE=; b=GrD9gv8V7G62Po/V/Bvq4tZqDWq5yhBAu+jNhQ+9gzHS0hkT1aIVgfZmT942hF9z6ydzRG rbKFFUqLvn5PjSyddsMKH8wLLh3Br1GGzEkkUnLnNjXj55WAnRSVtgUu9Thd6cQA5oDFY4 jwRQVEgplrq0Y6sdWO5NeY2bHQyeSgE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-403-FvgGp7G4ORChlALpuTsOPw-1; Wed, 12 Apr 2023 07:14:58 -0400 X-MC-Unique: FvgGp7G4ORChlALpuTsOPw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 67D5F101A531; Wed, 12 Apr 2023 11:14:58 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id A82F3C15BB8; Wed, 12 Apr 2023 11:14:54 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 64/71] ceph: update documentation regarding snapshot naming limitations Date: Wed, 12 Apr 2023 19:09:23 +0800 Message-Id: <20230412110930.176835-65-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- Documentation/filesystems/ceph.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/Documentation/filesystems/ceph.rst b/Documentation/filesystems/ceph.rst index 76ce938e7024..085f309ece60 100644 --- a/Documentation/filesystems/ceph.rst +++ b/Documentation/filesystems/ceph.rst @@ -57,6 +57,16 @@ a snapshot on any subdirectory (and its nested contents) in the system. Snapshot creation and deletion are as simple as 'mkdir .snap/foo' and 'rmdir .snap/foo'. +Snapshot names have two limitations: + +* They can not start with an underscore ('_'), as these names are reserved + for internal usage by the MDS. +* They can not exceed 240 characters in size. This is because the MDS makes + use of long snapshot names internally, which follow the format: + `__`. Since filenames in general can't have + more than 255 characters, and `` takes 13 characters, the long + snapshot names can take as much as 255 - 1 - 1 - 13 = 240. + Ceph also provides some recursive accounting on directories for nested files and bytes. That is, a 'getfattr -d foo' on any directory in the system will reveal the total number of nested regular files and From patchwork Wed Apr 12 11:09:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BE83C77B6E for ; Wed, 12 Apr 2023 11:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230113AbjDLLVT (ORCPT ); Wed, 12 Apr 2023 07:21:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbjDLLVS (ORCPT ); Wed, 12 Apr 2023 07:21:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF0167AB8 for ; Wed, 12 Apr 2023 04:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298298; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bg9tgxRmpOgUeMZi/dkjA6rDXqD71GLVfoeAufhHYYM=; b=ep8KxNWLMu8j9uMeqTk2qXwascOBng+yRbcz85cBvTS5V0J70d8cMU7nuJk6kPG0bVfGm1 hTHM4R/50p+HHbfMMw+nxjhytytBBVNp7G2NoBebBOFcqptG1em24Hme2a42pDiw+8hc1E DshVYTpXXVBmc3K3gYWnpT0Owv/cIi4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-377-Pq0_RiRyM_Oew7jO5LcP-g-1; Wed, 12 Apr 2023 07:15:04 -0400 X-MC-Unique: Pq0_RiRyM_Oew7jO5LcP-g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C69D1886474; Wed, 12 Apr 2023 11:15:03 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4CFB6C15BC0; Wed, 12 Apr 2023 11:14:58 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 65/71] ceph: prevent snapshots to be created in encrypted locked directories Date: Wed, 12 Apr 2023 19:09:24 +0800 Message-Id: <20230412110930.176835-66-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques With snapshot names encryption we can not allow snapshots to be created in locked directories because the names wouldn't be encrypted. This patch forces the directory to be unlocked to allow a snapshot to be created. Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Xiubo Li Signed-off-by: Luís Henriques Signed-off-by: Jeff Layton --- fs/ceph/dir.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 98a9b1592ba6..fe48a5d26c1d 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1084,6 +1084,11 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir, err = -EDQUOT; goto out; } + if ((op == CEPH_MDS_OP_MKSNAP) && IS_ENCRYPTED(dir) && + !fscrypt_has_encryption_key(dir)) { + err = -ENOKEY; + goto out; + } req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); From patchwork Wed Apr 12 11:09:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208854 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC83C77B6E for ; Wed, 12 Apr 2023 11:17:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230202AbjDLLRZ (ORCPT ); Wed, 12 Apr 2023 07:17:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230211AbjDLLRO (ORCPT ); Wed, 12 Apr 2023 07:17:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD32172A8 for ; Wed, 12 Apr 2023 04:16:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298113; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zpyrtxc50S84feY2qqgq6AjYWH+7xLd91XiywEtGGcA=; b=E/jqPhC5hzR7sO/GTpm9i9k+Gleu1cdLeXpxVz8JQOS5TUp7z86tlJg7g9hTcrvwT4QnGk 3fKPRF2GGLwz2+dcYxIH16GR9Fi0rTT62++IZ6DEymBzkAHOMV6m5IFkJHyoAmnElXuVaC YsiCL43Xu7FRC1+UnTU0REJbIgwMlsk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-346-WhqRQa2BNiWq-PYaRqEoCQ-1; Wed, 12 Apr 2023 07:15:08 -0400 X-MC-Unique: WhqRQa2BNiWq-PYaRqEoCQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 379E888646E; Wed, 12 Apr 2023 11:15:08 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97E60C15E69; Wed, 12 Apr 2023 11:15:04 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 66/71] ceph: report STATX_ATTR_ENCRYPTED on encrypted inodes Date: Wed, 12 Apr 2023 19:09:25 +0800 Message-Id: <20230412110930.176835-67-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index be3936b28f57..1cfcbc39f7c6 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -3012,8 +3012,12 @@ int ceph_getattr(struct mnt_idmap *idmap, const struct path *path, stat->nlink = 1 + 1 + ci->i_subdirs; } - stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC; stat->attributes |= STATX_ATTR_CHANGE_MONOTONIC; + if (IS_ENCRYPTED(inode)) + stat->attributes |= STATX_ATTR_ENCRYPTED; + stat->attributes_mask |= (STATX_ATTR_CHANGE_MONOTONIC | + STATX_ATTR_ENCRYPTED); + stat->result_mask = request_mask & valid_mask; return err; } From patchwork Wed Apr 12 11:09:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53AA0C7619A for ; Wed, 12 Apr 2023 11:17:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230237AbjDLLRa (ORCPT ); Wed, 12 Apr 2023 07:17:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230192AbjDLLRU (ORCPT ); Wed, 12 Apr 2023 07:17:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 536807AAE for ; Wed, 12 Apr 2023 04:16:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298117; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/iFv56XVYGliPNySSlKVvgKcaLwvtGXYrCQ9H02YzmQ=; b=SlD5QzYSHyVcuIPp3PE+GOHAMJB8FG9XzYtzvcqYMNc6PuawdH9MyIrIN3bmYdphN0hIQW 8Gpouh9oHkxJxpPZRvzz7Cx7baeNozHrHpDj+XPkmdF85xV5Nn55UTGAE8wli6Q5JbER0a IZ6NJSXQJZGMfPVkWGtnbd2Hwg4DzzM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-483-GeIP29-wP2qKtKYUDSWecg-1; Wed, 12 Apr 2023 07:15:14 -0400 X-MC-Unique: GeIP29-wP2qKtKYUDSWecg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C0A533813F3D; Wed, 12 Apr 2023 11:15:13 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06679C15BB8; Wed, 12 Apr 2023 11:15:08 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 67/71] libceph: defer removing the req from osdc just after req->r_callback Date: Wed, 12 Apr 2023 19:09:26 +0800 Message-Id: <20230412110930.176835-68-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li The sync_filesystem() will flush all the dirty buffer and submit the osd reqs to the osdc and then is blocked to wait for all the reqs to finish. But the when the reqs' replies come, the reqs will be removed from osdc just before the req->r_callback()s are called. Which means the sync_filesystem() will be woke up by leaving the req->r_callback()s are still running. This will be buggy when the waiter require the req->r_callback()s to release some resources before continuing. So we need to make sure the req->r_callback()s are called before removing the reqs from the osdc. WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b URL: https://tracker.ceph.com/issues/58126 Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li --- net/ceph/osd_client.c | 43 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 35 insertions(+), 8 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 78b622178a3d..db3d93d3e692 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2507,7 +2507,7 @@ static void submit_request(struct ceph_osd_request *req, bool wrlocked) __submit_request(req, wrlocked); } -static void finish_request(struct ceph_osd_request *req) +static void __finish_request(struct ceph_osd_request *req) { struct ceph_osd_client *osdc = req->r_osdc; @@ -2516,12 +2516,6 @@ static void finish_request(struct ceph_osd_request *req) req->r_end_latency = ktime_get(); - if (req->r_osd) { - ceph_init_sparse_read(&req->r_osd->o_sparse_read); - unlink_request(req->r_osd, req); - } - atomic_dec(&osdc->num_requests); - /* * If an OSD has failed or returned and a request has been sent * twice, it's possible to get a reply and end up here while the @@ -2532,13 +2526,46 @@ static void finish_request(struct ceph_osd_request *req) ceph_msg_revoke_incoming(req->r_reply); } +static void __remove_request(struct ceph_osd_request *req) +{ + struct ceph_osd_client *osdc = req->r_osdc; + + dout("%s req %p tid %llu\n", __func__, req, req->r_tid); + + if (req->r_osd) { + ceph_init_sparse_read(&req->r_osd->o_sparse_read); + unlink_request(req->r_osd, req); + } + atomic_dec(&osdc->num_requests); +} + +static void finish_request(struct ceph_osd_request *req) +{ + __finish_request(req); + __remove_request(req); +} + static void __complete_request(struct ceph_osd_request *req) { + struct ceph_osd_client *osdc = req->r_osdc; + struct ceph_osd *osd = req->r_osd; + dout("%s req %p tid %llu cb %ps result %d\n", __func__, req, req->r_tid, req->r_callback, req->r_result); if (req->r_callback) req->r_callback(req); + + down_read(&osdc->lock); + if (osd) { + mutex_lock(&osd->lock); + __remove_request(req); + mutex_unlock(&osd->lock); + } else { + atomic_dec(&osdc->num_requests); + } + up_read(&osdc->lock); + complete_all(&req->r_completion); ceph_osdc_put_request(req); } @@ -3873,7 +3900,7 @@ static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg) WARN_ON(!(m.flags & CEPH_OSD_FLAG_ONDISK)); req->r_version = m.user_version; req->r_result = m.result ?: data_len; - finish_request(req); + __finish_request(req); mutex_unlock(&osd->lock); up_read(&osdc->lock); From patchwork Wed Apr 12 11:09:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FCE5C77B6E for ; Wed, 12 Apr 2023 11:18:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230274AbjDLLSR (ORCPT ); Wed, 12 Apr 2023 07:18:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230233AbjDLLSF (ORCPT ); Wed, 12 Apr 2023 07:18:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F6217ED1 for ; Wed, 12 Apr 2023 04:17:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298123; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O06dbARyGUfJojzPMJcc4s1vKLxbsz616Jr/alzb4Qk=; b=PmnPNS9odz0HLpx9AbgdY1CW6pOnWdnMEk8iLATIp46vTH2o76CKGRCuqNvS4nThZPSRh4 IaTF0Xv8T2g1eWcHCJNHDIBndgOzeChFp1NvYBSqXqX0BYz7/aGJDa43KSiYS1OFa97bq5 KeMPOveZelW3C9AIHl0udmlRHRKQ26E= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-383-nFLd-YEUOW6sr_2JSSJJOQ-1; Wed, 12 Apr 2023 07:15:20 -0400 X-MC-Unique: nFLd-YEUOW6sr_2JSSJJOQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99B0228237C9; Wed, 12 Apr 2023 11:15:19 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 23E41C15BBA; Wed, 12 Apr 2023 11:15:14 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 68/71] ceph: drop the messages from MDS when unmounting Date: Wed, 12 Apr 2023 19:09:27 +0800 Message-Id: <20230412110930.176835-69-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li When unmounting and all the dirty buffer will be flushed and after the last osd request is finished the last reference of the i_count will be released. Then it will flush the dirty cap/snap to MDSs, and the unmounting won't wait the possible acks, which will ihode the inodes when updating the metadata locally but makes no sense any more, of this. This will make the evict_inodes() to skip these inodes. If encrypt is enabled the kernel generate a warning when removing the encrypt keys when the skipped inodes still hold the keyring: WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b Later the kernel will crash when iput() the inodes and dereferencing the "sb->s_master_keys", which has been released by the generic_shutdown_super(). URL: https://tracker.ceph.com/issues/58126 URL: https://tracker.ceph.com/issues/59162 Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/caps.c | 6 ++++- fs/ceph/mds_client.c | 14 ++++++++--- fs/ceph/mds_client.h | 11 ++++++++- fs/ceph/quota.c | 14 +++++------ fs/ceph/snap.c | 10 ++++---- fs/ceph/super.c | 57 ++++++++++++++++++++++++++++++++++++++++++++ fs/ceph/super.h | 3 +++ 7 files changed, 99 insertions(+), 16 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 6379c0070492..e1bb6d9c16f8 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4222,6 +4222,9 @@ void ceph_handle_caps(struct ceph_mds_session *session, dout("handle_caps from mds%d\n", session->s_mds); + if (!ceph_inc_stopping_blocker(mdsc, session)) + return; + /* decode */ end = msg->front.iov_base + msg->front.iov_len; if (msg->front.iov_len < sizeof(*h)) @@ -4323,7 +4326,6 @@ void ceph_handle_caps(struct ceph_mds_session *session, vino.snap, inode); mutex_lock(&session->s_mutex); - inc_session_sequence(session); dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq, (unsigned)seq); @@ -4435,6 +4437,8 @@ void ceph_handle_caps(struct ceph_mds_session *session, done_unlocked: iput(inode); out: + ceph_dec_stopping_blocker(mdsc); + ceph_put_string(extra_info.pool_ns); /* Defer closing the sessions after s_mutex lock being released */ diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 85d639f75ea1..294af79c25c9 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4877,6 +4877,9 @@ static void handle_lease(struct ceph_mds_client *mdsc, dout("handle_lease from mds%d\n", mds); + if (!ceph_inc_stopping_blocker(mdsc, session)) + return; + /* decode */ if (msg->front.iov_len < sizeof(*h) + sizeof(u32)) goto bad; @@ -4895,8 +4898,6 @@ static void handle_lease(struct ceph_mds_client *mdsc, dname.len, dname.name); mutex_lock(&session->s_mutex); - inc_session_sequence(session); - if (!inode) { dout("handle_lease no inode %llx\n", vino.ino); goto release; @@ -4958,9 +4959,13 @@ static void handle_lease(struct ceph_mds_client *mdsc, out: mutex_unlock(&session->s_mutex); iput(inode); + + ceph_dec_stopping_blocker(mdsc); return; bad: + ceph_dec_stopping_blocker(mdsc); + pr_err("corrupt lease message\n"); ceph_msg_dump(msg); } @@ -5156,6 +5161,9 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc) } init_completion(&mdsc->safe_umount_waiters); + spin_lock_init(&mdsc->stopping_lock); + atomic_set(&mdsc->stopping_blockers, 0); + init_completion(&mdsc->stopping_waiter); init_waitqueue_head(&mdsc->session_close_wq); INIT_LIST_HEAD(&mdsc->waiting_for_map); mdsc->quotarealms_inodes = RB_ROOT; @@ -5270,7 +5278,7 @@ void send_flush_mdlog(struct ceph_mds_session *s) void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) { dout("pre_umount\n"); - mdsc->stopping = 1; + mdsc->stopping = CEPH_MDSC_STOPPING_BEGIN; ceph_mdsc_iterate_sessions(mdsc, send_flush_mdlog, true); ceph_mdsc_iterate_sessions(mdsc, lock_unlock_session, false); diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 81a1f9a4ac3b..0f70ca3cdb21 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -398,6 +398,11 @@ struct cap_wait { int want; }; +enum { + CEPH_MDSC_STOPPING_BEGIN = 1, + CEPH_MDSC_STOPPING_FLUSHED = 2, +}; + /* * mds client state */ @@ -414,7 +419,11 @@ struct ceph_mds_client { struct ceph_mds_session **sessions; /* NULL for mds if no session */ atomic_t num_sessions; int max_sessions; /* len of sessions array */ - int stopping; /* true if shutting down */ + + spinlock_t stopping_lock; /* protect snap_empty */ + int stopping; /* the stage of shutting down */ + atomic_t stopping_blockers; + struct completion stopping_waiter; atomic64_t quotarealms_count; /* # realms with quota */ /* diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c index 64592adfe48f..bb3aac8fb833 100644 --- a/fs/ceph/quota.c +++ b/fs/ceph/quota.c @@ -47,25 +47,23 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc, struct inode *inode; struct ceph_inode_info *ci; + if (!ceph_inc_stopping_blocker(mdsc, session)) + return; + if (msg->front.iov_len < sizeof(*h)) { pr_err("%s corrupt message mds%d len %d\n", __func__, session->s_mds, (int)msg->front.iov_len); ceph_msg_dump(msg); - return; + goto out; } - /* increment msg sequence number */ - mutex_lock(&session->s_mutex); - inc_session_sequence(session); - mutex_unlock(&session->s_mutex); - /* lookup inode */ vino.ino = le64_to_cpu(h->ino); vino.snap = CEPH_NOSNAP; inode = ceph_find_inode(sb, vino); if (!inode) { pr_warn("Failed to find inode %llu\n", vino.ino); - return; + goto out; } ci = ceph_inode(inode); @@ -78,6 +76,8 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc, spin_unlock(&ci->i_ceph_lock); iput(inode); +out: + ceph_dec_stopping_blocker(mdsc); } static struct ceph_quotarealm_inode * diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index aa8e0657fc03..23b31600ee3c 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -1011,6 +1011,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, int locked_rwsem = 0; bool close_sessions = false; + if (!ceph_inc_stopping_blocker(mdsc, session)) + return; + /* decode */ if (msg->front.iov_len < sizeof(*h)) goto bad; @@ -1026,10 +1029,6 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, dout("%s from mds%d op %s split %llx tracelen %d\n", __func__, mds, ceph_snap_op_name(op), split, trace_len); - mutex_lock(&session->s_mutex); - inc_session_sequence(session); - mutex_unlock(&session->s_mutex); - down_write(&mdsc->snap_rwsem); locked_rwsem = 1; @@ -1134,12 +1133,15 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, up_write(&mdsc->snap_rwsem); flush_snaps(mdsc); + ceph_dec_stopping_blocker(mdsc); return; bad: pr_err("%s corrupt snap message from mds%d\n", __func__, mds); ceph_msg_dump(msg); out: + ceph_dec_stopping_blocker(mdsc); + if (locked_rwsem) up_write(&mdsc->snap_rwsem); diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 4b0a070d5c6d..4f82ffb4f540 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -1467,15 +1467,72 @@ static int ceph_init_fs_context(struct fs_context *fc) return -ENOMEM; } +/* + * Return true if mdsc successfully increase blocker counter, + * or false if the mdsc is in stopping and flushed state. + */ +bool ceph_inc_stopping_blocker(struct ceph_mds_client *mdsc, + struct ceph_mds_session *session) +{ + mutex_lock(&session->s_mutex); + inc_session_sequence(session); + mutex_unlock(&session->s_mutex); + + spin_lock(&mdsc->stopping_lock); + if (mdsc->stopping >= CEPH_MDSC_STOPPING_FLUSHED) { + spin_unlock(&mdsc->stopping_lock); + return false; + } + atomic_inc(&mdsc->stopping_blockers); + spin_unlock(&mdsc->stopping_lock); + return true; +} + +void ceph_dec_stopping_blocker(struct ceph_mds_client *mdsc) +{ + spin_lock(&mdsc->stopping_lock); + if (!atomic_dec_return(&mdsc->stopping_blockers) && + mdsc->stopping >= CEPH_MDSC_STOPPING_FLUSHED) + complete_all(&mdsc->stopping_waiter); + spin_unlock(&mdsc->stopping_lock); +} + static void ceph_kill_sb(struct super_block *s) { struct ceph_fs_client *fsc = ceph_sb_to_client(s); + bool wait; dout("kill_sb %p\n", s); ceph_mdsc_pre_umount(fsc->mdsc); flush_fs_workqueues(fsc); + /* + * Though the kill_anon_super() will finally trigger the + * sync_filesystem() anyway, we still need to do it here and + * then bump the stage of shutdown. This will allow us to + * drop any further message, which will increase the inodes' + * i_count reference counters but makes no sense any more, + * from MDSs. + * + * Without this when evicting the inodes it may fail in the + * kill_anon_super(), which will trigger a warning when + * destroying the fscrypt keyring and then possibly trigger + * a further crash in ceph module when the iput() tries to + * evict the inodes later. + */ + sync_filesystem(s); + + spin_lock(&fsc->mdsc->stopping_lock); + fsc->mdsc->stopping = CEPH_MDSC_STOPPING_FLUSHED; + wait = !!atomic_read(&fsc->mdsc->stopping_blockers); + spin_unlock(&fsc->mdsc->stopping_lock); + + if (wait) { + while (atomic_read(&fsc->mdsc->stopping_blockers)) + wait_for_completion(&fsc->mdsc->stopping_waiter); + } + kill_anon_super(s); fsc->client->extra_mon_dispatch = NULL; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index a785e5cb9b40..8a9afa81a76f 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1398,4 +1398,7 @@ extern bool ceph_quota_update_statfs(struct ceph_fs_client *fsc, struct kstatfs *buf); extern void ceph_cleanup_quotarealms_inodes(struct ceph_mds_client *mdsc); +bool ceph_inc_stopping_blocker(struct ceph_mds_client *mdsc, + struct ceph_mds_session *session); +void ceph_dec_stopping_blocker(struct ceph_mds_client *mdsc); #endif /* _FS_CEPH_SUPER_H */ From patchwork Wed Apr 12 11:09:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208862 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001EAC77B71 for ; Wed, 12 Apr 2023 11:18:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230217AbjDLLS0 (ORCPT ); Wed, 12 Apr 2023 07:18:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbjDLLSM (ORCPT ); Wed, 12 Apr 2023 07:18:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7762C76A4 for ; Wed, 12 Apr 2023 04:17:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298128; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2f4Wd4+1hMmAQHKrBiPbcRBlPeFpE/suerg34gbQisk=; b=NSKCDBKKMgk4FIzWAvGxKcLEfWYcOTlN8ESnAlKzf8h8uUbHcR8aNz1kvRo4DbNrxv4G1G fNBM8pEsdPvHn4EeEe78MAb6sLPkKUcYnzCafKteCjPkr+6ropKSvJPXnFXqwypuEv6fr9 7Ui5aTxMSNQyiztk9u5zwXGiXQyajmc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-84-rLxPDxSMOCKIS_GUcallkw-1; Wed, 12 Apr 2023 07:15:25 -0400 X-MC-Unique: rLxPDxSMOCKIS_GUcallkw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1302588B7A6; Wed, 12 Apr 2023 11:15:25 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD565C15BB8; Wed, 12 Apr 2023 11:15:20 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 69/71] ceph: fix updating the i_truncate_pagecache_size for fscrypt Date: Wed, 12 Apr 2023 19:09:28 +0800 Message-Id: <20230412110930.176835-70-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li When fscrypt is enabled we will align the truncate size up to the CEPH_FSCRYPT_BLOCK_SIZE always, so if we truncate the size in the same block more than once, the latter ones will be skipped being invalidated from the page caches. This will force invalidating the page caches by using the smaller size than the real file size. At the same time add more debug log and fix the debug log for truncate code. URL: https://tracker.ceph.com/issues/58834 Tested-by: Luís Henriques Tested-by: Venky Shankar Reviewed-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/caps.c | 4 ++-- fs/ceph/inode.c | 35 ++++++++++++++++++++++++----------- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index e1bb6d9c16f8..3c5450f9d825 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3929,8 +3929,8 @@ static bool handle_cap_trunc(struct inode *inode, if (IS_ENCRYPTED(inode) && size) size = extra_info->fscrypt_file_size; - dout("handle_cap_trunc inode %p mds%d seq %d to %lld seq %d\n", - inode, mds, seq, truncate_size, truncate_seq); + dout("%s inode %p mds%d seq %d to %lld truncate seq %d\n", + __func__, inode, mds, seq, truncate_size, truncate_seq); queue_trunc = ceph_fill_file_size(inode, issued, truncate_seq, truncate_size, size); return queue_trunc; diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 1cfcbc39f7c6..059ebe42367c 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -763,7 +763,7 @@ int ceph_fill_file_size(struct inode *inode, int issued, ceph_fscache_update(inode); ci->i_reported_size = size; if (truncate_seq != ci->i_truncate_seq) { - dout("truncate_seq %u -> %u\n", + dout("%s truncate_seq %u -> %u\n", __func__, ci->i_truncate_seq, truncate_seq); ci->i_truncate_seq = truncate_seq; @@ -787,15 +787,26 @@ int ceph_fill_file_size(struct inode *inode, int issued, } } } - if (ceph_seq_cmp(truncate_seq, ci->i_truncate_seq) >= 0 && - ci->i_truncate_size != truncate_size) { - dout("truncate_size %lld -> %llu\n", ci->i_truncate_size, - truncate_size); + + /* + * It's possible that the new sizes of the two consecutive + * size truncations will be in the same fscrypt last block, + * and we need to truncate the corresponding page caches + * anyway. + */ + if (ceph_seq_cmp(truncate_seq, ci->i_truncate_seq) >= 0) { + dout("%s truncate_size %lld -> %llu, encrypted %d\n", __func__, + ci->i_truncate_size, truncate_size, !!IS_ENCRYPTED(inode)); + ci->i_truncate_size = truncate_size; - if (IS_ENCRYPTED(inode)) + + if (IS_ENCRYPTED(inode)) { + dout("%s truncate_pagecache_size %lld -> %llu\n", + __func__, ci->i_truncate_pagecache_size, size); ci->i_truncate_pagecache_size = size; - else + } else { ci->i_truncate_pagecache_size = truncate_size; + } } return queue_trunc; } @@ -2150,7 +2161,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode) retry: spin_lock(&ci->i_ceph_lock); if (ci->i_truncate_pending == 0) { - dout("__do_pending_vmtruncate %p none pending\n", inode); + dout("%s %p none pending\n", __func__, inode); spin_unlock(&ci->i_ceph_lock); mutex_unlock(&ci->i_truncate_mutex); return; @@ -2162,8 +2173,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode) */ if (ci->i_wrbuffer_ref_head < ci->i_wrbuffer_ref) { spin_unlock(&ci->i_ceph_lock); - dout("__do_pending_vmtruncate %p flushing snaps first\n", - inode); + dout("%s %p flushing snaps first\n", __func__, inode); filemap_write_and_wait_range(&inode->i_data, 0, inode->i_sb->s_maxbytes); goto retry; @@ -2174,7 +2184,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode) to = ci->i_truncate_pagecache_size; wrbuffer_refs = ci->i_wrbuffer_ref; - dout("__do_pending_vmtruncate %p (%d) to %lld\n", inode, + dout("%s %p (%d) to %lld\n", __func__, inode, ci->i_truncate_pending, to); spin_unlock(&ci->i_ceph_lock); @@ -2361,6 +2371,9 @@ static int fill_fscrypt_truncate(struct inode *inode, header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE); header.file_offset = cpu_to_le64(orig_pos); + dout("%s encrypt block boff/bsize %d/%lu\n", __func__, + boff, CEPH_FSCRYPT_BLOCK_SIZE); + /* truncate and zero out the extra contents for the last block */ memset(iov.iov_base + boff, 0, PAGE_SIZE - boff); From patchwork Wed Apr 12 11:09:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208856 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 535D3C77B71 for ; Wed, 12 Apr 2023 11:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230244AbjDLLRg (ORCPT ); Wed, 12 Apr 2023 07:17:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbjDLLR3 (ORCPT ); Wed, 12 Apr 2023 07:17:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 702CD769F for ; Wed, 12 Apr 2023 04:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298131; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MPFxWgOF2zs7ImsQcB5cAHXbp2GFUYf1kySCWUs28ik=; b=WOj953Tm9F0nkJVK+eJeB9eOBLlWuop6ox0xe6WVFPLgw1rFhy8KKmfjOAhNHT6U6/4vuy 2k6ucJV/tOfRjCwdEmLNcTK99ZBOELOSUKl0m4lMzXEr9BeAyuoYB5XN4qNdg1kdKFL+lc hSgDUZhhwxzN/DfiL8flykQjqvohAhk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-435-qq2lx2ssNUq_gZZgPcOI7A-1; Wed, 12 Apr 2023 07:15:30 -0400 X-MC-Unique: qq2lx2ssNUq_gZZgPcOI7A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA0DB28237C4; Wed, 12 Apr 2023 11:15:29 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id D1121C15BB8; Wed, 12 Apr 2023 11:15:25 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 70/71] ceph: switch ceph_open() to use new fscrypt helper Date: Wed, 12 Apr 2023 19:09:29 +0800 Message-Id: <20230412110930.176835-71-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques Instead of setting the no-key dentry in ceph_lookup(), use the new fscrypt_prepare_lookup_partial() helper. We still need to mark the directory as incomplete if the directory was just unlocked. Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/dir.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index fe48a5d26c1d..c28de23e12a1 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -784,14 +784,15 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry, return ERR_PTR(-ENAMETOOLONG); if (IS_ENCRYPTED(dir)) { - err = ceph_fscrypt_prepare_readdir(dir); + bool had_key = fscrypt_has_encryption_key(dir); + + err = fscrypt_prepare_lookup_partial(dir, dentry); if (err < 0) return ERR_PTR(err); - if (!fscrypt_has_encryption_key(dir)) { - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_NOKEY_NAME; - spin_unlock(&dentry->d_lock); - } + + /* mark directory as incomplete if it has been unlocked */ + if (!had_key && fscrypt_has_encryption_key(dir)) + ceph_dir_clear_complete(dir); } /* can we conclude ENOENT locally? */ From patchwork Wed Apr 12 11:09:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 13208868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22CDDC77B6E for ; Wed, 12 Apr 2023 11:21:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230000AbjDLLV5 (ORCPT ); Wed, 12 Apr 2023 07:21:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230063AbjDLLV4 (ORCPT ); Wed, 12 Apr 2023 07:21:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1D2F6E8D for ; Wed, 12 Apr 2023 04:20:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681298340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=afHp7n8QxQml8QBX51zp0hsclkDOoz1gPIi7+5IWNAY=; b=gzT8nxgqS7RxzsnqeHYeeyMKUcb2DGjpbvA5hamjGkr2Jr8dR4sbje16Z+9D+K+wGcGUQw wFtOkRJob7qerv++fO+rEBF2pS0OPuHzzPkuq3L6x8Lv2PWHjymFvFCMgbEdOBC25qxvn8 fyzXJfEf9H6ZjGLhMJ2S2moV3Ws1/u4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-125-CiQvsPPqP5y1Hq-6fT-Hng-1; Wed, 12 Apr 2023 07:15:34 -0400 X-MC-Unique: CiQvsPPqP5y1Hq-6fT-Hng-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F2D7185A792; Wed, 12 Apr 2023 11:15:34 +0000 (UTC) Received: from li-a71a4dcc-35d1-11b2-a85c-951838863c8d.ibm.com.com (ovpn-12-131.pek2.redhat.com [10.72.12.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB00BC15BB8; Wed, 12 Apr 2023 11:15:30 +0000 (UTC) From: xiubli@redhat.com To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com, lhenriques@suse.de, Xiubo Li Subject: [PATCH v18 71/71] ceph: switch ceph_open_atomic() to use the new fscrypt helper Date: Wed, 12 Apr 2023 19:09:30 +0800 Message-Id: <20230412110930.176835-72-xiubli@redhat.com> In-Reply-To: <20230412110930.176835-1-xiubli@redhat.com> References: <20230412110930.176835-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Luís Henriques Switch ceph_atomic_open() to use new fscrypt helper function fscrypt_prepare_lookup_partial(). This fixes a bug in the atomic open operation where a dentry is incorrectly set with DCACHE_NOKEY_NAME when 'dir' has been evicted but the key is still available (for example, where there's a drop_caches). Tested-by: Luís Henriques Tested-by: Venky Shankar Signed-off-by: Luís Henriques Signed-off-by: Xiubo Li --- fs/ceph/file.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 317087ea017e..9e74ed673f93 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -791,11 +791,9 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, ihold(dir); if (IS_ENCRYPTED(dir)) { set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); - if (!fscrypt_has_encryption_key(dir)) { - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_NOKEY_NAME; - spin_unlock(&dentry->d_lock); - } + err = fscrypt_prepare_lookup_partial(dir, dentry); + if (err < 0) + goto out_req; } if (flags & O_CREAT) {