From patchwork Fri May 26 21:41:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13257340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D5E3C77B7C for ; Fri, 26 May 2023 21:42:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98D11900002; Fri, 26 May 2023 17:42:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93D3E280003; Fri, 26 May 2023 17:42:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DE43900004; Fri, 26 May 2023 17:42:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 716C9900002 for ; Fri, 26 May 2023 17:42:04 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4B54E140FC8 for ; Fri, 26 May 2023 21:42:04 +0000 (UTC) X-FDA: 80833729368.16.4151C9A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 924258001A for ; Fri, 26 May 2023 21:42:01 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RmIOcaao; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685137321; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4Cmt7wVWF/53m28oT2gNRjy02BoBlgv0E5157juSdEU=; b=e+qwI02V6kPbpSdTl70pDi8Wj8vBWEnhDY1+FHwDJDX3ArcduvX9d26P+zhzNZ56pkQlbi izd1jfscVJbD5CpqiL+dVJ7YqOvDLNhkKYT1rwxNqLmvtaBzapFMeaQJqvyNIM8kt4AkPl p03NBIN6sNgecV0bgDpeqnq8jRiavIs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RmIOcaao; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685137321; a=rsa-sha256; cv=none; b=tmMKtL4KiuL8rpHcelSwkcbddFox8GonMe+XugLd8FsToVNmRdgFJRzlp+PNfuJ7JDZKfT 1PfMvxtDlElpfi1gp+hIyz+RRIgXPSXQYjc0MqEbj4000LWh48gBspuUKnKPP5NunbiG7A spKN2BiBSsCeFHRRRObTa3OFraz6qUA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685137320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Cmt7wVWF/53m28oT2gNRjy02BoBlgv0E5157juSdEU=; b=RmIOcaaoWF1spfKLHF0afVEL3QEGCYfcYd13gkPZ5PJAHfAHi6E+DmPIA/baOVAn35VS7m HaAkFH88y5Qe3jA8BHOUSniVRlf9aTfSZnnkE5pFzitqaL+457G35Xpd0kt+oyJgGZ8xWo ZCoCiwCncHsjlAXmLCdZ+riS3SuvP0I= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-548-EQeH2I93OGuY2_IlSki5ZQ-1; Fri, 26 May 2023 17:41:57 -0400 X-MC-Unique: EQeH2I93OGuY2_IlSki5ZQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BDE0E1C068D1; Fri, 26 May 2023 21:41:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43B0940CFD45; Fri, 26 May 2023 21:41:54 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand , Lorenzo Stoakes Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [PATCH v4 3/3] block: Use iov_iter_extract_pages() and page pinning in direct-io.c Date: Fri, 26 May 2023 22:41:42 +0100 Message-Id: <20230526214142.958751-4-dhowells@redhat.com> In-Reply-To: <20230526214142.958751-1-dhowells@redhat.com> References: <20230526214142.958751-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Rspamd-Queue-Id: 924258001A X-Rspam-User: X-Stat-Signature: 5ocep8hanh4za9dx4xhkkfrchijw7f4c X-Rspamd-Server: rspam01 X-HE-Tag: 1685137321-526532 X-HE-Meta: U2FsdGVkX184mGPNG11lkzD+QdYYVjC8NyMYfZ10vze50VvmUymQQR2q9xXXc8gAg94MYBx6ci9d5vP2XKKtBb/bKx8/HAHKy6JyBxRJSUjOXLNFtOZzX7rB0mTsHzNpTWuYnqM/HqG9TNfCpiDZhLVigHnFWrfeX4Pj0/w5ZOm1KTb+OOpplBjXMBKmywGiRhrY7Cp2pu2FhOk0nAlXTB7T+P+uIk35qVlh0ezcoBn8wTA9ZaH6okI/0klz0IRbpX9NGe+uVuo8F16W7wrTqdBsC2BIBprplh3xoXKyqJ9clQ7JyweP/33LO2+HWC7PrGpJxvmQWU/0yoSb9qCDz245Jt3Kn5K+SgodvPXlL3nGmw35tTlGRikhuJIWqt0YjZO9RSZUkM+ul70dIcZPpJ8w2AA2qVgqljCrXHMr1zMP+J8t527CsjGCpKhNpGRkGn/AC7rYf0GE0IJsgwpBplyWSiNd0F0ocy2h2QHJr7zbCuVYki/we2YEVMm9uaszhNafT3TquSY7BoRjZpyLmhuyppxcqfp10yMX3Fft0hiVf9oJU2R5bsQaglGa6tFC+R/bkWra5sSAwgwnI1QAK+9vS2y7Yiklg5ArON46ZluE+bq9vEVWydwKTouA+bOfhvOzzU4uXxI+Cs+fTyA6GT4dYjvBZkC5FTSxaslft8ChVX3FlHNByLuKxco4LUyDbxh7+3Cz1wy0EX11seVEQsB1PNJ2oV2/yR2Lmzw85ykBgrx0kumKw9LlSzMstfedFpVkt/92BB0PVyVrYQbbG9QJEs3F2E5GlTN12OUfbYb+D4icT8J9wT/BApcKhGEIp67oQLQONjwjMUEYrrz8ziNph7gSN8TiepkI93L9TGcGpEGEwlNhZ+7+w8H5rd4H9FQZSkr00I2Ivd5MJ0E5B+KiuffN02LqTUrXpUmXn9lNM4EYB63gIkmH62VOKn54jHmHdLiihJi9RxeAl7q zO9cIX+T MozgO0YnoQ+AJeIYyskV9H7leA2C+qNakXNxdo9jXKlpp7AYyoVtPKvMDCfIHoIeE6Ocpea8Pz6fyM/vfNB7Mr4cmJ9pDD6+BuJkcJh1OrbciCLAT08i8sLjJ61Bi4wl29WlNNIvJrKv8mH/9D6kz+q+8YDHQIATJQPeyQX+Ktg5Z3YWmkmopM9sEoxCRGvmOEm3zyZg8nfdBGIM+1apHNvnogEcXZF3eUjQVgCUco6p1FQA/vsvcwoh0dW911erk7O/0oAVXh/G2dxfdyVfJFnCt5Vp54ZhznF8ewQHfOznIhgDrIzlM4XmLqoSm4rV0G9Z+Dd5BpG3vwQmDYbVjPlYisyFzqFhjCgpBAbnNlRsLShO2Cqe7ucQlE+lDtC50kGc+d5Dj2UUuQf4NMfnji4umIuk0ENovM3f51sgkFiofW5FEJaYdjFZlSJkVyyLSGc7IoXJ5V9E66If8b6TjYkN3OBPIvDudCxohqQ/B5mQW/qM1VdmHaNR/3NXmBzrJBkTtphZfOoMWxNfb2GoTZHJxZAj03mB7vkLqot4aofadkEcJu8U4kctNXBmdHH9UShDFnl/ym7ZCnMBSriuDDoQJ28qksCXJXiU69WEw3r0rJKPb4qlPDD+1P8Gfsky7uLHaO0y2Fxcf5VQrcA4M1RTgCK7CJatHlzQ8diIIAy5dUeALYgj3cGc6YQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Lorenzo Stoakes cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig --- Notes: ver #3) - Rename need_unpin to is_pinned in struct dio. - page_get_additional_pin() was renamed to folio_add_pin(). ver #2) - Need to set BIO_PAGE_PINNED conditionally, not BIO_PAGE_REFFED. fs/direct-io.c | 72 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index ad20f3428bab..0643f1bb4b59 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -42,8 +42,8 @@ #include "internal.h" /* - * How many user pages to map in one call to get_user_pages(). This determines - * the size of a structure in the slab cache + * How many user pages to map in one call to iov_iter_extract_pages(). This + * determines the size of a structure in the slab cache */ #define DIO_PAGES 64 @@ -121,12 +121,13 @@ struct dio { struct inode *inode; loff_t i_size; /* i_size when submitted */ dio_iodone_t *end_io; /* IO completion function */ + bool is_pinned; /* T if we have pins on the pages */ void *private; /* copy from map_bh.b_private */ /* BIO completion state */ spinlock_t bio_lock; /* protects BIO fields below */ - int page_errors; /* errno from get_user_pages() */ + int page_errors; /* err from iov_iter_extract_pages() */ int is_async; /* is IO async ? */ bool defer_completion; /* defer AIO completion to workqueue? */ bool should_dirty; /* if pages should be dirtied */ @@ -165,14 +166,14 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) */ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { + struct page **pages = dio->pages; const enum req_op dio_op = dio->opf & REQ_OP_MASK; ssize_t ret; - ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, + DIO_PAGES, 0, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { - struct page *page = ZERO_PAGE(0); /* * A memory fault, but the filesystem has some outstanding * mapped blocks. We need to use those blocks up to avoid @@ -180,8 +181,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); - dio->pages[0] = page; + dio->pages[0] = ZERO_PAGE(0); sdio->head = 0; sdio->tail = 1; sdio->from = 0; @@ -201,9 +201,9 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) /* * Get another userspace page. Returns an ERR_PTR on error. Pages are - * buffered inside the dio so that we can call get_user_pages() against a - * decent number of pages, less frequently. To provide nicer use of the - * L1 cache. + * buffered inside the dio so that we can call iov_iter_extract_pages() + * against a decent number of pages, less frequently. To provide nicer use of + * the L1 cache. */ static inline struct page *dio_get_page(struct dio *dio, struct dio_submit *sdio) @@ -219,6 +219,18 @@ static inline struct page *dio_get_page(struct dio *dio, return dio->pages[sdio->head]; } +static void dio_pin_page(struct dio *dio, struct page *page) +{ + if (dio->is_pinned) + folio_add_pin(page_folio(page)); +} + +static void dio_unpin_page(struct dio *dio, struct page *page) +{ + if (dio->is_pinned) + unpin_user_page(page); +} + /* * dio_complete() - called when all DIO BIO I/O has been completed * @@ -402,8 +414,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_aio; else bio->bi_end_io = dio_bio_end_io; - /* for now require references for all pages */ - bio_set_flag(bio, BIO_PAGE_REFFED); + if (dio->is_pinned) + bio_set_flag(bio, BIO_PAGE_PINNED); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; } @@ -444,8 +456,9 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { - while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + if (dio->is_pinned) + unpin_user_pages(dio->pages + sdio->head, + sdio->tail - sdio->head); } /* @@ -676,7 +689,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -688,7 +701,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio) */ if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE) sdio->pages_in_io--; - get_page(sdio->cur_page); + dio_pin_page(dio, sdio->cur_page); sdio->final_block_in_bio = sdio->cur_page_block + (sdio->cur_page_len >> sdio->blkbits); ret = 0; @@ -743,11 +756,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -804,13 +817,13 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + dio_pin_page(dio, page); /* It is in dio */ sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -825,7 +838,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -926,7 +939,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -971,7 +984,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_unpin_page(dio, page); return -ENOTBLK; } @@ -984,7 +997,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_unpin_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1024,7 +1037,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1039,8 +1052,8 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, break; } - /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + /* Drop the pin which was taken in get_user_pages() */ + dio_unpin_page(dio, page); } out: return ret; @@ -1135,6 +1148,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, /* will be released by direct_io_worker */ inode_lock(inode); } + dio->is_pinned = iov_iter_extract_will_pin(iter); /* Once we sampled i_size check for reads beyond EOF */ dio->i_size = i_size_read(inode); @@ -1259,7 +1273,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_unpin_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio)