From patchwork Wed Dec 9 02:19:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 11960169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3808C433FE for ; Wed, 9 Dec 2020 02:24:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CF0F23A53 for ; Wed, 9 Dec 2020 02:24:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725808AbgLICYF (ORCPT ); Tue, 8 Dec 2020 21:24:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbgLICYE (ORCPT ); Tue, 8 Dec 2020 21:24:04 -0500 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 591BFC0613D6; Tue, 8 Dec 2020 18:23:24 -0800 (PST) Received: by mail-wr1-x443.google.com with SMTP id t16so61105wra.3; Tue, 08 Dec 2020 18:23:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kEk2mz5l7WecWhgbMMZZSjIALYbOuSd6wBX0z9f5zlg=; b=el+Ia+dPnlOkLmTSpK7fA3ZFd71Z12DD7TXIqNy9i3WkLpb1bqXOxd7Nm/nwbIO2YG k8Yuy3f91BWVy4rlXD3vyZq8hJd3IR01DjOpNWqFsqtIYPTfAAB2tDeGDiM0sWffWlS2 LdqcSjuPDXENlRU130Yz+/1mjILgYKH3QUEOmMondSWpNHbIVk2wrVxdaqqCRIIVdNAl o7ExEYoBCfMkS3QRWy8fazz7zsHuYdxogYyXecZ98eLiJhImVR9/E8pKrmch6sqWoyVT XutPu4MuiCQspHh+puGzFHOGg9avtQg8aoyCu/wzPXqQvUkSZ7Q3mq8a3qeggIjIyM4s CHJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kEk2mz5l7WecWhgbMMZZSjIALYbOuSd6wBX0z9f5zlg=; b=fns6Bm98+Oi1sGiZ8qHnX/onYHLV4bjoNPXvYBGvXSspZoxo+SmwTdkjZ9M+CRaZLp blF3PlfqgEGLcWd9Niz9HX/8Mh9Vv1v7u7SXBrDFYvxVNQ8QKBE7PcF4+/yPIQeeDjN/ N9mHhpPqZcRyTErg7t4T+lzZVScEIJVZlO6q3lrxocULq3IuFD8aqbF3Q415NQ8Qz6za /vmH3qEaO33rGbfh2XNVBG5/qPPIcrOB4IoYH4I/T9+dORWb1GeiT8DPCp9Q3A1a3jfQ GRssZ4Z2C5WxnMs5kvy4F0o9khxTWGuOcoYDCjkYUos7FbR7u2tG6BQyV2gbjWgu7ya8 3bSA== X-Gm-Message-State: AOAM531vRRAAAaz2KVJC+mwACK4pBrOlP70oD8GkGJCDT4+8lxgHYBCK At13GSaMskMF8AB12iirRhY= X-Google-Smtp-Source: ABdhPJxWrrjCcSS+xhUK643SB6maMOz8JksdzmJmtBzLyHa0WrLlvDru8TaL0VthNeK+uog0GMINug== X-Received: by 2002:a5d:5710:: with SMTP id a16mr46975wrv.229.1607480603120; Tue, 08 Dec 2020 18:23:23 -0800 (PST) Received: from localhost.localdomain ([85.255.233.156]) by smtp.gmail.com with ESMTPSA id k64sm330606wmb.11.2020.12.08.18.23.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 18:23:22 -0800 (PST) From: Pavel Begunkov To: Jens Axboe Cc: Alexander Viro , linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] iov: introduce ITER_BVEC_FLAG_FIXED Date: Wed, 9 Dec 2020 02:19:51 +0000 Message-Id: X-Mailer: git-send-email 2.24.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add ITER_BVEC_FLAG_FIXED iov iter flag, which will allow us to reuse passed in bvec instead of copying it. In particular it means that iter->bvec won't be freed and page references are taken remain so until callees don't need them, including asynchronous execution. Signed-off-by: Pavel Begunkov Signed-off-by: Christoph Hellwig --- fs/io_uring.c | 1 + include/linux/uio.h | 14 +++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index c536462920a3..9ff2805d0075 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2920,6 +2920,7 @@ static ssize_t io_import_fixed(struct io_kiocb *req, int rw, } } + iter->type |= ITER_BVEC_FLAG_FIXED; return len; } diff --git a/include/linux/uio.h b/include/linux/uio.h index 72d88566694e..af626eb970cf 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -18,6 +18,8 @@ struct kvec { }; enum iter_type { + ITER_BVEC_FLAG_FIXED = 2, + /* iter types */ ITER_IOVEC = 4, ITER_KVEC = 8, @@ -29,8 +31,9 @@ enum iter_type { struct iov_iter { /* * Bit 0 is the read/write bit, set if we're writing. - * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and - * the caller isn't expecting to drop a page reference when done. + * Bit 1 is the BVEC_FLAG_FIXED bit, set if type is a bvec and the + * caller ensures that page references and memory baking bvec won't + * go away until callees finish with them. */ unsigned int type; size_t iov_offset; @@ -52,7 +55,7 @@ struct iov_iter { static inline enum iter_type iov_iter_type(const struct iov_iter *i) { - return i->type & ~(READ | WRITE); + return i->type & ~(READ | WRITE | ITER_BVEC_FLAG_FIXED); } static inline bool iter_is_iovec(const struct iov_iter *i) @@ -85,6 +88,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i) return i->type & (READ | WRITE); } +static inline unsigned char iov_iter_bvec_fixed(const struct iov_iter *i) +{ + return i->type & ITER_BVEC_FLAG_FIXED; +} + /* * Total number of bytes covered by an iovec. * From patchwork Wed Dec 9 02:19:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 11960173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9708AC19437 for ; Wed, 9 Dec 2020 02:24:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6FBE423A75 for ; Wed, 9 Dec 2020 02:24:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726694AbgLICYG (ORCPT ); Tue, 8 Dec 2020 21:24:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbgLICYF (ORCPT ); Tue, 8 Dec 2020 21:24:05 -0500 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49FA2C061793; Tue, 8 Dec 2020 18:23:25 -0800 (PST) Received: by mail-wm1-x342.google.com with SMTP id x22so124017wmc.5; Tue, 08 Dec 2020 18:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8nqDKih6zs0hYQyZVTLYe1C7LXeqUorCx1dYSoWDPdA=; b=dY0KUTZ5GpJFPtyv4XUE1XidgJi6UAxHsPmqZ2S1fkEB1MZEGrUE4mlvN6U1PBusmU QWpC/nsZe+pAABtgAxjYFa7p+L1y3ok1VqvO/ywORlAwm/JR9lqAPYxeEWeFv+Mq2FQ2 MrlAB2LtJStGBI+EmFmHRWGqA6ccp7im6GGm6qq2Ceu6bH5F8LWEUqKRwaljuot5e5M6 BBET6k+7gVu2Ijts63gVDyGMfjmX7zOwIWXc3hlCRvYwNijp5JUBxMlpixoLTc06YnKD wNeWz/rgfAOC3/dU2Zftza3BclNRpC3rjuVctWX8brDvkPhFUloHCYS3QmK8bhchgmT0 5kpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8nqDKih6zs0hYQyZVTLYe1C7LXeqUorCx1dYSoWDPdA=; b=hxAIUbfs07VZlTwXz30wIkdx/T5RVRdQ6JgHuj7rXNDTyPc8kK0zy3RmN+RTXbAq22 nrdXfnfb5fiLk+XUSuSrQnT7n8fTYvxlUWibCFe5GLv4ZddvcSqo0QFmnGBV8M6CYi1D J1scN4BH/i8BXVp2c4Msf3T9CiSwyaar66IKAHxp1DsutZTmpv5cfO0SQgJ9B76eUsw9 iImOq4CRRfPzkEEnMkP0N08elZ2xBnX/PBd/CkNTu16dfk277YGuHULNOnljjKghYkQv 5hk0FoHWDuSaX4hs7n4itrCeBaaVqHKnjvYBK/4vhtJpUV7VkJrBq8C7cSkN5IXpbkaz 4/MA== X-Gm-Message-State: AOAM532sLWCAWFKilSD4V1KpL9ra3KRZM/5KUTNqAeoJ3+nTs7oAeoXN Ncz2M1PRNENOAQyePsFTk1+hTHoY8nRFzw== X-Google-Smtp-Source: ABdhPJx4PlE+nQwap93mrg1QUhABGQKdyeuq32vY5vLrCdwCgzlONCP/1uYvKPYpVqOQj6fOXSosYw== X-Received: by 2002:a05:600c:210e:: with SMTP id u14mr319194wml.48.1607480604048; Tue, 08 Dec 2020 18:23:24 -0800 (PST) Received: from localhost.localdomain ([85.255.233.156]) by smtp.gmail.com with ESMTPSA id k64sm330606wmb.11.2020.12.08.18.23.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 18:23:23 -0800 (PST) From: Pavel Begunkov To: Jens Axboe Cc: Alexander Viro , linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH 2/2] block: no-copy bvec for direct IO Date: Wed, 9 Dec 2020 02:19:52 +0000 Message-Id: <51905c4fcb222e14a1d5cb676364c1b4f177f582.1607477897.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The block layer spends quite a while in blkdev_direct_IO() to copy and initialise bio's bvec. However, if we've already got a bvec in the input iterator it might be reused in some cases, i.e. when new ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable performance boost, and it also reduces memory footprint. Suggested-by: Matthew Wilcox [BIO_WORKINGSET] Suggested-by: Johannes Weiner Signed-off-by: Pavel Begunkov --- fs/block_dev.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index d699f3af1a09..aee5d2e4f324 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -349,6 +349,28 @@ static void blkdev_bio_end_io(struct bio *bio) } } +static int bio_iov_fixed_bvec_get_pages(struct bio *bio, struct iov_iter *iter) +{ + bio->bi_vcnt = iter->nr_segs; + bio->bi_max_vecs = iter->nr_segs; + bio->bi_io_vec = (struct bio_vec *)iter->bvec; + bio->bi_iter.bi_bvec_done = iter->iov_offset; + bio->bi_iter.bi_size = iter->count; + + /* + * In practice groups of pages tend to be accessed/reclaimed/refaulted + * together. To not go over bvec for those who didn't set BIO_WORKINGSET + * approximate it by looking at the first page and inducing it to the + * whole bio + */ + if (unlikely(PageWorkingset(iter->bvec->bv_page))) + bio_set_flag(bio, BIO_WORKINGSET); + bio_set_flag(bio, BIO_NO_PAGE_REF); + + iter->count = 0; + return 0; +} + static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) { @@ -368,6 +390,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) (bdev_logical_block_size(bdev) - 1)) return -EINVAL; + if (iov_iter_bvec_fixed(iter)) + nr_vecs = 0; bio = bio_alloc_bioset(GFP_KERNEL, nr_vecs, &blkdev_dio_pool); dio = container_of(bio, struct blkdev_dio, bio); @@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(bio, iter); + if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter)) + ret = bio_iov_fixed_bvec_get_pages(bio, iter); + else + ret = bio_iov_iter_get_pages(bio, iter); + if (unlikely(ret)) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio);