From patchwork Sun Oct 27 14:21:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13852662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD2A2D13562 for ; Sun, 27 Oct 2024 14:22:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43C6F8D0008; Sun, 27 Oct 2024 10:22:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3275D8D0001; Sun, 27 Oct 2024 10:22:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17F088D0008; Sun, 27 Oct 2024 10:22:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EA8328D0001 for ; Sun, 27 Oct 2024 10:22:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C094C81C3A for ; Sun, 27 Oct 2024 14:22:36 +0000 (UTC) X-FDA: 82719598500.28.268746C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id C545580016 for ; Sun, 27 Oct 2024 14:22:11 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=t8n4IOYj; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730038858; a=rsa-sha256; cv=none; b=1Ku4gTltfY7iAqqQYEYnG7bOtOnW/c1Q2LPFDawZl6qVFc0UbY4aRZY5857kInbKrroIoF 7UejKGJ0HA1dUE13IGNtbmpkYQjzqgjW0s5KVuVFV878sJ57nYgpXkmetkodUdZi0t54R3 HND8Ub+aUr1VZR26KVpVNRAxhKmkvcc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=t8n4IOYj; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730038858; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N/ZvKcy7ulyp8F2napwd3K9yKAl+F6g0GRqyTkyWeuU=; b=C059v+TA68PVliojOY17x/w/C7IBdnqwDcpvqMVcP9rKkeWjFZy5wI2MDHbjbfuAnIj+6y QPFziX+9KYuHrd1cgOmI8Q63QfhhFHzTvQG1I4LYC4nGeCgpbMcjRTzFo/Vn08FArwL5wX t0e0z2OKNZsIOkwYaco9T6zpF/yASHA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3425E5C5846; Sun, 27 Oct 2024 14:22:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 334A2C4CEE5; Sun, 27 Oct 2024 14:22:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730038973; bh=xs6fvM8ovmipSkvK4QHs63+5zfVZxbyL5606zTIt2PA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=t8n4IOYjkU+t+CyiP5M3EN8uu0+Fhy6+/w3iTz0Q2atQ8SbLeJALE1g9BoSjijj8S PhFXdSulUwQafrHIn9TUpPBoI8HneaOXwC+JJ/0p97sfDVhUC3RNK3PE04HkzCC/4J cWhbJk1wzm2NbKHYjlRlEPHMWjX6NITYLwV/dH6J33RTjaJ7fOIVz0CcVhtzsLzNex /phptkTCnClXhaWaw4eIffed64MThYl7FGZrLdyzju3U2a2z40qKT8ylqHMywRXe7K zG/3VBzIUTnaCnNjKwQ0rxt3cmXGFfMIET1vHtTft+96gIQ+u7gpzegw5AhZuk1QEm Ay7ibRcVje6YQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 2/7] block: don't merge different kinds of P2P transfers in a single bio Date: Sun, 27 Oct 2024 16:21:55 +0200 Message-ID: <34d44537a65aba6ede215a8ad882aeee028b423a.1730037261.git.leon@kernel.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: C545580016 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: cfmq7p7ofbkn818wkzzwy433hagmtmn7 X-HE-Tag: 1730038931-954611 X-HE-Meta: U2FsdGVkX19iVVKT2RESXlwETuZqmFlY4qpAZkKjIhipAB3dOft3ugazK9xE5Hjp5doM2pPEL9YhrLabcsw0KrdsxzUDLgVa2tPBPAbi3NrFaxFT5eetH72qgtir7c4kH9XnTJ4jXKX4nS39krZxXvnIBY+sMo0DKPK19VAITik6p/q0s6njMLQxhFnZlY3qIhyRzVdjbjGA8usVvvYl1bbRX6Vos+eXOZm3LCZbyXp5OoGOSsD46FDYC3BDrPv44g1S89ItR5XfbFgwKIzE/iRv6qgL0q04fOqg7rbdbs8vNFjZXjVoQgSsmogqSz5eAOrrMK088NH01iHszp6IPApQpwdK6q+B30w7o6YKIcDHKtZA+Y7fWBZ999HAg9f1tS8yq/nrwza0/F8foeXC4wLTpHD0nYIQppW+45YJn2mkuU2bEsnrx6nfDjJEEckHkIeuw3KFAlZtwNToL5+Po9ZoztXPuMvwFMscjt3RbFklBn65ZATPeWbc6qwGDuZq4NZufsDVxxqcNIVl1bFhoY7X3E5/7Yr0ev/031AuR22Ul/dXCk0UeKx3B5FFNYc1+NXQdNy+pjxNyZeq0fg9WzNVPo9joeupYb3z4sQF0fwZQFWH267jXkxFKsX+SQz51h2Y7BjJ/4VpS+wLNbN2MioVkS65+tO2Cjo9t8rI1d5kmgdJ67qK7OaB7Kaw4sybUOkkHBFki6Dz4McEaePVHE40/u4VGt4wGBOUkMwxDRwrf1MpEm+X5iuBKYKsZcxSzMubWg1x268yuLGgNDV7UQZ5dzoje2gYN3803xdts8YaHTXsz4nW1JhdkpNd3oKGvyEgv3abuN0sYxfHPqyvqRDT2Txu4MI8SP8ZmAVwHpcS1E0ufIHy4F5G9SebpwvUQk57tWSA85U2dnZ590O0Td0CFcYzZ4xF3O+JfBp9MYTb1ZMe04Fz6pP6oiv3OpS8Q5subEUsbIuyNPpEcYu xLyz9Nv3 rXrSgz8T4ah+8SAoMyWeaYtamiKAnC8jJZzLORi8KfkVaNDLq5+2ZFxhG5o0EIJ/9gG3xVBwvtwZhZUKvr6pFSB1aN2pPvgiTaPaorOoMpFMblppyA9KCBICPBrdi1XlMnemRIr1rn1pStT2upL/WU0ShhDnL/k+S/HEv5xZwQYqdLWB5h011YS538HG3qW1j4Y9TWHVI6U5VXyVess8Cauvjc9xdd07zo9P2B5MZQUG7T6I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Christoph Hellwig To get out of the dma mapping helpers having to check every segment for it's P2P status, ensure that bios either contain P2P transfers or non-P2P transfers, and that a P2P bio only contains ranges from a single device. This means we do the page zone access in the bio add path where it should be still page hot, and will only have do the fairly expensive P2P topology lookup once per bio down in the dma mapping path, and only for already marked bios. Signed-off-by: Christoph Hellwig Signed-off-by: Leon Romanovsky Reviewed-by: Logan Gunthorpe --- block/bio.c | 36 +++++++++++++++++++++++++++++------- block/blk-map.c | 32 ++++++++++++++++++++++++-------- include/linux/blk_types.h | 2 ++ 3 files changed, 55 insertions(+), 15 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2d3bc8bfb071..943a6d78cb3e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -928,8 +928,6 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page, return false; if (xen_domain() && !xen_biovec_phys_mergeable(bv, page)) return false; - if (!zone_device_pages_have_same_pgmap(bv->bv_page, page)) - return false; *same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) & PAGE_MASK)); @@ -993,6 +991,14 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, if (bio->bi_vcnt > 0) { struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; + /* + * When doing ZONE_DEVICE-based P2P transfers, all pages in a + * bio must be P2P pages from the same device. + */ + if ((bio->bi_opf & REQ_P2PDMA) && + !zone_device_pages_have_same_pgmap(bv->bv_page, page)) + return 0; + if (bvec_try_merge_hw_page(q, bv, page, len, offset, same_page)) { bio->bi_iter.bi_size += len; @@ -1009,6 +1015,9 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, */ if (bvec_gap_to_prev(&q->limits, bv, offset)) return 0; + } else { + if (is_pci_p2pdma_page(page)) + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; } bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, offset); @@ -1133,11 +1142,24 @@ static int bio_add_page_int(struct bio *bio, struct page *page, if (bio->bi_iter.bi_size > UINT_MAX - len) return 0; - if (bio->bi_vcnt > 0 && - bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], - page, len, offset, same_page)) { - bio->bi_iter.bi_size += len; - return len; + if (bio->bi_vcnt > 0) { + struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; + + /* + * When doing ZONE_DEVICE-based P2P transfers, all pages in a + * bio must be P2P pages from the same device. + */ + if ((bio->bi_opf & REQ_P2PDMA) && + !zone_device_pages_have_same_pgmap(bv->bv_page, page)) + return 0; + + if (bvec_try_merge_page(bv, page, len, offset, same_page)) { + bio->bi_iter.bi_size += len; + return len; + } + } else { + if (is_pci_p2pdma_page(page)) + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; } if (bio->bi_vcnt >= bio->bi_max_vecs) diff --git a/block/blk-map.c b/block/blk-map.c index 0e1167b23934..03192b1ca6ea 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -568,6 +568,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) const struct queue_limits *lim = &q->limits; unsigned int nsegs = 0, bytes = 0; struct bio *bio; + int error; size_t i; if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q)) @@ -588,15 +589,30 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) for (i = 0; i < nr_segs; i++) { struct bio_vec *bv = &bvecs[i]; - /* - * If the queue doesn't support SG gaps and adding this - * offset would create a gap, fallback to copy. - */ - if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) { - blk_mq_map_bio_put(bio); - return -EREMOTEIO; + error = -EREMOTEIO; + if (bvprvp) { + /* + * If the queue doesn't support SG gaps and adding this + * offset would create a gap, fallback to copy. + */ + if (bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) + goto put_bio; + + /* + * When doing ZONE_DEVICE-based P2P transfers, all pages + * in a bio must be P2P pages, and from the same device. + */ + if ((bio->bi_opf & REQ_P2PDMA) && + zone_device_pages_have_same_pgmap(bvprvp->bv_page, + bv->bv_page)) + goto put_bio; + } else { + if (is_pci_p2pdma_page(bv->bv_page)) + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; } + /* check full condition */ + error = -EINVAL; if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len) goto put_bio; if (bytes + bv->bv_len > nr_iter) @@ -611,7 +627,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) return 0; put_bio: blk_mq_map_bio_put(bio); - return -EINVAL; + return error; } /** diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dce7615c35e7..94cf146e8ce6 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -378,6 +378,7 @@ enum req_flag_bits { __REQ_DRV, /* for driver use */ __REQ_FS_PRIVATE, /* for file system (submitter) use */ __REQ_ATOMIC, /* for atomic write operations */ + __REQ_P2PDMA, /* contains P2P DMA pages */ /* * Command specific flags, keep last: */ @@ -410,6 +411,7 @@ enum req_flag_bits { #define REQ_DRV (__force blk_opf_t)(1ULL << __REQ_DRV) #define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE) #define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC) +#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA) #define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)