From patchwork Fri Jan 26 13:29:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13532583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32AFFC47DDF for ; Fri, 26 Jan 2024 13:30:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7D856B0096; Fri, 26 Jan 2024 08:30:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ADEE76B0098; Fri, 26 Jan 2024 08:30:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9306B6B0099; Fri, 26 Jan 2024 08:30:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 80C6F6B0096 for ; Fri, 26 Jan 2024 08:30:31 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 59F74C073E for ; Fri, 26 Jan 2024 13:30:31 +0000 (UTC) X-FDA: 81721546662.20.100E8DB Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf23.hostedemail.com (Postfix) with ESMTP id 9DF10140019 for ; Fri, 26 Jan 2024 13:30:29 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=teotdbtZ; dmarc=none; spf=none (imf23.hostedemail.com: domain of BATV+8230b42af99c397292d7+7460+infradead.org+hch@bombadil.srs.infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=BATV+8230b42af99c397292d7+7460+infradead.org+hch@bombadil.srs.infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706275829; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R65vFoRwXEcbpDPh9sk2joksz0b+qXxKQPYH8JdojnY=; b=20TgwgtJ5i8KLdm5JRveP1BQDHK5jUdPdMAxZTgfHCHm4ONmO+lQWHgeYzaj7Ulf4hg44Y +bidXYA0DEUaSjltj1TwRkT41yNF1COblPL5N5b1T2tbyTOoklz3WKpT7gu4TTEICUvVH+ ka8gt6lVmhSkuZf4Y0MkEbvGZJD6hlg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=teotdbtZ; dmarc=none; spf=none (imf23.hostedemail.com: domain of BATV+8230b42af99c397292d7+7460+infradead.org+hch@bombadil.srs.infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=BATV+8230b42af99c397292d7+7460+infradead.org+hch@bombadil.srs.infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706275829; a=rsa-sha256; cv=none; b=Gt7BhsqaoJh+Fe8c8swssgtURv7GZhFJclZu5bI6WKvZ18k3W8PZKUyfgTapIKuQqGhFWf 4ILKgDixUqBuCjP4zdzdkFmOlzwRTG40gKVCyS3I+yOOVjaNJQTpC9t9oyCCSeDUjrwzDM vIPCM1IIpRH5dQvSvW74OOImSiDifgI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=R65vFoRwXEcbpDPh9sk2joksz0b+qXxKQPYH8JdojnY=; b=teotdbtZ3zhU9LE56kgDgixM/1 A5j31X1bYN9n/qobJsPEBAw4kLlLRWVtxijErFpO6jCBuoVtZBvUwayCJiojz+r492ZmlmOcizxWF QtXk0ir5sPVD+4CTcmisar6jYiNoFIn25ypY7OQW9mmelkhur/mabcYPDlgTRESfUxj0Hzt/4HKY8 my8EkmTVdLD6/qfP+RMzHHazrgZvm4hO/ZaqWx0C+CWnJZcXDPYfISHMmPQKvHKIPCgWXVMcjDit2 18yAaKw/KY5FNOkQbQQZBbmEYEZxUAJQXbbMZeLrylPm42i2swKTgNJZr2calu6A0O8b4WhGPBgRJ s3YzbtvQ==; Received: from 2a02-8389-2341-5b80-39d3-4735-9a3c-88d8.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:39d3:4735:9a3c:88d8] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1rTMHq-00000004Czt-0kpA; Fri, 26 Jan 2024 13:30:26 +0000 From: Christoph Hellwig To: Chandan Babu R , "Darrick J. Wong" , Hugh Dickins , Andrew Morton Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios Date: Fri, 26 Jan 2024 14:29:02 +0100 Message-Id: <20240126132903.2700077-21-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240126132903.2700077-1-hch@lst.de> References: <20240126132903.2700077-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Rspamd-Queue-Id: 9DF10140019 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 85hfkygapyx6rgxrrhqzqjrhzmkby4io X-HE-Tag: 1706275829-739874 X-HE-Meta: U2FsdGVkX19wmdIdIUrQFOid6nSd9JuKAP5rxkQ8VOxd0j96UgJuY643ph/fq58pcAfqtAjeVnHd6BoPgdWFI+EbTQBa2JOGgdbDqAdTTzM6uMXM+HMj2aNvL3G2/7LU/Zbx8dr+uWlTmVHFTkuNtHR+uZTRzsWkknro9KqSbhlDmplrtfnE8CX++psN+jUJQTWhYSWGA/B+1tTbifRPWn8DNTwO2kvoAEAMI7X6m2594z+a//XsnQeMhIOpHy0H8+XK5TBlhbLrOxSg26BPfbScJmXeUdF6b7jaWpexs7dP2hUEIX99Gb7NS3LAAOf7BpvN9FMLo+z7KIkbPFGv5Tz7g+AgTQLNHbpMyH15D3BgMaI6WKpT4PVo1S4UHUYSoeYOWvrvK8crEb0E7UC1sGDxI2AOGaSN9jlk+j5EKZTicwHTQmgX/YMOBItTK+/XLbl6TVnTz/0akDb16dsAiBPXBxc2/lSsfDyyhCXFp5I9a6bgp2qkg8yqieHzhM+TaPWQAzHGK1sEXVZM9Xx1G1beqJGO5dPSETH7XrvQHfnHCpbIS7bzrXwgcqR1W85ZnVryPLmueC/ldXgWz1xFd3TqnDlrhENJAVbLEyVBnhTAXjA+1rDJ3LqdLKkZ9bT/ifRnx9bTCHegknb0H2hYkoeKGXvhigT1WDfGjPnHsIkKPRGRP0oc5MCmoIjJ9wwRYUwptENRRiRjpJ7yAvFqXZJchbsVg63rqYWkpFS+Zigh2Aj9D/Snam9sg04LT1qB5ONdsmUKRI/V8ftveHshOP1ciIqtO6GSnjx95z45jcXLndjz+45Qsb8iKYGoBMkY8ZjOkJGtF2Q5V1WZB+EuS04I+QKxtgylda/eb/lUD4WCjCHCf4to2ivLcwA0qAIajDkL93aljHIQuIIocjt+lbrkLgkPkN4FcYDvWuVBD2Mrq/CquA6DV1fFYrxbrudkhen2M+CD2Jt/BjwnX+u 5gImafOp QWYZELD5jvqly9NHfQgFOcu6xcuC/qIUWvDVW4PV5vQDqXGiIxrra7ZvCKHcoyENlMyb0K/sRZH2SN0ITYfAMkzsWAoflYQt9QfO4vQgt6Re5cviWr+KL4i1nZp7PEZFCnuX1YxHhhOhL/5A2AzRf0bQXWe+tRDHg+/XUQ783GG6Dp42cDC2GYZ+nU8wBrhiw7CVtaFieAdESB4i1SyIQ60InXnFp1n3JD3TBe+kahSY80vGhirCGlJ2Fbk8ajDq//iy3rfeqT40qNTvqPfe9ReT63RL53+5QZJxLT49DMk/AmrzaSA3lp62iC/TVJjiU4OiVowKHwmQOcbajyJwU+S0n2jLRkMG4k91d0W0ItkQtEQpw1XmLKyjOTF0/EADXdgY02LdXgjFu6X9xNPpw605960mDLFEsJIKjeOVgPW62dR23lZ0ILpBktseFU8IyewhqvDNM+o02t5P8jJBfRMI0PjE9I0kVbn5TNK4g/fPRE8mO+xXQRYnggxQvRW+gxcfN2snF9LVOvZ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Darrick J. Wong" Convert xfarray_pagesort to handle large folios by introducing a new xfile_get_folio routine that can return a folio of arbitrary size, and using heapsort on the full folio. This also corrects an off-by-one bug in the calculation of len in xfarray_pagesort that was papered over by xfarray_want_pagesort. Signed-off-by: Darrick J. Wong Signed-off-by: Christoph Hellwig Reviewed-by: Kent Overstreet --- fs/xfs/scrub/trace.h | 43 ++++++++- fs/xfs/scrub/xfarray.c | 201 +++++++++++++++++++---------------------- fs/xfs/scrub/xfarray.h | 10 +- 3 files changed, 143 insertions(+), 111 deletions(-) diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index c61fa7a95ef522..3a1a827828dcb9 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -956,7 +956,7 @@ TRACE_EVENT(xfarray_isort, __entry->hi - __entry->lo) ); -TRACE_EVENT(xfarray_pagesort, +TRACE_EVENT(xfarray_foliosort, TP_PROTO(struct xfarray_sortinfo *si, uint64_t lo, uint64_t hi), TP_ARGS(si, lo, hi), TP_STRUCT__entry( @@ -1027,6 +1027,47 @@ TRACE_EVENT(xfarray_sort, __entry->bytes) ); +TRACE_EVENT(xfarray_sort_scan, + TP_PROTO(struct xfarray_sortinfo *si, unsigned long long idx), + TP_ARGS(si, idx), + TP_STRUCT__entry( + __field(unsigned long, ino) + __field(unsigned long long, nr) + __field(size_t, obj_size) + __field(unsigned long long, idx) + __field(unsigned long long, folio_pos) + __field(unsigned long, folio_bytes) + __field(unsigned long long, first_idx) + __field(unsigned long long, last_idx) + ), + TP_fast_assign( + __entry->nr = si->array->nr; + __entry->obj_size = si->array->obj_size; + __entry->ino = file_inode(si->array->xfile->file)->i_ino; + __entry->idx = idx; + if (si->folio) { + __entry->folio_pos = folio_pos(si->folio); + __entry->folio_bytes = folio_size(si->folio); + __entry->first_idx = si->first_folio_idx; + __entry->last_idx = si->last_folio_idx; + } else { + __entry->folio_pos = 0; + __entry->folio_bytes = 0; + __entry->first_idx = 0; + __entry->last_idx = 0; + } + ), + TP_printk("xfino 0x%lx nr %llu objsz %zu idx %llu folio_pos 0x%llx folio_bytes 0x%lx first_idx %llu last_idx %llu", + __entry->ino, + __entry->nr, + __entry->obj_size, + __entry->idx, + __entry->folio_pos, + __entry->folio_bytes, + __entry->first_idx, + __entry->last_idx) +); + TRACE_EVENT(xfarray_sort_stats, TP_PROTO(struct xfarray_sortinfo *si, int error), TP_ARGS(si, error), diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c index 379e1db22269c7..17c982a4821d47 100644 --- a/fs/xfs/scrub/xfarray.c +++ b/fs/xfs/scrub/xfarray.c @@ -563,70 +563,42 @@ xfarray_isort( return xfile_store(si->array->xfile, scratch, len, lo_pos); } -/* Grab a page for sorting records. */ -static inline int -xfarray_sort_get_page( - struct xfarray_sortinfo *si, - loff_t pos, - uint64_t len) -{ - return xfile_get_page(si->array->xfile, pos, len, &si->xfpage); -} - -/* Release a page we grabbed for sorting records. */ -static inline int -xfarray_sort_put_page( - struct xfarray_sortinfo *si) -{ - if (!xfile_page_cached(&si->xfpage)) - return 0; - return xfile_put_page(si->array->xfile, &si->xfpage); -} - -/* Decide if these records are eligible for in-page sorting. */ -static inline bool -xfarray_want_pagesort( - struct xfarray_sortinfo *si, - xfarray_idx_t lo, - xfarray_idx_t hi) -{ - pgoff_t lo_page; - pgoff_t hi_page; - loff_t end_pos; - - /* We can only map one page at a time. */ - lo_page = xfarray_pos(si->array, lo) >> PAGE_SHIFT; - end_pos = xfarray_pos(si->array, hi) + si->array->obj_size - 1; - hi_page = end_pos >> PAGE_SHIFT; - - return lo_page == hi_page; -} - -/* Sort a bunch of records that all live in the same memory page. */ +/* + * Sort the records from lo to hi (inclusive) if they are all backed by the + * same memory folio. Returns 1 if it sorted, 0 if it did not, or a negative + * errno. + */ STATIC int -xfarray_pagesort( +xfarray_foliosort( struct xfarray_sortinfo *si, xfarray_idx_t lo, xfarray_idx_t hi) { + struct folio *folio; void *startp; loff_t lo_pos = xfarray_pos(si->array, lo); - uint64_t len = xfarray_pos(si->array, hi - lo); - int error = 0; + uint64_t len = xfarray_pos(si->array, hi - lo + 1); - trace_xfarray_pagesort(si, lo, hi); + /* No single folio could back this many records. */ + if (len > XFILE_MAX_FOLIO_SIZE) + return 0; xfarray_sort_bump_loads(si); - error = xfarray_sort_get_page(si, lo_pos, len); - if (error) - return error; + folio = xfile_get_folio(si->array->xfile, lo_pos, len, XFILE_ALLOC); + if (IS_ERR(folio)) + return PTR_ERR(folio); + if (!folio) + return 0; + + trace_xfarray_foliosort(si, lo, hi); xfarray_sort_bump_heapsorts(si); - startp = page_address(si->xfpage.page) + offset_in_page(lo_pos); + startp = folio_address(folio) + offset_in_folio(folio, lo_pos); sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL); xfarray_sort_bump_stores(si); - return xfarray_sort_put_page(si); + xfile_put_folio(si->array->xfile, folio); + return 1; } /* Return a pointer to the xfarray pivot record within the sortinfo struct. */ @@ -814,63 +786,78 @@ xfarray_qsort_push( return 0; } +static inline void +xfarray_sort_scan_done( + struct xfarray_sortinfo *si) +{ + if (si->folio) + xfile_put_folio(si->array->xfile, si->folio); + si->folio = NULL; +} + /* - * Load an element from the array into the first scratchpad and cache the page, - * if possible. + * Cache the folio backing the start of the given array element. If the array + * element is contained entirely within the folio, return a pointer to the + * cached folio. Otherwise, load the element into the scratchpad and return a + * pointer to the scratchpad. */ static inline int -xfarray_sort_load_cached( +xfarray_sort_scan( struct xfarray_sortinfo *si, xfarray_idx_t idx, - void *ptr) + void **ptrp) { loff_t idx_pos = xfarray_pos(si->array, idx); - pgoff_t startpage; - pgoff_t endpage; int error = 0; - /* - * If this load would split a page, release the cached page, if any, - * and perform a traditional read. - */ - startpage = idx_pos >> PAGE_SHIFT; - endpage = (idx_pos + si->array->obj_size - 1) >> PAGE_SHIFT; - if (startpage != endpage) { - error = xfarray_sort_put_page(si); - if (error) - return error; + if (xfarray_sort_terminated(si, &error)) + return error; - if (xfarray_sort_terminated(si, &error)) - return error; + trace_xfarray_sort_scan(si, idx); - return xfile_load(si->array->xfile, ptr, - si->array->obj_size, idx_pos); - } + /* If the cached folio doesn't cover this index, release it. */ + if (si->folio && + (idx < si->first_folio_idx || idx > si->last_folio_idx)) + xfarray_sort_scan_done(si); - /* If the cached page is not the one we want, release it. */ - if (xfile_page_cached(&si->xfpage) && - xfile_page_index(&si->xfpage) != startpage) { - error = xfarray_sort_put_page(si); - if (error) - return error; + /* Grab the first folio that backs this array element. */ + if (!si->folio) { + loff_t next_pos; + + si->folio = xfile_get_folio(si->array->xfile, idx_pos, + si->array->obj_size, XFILE_ALLOC); + if (IS_ERR(si->folio)) + return PTR_ERR(si->folio); + + si->first_folio_idx = xfarray_idx(si->array, + folio_pos(si->folio) + si->array->obj_size - 1); + + next_pos = folio_pos(si->folio) + folio_size(si->folio); + si->last_folio_idx = xfarray_idx(si->array, next_pos - 1); + if (xfarray_pos(si->array, si->last_folio_idx + 1) > next_pos) + si->last_folio_idx--; + + trace_xfarray_sort_scan(si, idx); } /* - * If we don't have a cached page (and we know the load is contained - * in a single page) then grab it. + * If this folio still doesn't cover the desired element, it must cross + * a folio boundary. Read into the scratchpad and we're done. */ - if (!xfile_page_cached(&si->xfpage)) { - if (xfarray_sort_terminated(si, &error)) - return error; + if (idx < si->first_folio_idx || idx > si->last_folio_idx) { + void *temp = xfarray_scratch(si->array); - error = xfarray_sort_get_page(si, startpage << PAGE_SHIFT, - PAGE_SIZE); + error = xfile_load(si->array->xfile, temp, si->array->obj_size, + idx_pos); if (error) return error; + + *ptrp = temp; + return 0; } - memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos), - si->array->obj_size); + /* Otherwise return a pointer to the array element in the folio. */ + *ptrp = folio_address(si->folio) + offset_in_folio(si->folio, idx_pos); return 0; } @@ -937,6 +924,8 @@ xfarray_sort( pivot = xfarray_sortinfo_pivot(si); while (si->stack_depth >= 0) { + int ret; + lo = si_lo[si->stack_depth]; hi = si_hi[si->stack_depth]; @@ -949,13 +938,13 @@ xfarray_sort( } /* - * If directly mapping the page and sorting can solve our + * If directly mapping the folio and sorting can solve our * problems, we're done. */ - if (xfarray_want_pagesort(si, lo, hi)) { - error = xfarray_pagesort(si, lo, hi); - if (error) - goto out_free; + ret = xfarray_foliosort(si, lo, hi); + if (ret < 0) + goto out_free; + if (ret == 1) { si->stack_depth--; continue; } @@ -980,25 +969,24 @@ xfarray_sort( * than the pivot is on the right side of the range. */ while (lo < hi) { + void *p; + /* * Decrement hi until it finds an a[hi] less than the * pivot value. */ - error = xfarray_sort_load_cached(si, hi, scratch); + error = xfarray_sort_scan(si, hi, &p); if (error) goto out_free; - while (xfarray_sort_cmp(si, scratch, pivot) >= 0 && - lo < hi) { + while (xfarray_sort_cmp(si, p, pivot) >= 0 && lo < hi) { hi--; - error = xfarray_sort_load_cached(si, hi, - scratch); + error = xfarray_sort_scan(si, hi, &p); if (error) goto out_free; } - error = xfarray_sort_put_page(si); - if (error) - goto out_free; - + if (p != scratch) + memcpy(scratch, p, si->array->obj_size); + xfarray_sort_scan_done(si); if (xfarray_sort_terminated(si, &error)) goto out_free; @@ -1013,21 +1001,18 @@ xfarray_sort( * Increment lo until it finds an a[lo] greater than * the pivot value. */ - error = xfarray_sort_load_cached(si, lo, scratch); + error = xfarray_sort_scan(si, lo, &p); if (error) goto out_free; - while (xfarray_sort_cmp(si, scratch, pivot) <= 0 && - lo < hi) { + while (xfarray_sort_cmp(si, p, pivot) <= 0 && lo < hi) { lo++; - error = xfarray_sort_load_cached(si, lo, - scratch); + error = xfarray_sort_scan(si, lo, &p); if (error) goto out_free; } - error = xfarray_sort_put_page(si); - if (error) - goto out_free; - + if (p != scratch) + memcpy(scratch, p, si->array->obj_size); + xfarray_sort_scan_done(si); if (xfarray_sort_terminated(si, &error)) goto out_free; diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h index 6f2862054e194d..ec643cc9fc1432 100644 --- a/fs/xfs/scrub/xfarray.h +++ b/fs/xfs/scrub/xfarray.h @@ -105,8 +105,14 @@ struct xfarray_sortinfo { /* XFARRAY_SORT_* flags; see below. */ unsigned int flags; - /* Cache a page here for faster access. */ - struct xfile_page xfpage; + /* Cache a folio here for faster scanning for pivots */ + struct folio *folio; + + /* First array index in folio that is completely readable */ + xfarray_idx_t first_folio_idx; + + /* Last array index in folio that is completely readable */ + xfarray_idx_t last_folio_idx; #ifdef DEBUG /* Performance statistics. */