From patchwork Thu Jul 7 02:35:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kaho Ng X-Patchwork-Id: 9248563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4C433607D8 for ; Tue, 26 Jul 2016 17:45:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CB302094D for ; Tue, 26 Jul 2016 17:45:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 38C2C26B39; Tue, 26 Jul 2016 17:45:11 +0000 (UTC) Received: from oss.sgi.com (oss.sgi.com [192.48.182.195]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 888A02094D for ; Tue, 26 Jul 2016 17:44:40 +0000 (UTC) Received: from oss.sgi.com (localhost [IPv6:::1]) by oss.sgi.com (Postfix) with ESMTP id 208FD7CA1; Tue, 26 Jul 2016 12:44:38 -0500 (CDT) X-Original-To: xfs@oss.sgi.com Delivered-To: xfs@oss.sgi.com Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id D802A7CCF for ; Wed, 6 Jul 2016 21:35:29 -0500 (CDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay1.corp.sgi.com (Postfix) with ESMTP id 53D8D8F80B2 for ; Wed, 6 Jul 2016 19:35:26 -0700 (PDT) X-ASG-Debug-ID: 1467858914-04cbb027356d5320001-NocioJ Received: from mail-pa0-f66.google.com (mail-pa0-f66.google.com [209.85.220.66]) by cuda.sgi.com with ESMTP id Mh50fn1QJW1Yj3fA (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 06 Jul 2016 19:35:14 -0700 (PDT) X-Barracuda-Envelope-From: ngkaho1234@gmail.com X-Barracuda-Effective-Source-IP: mail-pa0-f66.google.com[209.85.220.66] X-Barracuda-Apparent-Source-IP: 209.85.220.66 Received: by mail-pa0-f66.google.com with SMTP id us13so481543pab.1 for ; Wed, 06 Jul 2016 19:35:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=KwsXYVfIPNs/WYvIamkwhfBAKeeverjaFLopJkGl1tU=; b=Qm2vESfI/JsbQJc/hxFwcAqCrRZAD+SlY2ViAA2eVDsRqx9ziNSU7Qx2O8GjwbzTvU U9CA8VVwy1SJT/x5EMRacHNfKGZMpxhOT+SjeOn85RgcAVs4HQ/1r6L4trVDlH9e6yB/ hdakGfjA34pvcx0dTB6vTCYhbmRADhUUNhIB6JZhOiR8qZOCpv5+5jH15CdtytYjo5yd aHG4uZjFdiM3fF5jeVS+HoHsFhjb7MnLTLAqC81qhg3Hsu18FjC80UX7u/hcAKWR8MiG osdYlAu4d0SHAkRHnWADx8p1jiS2K1C2TjeqbO+oEeudlXwnaNCRBHO87ADORijjBjye d6TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=KwsXYVfIPNs/WYvIamkwhfBAKeeverjaFLopJkGl1tU=; b=NoE+tc7chlqwrtZwXCWGRQqf6BwtzZ7VAbsGWhc0JJF++LQ2UTVKLxdp5s0Vcywcip JjQAJufQs1mjMIvwWFMc+w84zhGuhwRind0s753awLqB+hn1YG8nbZ+y3D+4YXlARp41 kHMlLZG0XygW1TGZ1Aep0mGGWGKB0uGFDEe/AhAMhn0ioOqEl3HxBL2mQIWOE9i8shaN MEZVmKmsT6nCQvWbvGIGaijfccgKWzOjLnz809VMg05CLdw0zLSyO8ZcHcxkGp/PXvrn ebNOiHJRM6tw3WX01JKKzbnZgWnw7A1K62l06DDyGPko3Aup9coEfSdNNtXIgCWnU/9g 9dAg== X-Gm-Message-State: ALyK8tIbZsrh0Q2cpVhUxiGhzMSkzePBoNqAAXPoHBQwhXGZA/wVGOtqFXmZqdlP2ljwfdhQm17v4A0D1xV5uA== X-Received: by 10.66.236.69 with SMTP id us5mr47210288pac.69.1467858913209; Wed, 06 Jul 2016 19:35:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.66.37 with HTTP; Wed, 6 Jul 2016 19:35:12 -0700 (PDT) From: Kaho Ng Date: Thu, 7 Jul 2016 10:35:12 +0800 Message-ID: Subject: Some questions about the freelist allocator in XFS To: xfs@oss.sgi.com X-ASG-Orig-Subj: Some questions about the freelist allocator in XFS X-Barracuda-Connect: mail-pa0-f66.google.com[209.85.220.66] X-Barracuda-Start-Time: 1467858914 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://192.48.176.25:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 1553 X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.7 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.31078 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Mailman-Approved-At: Tue, 26 Jul 2016 12:44:35 -0500 X-BeenThere: xfs@oss.sgi.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com X-Virus-Scanned: ClamAV using ClamSMTP I am trying to investigate how freelist allocator in xfs interacts with freespace B+Tree allocator. First I prepared a patch against linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages (The kernel version used is linux-3.10.0-327.22.2.el7). Then, I wrote a simple utility to make TONS of holes in a filesystem by calling fallocate() to punch holes in a file that is almost as large as the volume size. I created an XFS filesystem image by the following steps: 1. fallocate -l 80G /mnt/disk2/xfs 2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs Then I created a large file by fallocate: fallocate -l 85823746048 /mnt/test/abc which left only 4 blocks available in the volume finally: /dev/loop0 20961280 20961276 4 100% /mnt/test The result of xfs_bmap against /mnt/test/abc: /mnt/test/abc: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000 After that, I used the hole-punching utility above to create holes on the files, and captured the output of kmsg. When reading the log output, I realised that there is no B+Tree split triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). Isn't B+Tree split possible in by-size B+Tree even when truncating a longer freespace record to shorter one? But what I found in the log is only a few tree shrinks... And when reading the source code of freespace allocator I found that a B+Tree growth in this case is impossible at least... --- linux-3.10.0-327.22.2.el7.a/fs/xfs/libxfs/xfs_alloc.c 2016-07-07 03:57:19.366512544 +0800 +++ linux-3.10.0-327.22.2.el7.b/fs/xfs/libxfs/xfs_alloc.c 2016-07-07 09:30:32.178766480 +0800 @@ -309,7 +309,8 @@ xfs_alloc_fixup_trees( xfs_extlen_t flen, /* length of free extent */ xfs_agblock_t rbno, /* starting block of returned extent */ xfs_extlen_t rlen, /* length of returned extent */ - int flags) /* flags, XFSA_FIXUP_... */ + int flags, /* flags, XFSA_FIXUP_... */ + int isfl) { int error; /* error code */ int i; /* operation results */ @@ -376,15 +377,27 @@ xfs_alloc_fixup_trees( nfbno1 = rbno + rlen; nflen1 = flen - rlen; nfbno2 = NULLAGBLOCK; + if (isfl) + xfs_warn(mp, + "Case 1 Inserting: nfbno1: %d, nflen1: %d, nfbno2: %d, nflen2: %d, fbno: %d, flen: %d, rbno: %d, rlen: %d", + nfbno1, nflen1, nfbno2, nflen2, fbno, flen, rbno, rlen); } else if (rbno + rlen == fbno + flen) { nfbno1 = fbno; nflen1 = flen - rlen; nfbno2 = NULLAGBLOCK; + if (isfl) + xfs_warn(mp, + "Case 2 Inserting: nfbno1: %d, nflen1: %d, nfbno2: %d, nflen2: %d, fbno: %d, flen: %d, rbno: %d, rlen: %d", + nfbno1, nflen1, nfbno2, nflen2, fbno, flen, rbno, rlen); } else { nfbno1 = fbno; nflen1 = rbno - fbno; nfbno2 = rbno + rlen; nflen2 = (fbno + flen) - nfbno2; + if (isfl) + xfs_warn(mp, + "Case 3 Inserting: nfbno1: %d, nflen1: %d, nfbno2: %d, nflen2: %d, fbno: %d, flen: %d, rbno: %d, rlen: %d", + nfbno1, nflen1, nfbno2, nflen2, fbno, flen, rbno, rlen); } /* * Delete the entry from the by-size btree. @@ -396,19 +409,31 @@ xfs_alloc_fixup_trees( * Add new by-size btree entry(s). */ if (nfbno1 != NULLAGBLOCK) { + struct xfs_btree_block *cntblock; if ((error = xfs_alloc_lookup_eq(cnt_cur, nfbno1, nflen1, &i))) return error; + cntblock = XFS_BUF_TO_BLOCK(cnt_cur->bc_bufs[0]); XFS_WANT_CORRUPTED_RETURN(mp, i == 0); + xfs_warn(mp, + "B+Tree before insert: isfl: %d, bb_numrec: %d, addr: %llu", isfl, xfs_btree_get_numrecs(cntblock), XFS_BUF_ADDR(cnt_cur->bc_bufs[0])); if ((error = xfs_btree_insert(cnt_cur, &i))) return error; + xfs_warn(mp, + "B+Tree after insert: isfl: %d, bb_numrec: %d, addr: %llu", isfl, xfs_btree_get_numrecs(cntblock), XFS_BUF_ADDR(cnt_cur->bc_bufs[0])); XFS_WANT_CORRUPTED_RETURN(mp, i == 1); } if (nfbno2 != NULLAGBLOCK) { + struct xfs_btree_block *cntblock; + cntblock = XFS_BUF_TO_BLOCK(cnt_cur->bc_bufs[0]); if ((error = xfs_alloc_lookup_eq(cnt_cur, nfbno2, nflen2, &i))) return error; XFS_WANT_CORRUPTED_RETURN(mp, i == 0); + xfs_warn(mp, + "B+Tree before insert: isfl: %d, bb_numrec: %d, addr: %llu", isfl, xfs_btree_get_numrecs(cntblock), XFS_BUF_ADDR(cnt_cur->bc_bufs[0])); if ((error = xfs_btree_insert(cnt_cur, &i))) return error; + xfs_warn(mp, + "B+Tree after insert: isfl: %d, bb_numrec: %d, addr: %llu", isfl, xfs_btree_get_numrecs(cntblock), XFS_BUF_ADDR(cnt_cur->bc_bufs[0])); XFS_WANT_CORRUPTED_RETURN(mp, i == 1); } /* @@ -730,7 +755,7 @@ xfs_alloc_ag_vextent_exact( ASSERT(args->agbno + args->len <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length)); error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, fbno, flen, args->agbno, - args->len, XFSA_FIXUP_BNO_OK); + args->len, XFSA_FIXUP_BNO_OK, args->isfl); if (error) { xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR); goto error0; @@ -1028,7 +1053,7 @@ restart: * Fix up the btree entries. */ if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, - ltlen, bnew, blen, XFSA_FIXUP_CNT_OK))) + ltlen, bnew, blen, XFSA_FIXUP_CNT_OK, args->isfl))) goto error0; xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR); @@ -1219,7 +1244,7 @@ restart: args->agbno = ltnew; if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen, - ltnew, rlen, XFSA_FIXUP_BNO_OK))) + ltnew, rlen, XFSA_FIXUP_BNO_OK, args->isfl))) goto error0; if (j) @@ -1420,7 +1445,7 @@ restart: bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp, args->agno, XFS_BTNUM_BNO); if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, fbno, flen, - rbno, rlen, XFSA_FIXUP_CNT_OK))) + rbno, rlen, XFSA_FIXUP_CNT_OK, args->isfl))) goto error0; xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);