Message ID | 87fs1g1rac.fsf@debian-BULLSEYE-live-builder-AMD64 (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [GIT,PULL] xfs: new code for 6.7 | expand |
On Wed, 8 Nov 2023 at 02:19, Chandan Babu R <chandanbabu@kernel.org> wrote: > > I had performed a test merge with latest contents of torvalds/linux.git. > > This resulted in merge conflicts. The following diff should resolve the merge > conflicts. Well, your merge conflict resolution is the same as my initial mindless one, but then when I look closer at it, it turns out that it's wrong. It's wrong not because the merge itself would be wrong, but because the conflict made me look at the original, and it turns out that commit 75d1e312bbbd ("xfs: convert to new timestamp accessors") was buggy. I'm actually surprised the compilers don't complain about it, because the bug means that the new struct timespec64 ts; temporary isn't actually initialized for the !XFS_DIFLAG_NEWRTBM case. The code does xfs_rtpick_extent(..) ... struct timespec64 ts; .. if (!(mp->m_rbmip->i_diflags & XFS_DIFLAG_NEWRTBM)) { mp->m_rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM; seq = 0; } else { ... ts.tv_sec = (time64_t)seq + 1; inode_set_atime_to_ts(VFS_I(mp->m_rbmip), ts); and notice how 'ts.tv_nsec' is never initialized. So we'll set the nsec part of the atime to random garbage. Oh, I'm sure it doesn't really *matter*, but it's most certainly wrong. I am not very happy about the whole crazy XFS model where people cast the 'struct timespec64' pointer to an 'uint64_t' pointer, and then say 'now it's a sequence number'. This is not the only place that happened, ie we have similar disgusting code in at least xfs_rtfree_extent() too. That other place in xfs_rtfree_extent() didn't have this bug - it does inode_get_atime() unconditionally and this keeps the nsec field as-is, but that other place has the same really ugly code. Doing that "cast struct timespec64 to an uint64_t' is not only ugly and wrong, it's _stupid_. The only reason it works in the first place is that 'struct timespec64' is struct timespec64 { time64_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; so the first field is 'tv_sec', which is a 64-bit (signed) value. So the cast is disgusting - and it's pointless. I don't know why it's done that way. It would have been much cleaner to just use tv_sec, and have a big comment about it being used as a sequence number here. I _assume_ there's just a simple 32-bit history to this all, where at one point it was a 32-bit tv_sec, and the cast basically used both 32-bit fields as a 64-bit sequence number. I get it. But it's most definitely wrong now. End result: I ended up fixing that bug and removing the bogus casts in my merge. I *think* I got it right, but apologies in advance if I screwed up. I only did visual inspection and build testing, no actual real testing. Also, xfs people may obviously have other preferences for how to deal with the whole "now using tv_sec in the VFS inode as a 64-bit sequence number" thing, and maybe you prefer to then update my fix to this all. But that horrid casts certainly wasn't the right way to do it. Put another way: please do give my merge a closer look, and decide amongst yourself if you then want to deal with this some other way. Linus
The pull request you sent on Wed, 08 Nov 2023 15:26:29 +0530:
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/xfs-6.7-merge-2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/34f763262743aac0847b15711b0460ac6d6943d5
Thank you!
On Wed, Nov 08, 2023 at 01:29:16PM -0800, Linus Torvalds wrote: > On Wed, 8 Nov 2023 at 02:19, Chandan Babu R <chandanbabu@kernel.org> wrote: > > > > I had performed a test merge with latest contents of torvalds/linux.git. > > > > This resulted in merge conflicts. The following diff should resolve the merge > > conflicts. > > Well, your merge conflict resolution is the same as my initial > mindless one, but then when I look closer at it, it turns out that > it's wrong. > > It's wrong not because the merge itself would be wrong, but because > the conflict made me look at the original, and it turns out that > commit 75d1e312bbbd ("xfs: convert to new timestamp accessors") was > buggy. > > I'm actually surprised the compilers don't complain about it, because > the bug means that the new > > struct timespec64 ts; > > temporary isn't actually initialized for the !XFS_DIFLAG_NEWRTBM case. > > The code does > > xfs_rtpick_extent(..) Oh gosh. Dave might have other things to say, but xfs_rtpick_extent is the sort of function that I hate with the power of 1,000 suns. Back in 2.6.x it apparently did this: seqp = (__uint64_t *)&mp->m_rbmip->i_d.di_atime; At the time, xfs_inode.id.di_atime was a struct xfs_ictimestamp: typedef struct xfs_ictimestamp { __int32_t t_sec; /* timestamp seconds */ __int32_t t_nsec; /* timestamp nanoseconds */ } xfs_ictimestamp_t; So the rt allocator thinks its maintaining a u64 new file counter in the bitmap file's atime. The lower 32bits ended up in t_sec, and the upper 32bits ended up in t_nsec. At some point (4.6?) the function started using the VFS i_atime field instead of the di_atime field. On 32-bit systems the struct timespec was still a struct of two int32 values and everything kept working the way it always had. On 64-bit systems, tv_sec is 64-bits which means the sequence counter was only stored (incore, anyway) in tv_sec. XFS truncates the upper 32-bits when writing the inode to disk because (at the time) it didn't handle y2038. IOWs, we broke the ondisk format in 2016 and nobody noticed. Because the allocator calls xfs_highbit64 on the sequence counter, the only observable behavior change would be the starting location of a free space search for the first rt file allocation after an upgrade from 4.5 to a newer kernel on a 64-bit machine. (Or going back, obviously) But then in 4.18 or so, the VFS inode switched to timespec64, at which point /all/ of the 32-bit kernels "migrated" to storing the sequence counter in tv_sec and truncating it when it goes out to disk. Then in 5.10 we added y2038 support, so post-Covid filesystems truncate less of the sequence counter. #winning > ... > struct timespec64 ts; > .. > if (!(mp->m_rbmip->i_diflags & XFS_DIFLAG_NEWRTBM)) { > mp->m_rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM; > seq = 0; > } else { > ... > ts.tv_sec = (time64_t)seq + 1; > inode_set_atime_to_ts(VFS_I(mp->m_rbmip), ts); So... according to the pre-4.6 definition of the sequence counter this is wrong, but OTOH it's not inconsistent with what was there in 6.4. > and notice how 'ts.tv_nsec' is never initialized. So we'll set the > nsec part of the atime to random garbage. > > Oh, I'm sure it doesn't really *matter*, but it's most certainly wrong. tv_nsec isn't explicitly initialized by rtpick_extent, but IIRC mkfs initializes the ondisk inode's tv_nsec field and the kernel reads that into the incore inode, so I dont't think it's leaking kernel memory contents. > I am not very happy about the whole crazy XFS model where people cast > the 'struct timespec64' pointer to an 'uint64_t' pointer, and then say > 'now it's a sequence number'. This is not the only place that > happened, ie we have similar disgusting code in at least > xfs_rtfree_extent() too. > > That other place in xfs_rtfree_extent() didn't have this bug - it does > inode_get_atime() unconditionally and this keeps the nsec field as-is, > but that other place has the same really ugly code. > > Doing that "cast struct timespec64 to an uint64_t' is not only ugly > and wrong, it's _stupid_. The only reason it works in the first place > is that 'struct timespec64' is > > struct timespec64 { > time64_t tv_sec; /* seconds */ > long tv_nsec; /* nanoseconds */ > }; > > so the first field is 'tv_sec', which is a 64-bit (signed) value. (yep) > So the cast is disgusting - and it's pointless. I don't know why it's > done that way. It would have been much cleaner to just use tv_sec, and > have a big comment about it being used as a sequence number here. > > I _assume_ there's just a simple 32-bit history to this all, where at > one point it was a 32-bit tv_sec, and the cast basically used both > 32-bit fields as a 64-bit sequence number. I get it. But it's most > definitely wrong now. I don't even think it was good C back whenever it was written, but I was probably in high school at that point. ;) > End result: I ended up fixing that bug and removing the bogus casts in > my merge. I *think* I got it right, but apologies in advance if I > screwed up. I only did visual inspection and build testing, no actual > real testing. My opinion is that you've kept your tree consistent with what the kernel has been doing for the last 5 years. No comment about the s**tshow that went on before that. > Also, xfs people may obviously have other preferences for how to deal > with the whole "now using tv_sec in the VFS inode as a 64-bit sequence > number" thing, and maybe you prefer to then update my fix to this all. > But that horrid casts certainly wasn't the right way to do it. Yeah, I can work on that for the rt modernization patchset. > Put another way: please do give my merge a closer look, and decide > amongst yourself if you then want to deal with this some other way. Let's see what the other devs say. Thank you for taking Chandan's pull request, by the way. --D > > Linus
On Wed, Nov 08, 2023 at 02:52:00PM -0800, Darrick J. Wong wrote: > > Also, xfs people may obviously have other preferences for how to deal > > with the whole "now using tv_sec in the VFS inode as a 64-bit sequence > > number" thing, and maybe you prefer to then update my fix to this all. > > But that horrid casts certainly wasn't the right way to do it. > > Yeah, I can work on that for the rt modernization patchset. As someone who has just written some new code stealing this trick I actually have a todo list item to make this less horrible as the cast upset my stomache. But shame on me for not actually noticing that it is buggy as well (which honestly should be the standard assumption for casts like this).
On Thu, Nov 09, 2023 at 05:51:50AM +0100, Christoph Hellwig wrote: > On Wed, Nov 08, 2023 at 02:52:00PM -0800, Darrick J. Wong wrote: > > > Also, xfs people may obviously have other preferences for how to deal > > > with the whole "now using tv_sec in the VFS inode as a 64-bit sequence > > > number" thing, and maybe you prefer to then update my fix to this all. > > > But that horrid casts certainly wasn't the right way to do it. > > > > Yeah, I can work on that for the rt modernization patchset. > > As someone who has just written some new code stealing this trick I > actually have a todo list item to make this less horrible as the cast > upset my stomache. But shame on me for not actually noticing that it > is buggy as well (which honestly should be the standard assumption for > casts like this). Dave and I started looking at this too, and came up with: For rtgroups filesystems, what if rtpick simply rotored the rtgroups? And what if we didn't bother persisting the rotor value, which would make this casting nightmare go away in the long run. It's not like we persist the agi rotors. --D
On Wed, Nov 08, 2023 at 11:39:45PM -0800, Darrick J. Wong wrote: > Dave and I started looking at this too, and came up with: For rtgroups > filesystems, what if rtpick simply rotored the rtgroups? And what if we > didn't bother persisting the rotor value, which would make this casting > nightmare go away in the long run. It's not like we persist the agi > rotors. Yep. We should still fix the cast and replace it with a proper union or other means for pre-RTG file systems given that they will be around for while.
On Thu, Nov 09, 2023 at 03:46:14PM +0100, Christoph Hellwig wrote: > On Wed, Nov 08, 2023 at 11:39:45PM -0800, Darrick J. Wong wrote: > > Dave and I started looking at this too, and came up with: For rtgroups > > filesystems, what if rtpick simply rotored the rtgroups? And what if we > > didn't bother persisting the rotor value, which would make this casting > > nightmare go away in the long run. It's not like we persist the agi > > rotors. > > Yep. We should still fix the cast and replace it with a proper union > or other means for pre-RTG file systems given that they will be around > for while. <nod> Linus' fixup stuffs the seq value in tv_sec. That's not great since the inode writeout code then truncates the upper 32 bits, but that's what the kernel has been doing for 5+ years now. Dave suggested that we might restore the pre-4.6 behavior by explicitly encoding what we used to do: inode->i_atime.tv_sec = seq & 0xFFFFFFFF; inode->i_atime.tv_nsec = seq >> 32; (There's a helper in 6.7 for this, apparently.) But then I pointed out that the entire rtpick sequence counter thing merely provides a *starting point* for rtbitmap searches. So it's not like garbled values result in metadata inconsistency. IOWs, it's apparently benign. IOWs, how much does anyone care about improving on Linus' fixup? --D
On Thu, Nov 09, 2023 at 08:38:56AM -0800, Darrick J. Wong wrote: > Dave suggested that we might restore the pre-4.6 behavior by explicitly > encoding what we used to do: > > inode->i_atime.tv_sec = seq & 0xFFFFFFFF; > inode->i_atime.tv_nsec = seq >> 32; > > (There's a helper in 6.7 for this, apparently.) > > But then I pointed out that the entire rtpick sequence counter thing > merely provides a *starting point* for rtbitmap searches. So it's not > like garbled values result in metadata inconsistency. IOWs, it's > apparently benign. > > IOWs, how much does anyone care about improving on Linus' fixup? I'd really like to see the cast of a pointer to a struct type to a scalar gone, because those tend to hide bugs. I'm not going to bother you too much with it, promised.
On Wed, 2023-11-08 at 13:29 -0800, Linus Torvalds wrote: > On Wed, 8 Nov 2023 at 02:19, Chandan Babu R <chandanbabu@kernel.org> wrote: > > > > I had performed a test merge with latest contents of torvalds/linux.git. > > > > This resulted in merge conflicts. The following diff should resolve the merge > > conflicts. > > Well, your merge conflict resolution is the same as my initial > mindless one, but then when I look closer at it, it turns out that > it's wrong. > > It's wrong not because the merge itself would be wrong, but because > the conflict made me look at the original, and it turns out that > commit 75d1e312bbbd ("xfs: convert to new timestamp accessors") was > buggy. > > I'm actually surprised the compilers don't complain about it, because > the bug means that the new > > struct timespec64 ts; > > temporary isn't actually initialized for the !XFS_DIFLAG_NEWRTBM case. > > The code does > > xfs_rtpick_extent(..) > ... > struct timespec64 ts; > .. > if (!(mp->m_rbmip->i_diflags & XFS_DIFLAG_NEWRTBM)) { > mp->m_rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM; > seq = 0; > } else { > ... > ts.tv_sec = (time64_t)seq + 1; > inode_set_atime_to_ts(VFS_I(mp->m_rbmip), ts); > > and notice how 'ts.tv_nsec' is never initialized. So we'll set the > nsec part of the atime to random garbage. > > Oh, I'm sure it doesn't really *matter*, but it's most certainly wrong. > > I am not very happy about the whole crazy XFS model where people cast > the 'struct timespec64' pointer to an 'uint64_t' pointer, and then say > 'now it's a sequence number'. This is not the only place that > happened, ie we have similar disgusting code in at least > xfs_rtfree_extent() too. > > That other place in xfs_rtfree_extent() didn't have this bug - it does > inode_get_atime() unconditionally and this keeps the nsec field as-is, > but that other place has the same really ugly code. > > Doing that "cast struct timespec64 to an uint64_t' is not only ugly > and wrong, it's _stupid_. The only reason it works in the first place > is that 'struct timespec64' is > > struct timespec64 { > time64_t tv_sec; /* seconds */ > long tv_nsec; /* nanoseconds */ > }; > > so the first field is 'tv_sec', which is a 64-bit (signed) value. > > So the cast is disgusting - and it's pointless. I don't know why it's > done that way. It would have been much cleaner to just use tv_sec, and > have a big comment about it being used as a sequence number here. > > I _assume_ there's just a simple 32-bit history to this all, where at > one point it was a 32-bit tv_sec, and the cast basically used both > 32-bit fields as a 64-bit sequence number. I get it. But it's most > definitely wrong now. > > End result: I ended up fixing that bug and removing the bogus casts in > my merge. I *think* I got it right, but apologies in advance if I > screwed up. I only did visual inspection and build testing, no actual > real testing. > > Also, xfs people may obviously have other preferences for how to deal > with the whole "now using tv_sec in the VFS inode as a 64-bit sequence > number" thing, and maybe you prefer to then update my fix to this all. > But that horrid casts certainly wasn't the right way to do it. > > Put another way: please do give my merge a closer look, and decide > amongst yourself if you then want to deal with this some other way. > > Linus I think when I was looking at that code, I had convinced myself that the tv_nsec field didn't matter at all, since it wasn't being used, but I should have done a better job of preserving the existing value. Mea culpa. Your fixup looks right to me. Thanks for fixing it. Cheers,
On Wed, Nov 08, 2023 at 11:39:45PM -0800, Darrick J. Wong wrote: > On Thu, Nov 09, 2023 at 05:51:50AM +0100, Christoph Hellwig wrote: > > On Wed, Nov 08, 2023 at 02:52:00PM -0800, Darrick J. Wong wrote: > > > > Also, xfs people may obviously have other preferences for how to deal > > > > with the whole "now using tv_sec in the VFS inode as a 64-bit sequence > > > > number" thing, and maybe you prefer to then update my fix to this all. > > > > But that horrid casts certainly wasn't the right way to do it. > > > > > > Yeah, I can work on that for the rt modernization patchset. > > > > As someone who has just written some new code stealing this trick I > > actually have a todo list item to make this less horrible as the cast > > upset my stomache. But shame on me for not actually noticing that it > > is buggy as well (which honestly should be the standard assumption for > > casts like this). > > Dave and I started looking at this too, and came up with: For rtgroups > filesystems, what if rtpick simply rotored the rtgroups? And what if we > didn't bother persisting the rotor value, which would make this casting > nightmare go away in the long run. It's not like we persist the agi > rotors. I think we could replace it right now with an in-memory rotor like the mp->m_agfrotor. It really does not need to be persistent; the current sequence based algorithm devolves to sequential ascending block order allocation targets once the sequence number gets large enough. Further, the (somewhat) deterministic extent distribution it is trying to acheive (i.e. even distribution across the rt dev) is really only acheivable in write-once workloads. The moment we start freeing space on the rtdev, the free space is no longer uniform and does not match the pattern the sequence-based target iterates. Hence the layout the search target attempts to create is unacheivable and largely meaningless. IOWs, we may as well just use an in-memory sequence number or a random number to seed the allocation target; they will work just as well as what we have right now without the need for persistent sequence numbers. Also, I think that not updating the persistent sequence number is fine from a backwards compatibility perspective - older kernels will just use it as it does now and newer kernels will just ignore it... I say we just kill the whole sequence number in atime thing completely.... -Dave.
diff --cc fs/xfs/libxfs/xfs_rtbitmap.c index 396648acb5be,b332ab490a48..84e27b9987f8 --- a/fs/xfs/libxfs/xfs_rtbitmap.c diff --cc fs/xfs/xfs_rtalloc.c index 2e1a4e5cd03d,ba66442910b1..0254c573086a --- a/fs/xfs/xfs_rtalloc.c
Hi Linus, Please pull this branch with changes for xfs for 6.7-rc1. The important changes include, 1. CPU usage optimizations for realtime allocator. 2. Allowing read operations to continue while a FICLONE ioctl is being serviced. The remaining changes are limited to bug fixes and code cleanups. There was a delay in me pushing the changes to XFS' for-next branch and hence a delay in the code changes reaching the linux-next tree. The XFS changes reached linux-next on 31st of October. The delay was due to me having to drop a patch from the XFS tree and having to initiate execution of the test suite once again on October 26th. The complete test run requires around 4 days to complete. During a discussion, Darrick told me that in such scenarios he would limit testing to non-fuzz tests which take around 12 hours to complete. Hence, in hindsight, I could have limited the time taken to execute tests after dropping the patch. I will make sure to update XFS' for-next branch well before the merge window period begins from next release onwards. The changes that are part of the current pull request are contained within XFS i.e. there are no patches which straddle across other subsystems. I have been executing fstests on linux-next for more than a week now. There were no new regressions found in XFS during the test run. I had performed a test merge with latest contents of torvalds/linux.git. i.e. 305230142ae0637213bf6e04f6d9f10bbcb74af8 Author: Linus Torvalds <torvalds@linux-foundation.org> AuthorDate: Tue Nov 7 17:16:23 2023 -0800 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Tue Nov 7 17:16:23 2023 -0800 Merge tag 'pm-6.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm This resulted in merge conflicts. The following diff should resolve the merge conflicts. +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@@ -960,19 -931,18 +931,19 @@@ xfs_rtcheck_alloc_range * Free an extent in the realtime subvolume. Length is expressed in * realtime extents, as is the block number. */ - int /* error */ + int xfs_rtfree_extent( - xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to free */ - xfs_extlen_t len) /* length of extent freed */ + struct xfs_trans *tp, /* transaction pointer */ + xfs_rtxnum_t start, /* starting rtext number to free */ + xfs_rtxlen_t len) /* length of extent freed */ { - int error; /* error value */ - xfs_mount_t *mp; /* file system mount structure */ - xfs_fsblock_t sb; /* summary file block number */ - struct xfs_buf *sumbp = NULL; /* summary file block buffer */ - struct timespec64 atime; - - mp = tp->t_mountp; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rtalloc_args args = { + .mp = mp, + .tp = tp, + }; + int error; ++ struct timespec64 atime; ASSERT(mp->m_rbmip->i_itemp != NULL); ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); @@@ -1000,13 -970,46 +971,49 @@@ mp->m_sb.sb_rextents) { if (!(mp->m_rbmip->i_diflags & XFS_DIFLAG_NEWRTBM)) mp->m_rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM; - *(uint64_t *)&VFS_I(mp->m_rbmip)->i_atime = 0; + + atime = inode_get_atime(VFS_I(mp->m_rbmip)); + *((uint64_t *)&atime) = 0; + inode_set_atime_to_ts(VFS_I(mp->m_rbmip), atime); xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE); } - return 0; + error = 0; + out: + xfs_rtbuf_cache_relse(&args); + return error; + } + + /* + * Free some blocks in the realtime subvolume. rtbno and rtlen are in units of + * rt blocks, not rt extents; must be aligned to the rt extent size; and rtlen + * cannot exceed XFS_MAX_BMBT_EXTLEN. + */ + int + xfs_rtfree_blocks( + struct xfs_trans *tp, + xfs_fsblock_t rtbno, + xfs_filblks_t rtlen) + { + struct xfs_mount *mp = tp->t_mountp; + xfs_rtxnum_t start; + xfs_filblks_t len; + xfs_extlen_t mod; + + ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN); + + len = xfs_rtb_to_rtxrem(mp, rtlen, &mod); + if (mod) { + ASSERT(mod == 0); + return -EIO; + } + + start = xfs_rtb_to_rtxrem(mp, rtbno, &mod); + if (mod) { + ASSERT(mod == 0); + return -EIO; + } + + return xfs_rtfree_extent(tp, start, len); } /* Find all the free records within a given range. */ +++ b/fs/xfs/xfs_rtalloc.c @@@ -1420,16 -1414,16 +1414,16 @@@ xfs_rtunmount_inodes */ int /* error */ xfs_rtpick_extent( - xfs_mount_t *mp, /* file system mount point */ - xfs_trans_t *tp, /* transaction pointer */ - xfs_extlen_t len, /* allocation length (rtextents) */ - xfs_rtblock_t *pick) /* result rt extent */ - { - xfs_rtblock_t b; /* result block */ - int log2; /* log of sequence number */ - uint64_t resid; /* residual after log removed */ - uint64_t seq; /* sequence number of file creation */ - struct timespec64 ts; /* temporary timespec64 storage */ - xfs_mount_t *mp, /* file system mount point */ - xfs_trans_t *tp, /* transaction pointer */ - xfs_rtxlen_t len, /* allocation length (rtextents) */ - xfs_rtxnum_t *pick) /* result rt extent */ ++ xfs_mount_t *mp, /* file system mount point */ ++ xfs_trans_t *tp, /* transaction pointer */ ++ xfs_rtxlen_t len, /* allocation length (rtextents) */ ++ xfs_rtxnum_t *pick) /* result rt extent */ + { - xfs_rtxnum_t b; /* result rtext */ - int log2; /* log of sequence number */ - uint64_t resid; /* residual after log removed */ - uint64_t seq; /* sequence number of file creation */ - uint64_t *seqp; /* pointer to seqno in inode */ ++ xfs_rtxnum_t b; /* result rtext */ ++ int log2; /* log of sequence number */ ++ uint64_t resid; /* residual after log removed */ ++ uint64_t seq; /* sequence number of file creation */ ++ struct timespec64 ts; /* temporary timespec64 storage */ ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); Please let me know if you encounter any problems. The following changes since commit 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1: Linux 6.6-rc7 (2023-10-22 12:11:21 -1000) are available in the Git repository at: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/xfs-6.7-merge-2 for you to fetch changes up to 14a537983b228cb050ceca3a5b743d01315dc4aa: xfs: allow read IO and FICLONE to run concurrently (2023-10-23 12:02:26 +0530) ---------------------------------------------------------------- New code for 6.7: * Realtime device subsystem - Cleanup usage of xfs_rtblock_t and xfs_fsblock_t data types. - Replace open coded conversions between rt blocks and rt extents with calls to static inline helpers. - Replace open coded realtime geometry compuation and macros with helper functions. - CPU usage optimizations for realtime allocator. - Misc. Bug fixes associated with Realtime device. * Allow read operations to execute while an FICLONE ioctl is being serviced. * Misc. bug fixes - Alert user when xfs_droplink() encounters an inode with a link count of zero. - Handle the case where the allocator could return zero extents when servicing an fallocate request. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> ---------------------------------------------------------------- Catherine Hoang (1): xfs: allow read IO and FICLONE to run concurrently Chandan Babu R (6): Merge tag 'realtime-fixes-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Merge tag 'clean-up-realtime-units-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Merge tag 'refactor-rt-unit-conversions-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Merge tag 'refactor-rtbitmap-macros-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Merge tag 'refactor-rtbitmap-accessors-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Merge tag 'rtalloc-speedups-6.7_2023-10-19' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.7-mergeA Cheng Lin (1): xfs: introduce protection for drop nlink Christoph Hellwig (1): xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space Darrick J. Wong (30): xfs: bump max fsgeom struct version xfs: hoist freeing of rt data fork extent mappings xfs: prevent rt growfs when quota is enabled xfs: rt stubs should return negative errnos when rt disabled xfs: fix units conversion error in xfs_bmap_del_extent_delay xfs: make sure maxlen is still congruent with prod when rounding down xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator xfs: create a helper to convert rtextents to rtblocks xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t xfs: create a helper to compute leftovers of realtime extents xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t xfs: create a helper to convert extlen to rtextlen xfs: rename xfs_verify_rtext to xfs_verify_rtbext xfs: create helpers to convert rt block numbers to rt extent numbers xfs: convert rt extent numbers to xfs_rtxnum_t xfs: convert do_div calls to xfs_rtb_to_rtx helper calls xfs: create rt extent rounding helpers for realtime extent blocks xfs: convert the rtbitmap block and bit macros to static inline functions xfs: use shifting and masking when converting rt extents, if possible xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros xfs: convert open-coded xfs_rtword_t pointer accesses to helper xfs: convert rt summary macros to helpers xfs: create a helper to handle logging parts of rt bitmap/summary blocks xfs: create helpers for rtbitmap block/wordcount computations xfs: use accessor functions for bitmap words xfs: create helpers for rtsummary block/wordcount computations xfs: use accessor functions for summary info words xfs: simplify xfs_rtbuf_get calling conventions xfs: simplify rt bitmap/summary block accessor functions Dave Chinner (1): xfs: consolidate realtime allocation arguments Omar Sandoval (6): xfs: cache last bitmap block in realtime allocator xfs: invert the realtime summary cache xfs: return maximum free size from xfs_rtany_summary() xfs: limit maxlen based on available space in xfs_rtallocate_extent_near() xfs: don't try redundant allocations in xfs_rtallocate_extent_near() xfs: don't look for end of extent further than necessary in xfs_rtallocate_extent_near() fs/xfs/libxfs/xfs_bmap.c | 45 +-- fs/xfs/libxfs/xfs_format.h | 34 +- fs/xfs/libxfs/xfs_rtbitmap.c | 803 ++++++++++++++++++++++------------------- fs/xfs/libxfs/xfs_rtbitmap.h | 383 ++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 2 + fs/xfs/libxfs/xfs_sb.h | 2 +- fs/xfs/libxfs/xfs_trans_resv.c | 10 +- fs/xfs/libxfs/xfs_types.c | 4 +- fs/xfs/libxfs/xfs_types.h | 10 +- fs/xfs/scrub/bmap.c | 2 +- fs/xfs/scrub/fscounters.c | 2 +- fs/xfs/scrub/inode.c | 3 +- fs/xfs/scrub/rtbitmap.c | 28 +- fs/xfs/scrub/rtsummary.c | 72 ++-- fs/xfs/scrub/trace.c | 1 + fs/xfs/scrub/trace.h | 15 +- fs/xfs/xfs_bmap_util.c | 74 ++-- fs/xfs/xfs_file.c | 63 +++- fs/xfs/xfs_fsmap.c | 15 +- fs/xfs/xfs_inode.c | 24 ++ fs/xfs/xfs_inode.h | 9 + fs/xfs/xfs_inode_item.c | 3 +- fs/xfs/xfs_ioctl.c | 5 +- fs/xfs/xfs_linux.h | 12 + fs/xfs/xfs_mount.h | 8 +- fs/xfs/xfs_ondisk.h | 4 + fs/xfs/xfs_reflink.c | 4 + fs/xfs/xfs_rtalloc.c | 626 ++++++++++++++++---------------- fs/xfs/xfs_rtalloc.h | 94 +---- fs/xfs/xfs_super.c | 3 +- fs/xfs/xfs_trans.c | 7 +- 31 files changed, 1425 insertions(+), 942 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h