Message ID | 20240611041540.495840-1-mjguzik@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vfs: partially sanitize i_state zeroing on inode creation | expand |
On Tue 11-06-24 06:15:40, Mateusz Guzik wrote: > new_inode used to have the following: > spin_lock(&inode_lock); > inodes_stat.nr_inodes++; > list_add(&inode->i_list, &inode_in_use); > list_add(&inode->i_sb_list, &sb->s_inodes); > inode->i_ino = ++last_ino; > inode->i_state = 0; > spin_unlock(&inode_lock); > > over time things disappeared, got moved around or got replaced (global > inode lock with a per-inode lock), eventually this got reduced to: > spin_lock(&inode->i_lock); > inode->i_state = 0; > spin_unlock(&inode->i_lock); > > But the lock acquire here does not synchronize against anyone. > > Additionally iget5_locked performs i_state = 0 assignment without any > locks to begin with and the two combined look confusing at best. > > It looks like the current state is a leftover which was not cleaned up. > > Ideally it would be an invariant that i_state == 0 to begin with, but > achieving that would require dealing with all filesystem alloc handlers > one by one. > > In the meantime drop the misleading locking and move i_state zeroing to > alloc_inode so that others don't need to deal with it by hand. > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Good point. But the initialization would seem more natural in inode_init_always(), wouldn't it? And that will also address your "FIXME" comment. Honza > --- > > I diffed this against fs-next + my inode hash patch as it adds one > i_state = 0 case. Should that patch not be accepted this bit can be > easily dropped from this one. > > I brought the entire thing up quite some time ago [1] and Dave Chinner > noted that perhaps the lock has a side effect of providing memory > barriers which otherwise would not be there and which are needed by > someone. > > For new_inode and alloc_inode consumers all fences are already there > anyway due to immediate lock usage. > > Arguably new_inode_pseudo escape without it but I don't find the code at > hand to be affected in any meanignful way -- the only 2 consumers > (get_pipe_inode and sock_alloc) perform numerous other stores to the > inode immediately after. By the time it gets added to anything looking > at i_state, flushing that should be handled by whatever thing which adds > it. Mentioning this just in case. > > [1] https://lore.kernel.org/all/CAGudoHF_Y0shcU+AMRRdN5RQgs9L_HHvBH8D4K=7_0X72kYy2g@mail.gmail.com/ > > fs/inode.c | 15 +++++---------- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index 149adf8ab0ea..3967e68311a6 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -276,6 +276,10 @@ static struct inode *alloc_inode(struct super_block *sb) > return NULL; > } > > + /* > + * FIXME: the code should be able to assert i_state == 0 instead. > + */ > + inode->i_state = 0; > return inode; > } > > @@ -1023,14 +1027,7 @@ EXPORT_SYMBOL(get_next_ino); > */ > struct inode *new_inode_pseudo(struct super_block *sb) > { > - struct inode *inode = alloc_inode(sb); > - > - if (inode) { > - spin_lock(&inode->i_lock); > - inode->i_state = 0; > - spin_unlock(&inode->i_lock); > - } > - return inode; > + return alloc_inode(sb); > } > > /** > @@ -1254,7 +1251,6 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, > struct inode *new = alloc_inode(sb); > > if (new) { > - new->i_state = 0; > inode = inode_insert5(new, hashval, test, set, data); > if (unlikely(inode != new)) > destroy_inode(new); > @@ -1297,7 +1293,6 @@ struct inode *iget5_locked_rcu(struct super_block *sb, unsigned long hashval, > > new = alloc_inode(sb); > if (new) { > - new->i_state = 0; > inode = inode_insert5(new, hashval, test, set, data); > if (unlikely(inode != new)) > destroy_inode(new); > -- > 2.43.0 >
On Tue, Jun 11, 2024 at 12:02:22PM +0200, Jan Kara wrote: > On Tue 11-06-24 06:15:40, Mateusz Guzik wrote: > > new_inode used to have the following: > > spin_lock(&inode_lock); > > inodes_stat.nr_inodes++; > > list_add(&inode->i_list, &inode_in_use); > > list_add(&inode->i_sb_list, &sb->s_inodes); > > inode->i_ino = ++last_ino; > > inode->i_state = 0; > > spin_unlock(&inode_lock); > > > > over time things disappeared, got moved around or got replaced (global > > inode lock with a per-inode lock), eventually this got reduced to: > > spin_lock(&inode->i_lock); > > inode->i_state = 0; > > spin_unlock(&inode->i_lock); > > > > But the lock acquire here does not synchronize against anyone. > > > > Additionally iget5_locked performs i_state = 0 assignment without any > > locks to begin with and the two combined look confusing at best. > > > > It looks like the current state is a leftover which was not cleaned up. > > > > Ideally it would be an invariant that i_state == 0 to begin with, but > > achieving that would require dealing with all filesystem alloc handlers > > one by one. > > > > In the meantime drop the misleading locking and move i_state zeroing to > > alloc_inode so that others don't need to deal with it by hand. > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > Good point. But the initialization would seem more natural in > inode_init_always(), wouldn't it? And that will also address your "FIXME" > comment. > My point is that by the time the inode is destroyed some of the fields like i_state should be set to a well-known value, this one preferably plain 0. I did not patch inode_init_always because it is exported and xfs uses it in 2 spots, only one of which zeroing the thing immediately after. Another one is a little more involved, it probably would not be a problem as the value is set altered later anyway, but I don't want to mess with semantics of the func if it can be easily avoided.
On Tue, Jun 11, 2024 at 12:23:59PM +0200, Mateusz Guzik wrote: > On Tue, Jun 11, 2024 at 12:02:22PM +0200, Jan Kara wrote: > > On Tue 11-06-24 06:15:40, Mateusz Guzik wrote: > > > new_inode used to have the following: > > > spin_lock(&inode_lock); > > > inodes_stat.nr_inodes++; > > > list_add(&inode->i_list, &inode_in_use); > > > list_add(&inode->i_sb_list, &sb->s_inodes); > > > inode->i_ino = ++last_ino; > > > inode->i_state = 0; > > > spin_unlock(&inode_lock); > > > > > > over time things disappeared, got moved around or got replaced (global > > > inode lock with a per-inode lock), eventually this got reduced to: > > > spin_lock(&inode->i_lock); > > > inode->i_state = 0; > > > spin_unlock(&inode->i_lock); > > > > > > But the lock acquire here does not synchronize against anyone. > > > > > > Additionally iget5_locked performs i_state = 0 assignment without any > > > locks to begin with and the two combined look confusing at best. > > > > > > It looks like the current state is a leftover which was not cleaned up. > > > > > > Ideally it would be an invariant that i_state == 0 to begin with, but > > > achieving that would require dealing with all filesystem alloc handlers > > > one by one. > > > > > > In the meantime drop the misleading locking and move i_state zeroing to > > > alloc_inode so that others don't need to deal with it by hand. > > > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > > > Good point. But the initialization would seem more natural in > > inode_init_always(), wouldn't it? And that will also address your "FIXME" > > comment. > > > > My point is that by the time the inode is destroyed some of the fields > like i_state should be set to a well-known value, this one preferably > plain 0. > > I did not patch inode_init_always because it is exported and xfs uses it > in 2 spots, only one of which zeroing the thing immediately after. > Another one is a little more involved, it probably would not be a > problem as the value is set altered later anyway, but I don't want to > mess with semantics of the func if it can be easily avoided. Better to move the zeroing to inode_init_always(), do the basic save/restore mod to xfs_reinit_inode(), and let us XFS people worry about whether inode_init_always() is the right thing to be calling in their code... All you'd need to do in xfs_reinit_inode() is this + unsigned long state = inode->i_state; ..... error = inode_init_always(mp->m_super, inode); ..... + inode->i_state = state; ..... And it should behave as expected. -Dave.
On Tue 11-06-24 12:23:59, Mateusz Guzik wrote: > On Tue, Jun 11, 2024 at 12:02:22PM +0200, Jan Kara wrote: > > On Tue 11-06-24 06:15:40, Mateusz Guzik wrote: > > > new_inode used to have the following: > > > spin_lock(&inode_lock); > > > inodes_stat.nr_inodes++; > > > list_add(&inode->i_list, &inode_in_use); > > > list_add(&inode->i_sb_list, &sb->s_inodes); > > > inode->i_ino = ++last_ino; > > > inode->i_state = 0; > > > spin_unlock(&inode_lock); > > > > > > over time things disappeared, got moved around or got replaced (global > > > inode lock with a per-inode lock), eventually this got reduced to: > > > spin_lock(&inode->i_lock); > > > inode->i_state = 0; > > > spin_unlock(&inode->i_lock); > > > > > > But the lock acquire here does not synchronize against anyone. > > > > > > Additionally iget5_locked performs i_state = 0 assignment without any > > > locks to begin with and the two combined look confusing at best. > > > > > > It looks like the current state is a leftover which was not cleaned up. > > > > > > Ideally it would be an invariant that i_state == 0 to begin with, but > > > achieving that would require dealing with all filesystem alloc handlers > > > one by one. > > > > > > In the meantime drop the misleading locking and move i_state zeroing to > > > alloc_inode so that others don't need to deal with it by hand. > > > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > > > Good point. But the initialization would seem more natural in > > inode_init_always(), wouldn't it? And that will also address your "FIXME" > > comment. > > > > My point is that by the time the inode is destroyed some of the fields > like i_state should be set to a well-known value, this one preferably > plain 0. Well, i_state is set to a more or less well defined value but it is not zero. I don't see a performance difference in whether set it to 0 on freeing or on allocation and on allocation it is actually much easier to find when reading the code. > I did not patch inode_init_always because it is exported and xfs uses it > in 2 spots, only one of which zeroing the thing immediately after. > Another one is a little more involved, it probably would not be a > problem as the value is set altered later anyway, but I don't want to > mess with semantics of the func if it can be easily avoided. Well, I'd consider that as another good reason to actually clean this up. Look, inode_init_always() is used in bcachefs and xfs. bcachefs sets i_state to 0 just before calling inode_init_always(), xfs just after one call of inode_init_always() and the call in xfs_reinit_inode() is used only from xfs_iget_recycle() which sets i_state to I_NEW. So I claim that moving i_state clearing to inode_init_always() will not cause any issue and is actually desirable. Honza
On Tue, Jun 11, 2024 at 08:56:40PM +1000, Dave Chinner wrote: > On Tue, Jun 11, 2024 at 12:23:59PM +0200, Mateusz Guzik wrote: > > I did not patch inode_init_always because it is exported and xfs uses it > > in 2 spots, only one of which zeroing the thing immediately after. > > Another one is a little more involved, it probably would not be a > > problem as the value is set altered later anyway, but I don't want to > > mess with semantics of the func if it can be easily avoided. > > Better to move the zeroing to inode_init_always(), do the basic > save/restore mod to xfs_reinit_inode(), and let us XFS people worry > about whether inode_init_always() is the right thing to be calling > in their code... > > All you'd need to do in xfs_reinit_inode() is this > > + unsigned long state = inode->i_state; > > ..... > error = inode_init_always(mp->m_super, inode); > ..... > + inode->i_state = state; > ..... > > And it should behave as expected. > Ok, so what would be the logistics of submitting this? Can I submit one patch which includes the above + i_state moved to inode_init_always? Do I instead ship a two-part patchset, starting with the xfs change and stating it was your idea? Something else? Fwiw inode_init_always consumer rundown is: - fs/inode.c which is automagically covered - bcachefs pre-zeroing state before even calling inode_init_always - xfs with one spot which zeroes immediately after the call - xfs with one spot which possibly avoids zeroing
On Tue, Jun 11, 2024 at 01:05:05PM +0200, Jan Kara wrote: > On Tue 11-06-24 12:23:59, Mateusz Guzik wrote: > > On Tue, Jun 11, 2024 at 12:02:22PM +0200, Jan Kara wrote: > > > On Tue 11-06-24 06:15:40, Mateusz Guzik wrote: > > > > new_inode used to have the following: > > > > spin_lock(&inode_lock); > > > > inodes_stat.nr_inodes++; > > > > list_add(&inode->i_list, &inode_in_use); > > > > list_add(&inode->i_sb_list, &sb->s_inodes); > > > > inode->i_ino = ++last_ino; > > > > inode->i_state = 0; > > > > spin_unlock(&inode_lock); > > > > > > > > over time things disappeared, got moved around or got replaced (global > > > > inode lock with a per-inode lock), eventually this got reduced to: > > > > spin_lock(&inode->i_lock); > > > > inode->i_state = 0; > > > > spin_unlock(&inode->i_lock); > > > > > > > > But the lock acquire here does not synchronize against anyone. > > > > > > > > Additionally iget5_locked performs i_state = 0 assignment without any > > > > locks to begin with and the two combined look confusing at best. > > > > > > > > It looks like the current state is a leftover which was not cleaned up. > > > > > > > > Ideally it would be an invariant that i_state == 0 to begin with, but > > > > achieving that would require dealing with all filesystem alloc handlers > > > > one by one. > > > > > > > > In the meantime drop the misleading locking and move i_state zeroing to > > > > alloc_inode so that others don't need to deal with it by hand. > > > > > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > > > > > Good point. But the initialization would seem more natural in > > > inode_init_always(), wouldn't it? And that will also address your "FIXME" > > > comment. > > > > > > > My point is that by the time the inode is destroyed some of the fields > > like i_state should be set to a well-known value, this one preferably > > plain 0. > > Well, i_state is set to a more or less well defined value but it is not > zero. I don't see a performance difference in whether set it to 0 on > freeing or on allocation and on allocation it is actually much easier to > find when reading the code. > I was thinking more about assertion potential, not anything perf-related, but it is a moot subject now. > > I did not patch inode_init_always because it is exported and xfs uses it > > in 2 spots, only one of which zeroing the thing immediately after. > > Another one is a little more involved, it probably would not be a > > problem as the value is set altered later anyway, but I don't want to > > mess with semantics of the func if it can be easily avoided. > > Well, I'd consider that as another good reason to actually clean this up. > Look, inode_init_always() is used in bcachefs and xfs. bcachefs sets > i_state to 0 just before calling inode_init_always(), xfs just after one > call of inode_init_always() and the call in xfs_reinit_inode() is used > only from xfs_iget_recycle() which sets i_state to I_NEW. So I claim that > moving i_state clearing to inode_init_always() will not cause any issue and > is actually desirable. > Ok, see my reply to Dave's e-mail. Just tell me how to ship this and I'll do the needful(tm).
On Tue 11-06-24 13:26:45, Mateusz Guzik wrote: > On Tue, Jun 11, 2024 at 08:56:40PM +1000, Dave Chinner wrote: > > On Tue, Jun 11, 2024 at 12:23:59PM +0200, Mateusz Guzik wrote: > > > I did not patch inode_init_always because it is exported and xfs uses it > > > in 2 spots, only one of which zeroing the thing immediately after. > > > Another one is a little more involved, it probably would not be a > > > problem as the value is set altered later anyway, but I don't want to > > > mess with semantics of the func if it can be easily avoided. > > > > Better to move the zeroing to inode_init_always(), do the basic > > save/restore mod to xfs_reinit_inode(), and let us XFS people worry > > about whether inode_init_always() is the right thing to be calling > > in their code... > > > > All you'd need to do in xfs_reinit_inode() is this > > > > + unsigned long state = inode->i_state; > > > > ..... > > error = inode_init_always(mp->m_super, inode); > > ..... > > + inode->i_state = state; > > ..... > > > > And it should behave as expected. > > > > Ok, so what would be the logistics of submitting this? > > Can I submit one patch which includes the above + i_state moved to > inode_init_always? > > Do I instead ship a two-part patchset, starting with the xfs change and > stating it was your idea? > > Something else? Well, I'd do it as 4 patches actually: 1) xfs i_state workaround in xfs_reinit_inode() 2) add i_state zeroing to inode_init_always(), drop pointless zeroing from VFS. 3) drop now pointless zeroing from xfs 4) drop now pointless zeroing from bcachefs This way also respective maintainers can easily ack the bits they care about. I can live with two as you suggest since the changes are tiny but four is IMHO a "proper" way to do things ;). Honza
diff --git a/fs/inode.c b/fs/inode.c index 149adf8ab0ea..3967e68311a6 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -276,6 +276,10 @@ static struct inode *alloc_inode(struct super_block *sb) return NULL; } + /* + * FIXME: the code should be able to assert i_state == 0 instead. + */ + inode->i_state = 0; return inode; } @@ -1023,14 +1027,7 @@ EXPORT_SYMBOL(get_next_ino); */ struct inode *new_inode_pseudo(struct super_block *sb) { - struct inode *inode = alloc_inode(sb); - - if (inode) { - spin_lock(&inode->i_lock); - inode->i_state = 0; - spin_unlock(&inode->i_lock); - } - return inode; + return alloc_inode(sb); } /** @@ -1254,7 +1251,6 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, struct inode *new = alloc_inode(sb); if (new) { - new->i_state = 0; inode = inode_insert5(new, hashval, test, set, data); if (unlikely(inode != new)) destroy_inode(new); @@ -1297,7 +1293,6 @@ struct inode *iget5_locked_rcu(struct super_block *sb, unsigned long hashval, new = alloc_inode(sb); if (new) { - new->i_state = 0; inode = inode_insert5(new, hashval, test, set, data); if (unlikely(inode != new)) destroy_inode(new);
new_inode used to have the following: spin_lock(&inode_lock); inodes_stat.nr_inodes++; list_add(&inode->i_list, &inode_in_use); list_add(&inode->i_sb_list, &sb->s_inodes); inode->i_ino = ++last_ino; inode->i_state = 0; spin_unlock(&inode_lock); over time things disappeared, got moved around or got replaced (global inode lock with a per-inode lock), eventually this got reduced to: spin_lock(&inode->i_lock); inode->i_state = 0; spin_unlock(&inode->i_lock); But the lock acquire here does not synchronize against anyone. Additionally iget5_locked performs i_state = 0 assignment without any locks to begin with and the two combined look confusing at best. It looks like the current state is a leftover which was not cleaned up. Ideally it would be an invariant that i_state == 0 to begin with, but achieving that would require dealing with all filesystem alloc handlers one by one. In the meantime drop the misleading locking and move i_state zeroing to alloc_inode so that others don't need to deal with it by hand. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- I diffed this against fs-next + my inode hash patch as it adds one i_state = 0 case. Should that patch not be accepted this bit can be easily dropped from this one. I brought the entire thing up quite some time ago [1] and Dave Chinner noted that perhaps the lock has a side effect of providing memory barriers which otherwise would not be there and which are needed by someone. For new_inode and alloc_inode consumers all fences are already there anyway due to immediate lock usage. Arguably new_inode_pseudo escape without it but I don't find the code at hand to be affected in any meanignful way -- the only 2 consumers (get_pipe_inode and sock_alloc) perform numerous other stores to the inode immediately after. By the time it gets added to anything looking at i_state, flushing that should be handled by whatever thing which adds it. Mentioning this just in case. [1] https://lore.kernel.org/all/CAGudoHF_Y0shcU+AMRRdN5RQgs9L_HHvBH8D4K=7_0X72kYy2g@mail.gmail.com/ fs/inode.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)