Message ID | 20180111125223.10727-1-chandan@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Thursday, January 11, 2018 6:22:23 PM IST Chandan Rajendra wrote: > generic/388 can cause the following "use after free" error to occur, > > ============================================================================= > BUG xfs_efi_item (Not tainted): Poison overwritten > ----------------------------------------------------------------------------- > > Disabling lock debugging due to kernel taint > INFO: 0x00000000292c4bd4-0x00000000292c4bd4. First byte 0x6a instead of 0x6b > INFO: Allocated in .kmem_zone_alloc+0xcc/0x190 age=79 cpu=0 pid=12436 > .__slab_alloc+0x54/0x80 > .kmem_cache_alloc+0x124/0x350 > .kmem_zone_alloc+0xcc/0x190 > .xfs_efi_init+0x48/0xf0 > .xfs_extent_free_create_intent+0x40/0x130 > .xfs_defer_intake_work+0x74/0x1e0 > .xfs_defer_finish+0xac/0x5c0 > .xfs_itruncate_extents+0x170/0x590 > .xfs_inactive_truncate+0xcc/0x170 > .xfs_inactive+0x1d8/0x2f0 > .xfs_fs_destroy_inode+0xe4/0x3d0 > .destroy_inode+0x68/0xb0 > .do_unlinkat+0x1e8/0x390 > system_call+0x58/0x6c > INFO: Freed in .xfs_efi_item_free+0x44/0x80 age=79 cpu=0 pid=12436 > .kmem_cache_free+0x120/0x2b0 > .xfs_efi_item_free+0x44/0x80 > .xfs_trans_free_items+0xd4/0x130 > .__xfs_trans_commit+0xd0/0x350 > .xfs_trans_roll+0x4c/0x90 > .xfs_defer_trans_roll+0xa4/0x2b0 > .xfs_defer_finish+0xb8/0x5c0 > .xfs_itruncate_extents+0x170/0x590 > .xfs_inactive_truncate+0xcc/0x170 > .xfs_inactive+0x1d8/0x2f0 > .xfs_fs_destroy_inode+0xe4/0x3d0 > .destroy_inode+0x68/0xb0 > .do_unlinkat+0x1e8/0x390 > system_call+0x58/0x6c > > This happens due to the following interaction, > 1. xfs_defer_finish() creates "extent free" intent item and adds it to the > list at xfs_trans->t_items. > 2. xfs_defer_trans_roll() invokes __xfs_trans_commit(). Here, if the > XFS_MOUNT_FS_SHUTDOWN flag is set, we invoke io_unlock() operation of the > corresponding log item. For "extent free" log items xfs_efi_item_unlock() > gets invoked which frees the xfs_efi_log_item. > 3. xfs_defer_trans_roll() then invokes xfs_defer_trans_abort(). Since the > xfs_defer_pending->dfp_intent is still set to the "extent free" intent > item, we invoke xfs_extent_free_abort_intent(). This accesses the > previously freed xfs_efi_log_item to decrement the ref count. > > This commit fixes the bug by changing xfs_efi_item_unlock() to invoke > xfs_efi_release() instead of xfs_efi_item_free(). The xfs_efi_log_item gets > freed when xfs_extent_free_abort_intent() invokes xfs_efi_release() once again > i.e. its refcount becomes zero and hence xfs_efi_item_free() is invoked. > > Reported-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> > --- > > The above mentioned "use after free" occurs with other log items. > (e.g. xfs_rui_item). If the below fix is found to be correct, I will > apply it to other affected log items as well and post a new patch. > I am assuming that the refcount for xfs_efi_item is set to 2 to account for, 1. Linking the corresponding log descriptor item to the transaction's t_items list. 2. For storing the pointer to the intent item at xfs_defer_pending->dfp_intent. Please let me know if my assumption is incorrect.
On Thu, Jan 11, 2018 at 06:22:23PM +0530, Chandan Rajendra wrote: > generic/388 can cause the following "use after free" error to occur, > > ============================================================================= > BUG xfs_efi_item (Not tainted): Poison overwritten > ----------------------------------------------------------------------------- > > Disabling lock debugging due to kernel taint > INFO: 0x00000000292c4bd4-0x00000000292c4bd4. First byte 0x6a instead of 0x6b > INFO: Allocated in .kmem_zone_alloc+0xcc/0x190 age=79 cpu=0 pid=12436 > .__slab_alloc+0x54/0x80 > .kmem_cache_alloc+0x124/0x350 > .kmem_zone_alloc+0xcc/0x190 > .xfs_efi_init+0x48/0xf0 > .xfs_extent_free_create_intent+0x40/0x130 > .xfs_defer_intake_work+0x74/0x1e0 > .xfs_defer_finish+0xac/0x5c0 > .xfs_itruncate_extents+0x170/0x590 > .xfs_inactive_truncate+0xcc/0x170 > .xfs_inactive+0x1d8/0x2f0 > .xfs_fs_destroy_inode+0xe4/0x3d0 > .destroy_inode+0x68/0xb0 > .do_unlinkat+0x1e8/0x390 > system_call+0x58/0x6c > INFO: Freed in .xfs_efi_item_free+0x44/0x80 age=79 cpu=0 pid=12436 > .kmem_cache_free+0x120/0x2b0 > .xfs_efi_item_free+0x44/0x80 > .xfs_trans_free_items+0xd4/0x130 > .__xfs_trans_commit+0xd0/0x350 > .xfs_trans_roll+0x4c/0x90 > .xfs_defer_trans_roll+0xa4/0x2b0 > .xfs_defer_finish+0xb8/0x5c0 > .xfs_itruncate_extents+0x170/0x590 > .xfs_inactive_truncate+0xcc/0x170 > .xfs_inactive+0x1d8/0x2f0 > .xfs_fs_destroy_inode+0xe4/0x3d0 > .destroy_inode+0x68/0xb0 > .do_unlinkat+0x1e8/0x390 > system_call+0x58/0x6c > > This happens due to the following interaction, > 1. xfs_defer_finish() creates "extent free" intent item and adds it to the > list at xfs_trans->t_items. > 2. xfs_defer_trans_roll() invokes __xfs_trans_commit(). Here, if the > XFS_MOUNT_FS_SHUTDOWN flag is set, we invoke io_unlock() operation of the > corresponding log item. For "extent free" log items xfs_efi_item_unlock() > gets invoked which frees the xfs_efi_log_item. > 3. xfs_defer_trans_roll() then invokes xfs_defer_trans_abort(). Since the > xfs_defer_pending->dfp_intent is still set to the "extent free" intent > item, we invoke xfs_extent_free_abort_intent(). This accesses the > previously freed xfs_efi_log_item to decrement the ref count. > > This commit fixes the bug by changing xfs_efi_item_unlock() to invoke > xfs_efi_release() instead of xfs_efi_item_free(). The xfs_efi_log_item gets > freed when xfs_extent_free_abort_intent() invokes xfs_efi_release() once again > i.e. its refcount becomes zero and hence xfs_efi_item_free() is invoked. > Interesting.. trying to refamiliarize myself with the expected reference counting rules here, I see that this goes back to the old xfs_bmap_finish() mechanism. That guy would grab an EFI, log it and then attempt to roll/commit the transaction. IIRC, the direct release covers the case where that tx commit fails. That is essentially equivalent to a tx cancel in this example, so the tx is freed and xfs_bmap_finish() is done (returns with an error). The ->iop_unlock() frees the EFI in this case explicitly because 1.) the refcount is initialized to 2, one for the log and one for the EFD and 2.) the EFD is never created in this situation. Given that, I suspect this code was correct with the bmap_finish() code and became a problem sometime later. Have you done any investigation to confirm/deny whether this is a regression since v4.7 or so? Moving along to the current code... xfs_defer_finish() creates the EFI, logs it, etc. and moves on to xfs_trans_roll(). If the tx commit fails, we're in a similar abort/cancel situation as before. The deferred ops code tracks the EFI within the dfops data structures, however, so it calls into the ->abort_intent() callout to drop the other reference and we hit this problem. AFAICT, the abort handler covers the only situation where an intent would be left hanging around with an extra reference for the done item but without the done item in existence yet. Given that, I think it's safe to change the intent log state machine handling as this patch does. The caveat is that with this change it is now required that the committer of a failed transaction is always responsible for releasing the reference on intent items for the associated done items that are not going to be created (where it wasn't required before). IOW, I suspect we could just as safely remove the abort handling code to resolve the problem. Since dfops seems to handle the situation correctly, I don't really feel strongly either way. I do think that this patch should at least: - document the historical context in the commit log, provided the above can be verified (i.e., why/since when is this a problem?) - update the comments in the intent item unlock handlers and probably the dfops code to make sure the reference counting rules are clear - include the same updates for other applicable items in the same patch ... and finally, this should be tested against a frequent shutdown workload if not done so already. E.g., run 'while [ true ]; do fsstress; umount; mount; done' in one thread and 'while [ true ]; do sleep N; xfs_io -xc "shutdown -f" ...; done' in another and verify we don't explode/hang. Brian > Reported-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> > --- > > The above mentioned "use after free" occurs with other log items. > (e.g. xfs_rui_item). If the below fix is found to be correct, I will > apply it to other affected log items as well and post a new patch. > > fs/xfs/xfs_extfree_item.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c > index 64da906..2119441 100644 > --- a/fs/xfs/xfs_extfree_item.c > +++ b/fs/xfs/xfs_extfree_item.c > @@ -151,7 +151,7 @@ xfs_efi_item_unlock( > struct xfs_log_item *lip) > { > if (lip->li_flags & XFS_LI_ABORTED) > - xfs_efi_item_free(EFI_ITEM(lip)); > + xfs_efi_release(EFI_ITEM(lip)); > } > > /* > -- > 2.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, January 12, 2018 8:32:24 PM IST Brian Foster wrote: > On Thu, Jan 11, 2018 at 06:22:23PM +0530, Chandan Rajendra wrote: > > generic/388 can cause the following "use after free" error to occur, > > > > ============================================================================= > > BUG xfs_efi_item (Not tainted): Poison overwritten > > ----------------------------------------------------------------------------- > > > > Disabling lock debugging due to kernel taint > > INFO: 0x00000000292c4bd4-0x00000000292c4bd4. First byte 0x6a instead of 0x6b > > INFO: Allocated in .kmem_zone_alloc+0xcc/0x190 age=79 cpu=0 pid=12436 > > .__slab_alloc+0x54/0x80 > > .kmem_cache_alloc+0x124/0x350 > > .kmem_zone_alloc+0xcc/0x190 > > .xfs_efi_init+0x48/0xf0 > > .xfs_extent_free_create_intent+0x40/0x130 > > .xfs_defer_intake_work+0x74/0x1e0 > > .xfs_defer_finish+0xac/0x5c0 > > .xfs_itruncate_extents+0x170/0x590 > > .xfs_inactive_truncate+0xcc/0x170 > > .xfs_inactive+0x1d8/0x2f0 > > .xfs_fs_destroy_inode+0xe4/0x3d0 > > .destroy_inode+0x68/0xb0 > > .do_unlinkat+0x1e8/0x390 > > system_call+0x58/0x6c > > INFO: Freed in .xfs_efi_item_free+0x44/0x80 age=79 cpu=0 pid=12436 > > .kmem_cache_free+0x120/0x2b0 > > .xfs_efi_item_free+0x44/0x80 > > .xfs_trans_free_items+0xd4/0x130 > > .__xfs_trans_commit+0xd0/0x350 > > .xfs_trans_roll+0x4c/0x90 > > .xfs_defer_trans_roll+0xa4/0x2b0 > > .xfs_defer_finish+0xb8/0x5c0 > > .xfs_itruncate_extents+0x170/0x590 > > .xfs_inactive_truncate+0xcc/0x170 > > .xfs_inactive+0x1d8/0x2f0 > > .xfs_fs_destroy_inode+0xe4/0x3d0 > > .destroy_inode+0x68/0xb0 > > .do_unlinkat+0x1e8/0x390 > > system_call+0x58/0x6c > > > > This happens due to the following interaction, > > 1. xfs_defer_finish() creates "extent free" intent item and adds it to the > > list at xfs_trans->t_items. > > 2. xfs_defer_trans_roll() invokes __xfs_trans_commit(). Here, if the > > XFS_MOUNT_FS_SHUTDOWN flag is set, we invoke io_unlock() operation of the > > corresponding log item. For "extent free" log items xfs_efi_item_unlock() > > gets invoked which frees the xfs_efi_log_item. > > 3. xfs_defer_trans_roll() then invokes xfs_defer_trans_abort(). Since the > > xfs_defer_pending->dfp_intent is still set to the "extent free" intent > > item, we invoke xfs_extent_free_abort_intent(). This accesses the > > previously freed xfs_efi_log_item to decrement the ref count. > > > > This commit fixes the bug by changing xfs_efi_item_unlock() to invoke > > xfs_efi_release() instead of xfs_efi_item_free(). The xfs_efi_log_item gets > > freed when xfs_extent_free_abort_intent() invokes xfs_efi_release() once again > > i.e. its refcount becomes zero and hence xfs_efi_item_free() is invoked. > > > > Interesting.. trying to refamiliarize myself with the expected reference > counting rules here, I see that this goes back to the old > xfs_bmap_finish() mechanism. That guy would grab an EFI, log it and then > attempt to roll/commit the transaction. IIRC, the direct release covers > the case where that tx commit fails. That is essentially equivalent to a > tx cancel in this example, so the tx is freed and xfs_bmap_finish() is > done (returns with an error). The ->iop_unlock() frees the EFI in this > case explicitly because 1.) the refcount is initialized to 2, one for > the log and one for the EFD and 2.) the EFD is never created in this > situation. > > Given that, I suspect this code was correct with the bmap_finish() code > and became a problem sometime later. Have you done any investigation to > confirm/deny whether this is a regression since v4.7 or so? You are right. The code was working fine with v4.6. IMHO, this was due the "committed" argument that was passed to __xfs_trans_roll(). This argument tracked if the log items were committed to the CIL. If __xfs_trans_roll() did return an error and "committed" was set to true, we just release one of the refcounts of the efi, since the efd will never be created. The final refcount will be released when the EFI is added to the AIL. If __xfs_trans_roll() did return an error and "committed" was set to false, we end up freeing the efi via ->iop_unlock() function and no action is taken in xfs_bmap_finish(). However, with the present code xfs_defer_trans_roll() ends up unconditionally (well almost; It does check for the presence of a non-NULL pointer at xfs_defer_pending->dfp_intent) invoking xfs_extent_free_abort_intent(), which access the freed memory. I am working on a new patch for fixing this. Thanks for guidance.
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 64da906..2119441 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -151,7 +151,7 @@ xfs_efi_item_unlock( struct xfs_log_item *lip) { if (lip->li_flags & XFS_LI_ABORTED) - xfs_efi_item_free(EFI_ITEM(lip)); + xfs_efi_release(EFI_ITEM(lip)); } /*
generic/388 can cause the following "use after free" error to occur, ============================================================================= BUG xfs_efi_item (Not tainted): Poison overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x00000000292c4bd4-0x00000000292c4bd4. First byte 0x6a instead of 0x6b INFO: Allocated in .kmem_zone_alloc+0xcc/0x190 age=79 cpu=0 pid=12436 .__slab_alloc+0x54/0x80 .kmem_cache_alloc+0x124/0x350 .kmem_zone_alloc+0xcc/0x190 .xfs_efi_init+0x48/0xf0 .xfs_extent_free_create_intent+0x40/0x130 .xfs_defer_intake_work+0x74/0x1e0 .xfs_defer_finish+0xac/0x5c0 .xfs_itruncate_extents+0x170/0x590 .xfs_inactive_truncate+0xcc/0x170 .xfs_inactive+0x1d8/0x2f0 .xfs_fs_destroy_inode+0xe4/0x3d0 .destroy_inode+0x68/0xb0 .do_unlinkat+0x1e8/0x390 system_call+0x58/0x6c INFO: Freed in .xfs_efi_item_free+0x44/0x80 age=79 cpu=0 pid=12436 .kmem_cache_free+0x120/0x2b0 .xfs_efi_item_free+0x44/0x80 .xfs_trans_free_items+0xd4/0x130 .__xfs_trans_commit+0xd0/0x350 .xfs_trans_roll+0x4c/0x90 .xfs_defer_trans_roll+0xa4/0x2b0 .xfs_defer_finish+0xb8/0x5c0 .xfs_itruncate_extents+0x170/0x590 .xfs_inactive_truncate+0xcc/0x170 .xfs_inactive+0x1d8/0x2f0 .xfs_fs_destroy_inode+0xe4/0x3d0 .destroy_inode+0x68/0xb0 .do_unlinkat+0x1e8/0x390 system_call+0x58/0x6c This happens due to the following interaction, 1. xfs_defer_finish() creates "extent free" intent item and adds it to the list at xfs_trans->t_items. 2. xfs_defer_trans_roll() invokes __xfs_trans_commit(). Here, if the XFS_MOUNT_FS_SHUTDOWN flag is set, we invoke io_unlock() operation of the corresponding log item. For "extent free" log items xfs_efi_item_unlock() gets invoked which frees the xfs_efi_log_item. 3. xfs_defer_trans_roll() then invokes xfs_defer_trans_abort(). Since the xfs_defer_pending->dfp_intent is still set to the "extent free" intent item, we invoke xfs_extent_free_abort_intent(). This accesses the previously freed xfs_efi_log_item to decrement the ref count. This commit fixes the bug by changing xfs_efi_item_unlock() to invoke xfs_efi_release() instead of xfs_efi_item_free(). The xfs_efi_log_item gets freed when xfs_extent_free_abort_intent() invokes xfs_efi_release() once again i.e. its refcount becomes zero and hence xfs_efi_item_free() is invoked. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> --- The above mentioned "use after free" occurs with other log items. (e.g. xfs_rui_item). If the below fix is found to be correct, I will apply it to other affected log items as well and post a new patch. fs/xfs/xfs_extfree_item.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)