diff mbox

Btrfs: put delayed iput tracking list inside in-memory inode

Message ID 1354274683-17762-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo Nov. 30, 2012, 11:24 a.m. UTC
This can save us a dynamic memory allocation/free.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/btrfs_inode.h |    3 +++
 fs/btrfs/inode.c       |   36 +++++++++++++-----------------------
 2 files changed, 16 insertions(+), 23 deletions(-)

Comments

Josef Bacik Dec. 11, 2012, 2:57 p.m. UTC | #1
On Fri, Nov 30, 2012 at 04:24:43AM -0700, Liu Bo wrote:
> This can save us a dynamic memory allocation/free.
> 

You can have multiple outstanding delayed iputs per inode, so this will result
in inodes still being in use on unmount, so this isn't going to work.

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Dec. 12, 2012, 1:50 a.m. UTC | #2
On Tue, Dec 11, 2012 at 09:57:50AM -0500, Josef Bacik wrote:
> On Fri, Nov 30, 2012 at 04:24:43AM -0700, Liu Bo wrote:
> > This can save us a dynamic memory allocation/free.
> > 
> 
> You can have multiple outstanding delayed iputs per inode, so this will result
> in inodes still being in use on unmount, so this isn't going to work.

Yeah, you're right, but what if we add a check

if (list_empty(&bi->iput_list))
	list_add(&bi->iput_list, &fs_info->iput_list));

Am I missing something?

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Dec. 12, 2012, 2:47 p.m. UTC | #3
On Tue, Dec 11, 2012 at 06:50:40PM -0700, Liu Bo wrote:
> On Tue, Dec 11, 2012 at 09:57:50AM -0500, Josef Bacik wrote:
> > On Fri, Nov 30, 2012 at 04:24:43AM -0700, Liu Bo wrote:
> > > This can save us a dynamic memory allocation/free.
> > > 
> > 
> > You can have multiple outstanding delayed iputs per inode, so this will result
> > in inodes still being in use on unmount, so this isn't going to work.
> 
> Yeah, you're right, but what if we add a check
> 
> if (list_empty(&bi->iput_list))
> 	list_add(&bi->iput_list, &fs_info->iput_list));
> 
> Am I missing something?
> 

Yes, we have the delayed iput stuff to keep us from running the actual iput
stuff where we are for locking reasons.  If we do the if (list_empty) check then
we have to do something for the case wehre the list isn't empty so we don't leak
references to the inode, which would basically mean doing an iput which is what
we don't want to do.  We could do a hybrid approach where we do your list and
then allocate if we're already on the list, or add a counter so we know how many
references to drop.  But this as it is won't work.  (Keep in mind I've been
buried in tree logging for 3 months so if I'm wrong bear with me).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Dec. 12, 2012, 4:05 p.m. UTC | #4
On Wed, Dec 12, 2012 at 09:47:11AM -0500, Josef Bacik wrote:
> On Tue, Dec 11, 2012 at 06:50:40PM -0700, Liu Bo wrote:
> > On Tue, Dec 11, 2012 at 09:57:50AM -0500, Josef Bacik wrote:
> > > On Fri, Nov 30, 2012 at 04:24:43AM -0700, Liu Bo wrote:
> > > > This can save us a dynamic memory allocation/free.
> > > > 
> > > 
> > > You can have multiple outstanding delayed iputs per inode, so this will result
> > > in inodes still being in use on unmount, so this isn't going to work.
> > 
> > Yeah, you're right, but what if we add a check
> > 
> > if (list_empty(&bi->iput_list))
> > 	list_add(&bi->iput_list, &fs_info->iput_list));
> > 
> > Am I missing something?
> > 
> 
> Yes, we have the delayed iput stuff to keep us from running the actual iput
> stuff where we are for locking reasons.  If we do the if (list_empty) check then
> we have to do something for the case wehre the list isn't empty so we don't leak
> references to the inode, which would basically mean doing an iput which is what
> we don't want to do.  We could do a hybrid approach where we do your list and
> then allocate if we're already on the list, or add a counter so we know how many
> references to drop.  But this as it is won't work.  (Keep in mind I've been
> buried in tree logging for 3 months so if I'm wrong bear with me).  Thanks,

I got your point, I'll add a counter here to tracking the dropped reference per
inode.

Mainly I was thinking memory allocation/free is just way too expensive. 

Thanks for reviewing it.

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ed8ca7c..446ce73 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -79,6 +79,9 @@  struct btrfs_inode {
 	 */
 	struct list_head delalloc_inodes;
 
+	/* for delayed iput */
+	struct list_head iput_list;
+
 	/*
 	 * list for tracking inodes that must be sent to disk before a
 	 * rename or truncate commit
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95542a1..0c7161d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2070,34 +2070,23 @@  zeroit:
 	return -EIO;
 }
 
-struct delayed_iput {
-	struct list_head list;
-	struct inode *inode;
-};
-
-/* JDM: If this is fs-wide, why can't we add a pointer to
- * btrfs_inode instead and avoid the allocation? */
 void btrfs_add_delayed_iput(struct inode *inode)
 {
-	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	struct delayed_iput *delayed;
-
-	if (atomic_add_unless(&inode->i_count, -1, 1))
-		return;
-
-	delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
-	delayed->inode = inode;
+	if (atomic_add_unless(&inode->i_count, -1, 1) == 0) {
+		struct btrfs_inode *bi = BTRFS_I(inode);
+		struct btrfs_fs_info *fs_info = bi->root->fs_info;
 
-	spin_lock(&fs_info->delayed_iput_lock);
-	list_add_tail(&delayed->list, &fs_info->delayed_iputs);
-	spin_unlock(&fs_info->delayed_iput_lock);
+		spin_lock(&fs_info->delayed_iput_lock);
+		list_add_tail(&bi->iput_list, &fs_info->delayed_iputs);
+		spin_unlock(&fs_info->delayed_iput_lock);
+	}
 }
 
 void btrfs_run_delayed_iputs(struct btrfs_root *root)
 {
 	LIST_HEAD(list);
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct delayed_iput *delayed;
+	struct btrfs_inode *bi = NULL;
 	int empty;
 
 	spin_lock(&fs_info->delayed_iput_lock);
@@ -2111,10 +2100,9 @@  void btrfs_run_delayed_iputs(struct btrfs_root *root)
 	spin_unlock(&fs_info->delayed_iput_lock);
 
 	while (!list_empty(&list)) {
-		delayed = list_entry(list.next, struct delayed_iput, list);
-		list_del(&delayed->list);
-		iput(delayed->inode);
-		kfree(delayed);
+		bi = list_entry(list.next, struct btrfs_inode, iput_list);
+		list_del_init(&bi->iput_list);
+		iput(&bi->vfs_inode);
 	}
 }
 
@@ -7097,6 +7085,7 @@  struct inode *btrfs_alloc_inode(struct super_block *sb)
 	mutex_init(&ei->delalloc_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
+	INIT_LIST_HEAD(&ei->iput_list);
 	INIT_LIST_HEAD(&ei->ordered_operations);
 	RB_CLEAR_NODE(&ei->rb_node);
 
@@ -7120,6 +7109,7 @@  void btrfs_destroy_inode(struct inode *inode)
 	WARN_ON(BTRFS_I(inode)->reserved_extents);
 	WARN_ON(BTRFS_I(inode)->delalloc_bytes);
 	WARN_ON(BTRFS_I(inode)->csum_bytes);
+	WARN_ON(!list_empty(&BTRFS_I(inode)->iput_list));
 
 	/*
 	 * This can happen where we create an inode, but somebody else also