Message ID | 548f65e2.GaTIVhOoT5YiOhC8%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Thanks once again for all of this by the way. On Mon, Dec 15, 2014 at 02:51:14PM -0800, Andrew Morton wrote: > From: Weiwei Wang <wangww631@huawei.com> > Subject: ocfs2: add functions to add and remove inode in orphan dir > > Add functions to add inode to orphan dir and remove inode in orphan dir. > Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add > directly. Because append O_DIRECT will add inode to orphan two and may > result in more than one orphan entry for the same inode. Just so I understand - your problem is that ocfs2_prepare_orphan_dir() will EEXIST when it finds the inode already there. The solution you have below has a couple problems, stemming from the fact that it uses the same name for an inode that is orphaned because of nlink == 0 as that of one that is orphaned because it is undergoing direct IO. Having identical names in a directory is always considered an error, even for system directories like the orphan dir. In my honest opinion we should not do that here either. Instead you could use a prefix for those entries - something like "dio-". If you modify ocfs2_prepare_orphan_dir() and __ocfs2_prepare_orphan_dir() to pass down an optional prefix which is prepended to the name you don't need a special dio prepare function. Then we don't need ocfs2_dio_prepare_orphan_dir() any more. In fact, the EEXIST check now becomes valid again and you don't need to do all that extra work in ocfs2_dio_orphan_add() to figure out whether the name is there because of nlink == 0 or we're doing dio (or both!). Actually, it really becomes trivial to special case dio in ocfs2_orphan_add() so I would suggest just adding a boolean argument and doing the bits for dio there, thus allowing us to drop ocfs2_dio_orphan_add() too. I hope this all helps. --Mark -- Mark Fasheh
Hi Mark, I much appreciate your pertinent comments on this feature. The reason we use orphan dir is to make sure allocating blocks and updating inode size consistent, and this will result an entry in orphan is not truly deleted one. To distinguish dio from unlink, we introduce a new dinode flag OCFS2_DIO_ORPHANED_FL. So after this change, once unlink and dio happen concurrently, there may be two kinds of results: 1. On the same node, only one entry will be added with flag OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear during dio. 2. On different nodes, two entries will be added with each flag OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. Since entry name is stringified from blkno and the length is fixed OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect too much. On 2014/12/19 7:18, Mark Fasheh wrote: > Thanks once again for all of this by the way. > > On Mon, Dec 15, 2014 at 02:51:14PM -0800, Andrew Morton wrote: >> From: Weiwei Wang <wangww631@huawei.com> >> Subject: ocfs2: add functions to add and remove inode in orphan dir >> >> Add functions to add inode to orphan dir and remove inode in orphan dir. >> Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add >> directly. Because append O_DIRECT will add inode to orphan two and may >> result in more than one orphan entry for the same inode. > > Just so I understand - your problem is that ocfs2_prepare_orphan_dir() will > EEXIST when it finds the inode already there. > > The solution you have below has a couple problems, stemming from the fact > that it uses the same name for an inode that is orphaned because of > nlink == 0 as that of one that is orphaned because it is undergoing direct > IO. > > Having identical names in a directory is always considered an error, even > for system directories like the orphan dir. In my honest opinion we should > not do that here either. > > Instead you could use a prefix for those entries - something like "dio-". > > If you modify ocfs2_prepare_orphan_dir() and __ocfs2_prepare_orphan_dir() > to pass down an optional prefix which is prepended to the name you don't need a > special dio prepare function. Then we don't need ocfs2_dio_prepare_orphan_dir() > any more. > > In fact, the EEXIST check now becomes valid again and you don't need to > do all that extra work in ocfs2_dio_orphan_add() to figure out whether the > name is there because of nlink == 0 or we're doing dio (or both!). > > Actually, it really becomes trivial to special case dio in > ocfs2_orphan_add() so I would suggest just adding a boolean argument and > doing the bits for dio there, thus allowing us to drop > ocfs2_dio_orphan_add() too. > > > I hope this all helps. > --Mark > > -- > Mark Fasheh > > . >
On Fri, Dec 19, 2014 at 11:37:56AM +0800, Joseph Qi wrote: > Hi Mark, > I much appreciate your pertinent comments on this feature. > > The reason we use orphan dir is to make sure allocating blocks and > updating inode size consistent, and this will result an entry in orphan > is not truly deleted one. Right, I got that from the patch. Good idea by the way. > To distinguish dio from unlink, we introduce a new dinode flag > OCFS2_DIO_ORPHANED_FL. Ok, that's also good. > So after this change, once unlink and dio happen concurrently, there > may be two kinds of results: > 1. On the same node, only one entry will be added with flag > OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear > during dio. > 2. On different nodes, two entries will be added with each flag > OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. > > Since entry name is stringified from blkno and the length is fixed > OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect > too much. No, adding the prefix is much more preferable to intentionally corrupting the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not matter, you are free to make it whatever reasonable value fits the name. The directory code naturally handles this, whtether the length is OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN. Also if you were to run fsck against the file system it will (should) complain about having duplciate entries in the orphan dir. Aside from the corruption issue, there are usability issues too. Someone running "ls //orphan_dir:0000" will not be able to tell whether a name is there for dio or nlink==0. If you prefix it, then debugging becomes much easier. Btw, did you see my note about a readonly superblock flag for this? We're going to need to handle backward compatibility too. --Mark -- Mark Fasheh
On 2014/12/20 4:33, Mark Fasheh wrote: > On Fri, Dec 19, 2014 at 11:37:56AM +0800, Joseph Qi wrote: >> Hi Mark, >> I much appreciate your pertinent comments on this feature. >> >> The reason we use orphan dir is to make sure allocating blocks and >> updating inode size consistent, and this will result an entry in orphan >> is not truly deleted one. > > Right, I got that from the patch. Good idea by the way. > > >> To distinguish dio from unlink, we introduce a new dinode flag >> OCFS2_DIO_ORPHANED_FL. > > Ok, that's also good. > > >> So after this change, once unlink and dio happen concurrently, there >> may be two kinds of results: >> 1. On the same node, only one entry will be added with flag >> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear >> during dio. >> 2. On different nodes, two entries will be added with each flag >> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. >> >> Since entry name is stringified from blkno and the length is fixed >> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect >> too much. > > No, adding the prefix is much more preferable to intentionally corrupting > the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not > matter, you are free to make it whatever reasonable value fits the name. The > directory code naturally handles this, whtether the length is > OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN. > > Also if you were to run fsck against the file system it will (should) complain > about having duplciate entries in the orphan dir. > > Aside from the corruption issue, there are usability issues too. Someone > running "ls //orphan_dir:0000" will not be able to tell whether a name is > there for dio or nlink==0. If you prefix it, then debugging becomes much > easier. > Okay, I will try to take your advice and make a change. > > Btw, did you see my note about a readonly superblock flag for this? We're > going to need to handle backward compatibility too. Yes. At my first glance, it may need a lot change to support feature ro compat at my first glance. I need to investigate on it before do the actual change. Thanks. > --Mark > > -- > Mark Fasheh > > . >
On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote: > >> So after this change, once unlink and dio happen concurrently, there > >> may be two kinds of results: > >> 1. On the same node, only one entry will be added with flag > >> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear > >> during dio. > >> 2. On different nodes, two entries will be added with each flag > >> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. > >> > >> Since entry name is stringified from blkno and the length is fixed > >> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect > >> too much. > > > > No, adding the prefix is much more preferable to intentionally corrupting > > the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not > > matter, you are free to make it whatever reasonable value fits the name. The > > directory code naturally handles this, whtether the length is > > OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN. > > > > Also if you were to run fsck against the file system it will (should) complain > > about having duplciate entries in the orphan dir. > > > > Aside from the corruption issue, there are usability issues too. Someone > > running "ls //orphan_dir:0000" will not be able to tell whether a name is > > there for dio or nlink==0. If you prefix it, then debugging becomes much > > easier. > > > Okay, I will try to take your advice and make a change. Great, thank you. > > Btw, did you see my note about a readonly superblock flag for this? We're > > going to need to handle backward compatibility too. > Yes. At my first glance, it may need a lot change to support feature ro compat > at my first glance. I need to investigate on it before do the actual change. > Thanks. Kernel should be straight forward - if the flag isn't there just use the existing fallback. Otherwise we can use the new direct_IO code. >From memory, in ocfs2-tools you'll have to update mkfs to write the flag, tunefs to turn it on/off for an existing filesystem and fsck to truncate the direct-io orphans it finds. --Mark -- Mark Fasheh
Hi Mark, On 2014/12/22 15:06, Mark Fasheh wrote: > On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote: >>>> So after this change, once unlink and dio happen concurrently, there >>>> may be two kinds of results: >>>> 1. On the same node, only one entry will be added with flag >>>> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear >>>> during dio. >>>> 2. On different nodes, two entries will be added with each flag >>>> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. >>>> >>>> Since entry name is stringified from blkno and the length is fixed >>>> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect >>>> too much. >>> >>> No, adding the prefix is much more preferable to intentionally corrupting >>> the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not >>> matter, you are free to make it whatever reasonable value fits the name. The >>> directory code naturally handles this, whtether the length is >>> OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN. >>> >>> Also if you were to run fsck against the file system it will (should) complain >>> about having duplciate entries in the orphan dir. >>> >>> Aside from the corruption issue, there are usability issues too. Someone >>> running "ls //orphan_dir:0000" will not be able to tell whether a name is >>> there for dio or nlink==0. If you prefix it, then debugging becomes much >>> easier. >>> >> Okay, I will try to take your advice and make a change. > > Great, thank you. > > >>> Btw, did you see my note about a readonly superblock flag for this? We're >>> going to need to handle backward compatibility too. >> Yes. At my first glance, it may need a lot change to support feature ro compat >> at my first glance. I need to investigate on it before do the actual change. >> Thanks. > > Kernel should be straight forward - if the flag isn't there just use the > existing fallback. Otherwise we can use the new direct_IO code. > >>From memory, in ocfs2-tools you'll have to update mkfs to write the flag, > tunefs to turn it on/off for an existing filesystem and fsck to truncate the > direct-io orphans it finds. > --Mark > For the comments about setting it to a ro compat feature, I still have one more questions. If I check this feature flag in ocfs2_direct_IO, that means the existing formatted ocfs2 volume won't take this feature except we explicitly turn it on using tunefs. But I don't think there is any problem if feature is default on. Do you mean we have to treat this feature as optional and let user choose? > -- > Mark Fasheh > > . >
On Tue, Jan 13, 2015 at 04:36:14PM +0800, Joseph Qi wrote: > Hi Mark, > > On 2014/12/22 15:06, Mark Fasheh wrote: > > On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote: > >>>> So after this change, once unlink and dio happen concurrently, there > >>>> may be two kinds of results: > >>>> 1. On the same node, only one entry will be added with flag > >>>> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear > >>>> during dio. > >>>> 2. On different nodes, two entries will be added with each flag > >>>> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL. > >>>> > >>>> Since entry name is stringified from blkno and the length is fixed > >>>> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect > >>>> too much. > >>> > >>> No, adding the prefix is much more preferable to intentionally corrupting > >>> the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not > >>> matter, you are free to make it whatever reasonable value fits the name. The > >>> directory code naturally handles this, whtether the length is > >>> OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN. > >>> > >>> Also if you were to run fsck against the file system it will (should) complain > >>> about having duplciate entries in the orphan dir. > >>> > >>> Aside from the corruption issue, there are usability issues too. Someone > >>> running "ls //orphan_dir:0000" will not be able to tell whether a name is > >>> there for dio or nlink==0. If you prefix it, then debugging becomes much > >>> easier. > >>> > >> Okay, I will try to take your advice and make a change. > > > > Great, thank you. > > > > > >>> Btw, did you see my note about a readonly superblock flag for this? We're > >>> going to need to handle backward compatibility too. > >> Yes. At my first glance, it may need a lot change to support feature ro compat > >> at my first glance. I need to investigate on it before do the actual change. > >> Thanks. > > > > Kernel should be straight forward - if the flag isn't there just use the > > existing fallback. Otherwise we can use the new direct_IO code. > > > >>From memory, in ocfs2-tools you'll have to update mkfs to write the flag, > > tunefs to turn it on/off for an existing filesystem and fsck to truncate the > > direct-io orphans it finds. > > --Mark > > > For the comments about setting it to a ro compat feature, I still have > one more questions. > If I check this feature flag in ocfs2_direct_IO, that means the existing > formatted ocfs2 volume won't take this feature except we explicitly turn > it on using tunefs. Correct. > But I don't think there is any problem if feature is default on. > Do you mean we have to treat this feature as optional and let user > choose? Yes it should be optional, though we can always have mkfs.ocfs2 turn it on by default. If the user uses this, and the node crashes with inodes in the orphan dir they might be deleted by another node with an ealier version of the fs code. We can not allow that and therefore have to guard with an incompat bit. --Mark -- Mark Fasheh
diff -puN fs/ocfs2/journal.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/journal.h --- a/fs/ocfs2/journal.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir +++ a/fs/ocfs2/journal.h @@ -472,6 +472,11 @@ static inline int ocfs2_unlink_credits(s * orphan dir index leaf */ #define OCFS2_DELETE_INODE_CREDITS (3 * OCFS2_INODE_UPDATE_CREDITS + 4) +/* dinode + orphan dir dinode + extent tree leaf block + orphan dir entry + + * orphan dir index root + orphan dir index leaf */ +#define OCFS2_INODE_ADD_TO_ORPHAN_CREDITS (2 * OCFS2_INODE_UPDATE_CREDITS + 4) +#define OCFS2_INODE_DEL_FROM_ORPHAN_CREDITS OCFS2_INODE_ADD_TO_ORPHAN_CREDITS + /* dinode update, old dir dinode update, new dir dinode update, old * dir dir entry, new dir dir entry, dir entry update for renaming * directory + target unlink + 3 x dir index leaves */ diff -puN fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir +++ a/fs/ocfs2/namei.c @@ -81,6 +81,12 @@ static int ocfs2_prepare_orphan_dir(stru char *name, struct ocfs2_dir_lookup_result *lookup); +static int ocfs2_dio_prepare_orphan_dir(struct ocfs2_super *osb, + struct inode **ret_orphan_dir, + u64 blkno, + char *name, + struct ocfs2_dir_lookup_result *lookup); + static int ocfs2_orphan_add(struct ocfs2_super *osb, handle_t *handle, struct inode *inode, @@ -89,6 +95,15 @@ static int ocfs2_orphan_add(struct ocfs2 struct ocfs2_dir_lookup_result *lookup, struct inode *orphan_dir_inode); +static int ocfs2_dio_orphan_add(struct ocfs2_super *osb, + handle_t *handle, + struct inode *inode, + struct buffer_head *fe_bh, + char *name, + struct ocfs2_dir_lookup_result *lookup, + struct inode *orphan_dir_inode, + bool orphaned); + static int ocfs2_create_symlink_data(struct ocfs2_super *osb, handle_t *handle, struct inode *inode, @@ -2137,6 +2152,51 @@ out: return ret; } +/** + * Copy from ocfs2_prepare_orphan_dir(). The difference: + * It will still lock orphan dir if entry exists. + * Caller must take care of -EEXIST and responsible for unlock. +*/ +static int ocfs2_dio_prepare_orphan_dir(struct ocfs2_super *osb, + struct inode **ret_orphan_dir, + u64 blkno, + char *name, + struct ocfs2_dir_lookup_result *lookup) +{ + struct inode *orphan_dir_inode = NULL; + struct buffer_head *orphan_dir_bh = NULL; + int ret = 0; + + ret = ocfs2_lookup_lock_orphan_dir(osb, &orphan_dir_inode, + &orphan_dir_bh); + if (ret < 0) { + mlog_errno(ret); + return ret; + } + + ret = __ocfs2_prepare_orphan_dir(orphan_dir_inode, orphan_dir_bh, + blkno, name, lookup); + if (ret < 0 && ret != -EEXIST) { + mlog_errno(ret); + goto out; + } + + *ret_orphan_dir = orphan_dir_inode; + +out: + brelse(orphan_dir_bh); + + if (ret && ret != -EEXIST) { + ocfs2_inode_unlock(orphan_dir_inode, 1); + mutex_unlock(&orphan_dir_inode->i_mutex); + iput(orphan_dir_inode); + } + + if (ret && ret != -EEXIST) + mlog_errno(ret); + return ret; +} + static int ocfs2_orphan_add(struct ocfs2_super *osb, handle_t *handle, struct inode *inode, @@ -2226,6 +2286,100 @@ leave: return status; } +/** + * Copy from ocfs2_orphan_add, the difference: + * 1. Do not add entry if already added. + * 2. Update di flags OCFS2_DIO_ORPHANED_FL and record the + * orphan slot. +*/ +static int ocfs2_dio_orphan_add(struct ocfs2_super *osb, + handle_t *handle, + struct inode *inode, + struct buffer_head *fe_bh, + char *name, + struct ocfs2_dir_lookup_result *lookup, + struct inode *orphan_dir_inode, + bool orphaned) +{ + struct buffer_head *orphan_dir_bh = NULL; + int status = 0; + struct ocfs2_dinode *orphan_fe; + struct ocfs2_dinode *fe = (struct ocfs2_dinode *) fe_bh->b_data; + + trace_ocfs2_dio_orphan_add_begin( + (unsigned long long)OCFS2_I(inode)->ip_blkno); + + status = ocfs2_read_inode_block(orphan_dir_inode, &orphan_dir_bh); + if (status < 0) { + mlog_errno(status); + goto leave; + } + + status = ocfs2_journal_access_di(handle, + INODE_CACHE(orphan_dir_inode), + orphan_dir_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + goto leave; + } + + /* + * We're going to journal the change of i_flags and i_dio_orphaned_slot. + * It's safe anyway, though some callers may duplicate the journaling. + * Journaling within the func just make the logic look more + * straightforward. + */ + status = ocfs2_journal_access_di(handle, + INODE_CACHE(inode), + fe_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + goto leave; + } + + /* we're a cluster, and nlink can change on disk from + * underneath us... */ + orphan_fe = (struct ocfs2_dinode *) orphan_dir_bh->b_data; + if (S_ISDIR(inode->i_mode)) + ocfs2_add_links_count(orphan_fe, 1); + set_nlink(orphan_dir_inode, ocfs2_read_links_count(orphan_fe)); + ocfs2_journal_dirty(handle, orphan_dir_bh); + + /* It may already be orphaned by ocfs2_unlink/ocfs2_rename */ + if (!orphaned) { + status = __ocfs2_add_entry(handle, orphan_dir_inode, name, + OCFS2_ORPHAN_NAMELEN, inode, + OCFS2_I(inode)->ip_blkno, + orphan_dir_bh, lookup); + if (status < 0) { + mlog_errno(status); + goto rollback; + } + } + + /* Update flag OCFS2_DIO_ORPHANED_FL and record the orphan slot */ + fe->i_flags |= cpu_to_le32(OCFS2_DIO_ORPHANED_FL); + fe->i_dio_orphaned_slot = cpu_to_le16(osb->slot_num); + + ocfs2_journal_dirty(handle, fe_bh); + + trace_ocfs2_dio_orphan_add_end((unsigned long long)OCFS2_I(inode)->ip_blkno, + osb->slot_num); + +rollback: + if (status < 0) { + if (S_ISDIR(inode->i_mode)) + ocfs2_add_links_count(orphan_fe, -1); + set_nlink(orphan_dir_inode, ocfs2_read_links_count(orphan_fe)); + } + +leave: + brelse(orphan_dir_bh); + + return status; +} /* unlike orphan_add, we expect the orphan dir to already be locked here. */ int ocfs2_orphan_del(struct ocfs2_super *osb, handle_t *handle, @@ -2500,6 +2654,167 @@ leave: return status; } +int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb, + struct inode *inode) +{ + char orphan_name[OCFS2_ORPHAN_NAMELEN + 1]; + struct inode *orphan_dir_inode = NULL; + struct ocfs2_dir_lookup_result orphan_insert = { NULL, }; + struct buffer_head *di_bh = NULL; + int status = 0; + handle_t *handle = NULL; + struct ocfs2_dinode *di = NULL; + bool orphaned = false; + + status = ocfs2_inode_lock(inode, &di_bh, 1); + if (status < 0) { + mlog_errno(status); + goto bail; + } + + status = ocfs2_dio_prepare_orphan_dir(osb, &orphan_dir_inode, + OCFS2_I(inode)->ip_blkno, + orphan_name, + &orphan_insert); + if (status < 0 && status != -EEXIST) { + mlog_errno(status); + goto bail_unlock_inode; + } else if (status == -EEXIST) { + mlog(ML_NOTICE, "inode %llu already added to " + "orphan dir %llu.\n", + OCFS2_I(inode)->ip_blkno, + OCFS2_I(orphan_dir_inode)->ip_blkno); + di = (struct ocfs2_dinode *) di_bh->b_data; + if (!(di->i_flags & le32_to_cpu(OCFS2_ORPHANED_FL))) { + mlog_errno(status); + goto bail_unlock_orphan; + } + orphaned = true; + } + + handle = ocfs2_start_trans(osb, + OCFS2_INODE_ADD_TO_ORPHAN_CREDITS); + if (IS_ERR(handle)) { + status = PTR_ERR(handle); + goto bail_unlock_orphan; + } + + status = ocfs2_dio_orphan_add(osb, handle, inode, di_bh, orphan_name, + &orphan_insert, orphan_dir_inode, orphaned); + if (status) + mlog_errno(status); + + ocfs2_commit_trans(osb, handle); + +bail_unlock_orphan: + ocfs2_inode_unlock(orphan_dir_inode, 1); + mutex_unlock(&orphan_dir_inode->i_mutex); + iput(orphan_dir_inode); + + ocfs2_free_dir_lookup_result(&orphan_insert); + +bail_unlock_inode: + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + +bail: + return status; +} + +int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb, + struct inode *inode, int update_isize, + loff_t end) +{ + struct inode *orphan_dir_inode = NULL; + struct buffer_head *orphan_dir_bh = NULL; + struct buffer_head *di_bh = NULL; + struct ocfs2_dinode *di = NULL; + handle_t *handle = NULL; + int status = 0; + + status = ocfs2_inode_lock(inode, &di_bh, 1); + if (status < 0) { + mlog_errno(status); + goto bail; + } + di = (struct ocfs2_dinode *) di_bh->b_data; + + orphan_dir_inode = ocfs2_get_system_file_inode(osb, + ORPHAN_DIR_SYSTEM_INODE, + le16_to_cpu(di->i_dio_orphaned_slot)); + if (!orphan_dir_inode) { + status = -EEXIST; + mlog_errno(status); + goto bail_unlock_inode; + } + + mutex_lock(&orphan_dir_inode->i_mutex); + status = ocfs2_inode_lock(orphan_dir_inode, &orphan_dir_bh, 1); + if (status < 0) { + mutex_unlock(&orphan_dir_inode->i_mutex); + iput(orphan_dir_inode); + mlog_errno(status); + goto bail_unlock_inode; + } + + handle = ocfs2_start_trans(osb, + OCFS2_INODE_DEL_FROM_ORPHAN_CREDITS); + if (IS_ERR(handle)) { + status = PTR_ERR(handle); + goto bail_unlock_orphan; + } + + BUG_ON(!(di->i_flags & cpu_to_le32(OCFS2_DIO_ORPHANED_FL))); + + /* Only delete entry if OCFS2_ORPHANED_FL not set, or + * there are two entries added */ + if (!(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL)) || + (di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL) && + (di->i_orphaned_slot != di->i_dio_orphaned_slot))) { + status = ocfs2_orphan_del(osb, handle, orphan_dir_inode, + inode, orphan_dir_bh); + if (status < 0) { + mlog_errno(status); + goto bail_commit; + } + } + + status = ocfs2_journal_access_di(handle, + INODE_CACHE(inode), + di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (status < 0) { + mlog_errno(status); + goto bail_commit; + } + + di->i_flags &= ~cpu_to_le32(OCFS2_DIO_ORPHANED_FL); + di->i_dio_orphaned_slot = 0; + + if (update_isize) { + status = ocfs2_set_inode_size(handle, inode, di_bh, end); + if (status) + mlog_errno(status); + } else + ocfs2_journal_dirty(handle, di_bh); + +bail_commit: + ocfs2_commit_trans(osb, handle); + +bail_unlock_orphan: + ocfs2_inode_unlock(orphan_dir_inode, 1); + mutex_unlock(&orphan_dir_inode->i_mutex); + brelse(orphan_dir_bh); + iput(orphan_dir_inode); + +bail_unlock_inode: + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + +bail: + return status; +} + int ocfs2_mv_orphaned_inode_to_new(struct inode *dir, struct inode *inode, struct dentry *dentry) diff -puN fs/ocfs2/namei.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/namei.h --- a/fs/ocfs2/namei.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir +++ a/fs/ocfs2/namei.h @@ -38,6 +38,11 @@ int ocfs2_orphan_del(struct ocfs2_super int ocfs2_create_inode_in_orphan(struct inode *dir, int mode, struct inode **new_inode); +int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb, + struct inode *inode); +int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb, + struct inode *inode, int update_isize, + loff_t end); int ocfs2_mv_orphaned_inode_to_new(struct inode *dir, struct inode *new_inode, struct dentry *new_dentry); diff -puN fs/ocfs2/ocfs2_fs.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/ocfs2_fs.h --- a/fs/ocfs2/ocfs2_fs.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir +++ a/fs/ocfs2/ocfs2_fs.h @@ -229,6 +229,7 @@ #define OCFS2_CHAIN_FL (0x00000400) /* Chain allocator */ #define OCFS2_DEALLOC_FL (0x00000800) /* Truncate log */ #define OCFS2_QUOTA_FL (0x00001000) /* Quota file */ +#define OCFS2_DIO_ORPHANED_FL (0X00002000) /* On the orphan list especially for dio */ /* * Flags on ocfs2_dinode.i_dyn_features @@ -729,7 +730,9 @@ struct ocfs2_dinode { inode belongs to. Only valid if allocated from a discontiguous block group */ -/*A0*/ __le64 i_reserved2[3]; +/*A0*/ __le16 i_dio_orphaned_slot; /* only used for append dio write */ + __le16 i_reserved1[3]; + __le64 i_reserved2[2]; /*B8*/ union { __le64 i_pad1; /* Generic way to refer to this 64bit union */ diff -puN fs/ocfs2/ocfs2_trace.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/ocfs2_trace.h --- a/fs/ocfs2/ocfs2_trace.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir +++ a/fs/ocfs2/ocfs2_trace.h @@ -2373,6 +2373,9 @@ DEFINE_OCFS2_ULL_EVENT(ocfs2_orphan_add_ DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_orphan_add_end); +DEFINE_OCFS2_ULL_EVENT(ocfs2_dio_orphan_add_begin); +DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_dio_orphan_add_end); + TRACE_EVENT(ocfs2_orphan_del, TP_PROTO(unsigned long long dir, const char *name, int namelen), TP_ARGS(dir, name, namelen),