Message ID | 51C2BC1F.2010106@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 20 Jun 2013 16:23:59 +0800 shencanquan <shencanquan@huawei.com> wrote: > llseek requires ocfs2 inode lock for updating the file size in SEEK_END. > because the file size maybe update on another node. > if it not . after call llseek in SEEK_END. the position is old. > > this bug can be reproduce the following scenario: > at first ,we dd a test fileA,the file size is 10k. > on NodeA: > --------- > 1) open the test fileA, lseek the end of file. and print the position. > 2) close the test fileA > > on NodeB: > 1) open the test fileA, append the 5k data to test FileA. > 2) lseek the end of file. and print the position. > 3) close file. > > at first we run the test program1 on NodeA , the result is 10k. > and then run the test program2 on NodeB, the result is 15k. > at last, we run the test program1 on NodeA again, the result is 10k. > > after apply this patch. the three step result is 15k. > > ... > > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) > case SEEK_SET: > break; > case SEEK_END: > + /* SEEK_END requires the OCFS2 inode lock for the file > + * because it references the file's size. > + */ > + ret = ocfs2_inode_lock(inode, NULL, 0); > + if (ret < 0) { > + mlog_errno(ret); > + goto out; > + } > offset += inode->i_size; > + ocfs2_inode_unlock(inode, 0); > break; I don't understand this. The lock for inode->i_size is inode->i_mutex, and we're already holding i_mutex here. The current mainline code looks correct. My guess is that there is some other code path which is modifying inode->i_size without holding inode->i_mutex, and while holding ocfs2_inode_lock(). If so, that code is surely wrong - it should hold i_mutex while modifying i_size. Also, safely reading i_size should be performed via i_size_read(), and modifications to i_size should use i_size_write(). And all this is only really applicable to 32-bit CPUs, which you probably aren't using. So.... please let's take a second look at this.
On 2013/6/27 5:18, Andrew Morton wrote: > On Thu, 20 Jun 2013 16:23:59 +0800 shencanquan <shencanquan@huawei.com> wrote: > >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END. >> because the file size maybe update on another node. >> if it not . after call llseek in SEEK_END. the position is old. >> >> this bug can be reproduce the following scenario: >> at first ,we dd a test fileA,the file size is 10k. >> on NodeA: >> --------- >> 1) open the test fileA, lseek the end of file. and print the position. >> 2) close the test fileA >> >> on NodeB: >> 1) open the test fileA, append the 5k data to test FileA. >> 2) lseek the end of file. and print the position. >> 3) close file. >> >> at first we run the test program1 on NodeA , the result is 10k. >> and then run the test program2 on NodeB, the result is 15k. >> at last, we run the test program1 on NodeA again, the result is 10k. >> >> after apply this patch. the three step result is 15k. >> >> ... >> >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) >> case SEEK_SET: >> break; >> case SEEK_END: >> + /* SEEK_END requires the OCFS2 inode lock for the file >> + * because it references the file's size. >> + */ >> + ret = ocfs2_inode_lock(inode, NULL, 0); >> + if (ret < 0) { >> + mlog_errno(ret); >> + goto out; >> + } >> offset += inode->i_size; >> + ocfs2_inode_unlock(inode, 0); >> break; > > I don't understand this. The lock for inode->i_size is inode->i_mutex, > and we're already holding i_mutex here. The current mainline code > looks correct. > > My guess is that there is some other code path which is modifying > inode->i_size without holding inode->i_mutex, and while holding > ocfs2_inode_lock(). If so, that code is surely wrong - it should hold > i_mutex while modifying i_size. inode->i_mutex lock only protect the inode size on the same machine. because ocfs2 is cluster file system. so other machine maybe append the file. and modify the file size. and so before get the inode size, it need cluster lock, when other node unlock cluster lock it will write the inode size to disk. and this machine acquire the lock it will read the inode size from disk. > > Also, safely reading i_size should be performed via i_size_read(), and > modifications to i_size should use i_size_write(). > ok, thanks for your comments, I will modify this and resent the patch. > And all this is only really applicable to 32-bit CPUs, which you > probably aren't using. I don't understand this. > > So.... please let's take a second look at this. > > . >
On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com> wrote: > On 2013/6/27 5:18, Andrew Morton wrote: > > > > > My guess is that there is some other code path which is modifying > > inode->i_size without holding inode->i_mutex, and while holding > > ocfs2_inode_lock(). If so, that code is surely wrong - it should hold > > i_mutex while modifying i_size. > > > inode->i_mutex lock only protect the inode size on the same machine. ah ;) > > And all this is only really applicable to 32-bit CPUs, which you > > probably aren't using. > > I don't understand this. The i_size_read/i_size_write infrastructure is designed to efficiently handle the situation where a 32-bit machine reads a 64-bit number which might be undergoing modification on another CPU. We don't want the reading CPU to see an invalid number when the writing CPU is in the middle of modifying the two 32-bit words.
On 2013/6/27 9:25, Andrew Morton wrote: > On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com> wrote: > >> On 2013/6/27 5:18, Andrew Morton wrote: >> >>> >>> My guess is that there is some other code path which is modifying >>> inode->i_size without holding inode->i_mutex, and while holding >>> ocfs2_inode_lock(). If so, that code is surely wrong - it should hold >>> i_mutex while modifying i_size. >> >> >> inode->i_mutex lock only protect the inode size on the same machine. > > ah ;) > >>> And all this is only really applicable to 32-bit CPUs, which you >>> probably aren't using. >> >> I don't understand this. > > The i_size_read/i_size_write infrastructure is designed to efficiently > handle the situation where a 32-bit machine reads a 64-bit number which > might be undergoing modification on another CPU. We don't want the > reading CPU to see an invalid number when the writing CPU is in the > middle of modifying the two 32-bit words. > > > ok. thanks.
AFAIR, this behavior has been there since day 1 and changing it will impact performance negatively. I would recommend against making this change for one app. On Wed, Jun 26, 2013 at 6:50 PM, shencanquan <shencanquan@huawei.com> wrote: > On 2013/6/27 9:25, Andrew Morton wrote: > > > On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com> > wrote: > > > >> On 2013/6/27 5:18, Andrew Morton wrote: > >> > >>> > >>> My guess is that there is some other code path which is modifying > >>> inode->i_size without holding inode->i_mutex, and while holding > >>> ocfs2_inode_lock(). If so, that code is surely wrong - it should hold > >>> i_mutex while modifying i_size. > >> > >> > >> inode->i_mutex lock only protect the inode size on the same machine. > > > > ah ;) > > > >>> And all this is only really applicable to 32-bit CPUs, which you > >>> probably aren't using. > >> > >> I don't understand this. > > > > The i_size_read/i_size_write infrastructure is designed to efficiently > > handle the situation where a 32-bit machine reads a 64-bit number which > > might be undergoing modification on another CPU. We don't want the > > reading CPU to see an invalid number when the writing CPU is in the > > middle of modifying the two 32-bit words. > > > > > > > > ok. thanks. > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
On 2013/6/27 11:34, Sunil Mushran wrote: > AFAIR, this behavior has been there since day 1 and changing it will impact performance negatively. I would recommend against making this > change for one app. > I think it will impact the performance negatively. but I think it is very very little, because in cluster lock function, the inode size will update from dlm in lvb. and it don't update by disk (another machine write file size to disk and this machine read file size from disk to update the inode size) if the file size update by disk. I think it will impact the performance very much. > > On Wed, Jun 26, 2013 at 6:50 PM, shencanquan <shencanquan@huawei.com <mailto:shencanquan@huawei.com>> wrote: > > On 2013/6/27 9:25, Andrew Morton wrote: > > > On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com <mailto:shencanquan@huawei.com>> wrote: > > > >> On 2013/6/27 5:18, Andrew Morton wrote: > >> > >>> > >>> My guess is that there is some other code path which is modifying > >>> inode->i_size without holding inode->i_mutex, and while holding > >>> ocfs2_inode_lock(). If so, that code is surely wrong - it should hold > >>> i_mutex while modifying i_size. > >> > >> > >> inode->i_mutex lock only protect the inode size on the same machine. > > > > ah ;) > > > >>> And all this is only really applicable to 32-bit CPUs, which you > >>> probably aren't using. > >> > >> I don't understand this. > > > > The i_size_read/i_size_write infrastructure is designed to efficiently > > handle the situation where a 32-bit machine reads a 64-bit number which > > might be undergoing modification on another CPU. We don't want the > > reading CPU to see an invalid number when the writing CPU is in the > > middle of modifying the two 32-bit words. > > > > > > > > ok. thanks. > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > >
On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran <sunil.mushran@gmail.com> wrote: > AFAIR, this behavior has been there since day 1 and changing it will impact > performance negatively. I would recommend against making this change for > one app. Well, it's a bug fix isn't it? SEEK_END on a 15k file is presently returning 10k. Obviously tradeoffs are involved - is there any way in which this behaviour can be fixed without serious performance damage?
The qs is whether this change is required for a real problem or not. If so, what is that logic that gets tripped up by this behaviour. On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton <akpm@linux-foundation.org>wrote: > On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran <sunil.mushran@gmail.com> > wrote: > > > AFAIR, this behavior has been there since day 1 and changing it will > impact > > performance negatively. I would recommend against making this change for > > one app. > > Well, it's a bug fix isn't it? SEEK_END on a 15k file is presently > returning > 10k. > > Obviously tradeoffs are involved - is there any way in which this > behaviour can be fixed without serious performance damage? > >
On 06/28/2013 06:59 AM, Sunil Mushran wrote: > The qs is whether this change is required for a real problem or not. If > so, what is that logic > that gets tripped up by this behaviour. IMHO, there have a couple of commands in Coreutils would be affected by this behavior: - tail(1) with "-c bytes" option. - wc(1): when counting only bytes, it will try to figure out the file size through lseek (SEEK_END - SEEK_CUR) when counting only bytes. Other popular programs need get the file size via SEEK_END only in some corner cases. e.g, if the input file is not a regular file or the utility failed to fetch the file size via fstat(2), they will call lseek as an alternative approach, like split(1), truncate(1), head(1) with '-c bytes' option. Generally, the user applications might fetch the file size through fstat(2) which will go through ocfs2_getattr() so that it's safe to us because we perform inode re-validation before filling up the generic attributes. IOWs, the user application could be changed accordingly to avoid this issue. However, this most likely a workaround to some extent, and we have a bug for the SEEK_END business. As per our current discussion, one big concern of us is regarding the performance with this fix, I'd like to consider it from another point of view, that is the user should take all the blame to themselves if they would like to run SEEK_END bounds operation on OCFS2, does it sounds make sense? :-P. If yes, it would be better to consolidate the code comments to indicate the performance negatively and it is not advisable to fetch file size via SEEK_END on OCFS2. Thanks, -Jeff > > > On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton > <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> wrote: > > On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran > <sunil.mushran@gmail.com <mailto:sunil.mushran@gmail.com>> wrote: > > > AFAIR, this behavior has been there since day 1 and changing it > will impact > > performance negatively. I would recommend against making this > change for > > one app. > > Well, it's a bug fix isn't it? SEEK_END on a 15k file is presently > returning > 10k. > > Obviously tradeoffs are involved - is there any way in which this > behaviour can be fixed without serious performance damage? > > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
On Fri, Jun 28, 2013 at 05:11:25PM +0800, Jeff Liu wrote: > On 06/28/2013 06:59 AM, Sunil Mushran wrote: > > > The qs is whether this change is required for a real problem or not. If > > so, what is that logic > > that gets tripped up by this behaviour. This is a straight correctness fix. We take the hit and apply it. I'll check the actual patch on that email. Joel > > > IMHO, there have a couple of commands in Coreutils would be affected by this > behavior: > - tail(1) with "-c bytes" option. > - wc(1): when counting only bytes, it will try to figure out the file size > through lseek (SEEK_END - SEEK_CUR) when counting only bytes. > > Other popular programs need get the file size via SEEK_END only in some corner > cases. e.g, if the input file is not a regular file or the utility failed to > fetch the file size via fstat(2), they will call lseek as an alternative approach, > like split(1), truncate(1), head(1) with '-c bytes' option. > > Generally, the user applications might fetch the file size through fstat(2) which > will go through ocfs2_getattr() so that it's safe to us because we perform inode > re-validation before filling up the generic attributes. > > IOWs, the user application could be changed accordingly to avoid this issue. > However, this most likely a workaround to some extent, and we have a bug for > the SEEK_END business. > > As per our current discussion, one big concern of us is regarding the performance > with this fix, I'd like to consider it from another point of view, that is the user > should take all the blame to themselves if they would like to run SEEK_END bounds > operation on OCFS2, does it sounds make sense? :-P. > If yes, it would be better to consolidate the code comments to indicate the > performance negatively and it is not advisable to fetch file size via SEEK_END > on OCFS2. > > Thanks, > -Jeff > > > > > > > On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton > > <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> wrote: > > > > On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran > > <sunil.mushran@gmail.com <mailto:sunil.mushran@gmail.com>> wrote: > > > > > AFAIR, this behavior has been there since day 1 and changing it > > will impact > > > performance negatively. I would recommend against making this > > change for > > > one app. > > > > Well, it's a bug fix isn't it? SEEK_END on a 15k file is presently > > returning > > 10k. > > > > Obviously tradeoffs are involved - is there any way in which this > > behaviour can be fixed without serious performance damage? > > > > > > > > > > _______________________________________________ > > Ocfs2-devel mailing list > > Ocfs2-devel@oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote: > llseek requires ocfs2 inode lock for updating the file size in SEEK_END. > because the file size maybe update on another node. > if it not . after call llseek in SEEK_END. the position is old. > > this bug can be reproduce the following scenario: > at first ,we dd a test fileA,the file size is 10k. > on NodeA: > --------- > 1) open the test fileA, lseek the end of file. and print the position. > 2) close the test fileA > > on NodeB: > 1) open the test fileA, append the 5k data to test FileA. > 2) lseek the end of file. and print the position. > 3) close file. > > at first we run the test program1 on NodeA , the result is 10k. > and then run the test program2 on NodeB, the result is 15k. > at last, we run the test program1 on NodeA again, the result is 10k. > > after apply this patch. the three step result is 15k. > > Signed-off-by: jensen <shencanquan@huawei.com> > --- > fs/ocfs2/file.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index ff54014..3afd24c 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) > case SEEK_SET: > break; > case SEEK_END: > + /* SEEK_END requires the OCFS2 inode lock for the file > + * because it references the file's size. > + */ > + ret = ocfs2_inode_lock(inode, NULL, 0); > + if (ret < 0) { > + mlog_errno(ret); > + goto out; > + } > offset += inode->i_size; > + ocfs2_inode_unlock(inode, 0); Why wouldn't ocfs2_rw_lock() work? Just because we dont get the LVB from it? Joel > break; > case SEEK_CUR: > if (offset == 0) { > -- > 1.7.9.7 > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
On 2013/6/29 21:37, Joel Becker wrote: > On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote: >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END. >> because the file size maybe update on another node. >> if it not . after call llseek in SEEK_END. the position is old. >> >> this bug can be reproduce the following scenario: >> at first ,we dd a test fileA,the file size is 10k. >> on NodeA: >> --------- >> 1) open the test fileA, lseek the end of file. and print the position. >> 2) close the test fileA >> >> on NodeB: >> 1) open the test fileA, append the 5k data to test FileA. >> 2) lseek the end of file. and print the position. >> 3) close file. >> >> at first we run the test program1 on NodeA , the result is 10k. >> and then run the test program2 on NodeB, the result is 15k. >> at last, we run the test program1 on NodeA again, the result is 10k. >> >> after apply this patch. the three step result is 15k. >> >> Signed-off-by: jensen <shencanquan@huawei.com> >> --- >> fs/ocfs2/file.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index ff54014..3afd24c 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) >> case SEEK_SET: >> break; >> case SEEK_END: >> + /* SEEK_END requires the OCFS2 inode lock for the file >> + * because it references the file's size. >> + */ >> + ret = ocfs2_inode_lock(inode, NULL, 0); >> + if (ret < 0) { >> + mlog_errno(ret); >> + goto out; >> + } >> offset += inode->i_size; >> + ocfs2_inode_unlock(inode, 0); > > Why wouldn't ocfs2_rw_lock() work? Just because we dont get the LVB > from it? > Yes. we want to update the inode size from lvb. I also think the file size maybe protected by inode lock. not rw lock. > Joel > >> break; >> case SEEK_CUR: >> if (offset == 0) { >> -- >> 1.7.9.7 >> >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
On Mon, Jul 01, 2013 at 09:38:31AM +0800, Jensen wrote: > On 2013/6/29 21:37, Joel Becker wrote: > > > On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote: > >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END. > >> because the file size maybe update on another node. > >> if it not . after call llseek in SEEK_END. the position is old. > >> > >> this bug can be reproduce the following scenario: > >> at first ,we dd a test fileA,the file size is 10k. > >> on NodeA: > >> --------- > >> 1) open the test fileA, lseek the end of file. and print the position. > >> 2) close the test fileA > >> > >> on NodeB: > >> 1) open the test fileA, append the 5k data to test FileA. > >> 2) lseek the end of file. and print the position. > >> 3) close file. > >> > >> at first we run the test program1 on NodeA , the result is 10k. > >> and then run the test program2 on NodeB, the result is 15k. > >> at last, we run the test program1 on NodeA again, the result is 10k. > >> > >> after apply this patch. the three step result is 15k. > >> > >> Signed-off-by: jensen <shencanquan@huawei.com> > >> --- > >> fs/ocfs2/file.c | 9 +++++++++ > >> 1 file changed, 9 insertions(+) > >> > >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > >> index ff54014..3afd24c 100644 > >> --- a/fs/ocfs2/file.c > >> +++ b/fs/ocfs2/file.c > >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) > >> case SEEK_SET: > >> break; > >> case SEEK_END: > >> + /* SEEK_END requires the OCFS2 inode lock for the file > >> + * because it references the file's size. > >> + */ > >> + ret = ocfs2_inode_lock(inode, NULL, 0); > >> + if (ret < 0) { > >> + mlog_errno(ret); > >> + goto out; > >> + } > >> offset += inode->i_size; > >> + ocfs2_inode_unlock(inode, 0); > > > > Why wouldn't ocfs2_rw_lock() work? Just because we dont get the LVB > > from it? > > > > > Yes. we want to update the inode size from lvb. > > I also think the file size maybe protected by inode lock. not rw lock. Correct, if you want to get the most up to date i_size you'll have to be holding the meta (inode) lock. --Mark -- Mark Fasheh
On Tue, Jul 02, 2013 at 12:58:26PM -0700, Mark Fasheh wrote: > On Mon, Jul 01, 2013 at 09:38:31AM +0800, Jensen wrote: > > On 2013/6/29 21:37, Joel Becker wrote: > > > > > On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote: > > >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END. > > >> because the file size maybe update on another node. > > >> if it not . after call llseek in SEEK_END. the position is old. > > >> > > >> this bug can be reproduce the following scenario: > > >> at first ,we dd a test fileA,the file size is 10k. > > >> on NodeA: > > >> --------- > > >> 1) open the test fileA, lseek the end of file. and print the position. > > >> 2) close the test fileA > > >> > > >> on NodeB: > > >> 1) open the test fileA, append the 5k data to test FileA. > > >> 2) lseek the end of file. and print the position. > > >> 3) close file. > > >> > > >> at first we run the test program1 on NodeA , the result is 10k. > > >> and then run the test program2 on NodeB, the result is 15k. > > >> at last, we run the test program1 on NodeA again, the result is 10k. > > >> > > >> after apply this patch. the three step result is 15k. > > >> > > >> Signed-off-by: jensen <shencanquan@huawei.com> > > >> --- > > >> fs/ocfs2/file.c | 9 +++++++++ > > >> 1 file changed, 9 insertions(+) > > >> > > >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > > >> index ff54014..3afd24c 100644 > > >> --- a/fs/ocfs2/file.c > > >> +++ b/fs/ocfs2/file.c > > >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) > > >> case SEEK_SET: > > >> break; > > >> case SEEK_END: > > >> + /* SEEK_END requires the OCFS2 inode lock for the file > > >> + * because it references the file's size. > > >> + */ > > >> + ret = ocfs2_inode_lock(inode, NULL, 0); > > >> + if (ret < 0) { > > >> + mlog_errno(ret); > > >> + goto out; > > >> + } > > >> offset += inode->i_size; > > >> + ocfs2_inode_unlock(inode, 0); > > > > > > Why wouldn't ocfs2_rw_lock() work? Just because we dont get the LVB > > > from it? > > > > > > > > > Yes. we want to update the inode size from lvb. > > > > I also think the file size maybe protected by inode lock. not rw lock. > > Correct, if you want to get the most up to date i_size you'll have to be > holding the meta (inode) lock. Ok, then: Acked-by: Joel Becker <jlbec@evilplan.org> > --Mark > > -- > Mark Fasheh > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index ff54014..3afd24c 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence) case SEEK_SET: break; case SEEK_END: + /* SEEK_END requires the OCFS2 inode lock for the file + * because it references the file's size. + */ + ret = ocfs2_inode_lock(inode, NULL, 0); + if (ret < 0) { + mlog_errno(ret); + goto out; + } offset += inode->i_size; + ocfs2_inode_unlock(inode, 0); break; case SEEK_CUR: if (offset == 0) {
llseek requires ocfs2 inode lock for updating the file size in SEEK_END. because the file size maybe update on another node. if it not . after call llseek in SEEK_END. the position is old. this bug can be reproduce the following scenario: at first ,we dd a test fileA,the file size is 10k. on NodeA: --------- 1) open the test fileA, lseek the end of file. and print the position. 2) close the test fileA on NodeB: 1) open the test fileA, append the 5k data to test FileA. 2) lseek the end of file. and print the position. 3) close file. at first we run the test program1 on NodeA , the result is 10k. and then run the test program2 on NodeB, the result is 15k. at last, we run the test program1 on NodeA again, the result is 10k. after apply this patch. the three step result is 15k. Signed-off-by: jensen <shencanquan@huawei.com> --- fs/ocfs2/file.c | 9 +++++++++ 1 file changed, 9 insertions(+)