Message ID | 1353999029-3975-6-git-send-email-piastry@etersoft.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 27 Nov 2012 10:50:28 +0400 Pavel Shilovsky <piastry@etersoft.ru> wrote: > If we have a read oplock and set a read lock in it, we can't write to the > locked area - so, filemap_fdatawrite may fail with a no information for a > userspace application even if we request a write to non-locked area. Fix > this by replacing it with filemap_write_and_wait call and sending non-page > write in a error case. > > While this may end up with two write requests to the server, we can be sure > that our data will be the same at the server and the page cache - the next read > on this file gets the valid data. > > Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> > --- > fs/cifs/file.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index f8fe1bd..89efd85 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, > */ > if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { > ssize_t written; > - int rc; > > written = generic_file_aio_write(iocb, iov, nr_segs, pos); > - rc = filemap_fdatawrite(inode->i_mapping); > - if (rc) > - return (ssize_t)rc; > - > - return written; > + /* try page write at first */ > + if (!filemap_write_and_wait(inode->i_mapping)) > + return written; > + /* page write failed - try from pos to pos+len-1 */ > } > #endif > Bleh -- nasty. I guess this will work though... Wonder if there's some way to populate the cache and then just mark the pages clean without sending out writes? That would be a better solution IMO, but I guess we can live with this for now... Acked-by: Jeff Layton <jlayton@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/11/27 Jeff Layton <jlayton@redhat.com>: > On Tue, 27 Nov 2012 10:50:28 +0400 > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> If we have a read oplock and set a read lock in it, we can't write to the >> locked area - so, filemap_fdatawrite may fail with a no information for a >> userspace application even if we request a write to non-locked area. Fix >> this by replacing it with filemap_write_and_wait call and sending non-page >> write in a error case. >> >> While this may end up with two write requests to the server, we can be sure >> that our data will be the same at the server and the page cache - the next read >> on this file gets the valid data. >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> >> --- >> fs/cifs/file.c | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c >> index f8fe1bd..89efd85 100644 >> --- a/fs/cifs/file.c >> +++ b/fs/cifs/file.c >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, >> */ >> if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { >> ssize_t written; >> - int rc; >> >> written = generic_file_aio_write(iocb, iov, nr_segs, pos); >> - rc = filemap_fdatawrite(inode->i_mapping); >> - if (rc) >> - return (ssize_t)rc; >> - >> - return written; >> + /* try page write at first */ >> + if (!filemap_write_and_wait(inode->i_mapping)) >> + return written; >> + /* page write failed - try from pos to pos+len-1 */ >> } >> #endif >> > > Bleh -- nasty. I guess this will work though... > > Wonder if there's some way to populate the cache and then just mark the > pages clean without sending out writes? That would be a better solution > IMO, but I guess we can live with this for now... > > Acked-by: Jeff Layton <jlayton@redhat.com> Thanks for reviewing the patchset. As for this patch I have the followon patch: http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e that allows us to populate the page cache without making pages dirty. I didn't have enough time to test it well (that's why I haven't posted it yet) - going to do it tomorrow. But you are welcome to comment the approach.
Am curious how you noticed this problem? Did you have a test case? On Tue, Nov 27, 2012 at 12:50 AM, Pavel Shilovsky <piastry@etersoft.ru> wrote: > If we have a read oplock and set a read lock in it, we can't write to the > locked area - so, filemap_fdatawrite may fail with a no information for a > userspace application even if we request a write to non-locked area. Fix > this by replacing it with filemap_write_and_wait call and sending non-page > write in a error case. > > While this may end up with two write requests to the server, we can be sure > that our data will be the same at the server and the page cache - the next read > on this file gets the valid data. > > Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> > --- > fs/cifs/file.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index f8fe1bd..89efd85 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, > */ > if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { > ssize_t written; > - int rc; > > written = generic_file_aio_write(iocb, iov, nr_segs, pos); > - rc = filemap_fdatawrite(inode->i_mapping); > - if (rc) > - return (ssize_t)rc; > - > - return written; > + /* try page write at first */ > + if (!filemap_write_and_wait(inode->i_mapping)) > + return written; > + /* page write failed - try from pos to pos+len-1 */ > } > #endif > > -- > 1.7.10.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/11/27 Steve French <smfrench@gmail.com>:
> Am curious how you noticed this problem? Did you have a test case?
The test case that reproduces the problem:
1) client1 opens "filename" (file that already has enough data ~~ 4096
bytes is ok)
2) client1 sets a read lock a file from pos=1 to pos=2
3) client2 opens "filename" - client1 gets oplock break to level II
and push lock from #2 to the server.
4) client1 writes 'a' to a file (from pos 0) - got status OK, but the
server reject the write (due to the client1 writes the whole page from
pos=0 to pos=4096 and conflicts with the read lock from pos=1 to
pos=2) - ERROR.
5) client2 reads 1 byte and got the old data (not 'a') because the
server returns it's version of the data
It is suitable for both CIFS and SMB2.
On Tue, 27 Nov 2012 18:57:22 +0400 Pavel Shilovsky <piastry@etersoft.ru> wrote: > 2012/11/27 Jeff Layton <jlayton@redhat.com>: > > On Tue, 27 Nov 2012 10:50:28 +0400 > > Pavel Shilovsky <piastry@etersoft.ru> wrote: > > > >> If we have a read oplock and set a read lock in it, we can't write to the > >> locked area - so, filemap_fdatawrite may fail with a no information for a > >> userspace application even if we request a write to non-locked area. Fix > >> this by replacing it with filemap_write_and_wait call and sending non-page > >> write in a error case. > >> > >> While this may end up with two write requests to the server, we can be sure > >> that our data will be the same at the server and the page cache - the next read > >> on this file gets the valid data. > >> > >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> > >> --- > >> fs/cifs/file.c | 10 ++++------ > >> 1 file changed, 4 insertions(+), 6 deletions(-) > >> > >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c > >> index f8fe1bd..89efd85 100644 > >> --- a/fs/cifs/file.c > >> +++ b/fs/cifs/file.c > >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, > >> */ > >> if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { > >> ssize_t written; > >> - int rc; > >> > >> written = generic_file_aio_write(iocb, iov, nr_segs, pos); > >> - rc = filemap_fdatawrite(inode->i_mapping); > >> - if (rc) > >> - return (ssize_t)rc; > >> - > >> - return written; > >> + /* try page write at first */ > >> + if (!filemap_write_and_wait(inode->i_mapping)) > >> + return written; > >> + /* page write failed - try from pos to pos+len-1 */ > >> } > >> #endif > >> > > > > Bleh -- nasty. I guess this will work though... > > > > Wonder if there's some way to populate the cache and then just mark the > > pages clean without sending out writes? That would be a better solution > > IMO, but I guess we can live with this for now... > > > > Acked-by: Jeff Layton <jlayton@redhat.com> > > Thanks for reviewing the patchset. > > As for this patch I have the followon patch: > > http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e > > that allows us to populate the page cache without making pages dirty. > I didn't have enough time to test it well (that's why I haven't posted > it yet) - going to do it tomorrow. But you are welcome to comment the > approach. > The only thing that makes me wary here is that you're setting this flag on the inode. Could there ever be a situation where another task might be writing to this inode at the same time and needs to set them dirty? If not, then I'm not sure I see the need for a new bool in the inode. It might be simpler to just check what sort of oplock you have in write_end instead. There's also a lot of logic around what sort of locking you're doing here too. I think we ought to do the same sort of I/O regardless of whether POSIX locks are being used or not.
2012/11/28 Jeff Layton <jlayton@redhat.com>: > On Tue, 27 Nov 2012 18:57:22 +0400 > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> 2012/11/27 Jeff Layton <jlayton@redhat.com>: >> > On Tue, 27 Nov 2012 10:50:28 +0400 >> > Pavel Shilovsky <piastry@etersoft.ru> wrote: >> > >> >> If we have a read oplock and set a read lock in it, we can't write to the >> >> locked area - so, filemap_fdatawrite may fail with a no information for a >> >> userspace application even if we request a write to non-locked area. Fix >> >> this by replacing it with filemap_write_and_wait call and sending non-page >> >> write in a error case. >> >> >> >> While this may end up with two write requests to the server, we can be sure >> >> that our data will be the same at the server and the page cache - the next read >> >> on this file gets the valid data. >> >> >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> >> >> --- >> >> fs/cifs/file.c | 10 ++++------ >> >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c >> >> index f8fe1bd..89efd85 100644 >> >> --- a/fs/cifs/file.c >> >> +++ b/fs/cifs/file.c >> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, >> >> */ >> >> if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { >> >> ssize_t written; >> >> - int rc; >> >> >> >> written = generic_file_aio_write(iocb, iov, nr_segs, pos); >> >> - rc = filemap_fdatawrite(inode->i_mapping); >> >> - if (rc) >> >> - return (ssize_t)rc; >> >> - >> >> - return written; >> >> + /* try page write at first */ >> >> + if (!filemap_write_and_wait(inode->i_mapping)) >> >> + return written; >> >> + /* page write failed - try from pos to pos+len-1 */ >> >> } >> >> #endif >> >> >> > >> > Bleh -- nasty. I guess this will work though... >> > >> > Wonder if there's some way to populate the cache and then just mark the >> > pages clean without sending out writes? That would be a better solution >> > IMO, but I guess we can live with this for now... >> > >> > Acked-by: Jeff Layton <jlayton@redhat.com> >> >> Thanks for reviewing the patchset. >> >> As for this patch I have the followon patch: >> >> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e >> >> that allows us to populate the page cache without making pages dirty. >> I didn't have enough time to test it well (that's why I haven't posted >> it yet) - going to do it tomorrow. But you are welcome to comment the >> approach. >> > > The only thing that makes me wary here is that you're setting this flag > on the inode. Could there ever be a situation where another task might > be writing to this inode at the same time and needs to set them dirty? > > If not, then I'm not sure I see the need for a new bool in the inode. > It might be simpler to just check what sort of oplock you have in > write_end instead. > > There's also a lot of logic around what sort of locking you're doing > here too. I think we ought to do the same sort of I/O regardless of > whether POSIX locks are being used or not. > There are some places where VFS code uses write_end call (through pagecache_write_end) but I didn't find any place where cifs code can hit it. So, I think we can assume now that cifs_write_end is called only from generic_file_aio_write codepath. If so, we can be sure that only on process may want to set pages dirty through cifs_write_end due to i_mutex lock. It seems your are right and we can use clientCanCacheAll value directly from cifs_write_end - will make changes. Also, I think I can merge these two patches into one. Thanks.
On Wed, 28 Nov 2012 16:12:56 +0400 Pavel Shilovsky <piastry@etersoft.ru> wrote: > 2012/11/28 Jeff Layton <jlayton@redhat.com>: > > On Tue, 27 Nov 2012 18:57:22 +0400 > > Pavel Shilovsky <piastry@etersoft.ru> wrote: > > > >> 2012/11/27 Jeff Layton <jlayton@redhat.com>: > >> > On Tue, 27 Nov 2012 10:50:28 +0400 > >> > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> > > >> >> If we have a read oplock and set a read lock in it, we can't write to the > >> >> locked area - so, filemap_fdatawrite may fail with a no information for a > >> >> userspace application even if we request a write to non-locked area. Fix > >> >> this by replacing it with filemap_write_and_wait call and sending non-page > >> >> write in a error case. > >> >> > >> >> While this may end up with two write requests to the server, we can be sure > >> >> that our data will be the same at the server and the page cache - the next read > >> >> on this file gets the valid data. > >> >> > >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> > >> >> --- > >> >> fs/cifs/file.c | 10 ++++------ > >> >> 1 file changed, 4 insertions(+), 6 deletions(-) > >> >> > >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c > >> >> index f8fe1bd..89efd85 100644 > >> >> --- a/fs/cifs/file.c > >> >> +++ b/fs/cifs/file.c > >> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, > >> >> */ > >> >> if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { > >> >> ssize_t written; > >> >> - int rc; > >> >> > >> >> written = generic_file_aio_write(iocb, iov, nr_segs, pos); > >> >> - rc = filemap_fdatawrite(inode->i_mapping); > >> >> - if (rc) > >> >> - return (ssize_t)rc; > >> >> - > >> >> - return written; > >> >> + /* try page write at first */ > >> >> + if (!filemap_write_and_wait(inode->i_mapping)) > >> >> + return written; > >> >> + /* page write failed - try from pos to pos+len-1 */ > >> >> } > >> >> #endif > >> >> > >> > > >> > Bleh -- nasty. I guess this will work though... > >> > > >> > Wonder if there's some way to populate the cache and then just mark the > >> > pages clean without sending out writes? That would be a better solution > >> > IMO, but I guess we can live with this for now... > >> > > >> > Acked-by: Jeff Layton <jlayton@redhat.com> > >> > >> Thanks for reviewing the patchset. > >> > >> As for this patch I have the followon patch: > >> > >> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e > >> > >> that allows us to populate the page cache without making pages dirty. > >> I didn't have enough time to test it well (that's why I haven't posted > >> it yet) - going to do it tomorrow. But you are welcome to comment the > >> approach. > >> > > > > The only thing that makes me wary here is that you're setting this flag > > on the inode. Could there ever be a situation where another task might > > be writing to this inode at the same time and needs to set them dirty? > > > > If not, then I'm not sure I see the need for a new bool in the inode. > > It might be simpler to just check what sort of oplock you have in > > write_end instead. > > > > There's also a lot of logic around what sort of locking you're doing > > here too. I think we ought to do the same sort of I/O regardless of > > whether POSIX locks are being used or not. > > > > There are some places where VFS code uses write_end call (through > pagecache_write_end) but I didn't find any place where cifs code can > hit it. So, I think we can assume now that cifs_write_end is called > only from generic_file_aio_write codepath. If so, we can be sure that > only on process may want to set pages dirty through cifs_write_end due > to i_mutex lock. > > It seems your are right and we can use clientCanCacheAll value > directly from cifs_write_end - will make changes. Also, I think I can > merge these two patches into one. > > Thanks. > Note though, that it matters what sort of cache= option is in force too. If cache=loose then you *do* want to set the page dirty.
2012/11/28 Jeff Layton <jlayton@redhat.com>: > On Wed, 28 Nov 2012 16:12:56 +0400 > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> 2012/11/28 Jeff Layton <jlayton@redhat.com>: >> > On Tue, 27 Nov 2012 18:57:22 +0400 >> > Pavel Shilovsky <piastry@etersoft.ru> wrote: >> > >> >> 2012/11/27 Jeff Layton <jlayton@redhat.com>: >> >> > On Tue, 27 Nov 2012 10:50:28 +0400 >> >> > Pavel Shilovsky <piastry@etersoft.ru> wrote: >> >> > >> >> >> If we have a read oplock and set a read lock in it, we can't write to the >> >> >> locked area - so, filemap_fdatawrite may fail with a no information for a >> >> >> userspace application even if we request a write to non-locked area. Fix >> >> >> this by replacing it with filemap_write_and_wait call and sending non-page >> >> >> write in a error case. >> >> >> >> >> >> While this may end up with two write requests to the server, we can be sure >> >> >> that our data will be the same at the server and the page cache - the next read >> >> >> on this file gets the valid data. >> >> >> >> >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> >> >> >> --- >> >> >> fs/cifs/file.c | 10 ++++------ >> >> >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> >> >> >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c >> >> >> index f8fe1bd..89efd85 100644 >> >> >> --- a/fs/cifs/file.c >> >> >> +++ b/fs/cifs/file.c >> >> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, >> >> >> */ >> >> >> if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { >> >> >> ssize_t written; >> >> >> - int rc; >> >> >> >> >> >> written = generic_file_aio_write(iocb, iov, nr_segs, pos); >> >> >> - rc = filemap_fdatawrite(inode->i_mapping); >> >> >> - if (rc) >> >> >> - return (ssize_t)rc; >> >> >> - >> >> >> - return written; >> >> >> + /* try page write at first */ >> >> >> + if (!filemap_write_and_wait(inode->i_mapping)) >> >> >> + return written; >> >> >> + /* page write failed - try from pos to pos+len-1 */ >> >> >> } >> >> >> #endif >> >> >> >> >> > >> >> > Bleh -- nasty. I guess this will work though... >> >> > >> >> > Wonder if there's some way to populate the cache and then just mark the >> >> > pages clean without sending out writes? That would be a better solution >> >> > IMO, but I guess we can live with this for now... >> >> > >> >> > Acked-by: Jeff Layton <jlayton@redhat.com> >> >> >> >> Thanks for reviewing the patchset. >> >> >> >> As for this patch I have the followon patch: >> >> >> >> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e >> >> >> >> that allows us to populate the page cache without making pages dirty. >> >> I didn't have enough time to test it well (that's why I haven't posted >> >> it yet) - going to do it tomorrow. But you are welcome to comment the >> >> approach. >> >> >> > >> > The only thing that makes me wary here is that you're setting this flag >> > on the inode. Could there ever be a situation where another task might >> > be writing to this inode at the same time and needs to set them dirty? >> > >> > If not, then I'm not sure I see the need for a new bool in the inode. >> > It might be simpler to just check what sort of oplock you have in >> > write_end instead. >> > >> > There's also a lot of logic around what sort of locking you're doing >> > here too. I think we ought to do the same sort of I/O regardless of >> > whether POSIX locks are being used or not. >> > >> >> There are some places where VFS code uses write_end call (through >> pagecache_write_end) but I didn't find any place where cifs code can >> hit it. So, I think we can assume now that cifs_write_end is called >> only from generic_file_aio_write codepath. If so, we can be sure that >> only on process may want to set pages dirty through cifs_write_end due >> to i_mutex lock. >> >> It seems your are right and we can use clientCanCacheAll value >> directly from cifs_write_end - will make changes. Also, I think I can >> merge these two patches into one. >> >> Thanks. >> > > Note though, that it matters what sort of cache= option is in force > too. If cache=loose then you *do* want to set the page dirty. This makes sense, thanks.
2012/11/28 Jeff Layton <jlayton@redhat.com>: > There's also a lot of logic around what sort of locking you're doing > here too. I think we ought to do the same sort of I/O regardless of > whether POSIX locks are being used or not. We can use cifs_writev for both POSIX and mandatory variants but I divided them to make POSIX variant work faster (no need to check hold a semaphore, walk through a lock list, etc).
On Wed, 28 Nov 2012 17:55:41 +0400 Pavel Shilovsky <piastry@etersoft.ru> wrote: > 2012/11/28 Jeff Layton <jlayton@redhat.com>: > > There's also a lot of logic around what sort of locking you're doing > > here too. I think we ought to do the same sort of I/O regardless of > > whether POSIX locks are being used or not. > > We can use cifs_writev for both POSIX and mandatory variants but I > divided them to make POSIX variant work faster (no need to check hold > a semaphore, walk through a lock list, etc). > Refresh my memory -- why do we need to handle writes differently when POSIX vs. non-POSIX locking is in force? It seems to me that that shouldn't matter and the behavior should be solely a function of what sort of oplock you have.
2012/11/28 Jeff Layton <jlayton@redhat.com>: > On Wed, 28 Nov 2012 17:55:41 +0400 > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> 2012/11/28 Jeff Layton <jlayton@redhat.com>: >> > There's also a lot of logic around what sort of locking you're doing >> > here too. I think we ought to do the same sort of I/O regardless of >> > whether POSIX locks are being used or not. >> >> We can use cifs_writev for both POSIX and mandatory variants but I >> divided them to make POSIX variant work faster (no need to check hold >> a semaphore, walk through a lock list, etc). >> > > Refresh my memory -- why do we need to handle writes differently when > POSIX vs. non-POSIX locking is in force? It seems to me that that > shouldn't matter and the behavior should be solely a function of what > sort of oplock you have. A write request can have conflict with mandatory locks set on a file. That's why we need to check for lock conflicts before issue the write/read.
On Wed, 28 Nov 2012 18:07:46 +0400 Pavel Shilovsky <piastry@etersoft.ru> wrote: > 2012/11/28 Jeff Layton <jlayton@redhat.com>: > > On Wed, 28 Nov 2012 17:55:41 +0400 > > Pavel Shilovsky <piastry@etersoft.ru> wrote: > > > >> 2012/11/28 Jeff Layton <jlayton@redhat.com>: > >> > There's also a lot of logic around what sort of locking you're doing > >> > here too. I think we ought to do the same sort of I/O regardless of > >> > whether POSIX locks are being used or not. > >> > >> We can use cifs_writev for both POSIX and mandatory variants but I > >> divided them to make POSIX variant work faster (no need to check hold > >> a semaphore, walk through a lock list, etc). > >> > > > > Refresh my memory -- why do we need to handle writes differently when > > POSIX vs. non-POSIX locking is in force? It seems to me that that > > shouldn't matter and the behavior should be solely a function of what > > sort of oplock you have. > > A write request can have conflict with mandatory locks set on a file. > That's why we need to check for lock conflicts before issue the > write/read. > Hmmm...and the server might not know about the lock if it's cached. Fair enough.
2012/11/28 Jeff Layton <jlayton@redhat.com>: > On Wed, 28 Nov 2012 18:07:46 +0400 > Pavel Shilovsky <piastry@etersoft.ru> wrote: > >> 2012/11/28 Jeff Layton <jlayton@redhat.com>: >> > On Wed, 28 Nov 2012 17:55:41 +0400 >> > Pavel Shilovsky <piastry@etersoft.ru> wrote: >> > >> >> 2012/11/28 Jeff Layton <jlayton@redhat.com>: >> >> > There's also a lot of logic around what sort of locking you're doing >> >> > here too. I think we ought to do the same sort of I/O regardless of >> >> > whether POSIX locks are being used or not. >> >> >> >> We can use cifs_writev for both POSIX and mandatory variants but I >> >> divided them to make POSIX variant work faster (no need to check hold >> >> a semaphore, walk through a lock list, etc). >> >> >> > >> > Refresh my memory -- why do we need to handle writes differently when >> > POSIX vs. non-POSIX locking is in force? It seems to me that that >> > shouldn't matter and the behavior should be solely a function of what >> > sort of oplock you have. >> >> A write request can have conflict with mandatory locks set on a file. >> That's why we need to check for lock conflicts before issue the >> write/read. >> > > Hmmm...and the server might not know about the lock if it's cached. > Fair enough. Yes. Also there may be a situation when we have oplock level II and brlocks sent to the server (we can only cache brlock in exclusive oplock case): we can check locks locally too and don't send a write to the server if conflicts exist - save performance.
diff --git a/fs/cifs/file.c b/fs/cifs/file.c index f8fe1bd..89efd85 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov, */ if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) { ssize_t written; - int rc; written = generic_file_aio_write(iocb, iov, nr_segs, pos); - rc = filemap_fdatawrite(inode->i_mapping); - if (rc) - return (ssize_t)rc; - - return written; + /* try page write at first */ + if (!filemap_write_and_wait(inode->i_mapping)) + return written; + /* page write failed - try from pos to pos+len-1 */ } #endif
If we have a read oplock and set a read lock in it, we can't write to the locked area - so, filemap_fdatawrite may fail with a no information for a userspace application even if we request a write to non-locked area. Fix this by replacing it with filemap_write_and_wait call and sending non-page write in a error case. While this may end up with two write requests to the server, we can be sure that our data will be the same at the server and the page cache - the next read on this file gets the valid data. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> --- fs/cifs/file.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)