Message ID | 1571776423-32000-1-git-send-email-dwysocha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] cifs: Fix cifsInodeInfo lock_sem deadlock with multiple readers | expand |
----- Original Message ----- > From: "Dave Wysochanski" <dwysocha@redhat.com> > To: linux-cifs@vger.kernel.org > Sent: Wednesday, 23 October, 2019 6:33:43 AM > Subject: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with multiple readers > > NOTE: I have verified this fixes the problem but have not run > locking tests yet. > > There's a deadlock that is possible that can easily be seen with > multiple readers open/read/close of the same file. The deadlock > is due to a reader calling down_read(lock_sem) and holding > it across the full IO, even if a network or server disruption > occurs and the session has to be reconnected. Upon reconnect, > cifs_relock_file is called where down_read(lock_sem) is called > a second time. Normally this is not a problem, but if there is > another process that calls down_write(lock_sem) in between the > first and second reader call to down_read(lock_sem), this will > cause a deadlock. The caller of down_write (often either > _cifsFileInfo_put that is just removing and freeing cifsLockInfo > structures from the list of locks, or cifs_new_fileinfo, which > is just attaching cifs_fid_locks to cifsInodeInfo->llist), will > block due to the reader's first down_read(lock_sem) that obtains > the semaphore (read IO in flight). And then when the server > comes back up, the reader that holds calls down_read(lock_sem) > a second time, and this time is blocked too because of the > blocked in down_write (rw_semaphores would starve writers if > this was not the case). Interestingly enough, the callers of > down_write in the simple test case was not adding a > conflicting lock at all, just either opening or closing the > file, and modifying the list of locks attached to cifsInodeInfo, > this ends up tripping up the reader process and causing the > deadlock. > > The root of the problem is that lock_sem both protects the > cifsInodeInfo fields (such as the lllist - the list of locks), > but is also being re-used to avoid a conflicting lock coming > in while IO is in flight. Add a new semaphore that tracks > just the IO in flight, and must be obtained before adding > a new lock. While this does add another layer of complexity > and a semaphore ordering that must be obeyed to avoid new > deadlocks, it does clealy solve the underlying problem. The patch is hard to read(not your fault) since "patch" decided that almost all changes are in cifs_closedir() :-( You are reverting 560d388950. it is unfortunate because I think we should make either cifs_reopen_file() or cifs_strict_readv() using down_read_nested() to suppress the warnings from the validator or else we will get a lot of these log entries in dmesg (almost) everytime we get a reconnect. A different approach, could be to change _cifsFileInfo_put() to use a down_write_trylock()-sleep() loop instead of a blocking down_write() call. I.e. something like this ? (and then the same at the other places where we have a deadlock vulnerable down_write() call) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 936e03892e2a..530af080dc61 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -464,7 +464,8 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler) * Delete any outstanding lock records. We'll lose them when the file * is closed anyway. */ - down_write(&cifsi->lock_sem); + while (!down_write_trylock(&cifsi->lock_sem)) + msleep(125); list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) { list_del(&li->llist); cifs_del_lock_waiters(li); > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > --- > fs/cifs/cifsfs.c | 1 + > fs/cifs/cifsglob.h | 1 + > fs/cifs/file.c | 24 +++++++++++++++++++----- > 3 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > index c049c7b3aa87..10f614324e4e 100644 > --- a/fs/cifs/cifsfs.c > +++ b/fs/cifs/cifsfs.c > @@ -1336,6 +1336,7 @@ static ssize_t cifs_copy_file_range(struct file > *src_file, loff_t off, > > inode_init_once(&cifsi->vfs_inode); > init_rwsem(&cifsi->lock_sem); > + init_rwsem(&cifsi->io_inflight_sem); > } > > static int __init > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > index 50dfd9049370..40e8358dc1cc 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -1392,6 +1392,7 @@ struct cifsInodeInfo { > bool can_cache_brlcks; > struct list_head llist; /* locks helb by this inode */ > struct rw_semaphore lock_sem; /* protect the fields above */ > + struct rw_semaphore io_inflight_sem; /* Used to avoid lock conflicts */ > /* BB add in lists for dirty pages i.e. write caching info for oplock */ > struct list_head openFileList; > spinlock_t open_file_lock; /* protects openFileList */ > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index 5ad15de2bb4f..417baa7f5dd3 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -621,7 +621,7 @@ int cifs_open(struct inode *inode, struct file *file) > struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); > int rc = 0; > > - down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); > + down_read(&cinode->lock_sem); > if (cinode->can_cache_brlcks) { > /* can cache locks - no need to relock */ > up_read(&cinode->lock_sem); > @@ -973,6 +973,7 @@ int cifs_closedir(struct inode *inode, struct file *file) > struct cifs_fid_locks *cur; > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > + down_read(&cinode->lock_sem); > list_for_each_entry(cur, &cinode->llist, llist) { > rc = cifs_find_fid_lock_conflict(cur, offset, length, type, > flags, cfile, conf_lock, > @@ -980,6 +981,7 @@ int cifs_closedir(struct inode *inode, struct file *file) > if (rc) > break; > } > + up_read(&cinode->lock_sem); > > return rc; > } > @@ -1027,9 +1029,11 @@ int cifs_closedir(struct inode *inode, struct file > *file) > cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock) > { > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > + down_write(&cinode->io_inflight_sem); > down_write(&cinode->lock_sem); > list_add_tail(&lock->llist, &cfile->llist->locks); > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > } > > /* > @@ -1049,6 +1053,7 @@ int cifs_closedir(struct inode *inode, struct file > *file) > > try_again: > exist = false; > + down_write(&cinode->io_inflight_sem); > down_write(&cinode->lock_sem); > > exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length, > @@ -1057,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file > *file) > if (!exist && cinode->can_cache_brlcks) { > list_add_tail(&lock->llist, &cfile->llist->locks); > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > return rc; > } > > @@ -1077,6 +1083,7 @@ int cifs_closedir(struct inode *inode, struct file > *file) > } > > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > return rc; > } > > @@ -1125,14 +1132,17 @@ int cifs_closedir(struct inode *inode, struct file > *file) > return rc; > > try_again: > + down_write(&cinode->io_inflight_sem); > down_write(&cinode->lock_sem); > if (!cinode->can_cache_brlcks) { > up_write(&cinode->lock_sem); > + down_write(&cinode->io_inflight_sem); > return rc; > } > > rc = posix_lock_file(file, flock, NULL); > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > if (rc == FILE_LOCK_DEFERRED) { > rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); > if (!rc) > @@ -1331,6 +1341,7 @@ struct lock_to_push { > int rc = 0; > > /* we are going to update can_cache_brlcks here - need a write access */ > + down_write(&cinode->io_inflight_sem); > down_write(&cinode->lock_sem); > if (!cinode->can_cache_brlcks) { > up_write(&cinode->lock_sem); > @@ -1346,6 +1357,7 @@ struct lock_to_push { > > cinode->can_cache_brlcks = false; > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > return rc; > } > > @@ -1522,6 +1534,7 @@ struct lock_to_push { > if (!buf) > return -ENOMEM; > > + down_write(&cinode->io_inflight_sem); > down_write(&cinode->lock_sem); > for (i = 0; i < 2; i++) { > cur = buf; > @@ -1593,6 +1606,7 @@ struct lock_to_push { > } > > up_write(&cinode->lock_sem); > + up_write(&cinode->io_inflight_sem); > kfree(buf); > return rc; > } > @@ -3148,7 +3162,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > iov_iter *from) > * We need to hold the sem to be sure nobody modifies lock list > * with a brlock that prevents writing. > */ > - down_read(&cinode->lock_sem); > + down_read(&cinode->io_inflight_sem); > > rc = generic_write_checks(iocb, from); > if (rc <= 0) > @@ -3161,7 +3175,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > iov_iter *from) > else > rc = -EACCES; > out: > - up_read(&cinode->lock_sem); > + up_read(&cinode->io_inflight_sem); > inode_unlock(inode); > > if (rc > 0) > @@ -3887,12 +3901,12 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct > iov_iter *to) > * We need to hold the sem to be sure nobody modifies lock list > * with a brlock that prevents reading. > */ > - down_read(&cinode->lock_sem); > + down_read(&cinode->io_inflight_sem); > if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to), > tcon->ses->server->vals->shared_lock_type, > 0, NULL, CIFS_READ_OP)) > rc = generic_file_read_iter(iocb, to); > - up_read(&cinode->lock_sem); > + up_read(&cinode->io_inflight_sem); > return rc; > } > > -- > 1.8.3.1 > >
On Wed, Oct 23, 2019 at 12:50 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote: > > > > > > ----- Original Message ----- > > From: "Dave Wysochanski" <dwysocha@redhat.com> > > To: linux-cifs@vger.kernel.org > > Sent: Wednesday, 23 October, 2019 6:33:43 AM > > Subject: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with multiple readers > > > > NOTE: I have verified this fixes the problem but have not run > > locking tests yet. > > > > There's a deadlock that is possible that can easily be seen with > > multiple readers open/read/close of the same file. The deadlock > > is due to a reader calling down_read(lock_sem) and holding > > it across the full IO, even if a network or server disruption > > occurs and the session has to be reconnected. Upon reconnect, > > cifs_relock_file is called where down_read(lock_sem) is called > > a second time. Normally this is not a problem, but if there is > > another process that calls down_write(lock_sem) in between the > > first and second reader call to down_read(lock_sem), this will > > cause a deadlock. The caller of down_write (often either > > _cifsFileInfo_put that is just removing and freeing cifsLockInfo > > structures from the list of locks, or cifs_new_fileinfo, which > > is just attaching cifs_fid_locks to cifsInodeInfo->llist), will > > block due to the reader's first down_read(lock_sem) that obtains > > the semaphore (read IO in flight). And then when the server > > comes back up, the reader that holds calls down_read(lock_sem) > > a second time, and this time is blocked too because of the > > blocked in down_write (rw_semaphores would starve writers if > > this was not the case). Interestingly enough, the callers of > > down_write in the simple test case was not adding a > > conflicting lock at all, just either opening or closing the > > file, and modifying the list of locks attached to cifsInodeInfo, > > this ends up tripping up the reader process and causing the > > deadlock. > > > > The root of the problem is that lock_sem both protects the > > cifsInodeInfo fields (such as the lllist - the list of locks), > > but is also being re-used to avoid a conflicting lock coming > > in while IO is in flight. Add a new semaphore that tracks > > just the IO in flight, and must be obtained before adding > > a new lock. While this does add another layer of complexity > > and a semaphore ordering that must be obeyed to avoid new > > deadlocks, it does clealy solve the underlying problem. > > The patch is hard to read(not your fault) since "patch" decided that almost all changes are in cifs_closedir() :-( > Sorry I should have also said in the header that the patch was built on top of 5.4-rc4 plus Pavel's patch: "CIFS: Fix retry mid list corruption on reconnects" Also, there is an obvious bug in this patch where I am taking lock_sem inside cifs_find_lock_conflict but that does not work for obvious reasons - needs to be moved back to callers. FWIW, I have a v2 patch that fixes that and I have started some locking tests on. > > You are reverting 560d388950. it is unfortunate because I think we should make either cifs_reopen_file() or cifs_strict_readv() > using down_read_nested() to suppress the warnings from the validator or else we will get a lot of these log entries in dmesg > (almost) everytime we get a reconnect. > I think you misunderstood the patch or the header didn't explain that part. It is actually removing the top level holding of lock_sem across the IO, because that leads to the deadlock - the root is that one process calls down_read twice, which IMO is erroneous. Instead of leaving the code paths where down_read can be called twice, there is a new rw_semaphore, inflight_io_sem, that protects the IO from someone adding a brlock that would restrict the IO. The lock_sem still protects the same fields from modification, it just does not cover "preventing someone from adding a conflicting brlock during IO". > A different approach, could be to change _cifsFileInfo_put() to use a down_write_trylock()-sleep() loop instead of a blocking down_write() call. > > I.e. something like this ? > (and then the same at the other places where we have a deadlock vulnerable down_write() call) > Yes your approach is less lines way to go about preventing the deadlock. The biggest downsides I see is that it does not remove the code paths which call down_read twice (which is dangerous for the reason here), and it does not eliminate (but does reduce) the lock contention due to multi-use nature of lock_sem (not only does it protect modification to the fields, it also doubles as preventing the conflicting brlock during IO). I agree it is simpler and avoids adding another semaphore so it may be a better approach. For your suggested approach, I'm not sure how many places there are that really matter as far as calls to down_write. Looks like there are 9 different calls to down_write(cifsi->lock_sem), and I know in the reproducer at least 2 of them matter. I would say we should do all 9 to be thorough, not leave anything to chance, and have consistent handling of lock_sem. What do you think? > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index 936e03892e2a..530af080dc61 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -464,7 +464,8 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler) > * Delete any outstanding lock records. We'll lose them when the file > * is closed anyway. > */ > - down_write(&cifsi->lock_sem); > + while (!down_write_trylock(&cifsi->lock_sem)) > + msleep(125); > list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) { > list_del(&li->llist); > cifs_del_lock_waiters(li); > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > --- > > fs/cifs/cifsfs.c | 1 + > > fs/cifs/cifsglob.h | 1 + > > fs/cifs/file.c | 24 +++++++++++++++++++----- > > 3 files changed, 21 insertions(+), 5 deletions(-) > > > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > > index c049c7b3aa87..10f614324e4e 100644 > > --- a/fs/cifs/cifsfs.c > > +++ b/fs/cifs/cifsfs.c > > @@ -1336,6 +1336,7 @@ static ssize_t cifs_copy_file_range(struct file > > *src_file, loff_t off, > > > > inode_init_once(&cifsi->vfs_inode); > > init_rwsem(&cifsi->lock_sem); > > + init_rwsem(&cifsi->io_inflight_sem); > > } > > > > static int __init > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > > index 50dfd9049370..40e8358dc1cc 100644 > > --- a/fs/cifs/cifsglob.h > > +++ b/fs/cifs/cifsglob.h > > @@ -1392,6 +1392,7 @@ struct cifsInodeInfo { > > bool can_cache_brlcks; > > struct list_head llist; /* locks helb by this inode */ > > struct rw_semaphore lock_sem; /* protect the fields above */ > > + struct rw_semaphore io_inflight_sem; /* Used to avoid lock conflicts */ > > /* BB add in lists for dirty pages i.e. write caching info for oplock */ > > struct list_head openFileList; > > spinlock_t open_file_lock; /* protects openFileList */ > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > index 5ad15de2bb4f..417baa7f5dd3 100644 > > --- a/fs/cifs/file.c > > +++ b/fs/cifs/file.c > > @@ -621,7 +621,7 @@ int cifs_open(struct inode *inode, struct file *file) > > struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); > > int rc = 0; > > > > - down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); > > + down_read(&cinode->lock_sem); > > if (cinode->can_cache_brlcks) { > > /* can cache locks - no need to relock */ > > up_read(&cinode->lock_sem); > > @@ -973,6 +973,7 @@ int cifs_closedir(struct inode *inode, struct file *file) > > struct cifs_fid_locks *cur; > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > > > + down_read(&cinode->lock_sem); > > list_for_each_entry(cur, &cinode->llist, llist) { > > rc = cifs_find_fid_lock_conflict(cur, offset, length, type, > > flags, cfile, conf_lock, > > @@ -980,6 +981,7 @@ int cifs_closedir(struct inode *inode, struct file *file) > > if (rc) > > break; > > } > > + up_read(&cinode->lock_sem); > > > > return rc; > > } > > @@ -1027,9 +1029,11 @@ int cifs_closedir(struct inode *inode, struct file > > *file) > > cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock) > > { > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > + down_write(&cinode->io_inflight_sem); > > down_write(&cinode->lock_sem); > > list_add_tail(&lock->llist, &cfile->llist->locks); > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > } > > > > /* > > @@ -1049,6 +1053,7 @@ int cifs_closedir(struct inode *inode, struct file > > *file) > > > > try_again: > > exist = false; > > + down_write(&cinode->io_inflight_sem); > > down_write(&cinode->lock_sem); > > > > exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length, > > @@ -1057,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file > > *file) > > if (!exist && cinode->can_cache_brlcks) { > > list_add_tail(&lock->llist, &cfile->llist->locks); > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > return rc; > > } > > > > @@ -1077,6 +1083,7 @@ int cifs_closedir(struct inode *inode, struct file > > *file) > > } > > > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > return rc; > > } > > > > @@ -1125,14 +1132,17 @@ int cifs_closedir(struct inode *inode, struct file > > *file) > > return rc; > > > > try_again: > > + down_write(&cinode->io_inflight_sem); > > down_write(&cinode->lock_sem); > > if (!cinode->can_cache_brlcks) { > > up_write(&cinode->lock_sem); > > + down_write(&cinode->io_inflight_sem); > > return rc; > > } > > > > rc = posix_lock_file(file, flock, NULL); > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > if (rc == FILE_LOCK_DEFERRED) { > > rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); > > if (!rc) > > @@ -1331,6 +1341,7 @@ struct lock_to_push { > > int rc = 0; > > > > /* we are going to update can_cache_brlcks here - need a write access */ > > + down_write(&cinode->io_inflight_sem); > > down_write(&cinode->lock_sem); > > if (!cinode->can_cache_brlcks) { > > up_write(&cinode->lock_sem); > > @@ -1346,6 +1357,7 @@ struct lock_to_push { > > > > cinode->can_cache_brlcks = false; > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > return rc; > > } > > > > @@ -1522,6 +1534,7 @@ struct lock_to_push { > > if (!buf) > > return -ENOMEM; > > > > + down_write(&cinode->io_inflight_sem); > > down_write(&cinode->lock_sem); > > for (i = 0; i < 2; i++) { > > cur = buf; > > @@ -1593,6 +1606,7 @@ struct lock_to_push { > > } > > > > up_write(&cinode->lock_sem); > > + up_write(&cinode->io_inflight_sem); > > kfree(buf); > > return rc; > > } > > @@ -3148,7 +3162,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > iov_iter *from) > > * We need to hold the sem to be sure nobody modifies lock list > > * with a brlock that prevents writing. > > */ > > - down_read(&cinode->lock_sem); > > + down_read(&cinode->io_inflight_sem); > > > > rc = generic_write_checks(iocb, from); > > if (rc <= 0) > > @@ -3161,7 +3175,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > iov_iter *from) > > else > > rc = -EACCES; > > out: > > - up_read(&cinode->lock_sem); > > + up_read(&cinode->io_inflight_sem); > > inode_unlock(inode); > > > > if (rc > 0) > > @@ -3887,12 +3901,12 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct > > iov_iter *to) > > * We need to hold the sem to be sure nobody modifies lock list > > * with a brlock that prevents reading. > > */ > > - down_read(&cinode->lock_sem); > > + down_read(&cinode->io_inflight_sem); > > if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to), > > tcon->ses->server->vals->shared_lock_type, > > 0, NULL, CIFS_READ_OP)) > > rc = generic_file_read_iter(iocb, to); > > - up_read(&cinode->lock_sem); > > + up_read(&cinode->io_inflight_sem); > > return rc; > > } > > > > -- > > 1.8.3.1 > > > >
----- Original Message ----- > From: "David Wysochanski" <dwysocha@redhat.com> > To: "Ronnie Sahlberg" <lsahlber@redhat.com> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > Sent: Wednesday, 23 October, 2019 6:35:34 PM > Subject: Re: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with multiple readers > > On Wed, Oct 23, 2019 at 12:50 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote: > > > > > > > > > > > > ----- Original Message ----- > > > From: "Dave Wysochanski" <dwysocha@redhat.com> > > > To: linux-cifs@vger.kernel.org > > > Sent: Wednesday, 23 October, 2019 6:33:43 AM > > > Subject: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with > > > multiple readers > > > > > > NOTE: I have verified this fixes the problem but have not run > > > locking tests yet. > > > > > > There's a deadlock that is possible that can easily be seen with > > > multiple readers open/read/close of the same file. The deadlock > > > is due to a reader calling down_read(lock_sem) and holding > > > it across the full IO, even if a network or server disruption > > > occurs and the session has to be reconnected. Upon reconnect, > > > cifs_relock_file is called where down_read(lock_sem) is called > > > a second time. Normally this is not a problem, but if there is > > > another process that calls down_write(lock_sem) in between the > > > first and second reader call to down_read(lock_sem), this will > > > cause a deadlock. The caller of down_write (often either > > > _cifsFileInfo_put that is just removing and freeing cifsLockInfo > > > structures from the list of locks, or cifs_new_fileinfo, which > > > is just attaching cifs_fid_locks to cifsInodeInfo->llist), will > > > block due to the reader's first down_read(lock_sem) that obtains > > > the semaphore (read IO in flight). And then when the server > > > comes back up, the reader that holds calls down_read(lock_sem) > > > a second time, and this time is blocked too because of the > > > blocked in down_write (rw_semaphores would starve writers if > > > this was not the case). Interestingly enough, the callers of > > > down_write in the simple test case was not adding a > > > conflicting lock at all, just either opening or closing the > > > file, and modifying the list of locks attached to cifsInodeInfo, > > > this ends up tripping up the reader process and causing the > > > deadlock. > > > > > > The root of the problem is that lock_sem both protects the > > > cifsInodeInfo fields (such as the lllist - the list of locks), > > > but is also being re-used to avoid a conflicting lock coming > > > in while IO is in flight. Add a new semaphore that tracks > > > just the IO in flight, and must be obtained before adding > > > a new lock. While this does add another layer of complexity > > > and a semaphore ordering that must be obeyed to avoid new > > > deadlocks, it does clealy solve the underlying problem. > > > > The patch is hard to read(not your fault) since "patch" decided that almost > > all changes are in cifs_closedir() :-( > > > > Sorry I should have also said in the header that the patch was built > on top of 5.4-rc4 plus Pavel's patch: > "CIFS: Fix retry mid list corruption on reconnects" > > Also, there is an obvious bug in this patch where I am taking lock_sem > inside cifs_find_lock_conflict but that > does not work for obvious reasons - needs to be moved back to callers. > FWIW, I have a v2 patch that fixes > that and I have started some locking tests on. > > > > > You are reverting 560d388950. it is unfortunate because I think we should > > make either cifs_reopen_file() or cifs_strict_readv() > > using down_read_nested() to suppress the warnings from the validator or > > else we will get a lot of these log entries in dmesg > > (almost) everytime we get a reconnect. > > > > I think you misunderstood the patch or the header didn't explain that > part. It is actually removing the top level holding of > lock_sem across the IO, because that leads to the deadlock - the root > is that one process calls down_read twice, which > IMO is erroneous. > > Instead of leaving the code paths where down_read can be called twice, > there is a new rw_semaphore, inflight_io_sem, that > protects the IO from someone adding a brlock that would restrict the > IO. The lock_sem still protects the same fields from > modification, it just does not cover "preventing someone from adding a > conflicting brlock during IO". > > > > A different approach, could be to change _cifsFileInfo_put() to use a > > down_write_trylock()-sleep() loop instead of a blocking down_write() call. > > > > I.e. something like this ? > > (and then the same at the other places where we have a deadlock vulnerable > > down_write() call) > > > > Yes your approach is less lines way to go about preventing the > deadlock. The biggest downsides I see is that it does not > remove the code paths which call down_read twice (which is dangerous > for the reason here), and it does not eliminate > (but does reduce) the lock contention due to multi-use nature of > lock_sem (not only does it protect modification to the > fields, it also doubles as preventing the conflicting brlock during > IO). I agree it is simpler and avoids adding another > semaphore so it may be a better approach. > > For your suggested approach, I'm not sure how many places there are > that really matter as far as calls to down_write. > Looks like there are 9 different calls to down_write(cifsi->lock_sem), > and I know in the reproducer at least 2 of them matter. > I would say we should do all 9 to be thorough, not leave anything to > chance, and have consistent handling of lock_sem. > What do you think? I agree 100%. Lets do it for all 9 instances and add a comment to the header where we specify that we must always use a down_write_trylock()/msleep() instead of down_write() for this because we may need to call down_read() recursively in some codepaths. Doing it for all places whether we strictly need to or not, today, is most future proof since the number of places where we MUST do this can change in the future with semi-unrelated patches and then we would accidentally reintroduce the deadlock. Can you try a simple patch where we do this for all 9 places and see if fixes the deadlock? And send to the list if it works? Since the patch is so trivial being just a - down_write() + if (!down_write_trylock()) + msleep() it will be trivial to backport it to earlier versions. regards ronnie sahlberg > > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > index 936e03892e2a..530af080dc61 100644 > > --- a/fs/cifs/file.c > > +++ b/fs/cifs/file.c > > @@ -464,7 +464,8 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, > > bool wait_oplock_handler) > > * Delete any outstanding lock records. We'll lose them when the > > file > > * is closed anyway. > > */ > > - down_write(&cifsi->lock_sem); > > + while (!down_write_trylock(&cifsi->lock_sem)) > > + msleep(125); > > list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) > > { > > list_del(&li->llist); > > cifs_del_lock_waiters(li); > > > > > > > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > > --- > > > fs/cifs/cifsfs.c | 1 + > > > fs/cifs/cifsglob.h | 1 + > > > fs/cifs/file.c | 24 +++++++++++++++++++----- > > > 3 files changed, 21 insertions(+), 5 deletions(-) > > > > > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > > > index c049c7b3aa87..10f614324e4e 100644 > > > --- a/fs/cifs/cifsfs.c > > > +++ b/fs/cifs/cifsfs.c > > > @@ -1336,6 +1336,7 @@ static ssize_t cifs_copy_file_range(struct file > > > *src_file, loff_t off, > > > > > > inode_init_once(&cifsi->vfs_inode); > > > init_rwsem(&cifsi->lock_sem); > > > + init_rwsem(&cifsi->io_inflight_sem); > > > } > > > > > > static int __init > > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > > > index 50dfd9049370..40e8358dc1cc 100644 > > > --- a/fs/cifs/cifsglob.h > > > +++ b/fs/cifs/cifsglob.h > > > @@ -1392,6 +1392,7 @@ struct cifsInodeInfo { > > > bool can_cache_brlcks; > > > struct list_head llist; /* locks helb by this inode */ > > > struct rw_semaphore lock_sem; /* protect the fields above */ > > > + struct rw_semaphore io_inflight_sem; /* Used to avoid lock > > > conflicts */ > > > /* BB add in lists for dirty pages i.e. write caching info for > > > oplock */ > > > struct list_head openFileList; > > > spinlock_t open_file_lock; /* protects openFileList */ > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > > index 5ad15de2bb4f..417baa7f5dd3 100644 > > > --- a/fs/cifs/file.c > > > +++ b/fs/cifs/file.c > > > @@ -621,7 +621,7 @@ int cifs_open(struct inode *inode, struct file *file) > > > struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); > > > int rc = 0; > > > > > > - down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); > > > + down_read(&cinode->lock_sem); > > > if (cinode->can_cache_brlcks) { > > > /* can cache locks - no need to relock */ > > > up_read(&cinode->lock_sem); > > > @@ -973,6 +973,7 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > struct cifs_fid_locks *cur; > > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > > > > > + down_read(&cinode->lock_sem); > > > list_for_each_entry(cur, &cinode->llist, llist) { > > > rc = cifs_find_fid_lock_conflict(cur, offset, length, type, > > > flags, cfile, conf_lock, > > > @@ -980,6 +981,7 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > if (rc) > > > break; > > > } > > > + up_read(&cinode->lock_sem); > > > > > > return rc; > > > } > > > @@ -1027,9 +1029,11 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock) > > > { > > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > > + down_write(&cinode->io_inflight_sem); > > > down_write(&cinode->lock_sem); > > > list_add_tail(&lock->llist, &cfile->llist->locks); > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > } > > > > > > /* > > > @@ -1049,6 +1053,7 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > > > > try_again: > > > exist = false; > > > + down_write(&cinode->io_inflight_sem); > > > down_write(&cinode->lock_sem); > > > > > > exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length, > > > @@ -1057,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > if (!exist && cinode->can_cache_brlcks) { > > > list_add_tail(&lock->llist, &cfile->llist->locks); > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > return rc; > > > } > > > > > > @@ -1077,6 +1083,7 @@ int cifs_closedir(struct inode *inode, struct file > > > *file) > > > } > > > > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > return rc; > > > } > > > > > > @@ -1125,14 +1132,17 @@ int cifs_closedir(struct inode *inode, struct > > > file > > > *file) > > > return rc; > > > > > > try_again: > > > + down_write(&cinode->io_inflight_sem); > > > down_write(&cinode->lock_sem); > > > if (!cinode->can_cache_brlcks) { > > > up_write(&cinode->lock_sem); > > > + down_write(&cinode->io_inflight_sem); > > > return rc; > > > } > > > > > > rc = posix_lock_file(file, flock, NULL); > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > if (rc == FILE_LOCK_DEFERRED) { > > > rc = wait_event_interruptible(flock->fl_wait, > > > !flock->fl_blocker); > > > if (!rc) > > > @@ -1331,6 +1341,7 @@ struct lock_to_push { > > > int rc = 0; > > > > > > /* we are going to update can_cache_brlcks here - need a write > > > access */ > > > + down_write(&cinode->io_inflight_sem); > > > down_write(&cinode->lock_sem); > > > if (!cinode->can_cache_brlcks) { > > > up_write(&cinode->lock_sem); > > > @@ -1346,6 +1357,7 @@ struct lock_to_push { > > > > > > cinode->can_cache_brlcks = false; > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > return rc; > > > } > > > > > > @@ -1522,6 +1534,7 @@ struct lock_to_push { > > > if (!buf) > > > return -ENOMEM; > > > > > > + down_write(&cinode->io_inflight_sem); > > > down_write(&cinode->lock_sem); > > > for (i = 0; i < 2; i++) { > > > cur = buf; > > > @@ -1593,6 +1606,7 @@ struct lock_to_push { > > > } > > > > > > up_write(&cinode->lock_sem); > > > + up_write(&cinode->io_inflight_sem); > > > kfree(buf); > > > return rc; > > > } > > > @@ -3148,7 +3162,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > > iov_iter *from) > > > * We need to hold the sem to be sure nobody modifies lock list > > > * with a brlock that prevents writing. > > > */ > > > - down_read(&cinode->lock_sem); > > > + down_read(&cinode->io_inflight_sem); > > > > > > rc = generic_write_checks(iocb, from); > > > if (rc <= 0) > > > @@ -3161,7 +3175,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > > iov_iter *from) > > > else > > > rc = -EACCES; > > > out: > > > - up_read(&cinode->lock_sem); > > > + up_read(&cinode->io_inflight_sem); > > > inode_unlock(inode); > > > > > > if (rc > 0) > > > @@ -3887,12 +3901,12 @@ ssize_t cifs_user_readv(struct kiocb *iocb, > > > struct > > > iov_iter *to) > > > * We need to hold the sem to be sure nobody modifies lock list > > > * with a brlock that prevents reading. > > > */ > > > - down_read(&cinode->lock_sem); > > > + down_read(&cinode->io_inflight_sem); > > > if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, > > > iov_iter_count(to), > > > tcon->ses->server->vals->shared_lock_type, > > > 0, NULL, CIFS_READ_OP)) > > > rc = generic_file_read_iter(iocb, to); > > > - up_read(&cinode->lock_sem); > > > + up_read(&cinode->io_inflight_sem); > > > return rc; > > > } > > > > > > -- > > > 1.8.3.1 > > > > > > >
On Wed, Oct 23, 2019 at 5:01 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote: > > > > > > ----- Original Message ----- > > From: "David Wysochanski" <dwysocha@redhat.com> > > To: "Ronnie Sahlberg" <lsahlber@redhat.com> > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > > Sent: Wednesday, 23 October, 2019 6:35:34 PM > > Subject: Re: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with multiple readers > > > > On Wed, Oct 23, 2019 at 12:50 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote: > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Dave Wysochanski" <dwysocha@redhat.com> > > > > To: linux-cifs@vger.kernel.org > > > > Sent: Wednesday, 23 October, 2019 6:33:43 AM > > > > Subject: [RFC PATCH] cifs: Fix cifsInodeInfo lock_sem deadlock with > > > > multiple readers > > > > > > > > NOTE: I have verified this fixes the problem but have not run > > > > locking tests yet. > > > > > > > > There's a deadlock that is possible that can easily be seen with > > > > multiple readers open/read/close of the same file. The deadlock > > > > is due to a reader calling down_read(lock_sem) and holding > > > > it across the full IO, even if a network or server disruption > > > > occurs and the session has to be reconnected. Upon reconnect, > > > > cifs_relock_file is called where down_read(lock_sem) is called > > > > a second time. Normally this is not a problem, but if there is > > > > another process that calls down_write(lock_sem) in between the > > > > first and second reader call to down_read(lock_sem), this will > > > > cause a deadlock. The caller of down_write (often either > > > > _cifsFileInfo_put that is just removing and freeing cifsLockInfo > > > > structures from the list of locks, or cifs_new_fileinfo, which > > > > is just attaching cifs_fid_locks to cifsInodeInfo->llist), will > > > > block due to the reader's first down_read(lock_sem) that obtains > > > > the semaphore (read IO in flight). And then when the server > > > > comes back up, the reader that holds calls down_read(lock_sem) > > > > a second time, and this time is blocked too because of the > > > > blocked in down_write (rw_semaphores would starve writers if > > > > this was not the case). Interestingly enough, the callers of > > > > down_write in the simple test case was not adding a > > > > conflicting lock at all, just either opening or closing the > > > > file, and modifying the list of locks attached to cifsInodeInfo, > > > > this ends up tripping up the reader process and causing the > > > > deadlock. > > > > > > > > The root of the problem is that lock_sem both protects the > > > > cifsInodeInfo fields (such as the lllist - the list of locks), > > > > but is also being re-used to avoid a conflicting lock coming > > > > in while IO is in flight. Add a new semaphore that tracks > > > > just the IO in flight, and must be obtained before adding > > > > a new lock. While this does add another layer of complexity > > > > and a semaphore ordering that must be obeyed to avoid new > > > > deadlocks, it does clealy solve the underlying problem. > > > > > > The patch is hard to read(not your fault) since "patch" decided that almost > > > all changes are in cifs_closedir() :-( > > > > > > > Sorry I should have also said in the header that the patch was built > > on top of 5.4-rc4 plus Pavel's patch: > > "CIFS: Fix retry mid list corruption on reconnects" > > > > Also, there is an obvious bug in this patch where I am taking lock_sem > > inside cifs_find_lock_conflict but that > > does not work for obvious reasons - needs to be moved back to callers. > > FWIW, I have a v2 patch that fixes > > that and I have started some locking tests on. > > > > > > > > You are reverting 560d388950. it is unfortunate because I think we should > > > make either cifs_reopen_file() or cifs_strict_readv() > > > using down_read_nested() to suppress the warnings from the validator or > > > else we will get a lot of these log entries in dmesg > > > (almost) everytime we get a reconnect. > > > > > > > I think you misunderstood the patch or the header didn't explain that > > part. It is actually removing the top level holding of > > lock_sem across the IO, because that leads to the deadlock - the root > > is that one process calls down_read twice, which > > IMO is erroneous. > > > > Instead of leaving the code paths where down_read can be called twice, > > there is a new rw_semaphore, inflight_io_sem, that > > protects the IO from someone adding a brlock that would restrict the > > IO. The lock_sem still protects the same fields from > > modification, it just does not cover "preventing someone from adding a > > conflicting brlock during IO". > > > > > > > A different approach, could be to change _cifsFileInfo_put() to use a > > > down_write_trylock()-sleep() loop instead of a blocking down_write() call. > > > > > > I.e. something like this ? > > > (and then the same at the other places where we have a deadlock vulnerable > > > down_write() call) > > > > > > > Yes your approach is less lines way to go about preventing the > > deadlock. The biggest downsides I see is that it does not > > remove the code paths which call down_read twice (which is dangerous > > for the reason here), and it does not eliminate > > (but does reduce) the lock contention due to multi-use nature of > > lock_sem (not only does it protect modification to the > > fields, it also doubles as preventing the conflicting brlock during > > IO). I agree it is simpler and avoids adding another > > semaphore so it may be a better approach. > > > > For your suggested approach, I'm not sure how many places there are > > that really matter as far as calls to down_write. > > Looks like there are 9 different calls to down_write(cifsi->lock_sem), > > and I know in the reproducer at least 2 of them matter. > > > > > I would say we should do all 9 to be thorough, not leave anything to > > chance, and have consistent handling of lock_sem. > > What do you think? > > I agree 100%. > > Lets do it for all 9 instances and add a comment to the header where we specify that > we must always use a down_write_trylock()/msleep() instead of down_write() for this because we may need to call down_read() recursively in some > codepaths. > Ok. > Doing it for all places whether we strictly need to or not, today, is most future proof since the number of places where we MUST > do this can change in the future with semi-unrelated patches and then we would accidentally reintroduce the deadlock. > Yes that is right, I agree. > > Can you try a simple patch where we do this for all 9 places and see if fixes the deadlock? > And send to the list if it works? > > Since the patch is so trivial being just a > - down_write() > + if (!down_write_trylock()) > + msleep() > it will be trivial to backport it to earlier versions. > > Yes I already did the sample patch like this changing all 9 places and anticipating it may be a preferred patch. I'm testing it now and if it all goes well I'll write-up a nice header and post it. > > regards > ronnie sahlberg > > > > > > > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > > index 936e03892e2a..530af080dc61 100644 > > > --- a/fs/cifs/file.c > > > +++ b/fs/cifs/file.c > > > @@ -464,7 +464,8 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, > > > bool wait_oplock_handler) > > > * Delete any outstanding lock records. We'll lose them when the > > > file > > > * is closed anyway. > > > */ > > > - down_write(&cifsi->lock_sem); > > > + while (!down_write_trylock(&cifsi->lock_sem)) > > > + msleep(125); > > > list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) > > > { > > > list_del(&li->llist); > > > cifs_del_lock_waiters(li); > > > > > > > > > > > > > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > > > --- > > > > fs/cifs/cifsfs.c | 1 + > > > > fs/cifs/cifsglob.h | 1 + > > > > fs/cifs/file.c | 24 +++++++++++++++++++----- > > > > 3 files changed, 21 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > > > > index c049c7b3aa87..10f614324e4e 100644 > > > > --- a/fs/cifs/cifsfs.c > > > > +++ b/fs/cifs/cifsfs.c > > > > @@ -1336,6 +1336,7 @@ static ssize_t cifs_copy_file_range(struct file > > > > *src_file, loff_t off, > > > > > > > > inode_init_once(&cifsi->vfs_inode); > > > > init_rwsem(&cifsi->lock_sem); > > > > + init_rwsem(&cifsi->io_inflight_sem); > > > > } > > > > > > > > static int __init > > > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > > > > index 50dfd9049370..40e8358dc1cc 100644 > > > > --- a/fs/cifs/cifsglob.h > > > > +++ b/fs/cifs/cifsglob.h > > > > @@ -1392,6 +1392,7 @@ struct cifsInodeInfo { > > > > bool can_cache_brlcks; > > > > struct list_head llist; /* locks helb by this inode */ > > > > struct rw_semaphore lock_sem; /* protect the fields above */ > > > > + struct rw_semaphore io_inflight_sem; /* Used to avoid lock > > > > conflicts */ > > > > /* BB add in lists for dirty pages i.e. write caching info for > > > > oplock */ > > > > struct list_head openFileList; > > > > spinlock_t open_file_lock; /* protects openFileList */ > > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > > > index 5ad15de2bb4f..417baa7f5dd3 100644 > > > > --- a/fs/cifs/file.c > > > > +++ b/fs/cifs/file.c > > > > @@ -621,7 +621,7 @@ int cifs_open(struct inode *inode, struct file *file) > > > > struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); > > > > int rc = 0; > > > > > > > > - down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); > > > > + down_read(&cinode->lock_sem); > > > > if (cinode->can_cache_brlcks) { > > > > /* can cache locks - no need to relock */ > > > > up_read(&cinode->lock_sem); > > > > @@ -973,6 +973,7 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > struct cifs_fid_locks *cur; > > > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > > > > > > > + down_read(&cinode->lock_sem); > > > > list_for_each_entry(cur, &cinode->llist, llist) { > > > > rc = cifs_find_fid_lock_conflict(cur, offset, length, type, > > > > flags, cfile, conf_lock, > > > > @@ -980,6 +981,7 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > if (rc) > > > > break; > > > > } > > > > + up_read(&cinode->lock_sem); > > > > > > > > return rc; > > > > } > > > > @@ -1027,9 +1029,11 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock) > > > > { > > > > struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); > > > > + down_write(&cinode->io_inflight_sem); > > > > down_write(&cinode->lock_sem); > > > > list_add_tail(&lock->llist, &cfile->llist->locks); > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > } > > > > > > > > /* > > > > @@ -1049,6 +1053,7 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > > > > > try_again: > > > > exist = false; > > > > + down_write(&cinode->io_inflight_sem); > > > > down_write(&cinode->lock_sem); > > > > > > > > exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length, > > > > @@ -1057,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > if (!exist && cinode->can_cache_brlcks) { > > > > list_add_tail(&lock->llist, &cfile->llist->locks); > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > return rc; > > > > } > > > > > > > > @@ -1077,6 +1083,7 @@ int cifs_closedir(struct inode *inode, struct file > > > > *file) > > > > } > > > > > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > return rc; > > > > } > > > > > > > > @@ -1125,14 +1132,17 @@ int cifs_closedir(struct inode *inode, struct > > > > file > > > > *file) > > > > return rc; > > > > > > > > try_again: > > > > + down_write(&cinode->io_inflight_sem); > > > > down_write(&cinode->lock_sem); > > > > if (!cinode->can_cache_brlcks) { > > > > up_write(&cinode->lock_sem); > > > > + down_write(&cinode->io_inflight_sem); > > > > return rc; > > > > } > > > > > > > > rc = posix_lock_file(file, flock, NULL); > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > if (rc == FILE_LOCK_DEFERRED) { > > > > rc = wait_event_interruptible(flock->fl_wait, > > > > !flock->fl_blocker); > > > > if (!rc) > > > > @@ -1331,6 +1341,7 @@ struct lock_to_push { > > > > int rc = 0; > > > > > > > > /* we are going to update can_cache_brlcks here - need a write > > > > access */ > > > > + down_write(&cinode->io_inflight_sem); > > > > down_write(&cinode->lock_sem); > > > > if (!cinode->can_cache_brlcks) { > > > > up_write(&cinode->lock_sem); > > > > @@ -1346,6 +1357,7 @@ struct lock_to_push { > > > > > > > > cinode->can_cache_brlcks = false; > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > return rc; > > > > } > > > > > > > > @@ -1522,6 +1534,7 @@ struct lock_to_push { > > > > if (!buf) > > > > return -ENOMEM; > > > > > > > > + down_write(&cinode->io_inflight_sem); > > > > down_write(&cinode->lock_sem); > > > > for (i = 0; i < 2; i++) { > > > > cur = buf; > > > > @@ -1593,6 +1606,7 @@ struct lock_to_push { > > > > } > > > > > > > > up_write(&cinode->lock_sem); > > > > + up_write(&cinode->io_inflight_sem); > > > > kfree(buf); > > > > return rc; > > > > } > > > > @@ -3148,7 +3162,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > > > iov_iter *from) > > > > * We need to hold the sem to be sure nobody modifies lock list > > > > * with a brlock that prevents writing. > > > > */ > > > > - down_read(&cinode->lock_sem); > > > > + down_read(&cinode->io_inflight_sem); > > > > > > > > rc = generic_write_checks(iocb, from); > > > > if (rc <= 0) > > > > @@ -3161,7 +3175,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct > > > > iov_iter *from) > > > > else > > > > rc = -EACCES; > > > > out: > > > > - up_read(&cinode->lock_sem); > > > > + up_read(&cinode->io_inflight_sem); > > > > inode_unlock(inode); > > > > > > > > if (rc > 0) > > > > @@ -3887,12 +3901,12 @@ ssize_t cifs_user_readv(struct kiocb *iocb, > > > > struct > > > > iov_iter *to) > > > > * We need to hold the sem to be sure nobody modifies lock list > > > > * with a brlock that prevents reading. > > > > */ > > > > - down_read(&cinode->lock_sem); > > > > + down_read(&cinode->io_inflight_sem); > > > > if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, > > > > iov_iter_count(to), > > > > tcon->ses->server->vals->shared_lock_type, > > > > 0, NULL, CIFS_READ_OP)) > > > > rc = generic_file_read_iter(iocb, to); > > > > - up_read(&cinode->lock_sem); > > > > + up_read(&cinode->io_inflight_sem); > > > > return rc; > > > > } > > > > > > > > -- > > > > 1.8.3.1 > > > > > > > > > >
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index c049c7b3aa87..10f614324e4e 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -1336,6 +1336,7 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off, inode_init_once(&cifsi->vfs_inode); init_rwsem(&cifsi->lock_sem); + init_rwsem(&cifsi->io_inflight_sem); } static int __init diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 50dfd9049370..40e8358dc1cc 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1392,6 +1392,7 @@ struct cifsInodeInfo { bool can_cache_brlcks; struct list_head llist; /* locks helb by this inode */ struct rw_semaphore lock_sem; /* protect the fields above */ + struct rw_semaphore io_inflight_sem; /* Used to avoid lock conflicts */ /* BB add in lists for dirty pages i.e. write caching info for oplock */ struct list_head openFileList; spinlock_t open_file_lock; /* protects openFileList */ diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 5ad15de2bb4f..417baa7f5dd3 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -621,7 +621,7 @@ int cifs_open(struct inode *inode, struct file *file) struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); int rc = 0; - down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); + down_read(&cinode->lock_sem); if (cinode->can_cache_brlcks) { /* can cache locks - no need to relock */ up_read(&cinode->lock_sem); @@ -973,6 +973,7 @@ int cifs_closedir(struct inode *inode, struct file *file) struct cifs_fid_locks *cur; struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); + down_read(&cinode->lock_sem); list_for_each_entry(cur, &cinode->llist, llist) { rc = cifs_find_fid_lock_conflict(cur, offset, length, type, flags, cfile, conf_lock, @@ -980,6 +981,7 @@ int cifs_closedir(struct inode *inode, struct file *file) if (rc) break; } + up_read(&cinode->lock_sem); return rc; } @@ -1027,9 +1029,11 @@ int cifs_closedir(struct inode *inode, struct file *file) cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock) { struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry)); + down_write(&cinode->io_inflight_sem); down_write(&cinode->lock_sem); list_add_tail(&lock->llist, &cfile->llist->locks); up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); } /* @@ -1049,6 +1053,7 @@ int cifs_closedir(struct inode *inode, struct file *file) try_again: exist = false; + down_write(&cinode->io_inflight_sem); down_write(&cinode->lock_sem); exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length, @@ -1057,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file *file) if (!exist && cinode->can_cache_brlcks) { list_add_tail(&lock->llist, &cfile->llist->locks); up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); return rc; } @@ -1077,6 +1083,7 @@ int cifs_closedir(struct inode *inode, struct file *file) } up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); return rc; } @@ -1125,14 +1132,17 @@ int cifs_closedir(struct inode *inode, struct file *file) return rc; try_again: + down_write(&cinode->io_inflight_sem); down_write(&cinode->lock_sem); if (!cinode->can_cache_brlcks) { up_write(&cinode->lock_sem); + down_write(&cinode->io_inflight_sem); return rc; } rc = posix_lock_file(file, flock, NULL); up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); if (rc == FILE_LOCK_DEFERRED) { rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); if (!rc) @@ -1331,6 +1341,7 @@ struct lock_to_push { int rc = 0; /* we are going to update can_cache_brlcks here - need a write access */ + down_write(&cinode->io_inflight_sem); down_write(&cinode->lock_sem); if (!cinode->can_cache_brlcks) { up_write(&cinode->lock_sem); @@ -1346,6 +1357,7 @@ struct lock_to_push { cinode->can_cache_brlcks = false; up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); return rc; } @@ -1522,6 +1534,7 @@ struct lock_to_push { if (!buf) return -ENOMEM; + down_write(&cinode->io_inflight_sem); down_write(&cinode->lock_sem); for (i = 0; i < 2; i++) { cur = buf; @@ -1593,6 +1606,7 @@ struct lock_to_push { } up_write(&cinode->lock_sem); + up_write(&cinode->io_inflight_sem); kfree(buf); return rc; } @@ -3148,7 +3162,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from) * We need to hold the sem to be sure nobody modifies lock list * with a brlock that prevents writing. */ - down_read(&cinode->lock_sem); + down_read(&cinode->io_inflight_sem); rc = generic_write_checks(iocb, from); if (rc <= 0) @@ -3161,7 +3175,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from) else rc = -EACCES; out: - up_read(&cinode->lock_sem); + up_read(&cinode->io_inflight_sem); inode_unlock(inode); if (rc > 0) @@ -3887,12 +3901,12 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) * We need to hold the sem to be sure nobody modifies lock list * with a brlock that prevents reading. */ - down_read(&cinode->lock_sem); + down_read(&cinode->io_inflight_sem); if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to), tcon->ses->server->vals->shared_lock_type, 0, NULL, CIFS_READ_OP)) rc = generic_file_read_iter(iocb, to); - up_read(&cinode->lock_sem); + up_read(&cinode->io_inflight_sem); return rc; }
NOTE: I have verified this fixes the problem but have not run locking tests yet. There's a deadlock that is possible that can easily be seen with multiple readers open/read/close of the same file. The deadlock is due to a reader calling down_read(lock_sem) and holding it across the full IO, even if a network or server disruption occurs and the session has to be reconnected. Upon reconnect, cifs_relock_file is called where down_read(lock_sem) is called a second time. Normally this is not a problem, but if there is another process that calls down_write(lock_sem) in between the first and second reader call to down_read(lock_sem), this will cause a deadlock. The caller of down_write (often either _cifsFileInfo_put that is just removing and freeing cifsLockInfo structures from the list of locks, or cifs_new_fileinfo, which is just attaching cifs_fid_locks to cifsInodeInfo->llist), will block due to the reader's first down_read(lock_sem) that obtains the semaphore (read IO in flight). And then when the server comes back up, the reader that holds calls down_read(lock_sem) a second time, and this time is blocked too because of the blocked in down_write (rw_semaphores would starve writers if this was not the case). Interestingly enough, the callers of down_write in the simple test case was not adding a conflicting lock at all, just either opening or closing the file, and modifying the list of locks attached to cifsInodeInfo, this ends up tripping up the reader process and causing the deadlock. The root of the problem is that lock_sem both protects the cifsInodeInfo fields (such as the lllist - the list of locks), but is also being re-used to avoid a conflicting lock coming in while IO is in flight. Add a new semaphore that tracks just the IO in flight, and must be obtained before adding a new lock. While this does add another layer of complexity and a semaphore ordering that must be obeyed to avoid new deadlocks, it does clealy solve the underlying problem. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> --- fs/cifs/cifsfs.c | 1 + fs/cifs/cifsglob.h | 1 + fs/cifs/file.c | 24 +++++++++++++++++++----- 3 files changed, 21 insertions(+), 5 deletions(-)