Message ID | 1434970396-19644-1-git-send-email-a.sangwan@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
+ Lino Sanfilippo > > For deleting the fsnotify_mark related with an inode, there are 2 paths in the > kernel. When the inotify fd is closed, all the marks belonging to a group are > removed one by one in fsnotify_clear_marks_by_group_flags. Other path is when > the inode is removed from user space by unlink, fsnotify_destroy_mark is > called to delete a single mark. > There is a race between these 2 paths which is caused due to the temporary > release of the mark_mutex inside fsnotify_destroy_mark_locked. > The race happen when the inotify app monitoring the file(s) exits, triggering > fsnotify_clear_marks_by_group_flags to delete the marks. > This function use lmark pointer to move to the next node after a safe removal > of the node. In parallel, if there is rm call for a file and such that the > lmark is pointing to the mark which is removed by this rm call, lmark ends up > pointing to a freed memory. Now, when we try to move to the next node using > lmark, it triggers an invalid virtual address crash. > Although fsnotify_clear_marks_by_group_flags and fsnotify_destroy_mark are > synchronized by mark_mutex, but both of these functions call > fsnotify_destroy_mark_locked which release the mark_mutex and acquire it again > creating a subtle race window. There seems to be no reason for releasing > mark_mutex, so this patch remove the mutex_unlock call. > > Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> > Reviewed-by: Amit Sahrawat <a.sahrawat@samsung.com> > --- > fs/notify/mark.c | 4 ---- > 1 files changed, 0 insertions(+), 4 deletions(-) > > diff --git a/fs/notify/mark.c b/fs/notify/mark.c > index 92e48c7..4ee419f 100755 > --- a/fs/notify/mark.c > +++ b/fs/notify/mark.c > @@ -157,8 +157,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, > > if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) > iput(inode); > - /* release lock temporarily */ > - mutex_unlock(&group->mark_mutex); > > spin_lock(&destroy_lock); > list_add(&mark->g_list, &destroy_list); > @@ -191,8 +189,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, > */ > > atomic_dec(&group->num_marks); > - > - mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); > } > > void fsnotify_destroy_mark(struct fsnotify_mark *mark, > -- > 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
On Mon 22-06-15 16:23:16, Ashish Sangwan wrote: > For deleting the fsnotify_mark related with an inode, there are 2 paths in the > kernel. When the inotify fd is closed, all the marks belonging to a group are > removed one by one in fsnotify_clear_marks_by_group_flags. Other path is when > the inode is removed from user space by unlink, fsnotify_destroy_mark is > called to delete a single mark. > There is a race between these 2 paths which is caused due to the temporary > release of the mark_mutex inside fsnotify_destroy_mark_locked. > The race happen when the inotify app monitoring the file(s) exits, triggering > fsnotify_clear_marks_by_group_flags to delete the marks. > This function use lmark pointer to move to the next node after a safe removal > of the node. In parallel, if there is rm call for a file and such that the > lmark is pointing to the mark which is removed by this rm call, lmark ends up > pointing to a freed memory. Now, when we try to move to the next node using > lmark, it triggers an invalid virtual address crash. > Although fsnotify_clear_marks_by_group_flags and fsnotify_destroy_mark are > synchronized by mark_mutex, but both of these functions call > fsnotify_destroy_mark_locked which release the mark_mutex and acquire it again > creating a subtle race window. There seems to be no reason for releasing > mark_mutex, so this patch remove the mutex_unlock call. Thanks for report and the analysis. I agree with your problem analysis. Indeed the loop in fsnotify_clear_marks_by_group_flags() isn't safe against us dropping the mark_mutex inside fsnotify_destroy_mark_locked(). However mark_mutex is dropped in fsnotify_destroy_mark_locked() for a purpose. We call ->freeing_mark() callback from there and that should be called without mark_mutex. In particular inotify uses this callback to send the IN_IGNORE event and that code certainly isn't prepared to be called under mark_mutex and you likely introduce interesting deadlock possibilities there. Looking into this in more detail, it might be worthwhile to revisit how mark_mutex is used since at least fanotify and dnotify use it for more than just a protection of list of group marks and untangling this would simplify things. But that's a longer term goal. A relatively simple fix for your issue is to split group list of marks into a list of inode marks and a list of mount marks. Then destroying becomes much simpler because we always discard the whole list (or both of them) and we can easily avoid problems with list corruption when dropping the mark_mutex. I can write the patch later or you can do that if you are interested. Honza > > Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> > Reviewed-by: Amit Sahrawat <a.sahrawat@samsung.com> > --- > fs/notify/mark.c | 4 ---- > 1 files changed, 0 insertions(+), 4 deletions(-) > > diff --git a/fs/notify/mark.c b/fs/notify/mark.c > index 92e48c7..4ee419f 100755 > --- a/fs/notify/mark.c > +++ b/fs/notify/mark.c > @@ -157,8 +157,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, > > if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) > iput(inode); > - /* release lock temporarily */ > - mutex_unlock(&group->mark_mutex); > > spin_lock(&destroy_lock); > list_add(&mark->g_list, &destroy_list); > @@ -191,8 +189,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, > */ > > atomic_dec(&group->num_marks); > - > - mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); > } > > void fsnotify_destroy_mark(struct fsnotify_mark *mark, > -- > 1.7.7 >
Hi, On 23.06.2015 12:25, Jan Kara wrote: > On Mon 22-06-15 16:23:16, Ashish Sangwan wrote: >> For deleting the fsnotify_mark related with an inode, there are 2 paths in the >> kernel. When the inotify fd is closed, all the marks belonging to a group are >> removed one by one in fsnotify_clear_marks_by_group_flags. Other path is when >> the inode is removed from user space by unlink, fsnotify_destroy_mark is >> called to delete a single mark. >> There is a race between these 2 paths which is caused due to the temporary >> release of the mark_mutex inside fsnotify_destroy_mark_locked. >> The race happen when the inotify app monitoring the file(s) exits, triggering >> fsnotify_clear_marks_by_group_flags to delete the marks. >> This function use lmark pointer to move to the next node after a safe removal >> of the node. In parallel, if there is rm call for a file and such that the >> lmark is pointing to the mark which is removed by this rm call, lmark ends up >> pointing to a freed memory. Now, when we try to move to the next node using >> lmark, it triggers an invalid virtual address crash. >> Although fsnotify_clear_marks_by_group_flags and fsnotify_destroy_mark are >> synchronized by mark_mutex, but both of these functions call >> fsnotify_destroy_mark_locked which release the mark_mutex and acquire it again >> creating a subtle race window. There seems to be no reason for releasing >> mark_mutex, so this patch remove the mutex_unlock call. > > Thanks for report and the analysis. I agree with your problem analysis. > Indeed the loop in fsnotify_clear_marks_by_group_flags() isn't safe against > us dropping the mark_mutex inside fsnotify_destroy_mark_locked(). However > mark_mutex is dropped in fsnotify_destroy_mark_locked() for a purpose. We > call ->freeing_mark() callback from there and that should be called without > mark_mutex. In particular inotify uses this callback to send the IN_IGNORE > event and that code certainly isn't prepared to be called under mark_mutex > and you likely introduce interesting deadlock possibilities there. > Why dont we call freeing_mark() from the "fsnotify_mark"-thread instead of fsnotify_destroy_mark_locked()? So there would not be a reason for this temporary unlock any longer and we could close that race as Ashish suggested. Lino -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Wed 24-06-15 00:30:16, Lino Sanfilippo wrote: > On 23.06.2015 12:25, Jan Kara wrote: > > On Mon 22-06-15 16:23:16, Ashish Sangwan wrote: > >> For deleting the fsnotify_mark related with an inode, there are 2 paths in the > >> kernel. When the inotify fd is closed, all the marks belonging to a group are > >> removed one by one in fsnotify_clear_marks_by_group_flags. Other path is when > >> the inode is removed from user space by unlink, fsnotify_destroy_mark is > >> called to delete a single mark. > >> There is a race between these 2 paths which is caused due to the temporary > >> release of the mark_mutex inside fsnotify_destroy_mark_locked. > >> The race happen when the inotify app monitoring the file(s) exits, triggering > >> fsnotify_clear_marks_by_group_flags to delete the marks. > >> This function use lmark pointer to move to the next node after a safe removal > >> of the node. In parallel, if there is rm call for a file and such that the > >> lmark is pointing to the mark which is removed by this rm call, lmark ends up > >> pointing to a freed memory. Now, when we try to move to the next node using > >> lmark, it triggers an invalid virtual address crash. > >> Although fsnotify_clear_marks_by_group_flags and fsnotify_destroy_mark are > >> synchronized by mark_mutex, but both of these functions call > >> fsnotify_destroy_mark_locked which release the mark_mutex and acquire it again > >> creating a subtle race window. There seems to be no reason for releasing > >> mark_mutex, so this patch remove the mutex_unlock call. > > > > Thanks for report and the analysis. I agree with your problem analysis. > > Indeed the loop in fsnotify_clear_marks_by_group_flags() isn't safe against > > us dropping the mark_mutex inside fsnotify_destroy_mark_locked(). However > > mark_mutex is dropped in fsnotify_destroy_mark_locked() for a purpose. We > > call ->freeing_mark() callback from there and that should be called without > > mark_mutex. In particular inotify uses this callback to send the IN_IGNORE > > event and that code certainly isn't prepared to be called under mark_mutex > > and you likely introduce interesting deadlock possibilities there. > > Why dont we call freeing_mark() from the "fsnotify_mark"-thread instead > of fsnotify_destroy_mark_locked()? So there would not be a reason for > this temporary unlock any longer and we could close that race as Ashish > suggested. We could do that as well. But I'd prefer to keep sending the IN_IGNORED event from the context of the process destroying the mark (not that I would be aware of any strong reason why that must happen but it just seems more natural). Also the event from destruction thread will be sent with a delay caused by synchronize_srcu(). Finally one long critical section for destruction of all marks belonging to a group doesn't seem ideal either. Anyway, I'll have this possibility in mind when implementing some solution. Maybe it will be the most elegant way... Honza
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 92e48c7..4ee419f 100755 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -157,8 +157,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) iput(inode); - /* release lock temporarily */ - mutex_unlock(&group->mark_mutex); spin_lock(&destroy_lock); list_add(&mark->g_list, &destroy_list); @@ -191,8 +189,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, */ atomic_dec(&group->num_marks); - - mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); } void fsnotify_destroy_mark(struct fsnotify_mark *mark,