Message ID | 20230720013249.199981-1-jannh@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Don't drop VMA locks in mm_drop_all_locks() | expand |
On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@google.com> wrote: > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > lock is held write-locked by the caller, and the caller is responsible for > dropping the mmap lock at a later point (which will also release the VMA > locks). > Calling vma_end_write_all() here is dangerous because the caller might have > write-locked a VMA with the expectation that it will stay write-locked > until the mmap_lock is released, as usual. > > This _almost_ becomes a problem in the following scenario: > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > Userspace calls munmap() on a range starting at the start address of A and > ending in the middle of B. > > Hypothetical call graph with additional notes in brackets: > > do_vmi_align_munmap > [begin first for_each_vma_range loop] > vma_start_write [on VMA A] > vma_mark_detached [on VMA A] > __split_vma [on VMA B] > sgx_vma_open [== new->vm_ops->open] > sgx_encl_mm_add > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > mm_take_all_locks > mm_drop_all_locks > vma_end_write_all [drops VMA lock taken on VMA A before] > vma_start_write [on VMA B] > vma_mark_detached [on VMA B] > [end first for_each_vma_range loop] > vma_iter_clear_gfp [removes VMAs from maple tree] > mmap_write_downgrade > unmap_region > mmap_read_unlock > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > holds a VMA write lock on VMA A, the VMA write lock has actually been > invalidated inside __split_vma(). > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > actually happen here, as far as I understand, because we are duplicating an > existing SGX VMA, but sgx_encl_mm_add() only calls > __mmu_notifier_register() for the first SGX VMA created in a given process. > So this could only happen in fork(), not on munmap(). > But in my view it is just pure luck that this can't happen. > > Also, we wouldn't actually have any bad consequences from this in > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > we've already marked VMA A as detached, which makes it completely > ineligible for any VMA-locked page faults. > But again, that's just pure luck. > > So remove the vma_end_write_all(), so that VMA write locks are only ever > released on mmap_write_unlock() or mmap_write_downgrade(). Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, even the ones which were locked before mm_take_all_locks() seems dangerous. One concern I have is that mm_take_all_locks() and mm_drop_all_locks() become asymmetric with this change: mm_take_all_locks() locks all VMAs but mm_drop_all_locks() does not release them. I think there should be an additional comment explaining this asymmetry. Another side-effect which would be nice to document in a comment is that when mm_take_all_locks() fails after it locked the VMAs, those VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. This happens because of failure mm_take_all_locks() jumps to perform mm_drop_all_locks() and this will not unlock already locked VMAs. Other than that LGTM. Thanks! > > Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") > Cc: Suren Baghdasaryan <surenb@google.com> > Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/mmap.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 3eda23c9ebe7..1ff354b1e23c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3758,7 +3758,6 @@ void mm_drop_all_locks(struct mm_struct *mm) > if (vma->vm_file && vma->vm_file->f_mapping) > vm_unlock_mapping(vma->vm_file->f_mapping); > } > - vma_end_write_all(mm); > > mutex_unlock(&mm_all_locks_mutex); > } > > base-commit: bfa3037d828050896ae52f6467b6ca2489ae6fb1 > -- > 2.41.0.255.g8b1d071c50-goog >
On Thu, Jul 20, 2023 at 9:51 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@google.com> wrote: > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > > lock is held write-locked by the caller, and the caller is responsible for > > dropping the mmap lock at a later point (which will also release the VMA > > locks). > > Calling vma_end_write_all() here is dangerous because the caller might have > > write-locked a VMA with the expectation that it will stay write-locked > > until the mmap_lock is released, as usual. > > > > This _almost_ becomes a problem in the following scenario: > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > > Userspace calls munmap() on a range starting at the start address of A and > > ending in the middle of B. > > > > Hypothetical call graph with additional notes in brackets: > > > > do_vmi_align_munmap > > [begin first for_each_vma_range loop] > > vma_start_write [on VMA A] > > vma_mark_detached [on VMA A] > > __split_vma [on VMA B] > > sgx_vma_open [== new->vm_ops->open] > > sgx_encl_mm_add > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > mm_take_all_locks > > mm_drop_all_locks > > vma_end_write_all [drops VMA lock taken on VMA A before] > > vma_start_write [on VMA B] > > vma_mark_detached [on VMA B] > > [end first for_each_vma_range loop] > > vma_iter_clear_gfp [removes VMAs from maple tree] > > mmap_write_downgrade > > unmap_region > > mmap_read_unlock > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > invalidated inside __split_vma(). > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > actually happen here, as far as I understand, because we are duplicating an > > existing SGX VMA, but sgx_encl_mm_add() only calls > > __mmu_notifier_register() for the first SGX VMA created in a given process. > > So this could only happen in fork(), not on munmap(). > > But in my view it is just pure luck that this can't happen. > > > > Also, we wouldn't actually have any bad consequences from this in > > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > > we've already marked VMA A as detached, which makes it completely > > ineligible for any VMA-locked page faults. > > But again, that's just pure luck. > > > > So remove the vma_end_write_all(), so that VMA write locks are only ever > > released on mmap_write_unlock() or mmap_write_downgrade(). > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > even the ones which were locked before mm_take_all_locks() seems > dangerous. > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > become asymmetric with this change: mm_take_all_locks() locks all VMAs > but mm_drop_all_locks() does not release them. I think there should be > an additional comment explaining this asymmetry. > Another side-effect which would be nice to document in a comment is > that when mm_take_all_locks() fails after it locked the VMAs, those > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > This happens because of failure mm_take_all_locks() jumps to perform s/of/on in the above statement > mm_drop_all_locks() and this will not unlock already locked VMAs. > Other than that LGTM. Thanks! > > > > > Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") > > Cc: Suren Baghdasaryan <surenb@google.com> > > Signed-off-by: Jann Horn <jannh@google.com> > > Reviewed-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > mm/mmap.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 3eda23c9ebe7..1ff354b1e23c 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -3758,7 +3758,6 @@ void mm_drop_all_locks(struct mm_struct *mm) > > if (vma->vm_file && vma->vm_file->f_mapping) > > vm_unlock_mapping(vma->vm_file->f_mapping); > > } > > - vma_end_write_all(mm); > > > > mutex_unlock(&mm_all_locks_mutex); > > } > > > > base-commit: bfa3037d828050896ae52f6467b6ca2489ae6fb1 > > -- > > 2.41.0.255.g8b1d071c50-goog > >
On Thu, Jul 20, 2023 at 6:52 PM Suren Baghdasaryan <surenb@google.com> wrote: > On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@google.com> wrote: > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > > lock is held write-locked by the caller, and the caller is responsible for > > dropping the mmap lock at a later point (which will also release the VMA > > locks). > > Calling vma_end_write_all() here is dangerous because the caller might have > > write-locked a VMA with the expectation that it will stay write-locked > > until the mmap_lock is released, as usual. > > > > This _almost_ becomes a problem in the following scenario: > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > > Userspace calls munmap() on a range starting at the start address of A and > > ending in the middle of B. > > > > Hypothetical call graph with additional notes in brackets: > > > > do_vmi_align_munmap > > [begin first for_each_vma_range loop] > > vma_start_write [on VMA A] > > vma_mark_detached [on VMA A] > > __split_vma [on VMA B] > > sgx_vma_open [== new->vm_ops->open] > > sgx_encl_mm_add > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > mm_take_all_locks > > mm_drop_all_locks > > vma_end_write_all [drops VMA lock taken on VMA A before] > > vma_start_write [on VMA B] > > vma_mark_detached [on VMA B] > > [end first for_each_vma_range loop] > > vma_iter_clear_gfp [removes VMAs from maple tree] > > mmap_write_downgrade > > unmap_region > > mmap_read_unlock > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > invalidated inside __split_vma(). > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > actually happen here, as far as I understand, because we are duplicating an > > existing SGX VMA, but sgx_encl_mm_add() only calls > > __mmu_notifier_register() for the first SGX VMA created in a given process. > > So this could only happen in fork(), not on munmap(). > > But in my view it is just pure luck that this can't happen. > > > > Also, we wouldn't actually have any bad consequences from this in > > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > > we've already marked VMA A as detached, which makes it completely > > ineligible for any VMA-locked page faults. > > But again, that's just pure luck. > > > > So remove the vma_end_write_all(), so that VMA write locks are only ever > > released on mmap_write_unlock() or mmap_write_downgrade(). > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > even the ones which were locked before mm_take_all_locks() seems > dangerous. > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > become asymmetric with this change: mm_take_all_locks() locks all VMAs > but mm_drop_all_locks() does not release them. I think there should be > an additional comment explaining this asymmetry. > Another side-effect which would be nice to document in a comment is > that when mm_take_all_locks() fails after it locked the VMAs, those > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > This happens because of failure mm_take_all_locks() jumps to perform > mm_drop_all_locks() and this will not unlock already locked VMAs. > Other than that LGTM. Thanks! But this is not specific to mm_drop_all_locks() at all, right? It's just fundamentally how per-VMA locks are used everywhere. Somewhere deep down in some call path, while the mmap lock is held in write mode, a VMA is marked as being written to, and then this marking persists until the mmap lock is dropped. If we want to clarify this, I guess some comments on vma_end_write_all() and vma_start_write() might help, but I think that's independent of this patch.
On Thu, Jul 20, 2023 at 10:14 AM Jann Horn <jannh@google.com> wrote: > > On Thu, Jul 20, 2023 at 6:52 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@google.com> wrote: > > > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > > > lock is held write-locked by the caller, and the caller is responsible for > > > dropping the mmap lock at a later point (which will also release the VMA > > > locks). > > > Calling vma_end_write_all() here is dangerous because the caller might have > > > write-locked a VMA with the expectation that it will stay write-locked > > > until the mmap_lock is released, as usual. > > > > > > This _almost_ becomes a problem in the following scenario: > > > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > > > Userspace calls munmap() on a range starting at the start address of A and > > > ending in the middle of B. > > > > > > Hypothetical call graph with additional notes in brackets: > > > > > > do_vmi_align_munmap > > > [begin first for_each_vma_range loop] > > > vma_start_write [on VMA A] > > > vma_mark_detached [on VMA A] > > > __split_vma [on VMA B] > > > sgx_vma_open [== new->vm_ops->open] > > > sgx_encl_mm_add > > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > > mm_take_all_locks > > > mm_drop_all_locks > > > vma_end_write_all [drops VMA lock taken on VMA A before] > > > vma_start_write [on VMA B] > > > vma_mark_detached [on VMA B] > > > [end first for_each_vma_range loop] > > > vma_iter_clear_gfp [removes VMAs from maple tree] > > > mmap_write_downgrade > > > unmap_region > > > mmap_read_unlock > > > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > > invalidated inside __split_vma(). > > > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > > actually happen here, as far as I understand, because we are duplicating an > > > existing SGX VMA, but sgx_encl_mm_add() only calls > > > __mmu_notifier_register() for the first SGX VMA created in a given process. > > > So this could only happen in fork(), not on munmap(). > > > But in my view it is just pure luck that this can't happen. > > > > > > Also, we wouldn't actually have any bad consequences from this in > > > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > > > we've already marked VMA A as detached, which makes it completely > > > ineligible for any VMA-locked page faults. > > > But again, that's just pure luck. > > > > > > So remove the vma_end_write_all(), so that VMA write locks are only ever > > > released on mmap_write_unlock() or mmap_write_downgrade(). > > > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > > even the ones which were locked before mm_take_all_locks() seems > > dangerous. > > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > > become asymmetric with this change: mm_take_all_locks() locks all VMAs > > but mm_drop_all_locks() does not release them. I think there should be > > an additional comment explaining this asymmetry. > > Another side-effect which would be nice to document in a comment is > > that when mm_take_all_locks() fails after it locked the VMAs, those > > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > > This happens because of failure mm_take_all_locks() jumps to perform > > mm_drop_all_locks() and this will not unlock already locked VMAs. > > Other than that LGTM. Thanks! > > But this is not specific to mm_drop_all_locks() at all, right? It's just > fundamentally how per-VMA locks are used everywhere. Somewhere deep > down in some call path, while the mmap lock is held in write mode, a > VMA is marked as being written to, and then this marking persists > until the mmap lock is dropped. Yes, but for all other locks mm_take_all_locks()/mm_drop_all_locks() is perfectly symmetrical AFAIKT. I see pretty descriptive comment for mm_take_all_locks() explaining locking rules, so mentioning this asymmetry in there seems appropriate to me. > > If we want to clarify this, I guess some comments on > vma_end_write_all() and vma_start_write() might help, but I think > that's independent of this patch. Yes and no. This patch changes vma_end_write_all() to be called exclusively from mmap_write_unlock()/mmap_write_downgrade(), so it introduces the notion that once VMA is write-locked it's never released until mmap_lock is released/downgraded. So, I agree that a comment for vma_start_write() is appropriate but I also think it does belong in this patch. Thanks, Suren.
diff --git a/mm/mmap.c b/mm/mmap.c index 3eda23c9ebe7..1ff354b1e23c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3758,7 +3758,6 @@ void mm_drop_all_locks(struct mm_struct *mm) if (vma->vm_file && vma->vm_file->f_mapping) vm_unlock_mapping(vma->vm_file->f_mapping); } - vma_end_write_all(mm); mutex_unlock(&mm_all_locks_mutex); }
Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap lock is held write-locked by the caller, and the caller is responsible for dropping the mmap lock at a later point (which will also release the VMA locks). Calling vma_end_write_all() here is dangerous because the caller might have write-locked a VMA with the expectation that it will stay write-locked until the mmap_lock is released, as usual. This _almost_ becomes a problem in the following scenario: An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. Userspace calls munmap() on a range starting at the start address of A and ending in the middle of B. Hypothetical call graph with additional notes in brackets: do_vmi_align_munmap [begin first for_each_vma_range loop] vma_start_write [on VMA A] vma_mark_detached [on VMA A] __split_vma [on VMA B] sgx_vma_open [== new->vm_ops->open] sgx_encl_mm_add __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] mm_take_all_locks mm_drop_all_locks vma_end_write_all [drops VMA lock taken on VMA A before] vma_start_write [on VMA B] vma_mark_detached [on VMA B] [end first for_each_vma_range loop] vma_iter_clear_gfp [removes VMAs from maple tree] mmap_write_downgrade unmap_region mmap_read_unlock In this hypothetical scenario, while do_vmi_align_munmap() thinks it still holds a VMA write lock on VMA A, the VMA write lock has actually been invalidated inside __split_vma(). The call from sgx_encl_mm_add() to __mmu_notifier_register() can't actually happen here, as far as I understand, because we are duplicating an existing SGX VMA, but sgx_encl_mm_add() only calls __mmu_notifier_register() for the first SGX VMA created in a given process. So this could only happen in fork(), not on munmap(). But in my view it is just pure luck that this can't happen. Also, we wouldn't actually have any bad consequences from this in do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, we've already marked VMA A as detached, which makes it completely ineligible for any VMA-locked page faults. But again, that's just pure luck. So remove the vma_end_write_all(), so that VMA write locks are only ever released on mmap_write_unlock() or mmap_write_downgrade(). Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Jann Horn <jannh@google.com> --- mm/mmap.c | 1 - 1 file changed, 1 deletion(-) base-commit: bfa3037d828050896ae52f6467b6ca2489ae6fb1