Message ID | 201803282126.GBC56799.tOFVFSOHJLFOQM@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed 28-03-18 21:26:43, Tetsuo Handa wrote: > Michal Hocko wrote: [...] > > > Basic idea is > > > > > > bool downgraded = false; > > > down_write(mmap_sem); > > > for (something_1_that_might_depend_mmap_sem_held_for_write; > > > something_2_that_might_depend_mmap_sem_held_for_write; > > > something_3_that_might_depend_mmap_sem_held_for_write) { > > > something_4_that_might_depend_mmap_sem_held_for_write(); > > > if (fatal_signal_pending(current)) { > > > downgrade_write(mmap_sem); > > > downgraded = true; > > > break; > > > } > > > something_5_that_might_depend_mmap_sem_held_for_write(); > > > } > > > if (!downgraded) > > > up_write(mmap_sem); > > > else > > > up_read(mmap_sem); > > > > > > . That is, try to interrupt critical sections at locations where it is > > > known to be safe and consistent. > > > > Please explain why those places are safe to interrupt. > > Because (regarding the downgrade_write() approach), as far as I know, > the current thread does not access memory which needs to be protected with > mmap_sem held for write. But there are other threads which can be in an arbitrary code path before they get to the signal handling code. > > So please explain why it is safe. It is > > really not straightforward. > > So, please explain why it is not safe. How can releasing mmap_sem held for > write at safely and consistently interruptible locations be not safe? Look, you are proposing a patch and the onus to justify its correctness is on you. You are playing with the code which you have only vague understanding of and so you should study and understand it more. You seem to be infering invariants from the current function scope and that is quite dangerous. [...] > > > What we need to be careful is making changes to current->mm. > > > I'm assuming that current->mm->mmap_sem held for read is enough for > > > i_mmap_lock_write()/flush_dcache_mmap_lock()/vma_interval_tree_insert_after()/ > > > flush_dcache_mmap_unlock()/i_mmap_unlock_write()/is_vm_hugetlb_page()/ > > > reset_vma_resv_huge_pages()/__vma_link_rb(). But I'm not sure. > > > > But as soon as you downgrade the lock then all other threads can > > interfere and perform page faults or update respecive mappings. Does > > this matter? If not then why? > > > > Why does this matter? > > I don't know what "update respecive mappings" means. E.g. any get_user_page or an ongoing #PF can update a mapping and so the child process might see some inconsistencies (aka partial of the address space with the previous content). > Is that about mmap()/munmap() which need mmap_sem held for write? No, it is about those who take it for read. > Since mmap_sem is still held for read, operations which needs > mmap_sem held for write cannot happen. > > Anyway, as long as I downgrade the mmap_sem at safely and consistently > interruptible locations, there cannot be a problem. I have a strong feeling that the whole address space has to be copied atomicaly form the address space POV. I might be wrong but it is not really obvious to me why and if you want to downgrade the lock there then please make sure to document what are you assumptions and why they are valid.
diff --git a/kernel/fork.c b/kernel/fork.c index 1e8c9a7..851c675 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -514,6 +514,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, if (tmp->vm_ops && tmp->vm_ops->open) tmp->vm_ops->open(tmp); + if (!retval && fatal_signal_pending(current)) + retval = -EINTR; + if (retval) goto out; } -- is safe because there is no difference (except the error code) between above change and hitting "goto fail_nomem;" path after "mpnt = mpnt->vm_next;". Therefore, I think that interrupting at diff --git a/kernel/fork.c b/kernel/fork.c index 851c675..2706acc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -508,7 +508,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, rb_parent = &tmp->vm_rb; mm->map_count++; - if (!(tmp->vm_flags & VM_WIPEONFORK)) + if (fatal_signal_pending(current)) + retval = -EINTR; + else if (!(tmp->vm_flags & VM_WIPEONFORK)) retval = copy_page_range(mm, oldmm, mpnt); if (tmp->vm_ops && tmp->vm_ops->open)