Message ID | 1578292679-2592-1-git-send-email-lixinhai.lxh@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/rmap.c: remove useless checking to child vma->vm_prev in anon_vma_clone | expand |
On 06/01/2020 09.37, Li Xinhai wrote: > For fork case, the dst->vm_prev is always same as src->vm_prev when > anon_vma_clone() is called. Removing the assignment from > dst->vm_prev->anon_vma to dst->anon_vma, and explictly assign from > anon_vma which is shared by its parent vmas. This doesn't sound right. I see dst->vm_prev is set after anon_vma_fork(), so here it still points to parent prev. So, this thing works isn't as is supposed to be. I expect this logic: If parent SRC1 SRC2 .. SRCn share ANON0 then in child related DST1 DST2 .. DSTn should fork and share ANON1: Forking DST1 creates new ANON1 and then DST2 and following share it. Also this assumption is wrong: > Parent has vm_prev, which implies we have vm_prev. If in parent prev VMA has VM_DONTCOPY then in child prev VMA will not match pprev or even could be NULL if it was first in mm. See patch: https://lore.kernel.org/lkml/157830736034.8148.7070851958306750616.stgit@buzz/T/#u I've tested it using this: --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -847,6 +847,12 @@ static int show_smap(struct seq_file *m, void *v) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); + if (vma->anon_vma) + seq_printf(m, "AnonVMA: %p %p %d\n", + vma->anon_vma, + vma->anon_vma->parent, + vma->anon_vma->degree); + m_cache_vma(m, vma); return 0; --- #include <sys/mman.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <stdio.h> int main(int argc, char **argv) { void *ptr; char buf[100]; ptr = mmap(NULL, 0x3000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memset(ptr, 0, 0x3000); mprotect(ptr + 0x1000, 0x1000, PROT_READ); sprintf(buf, "cat /proc/%d/smaps", getpid()); system(buf); if (fork()) { wait(NULL); } else { printf("\n\n\n"); fflush(stdout); sprintf(buf, "cat /proc/%d/smaps", getpid()); system(buf); } } --- > > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > Cc: Wei Yang <richardw.yang@linux.intel.com> > Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/rmap.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index b3e3819..3c912a6c 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -269,10 +269,10 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) > { > struct anon_vma_chain *avc, *pavc; > struct anon_vma *root = NULL; > - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; > + struct vm_area_struct *pprev = src->vm_prev; > > /* > - * If parent share anon_vma with its vm_prev, keep this sharing in in > + * If parent share anon_vma with its vm_prev, keep this sharing in > * child. > * > * 1. Parent has vm_prev, which implies we have vm_prev. > @@ -280,8 +280,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) > */ > if (!dst->anon_vma && src->anon_vma && > pprev && pprev->anon_vma == src->anon_vma) > - dst->anon_vma = prev->anon_vma; > - > + dst->anon_vma = pprev->anon_vma; > > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { > struct anon_vma *anon_vma; >
On 2020-01-06 at 18:43 Konstantin Khlebnikov wrote: >On 06/01/2020 09.37, Li Xinhai wrote: >> For fork case, the dst->vm_prev is always same as src->vm_prev when >> anon_vma_clone() is called. Removing the assignment from >> dst->vm_prev->anon_vma to dst->anon_vma, and explictly assign from >> anon_vma which is shared by its parent vmas. > >This doesn't sound right. > >I see dst->vm_prev is set after anon_vma_fork(), so here it still points to parent prev. >So, this thing works isn't as is supposed to be. > >I expect this logic: If parent SRC1 SRC2 .. SRCn share ANON0 >then in child related DST1 DST2 .. DSTn should fork and share ANON1: >Forking DST1 creates new ANON1 and then DST2 and following share it. This logic was not fully clarified in https://lore.kernel.org/linux-mm/20191011072256.16275-2-richardw.yang@linux.intel.com/ I've assumed that sharing parent vma's anon_vma with child vma was the purpose of that patch, and it intentionally want the first child has its own new anon_vma (don't sharing as done by other child vma). > >Also this assumption is wrong: > > Parent has vm_prev, which implies we have vm_prev. >If in parent prev VMA has VM_DONTCOPY then in child prev VMA will >not match pprev or even could be NULL if it was first in mm. > >See patch: >https://lore.kernel.org/lkml/157830736034.8148.7070851958306750616.stgit@buzz/T/#u > >I've tested it using this: > >--- a/fs/proc/task_mmu.c >+++ b/fs/proc/task_mmu.c >@@ -847,6 +847,12 @@ static int show_smap(struct seq_file *m, void *v) > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > show_smap_vma_flags(m, vma); > >+ if (vma->anon_vma) >+ seq_printf(m, "AnonVMA: %p %p %d\n", >+ vma->anon_vma, >+ vma->anon_vma->parent, >+ vma->anon_vma->degree); >+ > m_cache_vma(m, vma); > > return 0; > >--- > >#include <sys/mman.h> >#include <stdlib.h> >#include <unistd.h> >#include <string.h> >#include <stdio.h> > >int main(int argc, char **argv) { > void *ptr; > char buf[100]; > > ptr = mmap(NULL, 0x3000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > memset(ptr, 0, 0x3000); > mprotect(ptr + 0x1000, 0x1000, PROT_READ); > > sprintf(buf, "cat /proc/%d/smaps", getpid()); > system(buf); > > if (fork()) { > wait(NULL); > } else { > printf("\n\n\n"); > fflush(stdout); > sprintf(buf, "cat /proc/%d/smaps", getpid()); > system(buf); > } >} > >--- > >> >> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> Cc: Wei Yang <richardw.yang@linux.intel.com> >> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> --- >> mm/rmap.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index b3e3819..3c912a6c 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -269,10 +269,10 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) >> { >> struct anon_vma_chain *avc, *pavc; >> struct anon_vma *root = NULL; >> - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; >> + struct vm_area_struct *pprev = src->vm_prev; >> >> /* >> - * If parent share anon_vma with its vm_prev, keep this sharing in in >> + * If parent share anon_vma with its vm_prev, keep this sharing in >> * child. >> * >> * 1. Parent has vm_prev, which implies we have vm_prev. >> @@ -280,8 +280,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) >> */ >> if (!dst->anon_vma && src->anon_vma && >> pprev && pprev->anon_vma == src->anon_vma) >> - dst->anon_vma = prev->anon_vma; >> - >> + dst->anon_vma = pprev->anon_vma; >> >> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { >> struct anon_vma *anon_vma; >>
On Mon, Jan 6, 2020 at 4:28 PM lixinhai.lxh@gmail.com <lixinhai.lxh@gmail.com> wrote: > > On 2020-01-06 at 18:43 Konstantin Khlebnikov wrote: > >On 06/01/2020 09.37, Li Xinhai wrote: > >> For fork case, the dst->vm_prev is always same as src->vm_prev when > >> anon_vma_clone() is called. Removing the assignment from > >> dst->vm_prev->anon_vma to dst->anon_vma, and explictly assign from > >> anon_vma which is shared by its parent vmas. > > > >This doesn't sound right. > > > >I see dst->vm_prev is set after anon_vma_fork(), so here it still points to parent prev. > >So, this thing works isn't as is supposed to be. > > > >I expect this logic: If parent SRC1 SRC2 .. SRCn share ANON0 > >then in child related DST1 DST2 .. DSTn should fork and share ANON1: > >Forking DST1 creates new ANON1 and then DST2 and following share it. > > This logic was not fully clarified in > https://lore.kernel.org/linux-mm/20191011072256.16275-2-richardw.yang@linux.intel.com/ > I've assumed that sharing parent vma's anon_vma with child vma was the > purpose of that patch, and it intentionally want the first child has its own new > anon_vma (don't sharing as done by other child vma). Well, this more or less follows from original design. Page anon-vma along with page offset limits set of vmas scanned by rmap: it skips vmas where page cannot be mapped for sure. If vmas in one process shares anon-vma then they likely have non-overlapping offsets, so there is no reason to fork personal anon-vma for each of them when process forks. But it's good to fork new anon-vma for all of them together: then rmap could skip scanning parent vmas for pages allocated\cowed in child process. Together they act like one big vma. > > > > >Also this assumption is wrong: > > > Parent has vm_prev, which implies we have vm_prev. > >If in parent prev VMA has VM_DONTCOPY then in child prev VMA will > >not match pprev or even could be NULL if it was first in mm. > > > >See patch: > >https://lore.kernel.org/lkml/157830736034.8148.7070851958306750616.stgit@buzz/T/#u > > > >I've tested it using this: > > > >--- a/fs/proc/task_mmu.c > >+++ b/fs/proc/task_mmu.c > >@@ -847,6 +847,12 @@ static int show_smap(struct seq_file *m, void *v) > > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > > show_smap_vma_flags(m, vma); > > > >+ if (vma->anon_vma) > >+ seq_printf(m, "AnonVMA: %p %p %d\n", > >+ vma->anon_vma, > >+ vma->anon_vma->parent, > >+ vma->anon_vma->degree); > >+ > > m_cache_vma(m, vma); > > > > return 0; > > > >--- > > > >#include <sys/mman.h> > >#include <stdlib.h> > >#include <unistd.h> > >#include <string.h> > >#include <stdio.h> > > > >int main(int argc, char **argv) { > > void *ptr; > > char buf[100]; > > > > ptr = mmap(NULL, 0x3000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > > memset(ptr, 0, 0x3000); > > mprotect(ptr + 0x1000, 0x1000, PROT_READ); > > > > sprintf(buf, "cat /proc/%d/smaps", getpid()); > > system(buf); > > > > if (fork()) { > > wait(NULL); > > } else { > > printf("\n\n\n"); > > fflush(stdout); > > sprintf(buf, "cat /proc/%d/smaps", getpid()); > > system(buf); > > } > >} > > > >--- > > > >> > >> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > >> Cc: Wei Yang <richardw.yang@linux.intel.com> > >> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > >> --- > >> mm/rmap.c | 7 +++---- > >> 1 file changed, 3 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index b3e3819..3c912a6c 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -269,10 +269,10 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) > >> { > >> struct anon_vma_chain *avc, *pavc; > >> struct anon_vma *root = NULL; > >> - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; > >> + struct vm_area_struct *pprev = src->vm_prev; > >> > >> /* > >> - * If parent share anon_vma with its vm_prev, keep this sharing in in > >> + * If parent share anon_vma with its vm_prev, keep this sharing in > >> * child. > >> * > >> * 1. Parent has vm_prev, which implies we have vm_prev. > >> @@ -280,8 +280,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) > >> */ > >> if (!dst->anon_vma && src->anon_vma && > >> pprev && pprev->anon_vma == src->anon_vma) > >> - dst->anon_vma = prev->anon_vma; > >> - > >> + dst->anon_vma = pprev->anon_vma; > >> > >> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { > >> struct anon_vma *anon_vma; > >>
diff --git a/mm/rmap.c b/mm/rmap.c index b3e3819..3c912a6c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -269,10 +269,10 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; struct anon_vma *root = NULL; - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; + struct vm_area_struct *pprev = src->vm_prev; /* - * If parent share anon_vma with its vm_prev, keep this sharing in in + * If parent share anon_vma with its vm_prev, keep this sharing in * child. * * 1. Parent has vm_prev, which implies we have vm_prev. @@ -280,8 +280,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) */ if (!dst->anon_vma && src->anon_vma && pprev && pprev->anon_vma == src->anon_vma) - dst->anon_vma = prev->anon_vma; - + dst->anon_vma = pprev->anon_vma; list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { struct anon_vma *anon_vma;
For fork case, the dst->vm_prev is always same as src->vm_prev when anon_vma_clone() is called. Removing the assignment from dst->vm_prev->anon_vma to dst->anon_vma, and explictly assign from anon_vma which is shared by its parent vmas. Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/rmap.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)