Message ID | 20220222054025.3412898-2-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/3] mm: refactor vm_area_struct::anon_vma_name usage code | expand |
On Mon 21-02-22 21:40:24, Suren Baghdasaryan wrote: > A deep process chain with many vmas could grow really high. This would really benefit from some numbers. With default sysctl_max_map_count (64k) and default pid_max (32k) the INT_MAX could be theoretically reached but I find it impractical because not all vmas can be anonymous same as all available pids can be consumed for a theoretical attack (if my counting is proper). On the other hand any non-default configuration with any of the values increased could hit this theoretically. > kref > refcounting interface used in anon_vma_name structure will detect > a counter overflow when it reaches REFCOUNT_SATURATED value but will > only generate a warning about broken refcounting. > To ensure anon_vma_name refcount does not overflow, stop anon_vma_name > sharing when the refcount reaches INT_MAX, which still leaves INT_MAX/2 > values before the counter reaches REFCOUNT_SATURATED. This should provide > enough headroom for raising the refcounts temporarily. > > Suggested-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > include/linux/mm_inline.h | 18 ++++++++++++++---- > mm/madvise.c | 3 +-- > 2 files changed, 15 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h > index 70b619442d56..b189e2638843 100644 > --- a/include/linux/mm_inline.h > +++ b/include/linux/mm_inline.h > @@ -156,15 +156,25 @@ static inline void anon_vma_name_get(struct anon_vma_name *anon_name) > > extern void anon_vma_name_put(struct anon_vma_name *anon_name); > > +static inline > +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) > +{ > + /* Prevent anon_name refcount saturation early on */ > + if (kref_read(&anon_name->kref) < INT_MAX) { REFCOUNT_MAX seems to be defined by the kref framework. Other than that looks good to me. > + anon_vma_name_get(anon_name); > + return anon_name; > + > + } > + return anon_vma_name_alloc(anon_name->name); > +} > + > static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, > struct vm_area_struct *new_vma) > { > struct anon_vma_name *anon_name = vma_anon_name(orig_vma); > > - if (anon_name) { > - anon_vma_name_get(anon_name); > - new_vma->anon_name = anon_name; > - } > + if (anon_name) > + new_vma->anon_name = anon_vma_name_reuse(anon_name); > } > > static inline void free_vma_anon_name(struct vm_area_struct *vma) > diff --git a/mm/madvise.c b/mm/madvise.c > index f81d62d8ce9b..a395884aeecb 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -122,8 +122,7 @@ static int replace_vma_anon_name(struct vm_area_struct *vma, > if (anon_vma_name_eq(orig_name, anon_name)) > return 0; > > - anon_vma_name_get(anon_name); > - vma->anon_name = anon_name; > + vma->anon_name = anon_vma_name_reuse(anon_name); > anon_vma_name_put(orig_name); > > return 0; > -- > 2.35.1.473.g83b2b277ed-goog
On Tue, Feb 22, 2022 at 1:17 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 21-02-22 21:40:24, Suren Baghdasaryan wrote: > > A deep process chain with many vmas could grow really high. > > This would really benefit from some numbers. With default > sysctl_max_map_count (64k) and default pid_max (32k) the INT_MAX could > be theoretically reached but I find it impractical because not all vmas > can be anonymous same as all available pids can be consumed for a > theoretical attack (if my counting is proper). > On the other hand any non-default configuration with any of the values > increased could hit this theoretically. re: This would really benefit from some numbers Should I just add the details you provided above into the description? Would that suffice? > > > kref > > refcounting interface used in anon_vma_name structure will detect > > a counter overflow when it reaches REFCOUNT_SATURATED value but will > > only generate a warning about broken refcounting. > > To ensure anon_vma_name refcount does not overflow, stop anon_vma_name > > sharing when the refcount reaches INT_MAX, which still leaves INT_MAX/2 > > values before the counter reaches REFCOUNT_SATURATED. This should provide > > enough headroom for raising the refcounts temporarily. > > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > include/linux/mm_inline.h | 18 ++++++++++++++---- > > mm/madvise.c | 3 +-- > > 2 files changed, 15 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h > > index 70b619442d56..b189e2638843 100644 > > --- a/include/linux/mm_inline.h > > +++ b/include/linux/mm_inline.h > > @@ -156,15 +156,25 @@ static inline void anon_vma_name_get(struct anon_vma_name *anon_name) > > > > extern void anon_vma_name_put(struct anon_vma_name *anon_name); > > > > +static inline > > +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) > > +{ > > + /* Prevent anon_name refcount saturation early on */ > > + if (kref_read(&anon_name->kref) < INT_MAX) { > > REFCOUNT_MAX seems to be defined by the kref framework. Ah, indeed. I missed that. Will change to use it. > > Other than that looks good to me. Thanks for the review! > > > + anon_vma_name_get(anon_name); > > + return anon_name; > > + > > + } > > + return anon_vma_name_alloc(anon_name->name); > > +} > > + > > static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, > > struct vm_area_struct *new_vma) > > { > > struct anon_vma_name *anon_name = vma_anon_name(orig_vma); > > > > - if (anon_name) { > > - anon_vma_name_get(anon_name); > > - new_vma->anon_name = anon_name; > > - } > > + if (anon_name) > > + new_vma->anon_name = anon_vma_name_reuse(anon_name); > > } > > > > static inline void free_vma_anon_name(struct vm_area_struct *vma) > > diff --git a/mm/madvise.c b/mm/madvise.c > > index f81d62d8ce9b..a395884aeecb 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -122,8 +122,7 @@ static int replace_vma_anon_name(struct vm_area_struct *vma, > > if (anon_vma_name_eq(orig_name, anon_name)) > > return 0; > > > > - anon_vma_name_get(anon_name); > > - vma->anon_name = anon_name; > > + vma->anon_name = anon_vma_name_reuse(anon_name); > > anon_vma_name_put(orig_name); > > > > return 0; > > -- > > 2.35.1.473.g83b2b277ed-goog > > -- > Michal Hocko > SUSE Labs
On Tue, Feb 22, 2022 at 7:56 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Feb 22, 2022 at 1:17 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 21-02-22 21:40:24, Suren Baghdasaryan wrote: > > > A deep process chain with many vmas could grow really high. > > > > This would really benefit from some numbers. With default > > sysctl_max_map_count (64k) and default pid_max (32k) the INT_MAX could > > be theoretically reached but I find it impractical because not all vmas > > can be anonymous same as all available pids can be consumed for a > > theoretical attack (if my counting is proper). > > On the other hand any non-default configuration with any of the values > > increased could hit this theoretically. > > re: This would really benefit from some numbers > Should I just add the details you provided above into the description? > Would that suffice? Hmm. According to the defaults you posted, with max number of processes being 32k and max number of vmas per process 64k, the max number of vmas in the system is 2147450880. That's 32767 less than REFCOUNT_MAX=INT_MAX (2147483647) and 1073774592 less than REFCOUNT_SATURATED (3221225472). So with those defaults we should never hit these limits. Are we adding this protection for systems that set non-default higher limits or am I miscalculating something? > > > > > > kref > > > refcounting interface used in anon_vma_name structure will detect > > > a counter overflow when it reaches REFCOUNT_SATURATED value but will > > > only generate a warning about broken refcounting. > > > To ensure anon_vma_name refcount does not overflow, stop anon_vma_name > > > sharing when the refcount reaches INT_MAX, which still leaves INT_MAX/2 > > > values before the counter reaches REFCOUNT_SATURATED. This should provide > > > enough headroom for raising the refcounts temporarily. > > > > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > > include/linux/mm_inline.h | 18 ++++++++++++++---- > > > mm/madvise.c | 3 +-- > > > 2 files changed, 15 insertions(+), 6 deletions(-) > > > > > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h > > > index 70b619442d56..b189e2638843 100644 > > > --- a/include/linux/mm_inline.h > > > +++ b/include/linux/mm_inline.h > > > @@ -156,15 +156,25 @@ static inline void anon_vma_name_get(struct anon_vma_name *anon_name) > > > > > > extern void anon_vma_name_put(struct anon_vma_name *anon_name); > > > > > > +static inline > > > +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) > > > +{ > > > + /* Prevent anon_name refcount saturation early on */ > > > + if (kref_read(&anon_name->kref) < INT_MAX) { > > > > REFCOUNT_MAX seems to be defined by the kref framework. > > Ah, indeed. I missed that. Will change to use it. > > > > > Other than that looks good to me. > > Thanks for the review! > > > > > > + anon_vma_name_get(anon_name); > > > + return anon_name; > > > + > > > + } > > > + return anon_vma_name_alloc(anon_name->name); > > > +} > > > + > > > static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, > > > struct vm_area_struct *new_vma) > > > { > > > struct anon_vma_name *anon_name = vma_anon_name(orig_vma); > > > > > > - if (anon_name) { > > > - anon_vma_name_get(anon_name); > > > - new_vma->anon_name = anon_name; > > > - } > > > + if (anon_name) > > > + new_vma->anon_name = anon_vma_name_reuse(anon_name); > > > } > > > > > > static inline void free_vma_anon_name(struct vm_area_struct *vma) > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > index f81d62d8ce9b..a395884aeecb 100644 > > > --- a/mm/madvise.c > > > +++ b/mm/madvise.c > > > @@ -122,8 +122,7 @@ static int replace_vma_anon_name(struct vm_area_struct *vma, > > > if (anon_vma_name_eq(orig_name, anon_name)) > > > return 0; > > > > > > - anon_vma_name_get(anon_name); > > > - vma->anon_name = anon_name; > > > + vma->anon_name = anon_vma_name_reuse(anon_name); > > > anon_vma_name_put(orig_name); > > > > > > return 0; > > > -- > > > 2.35.1.473.g83b2b277ed-goog > > > > -- > > Michal Hocko > > SUSE Labs
On Tue 22-02-22 19:02:08, Suren Baghdasaryan wrote: > On Tue, Feb 22, 2022 at 7:56 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Tue, Feb 22, 2022 at 1:17 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 21-02-22 21:40:24, Suren Baghdasaryan wrote: > > > > A deep process chain with many vmas could grow really high. > > > > > > This would really benefit from some numbers. With default > > > sysctl_max_map_count (64k) and default pid_max (32k) the INT_MAX could > > > be theoretically reached but I find it impractical because not all vmas > > > can be anonymous same as all available pids can be consumed for a > > > theoretical attack (if my counting is proper). > > > On the other hand any non-default configuration with any of the values > > > increased could hit this theoretically. > > > > re: This would really benefit from some numbers > > Should I just add the details you provided above into the description? > > Would that suffice? > > Hmm. According to the defaults you posted, with max number of > processes being 32k and max number of vmas per process 64k, the max > number of vmas in the system is 2147450880. That's 32767 less than > REFCOUNT_MAX=INT_MAX (2147483647) and 1073774592 less than > REFCOUNT_SATURATED (3221225472). So with those defaults we should > never hit these limits. Are we adding this protection for systems that > set non-default higher limits or am I miscalculating something? Yeah, I guess this should be the message the changelog should be sending.
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 70b619442d56..b189e2638843 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -156,15 +156,25 @@ static inline void anon_vma_name_get(struct anon_vma_name *anon_name) extern void anon_vma_name_put(struct anon_vma_name *anon_name); +static inline +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) +{ + /* Prevent anon_name refcount saturation early on */ + if (kref_read(&anon_name->kref) < INT_MAX) { + anon_vma_name_get(anon_name); + return anon_name; + + } + return anon_vma_name_alloc(anon_name->name); +} + static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, struct vm_area_struct *new_vma) { struct anon_vma_name *anon_name = vma_anon_name(orig_vma); - if (anon_name) { - anon_vma_name_get(anon_name); - new_vma->anon_name = anon_name; - } + if (anon_name) + new_vma->anon_name = anon_vma_name_reuse(anon_name); } static inline void free_vma_anon_name(struct vm_area_struct *vma) diff --git a/mm/madvise.c b/mm/madvise.c index f81d62d8ce9b..a395884aeecb 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -122,8 +122,7 @@ static int replace_vma_anon_name(struct vm_area_struct *vma, if (anon_vma_name_eq(orig_name, anon_name)) return 0; - anon_vma_name_get(anon_name); - vma->anon_name = anon_name; + vma->anon_name = anon_vma_name_reuse(anon_name); anon_vma_name_put(orig_name); return 0;
A deep process chain with many vmas could grow really high. kref refcounting interface used in anon_vma_name structure will detect a counter overflow when it reaches REFCOUNT_SATURATED value but will only generate a warning about broken refcounting. To ensure anon_vma_name refcount does not overflow, stop anon_vma_name sharing when the refcount reaches INT_MAX, which still leaves INT_MAX/2 values before the counter reaches REFCOUNT_SATURATED. This should provide enough headroom for raising the refcounts temporarily. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/mm_inline.h | 18 ++++++++++++++---- mm/madvise.c | 3 +-- 2 files changed, 15 insertions(+), 6 deletions(-)