Message ID | 20230103072807.19578-1-jaewon31.kim@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | page_alloc: avoid the negative free for meminfo available | expand |
On Tue, Jan 03, 2023 at 04:28:07PM +0900, Jaewon Kim wrote: > The totalreserve_pages could be higher than the free because of > watermark high or watermark boost. Handle this situation and fix it to 0 > free size. > > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > --- > mm/page_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 218b28ee49ed..e510ae83d5f3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5948,6 +5948,8 @@ long si_mem_available(void) > * without causing swapping or OOM. > */ > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; > + if (available < 0) > + available = 0; > > /* > * Not all the page cache can be freed, otherwise the system will > -- > 2.17.1 > We already reset to zero at the end of the function, wouldn't resetting to zero here potentially skew the result?
>--------- Original Message --------- >Sender : Lorenzo Stoakes?<lstoakes@gmail.com> >Date : 2023-01-03 16:35 (GMT+9) >Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available >? >On Tue, Jan 03, 2023 at 04:28:07PM +0900, Jaewon Kim wrote: >> The totalreserve_pages could be higher than the free because of >> watermark high or watermark boost. Handle this situation and fix it to 0 >> free size. >> >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> >> --- >> ?mm/page_alloc.c | 2 ++ >> ?1 file changed, 2 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 218b28ee49ed..e510ae83d5f3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5948,6 +5948,8 @@ long si_mem_available(void) >> ? ? ? ? ? * without causing swapping or OOM. >> ? ? ? ? ? */ >> ? ? ? ? ?available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; >> + ? ? ? ?if (available < 0) >> + ? ? ? ? ? ? ? ?available = 0; >> >> ? ? ? ? ?/* >> ? ? ? ? ? * Not all the page cache can be freed, otherwise the system will >> -- >> 2.17.1 >> > >We already reset to zero at the end of the function, wouldn't resetting to zero >here potentially skew the result? > Hello I did not mean the negative of the final available, we should account the actual size by removing some improper portion of it. The free should be not negative in that perspective. If negative, other parts like pagecache an reclailable would be decreased. Actually pagecache and reclaimable are caculated with min, so I think reseting to zero at the end the function is not necessary. br
On Tue 03-01-23 16:28:07, Jaewon Kim wrote: > The totalreserve_pages could be higher than the free because of > watermark high or watermark boost. Handle this situation and fix it to 0 > free size. What is the actual problem you are trying to address by this change? > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > --- > mm/page_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 218b28ee49ed..e510ae83d5f3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5948,6 +5948,8 @@ long si_mem_available(void) > * without causing swapping or OOM. > */ > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; > + if (available < 0) > + available = 0; > > /* > * Not all the page cache can be freed, otherwise the system will > -- > 2.17.1
>On Tue 03-01-23 16:28:07, Jaewon Kim wrote: >> The totalreserve_pages could be higher than the free because of >> watermark high or watermark boost. Handle this situation and fix it to 0 >> free size. > >What is the actual problem you are trying to address by this change? Hello As described on the original commit, 34e431b0ae39 /proc/meminfo: provide estimated available memory mm is tring to provide the avaiable memory to user space. But if free is negative, the available memory shown to userspace would be shown smaller thatn the actual available size. The userspace may do unwanted memory shrinking actions like process kills. I think the logic sholud account the positive size only. BR > >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> >> --- >> mm/page_alloc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 218b28ee49ed..e510ae83d5f3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5948,6 +5948,8 @@ long si_mem_available(void) >> * without causing swapping or OOM. >> */ >> available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; >> + if (available < 0) >> + available = 0; >> >> /* >> * Not all the page cache can be freed, otherwise the system will >> -- >> 2.17.1 > >-- >Michal Hocko >SUSE Labs --------- Original Message --------- Sender : Michal Hocko <mhocko@suse.com> Date : 2023-01-03 17:03 (GMT+9) Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available On Tue 03-01-23 16:28:07, Jaewon Kim wrote: > The totalreserve_pages could be higher than the free because of > watermark high or watermark boost. Handle this situation and fix it to 0 > free size. What is the actual problem you are trying to address by this change? > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > --- > mm/page_alloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 218b28ee49ed..e510ae83d5f3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5948,6 +5948,8 @@ long si_mem_available(void) > * without causing swapping or OOM. > */ > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; > + if (available < 0) > + available = 0; > > /* > * Not all the page cache can be freed, otherwise the system will > -- > 2.17.1
On Tue 03-01-23 17:20:08, 김재원 wrote: > >On Tue 03-01-23 16:28:07, Jaewon Kim wrote: > >> The totalreserve_pages could be higher than the free because of > >> watermark high or watermark boost. Handle this situation and fix it to 0 > >> free size. > > > >What is the actual problem you are trying to address by this change? > > Hello > > As described on the original commit, > 34e431b0ae39 /proc/meminfo: provide estimated available memory > mm is tring to provide the avaiable memory to user space. > > But if free is negative, the available memory shown to userspace > would be shown smaller thatn the actual available size. The userspace > may do unwanted memory shrinking actions like process kills. Do you have any specific example? Have you seen this happening in practice or is this based on the code inspection? Also does this patch actually fix anything? Say the system is really struggling and we are under min watermark. Shouldn't that lead to Available to be reported as 0 without even looking at other counters? > I think the logic sholud account the positive size only. > > BR > > > > >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > >> --- > >> mm/page_alloc.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 218b28ee49ed..e510ae83d5f3 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -5948,6 +5948,8 @@ long si_mem_available(void) > >> * without causing swapping or OOM. > >> */ > >> available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; > >> + if (available < 0) > >> + available = 0; > >> > >> /* > >> * Not all the page cache can be freed, otherwise the system will > >> -- > >> 2.17.1 > > > >-- > >Michal Hocko > >SUSE Labs > > > > > --------- Original Message --------- > Sender : Michal Hocko <mhocko@suse.com> > Date : 2023-01-03 17:03 (GMT+9) > Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available > > On Tue 03-01-23 16:28:07, Jaewon Kim wrote: > > The totalreserve_pages could be higher than the free because of > > watermark high or watermark boost. Handle this situation and fix it to 0 > > free size. > > What is the actual problem you are trying to address by this change? > > > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > > --- > > mm/page_alloc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 218b28ee49ed..e510ae83d5f3 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5948,6 +5948,8 @@ long si_mem_available(void) > > * without causing swapping or OOM. > > */ > > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; > > + if (available < 0) > > + available = 0; > > > > /* > > * Not all the page cache can be freed, otherwise the system will > > -- > > 2.17.1 > > -- > Michal Hocko > SUSE Labs >
>> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote: >> >> The totalreserve_pages could be higher than the free because of >> >> watermark high or watermark boost. Handle this situation and fix it to 0 >> >> free size. >> > >> >What is the actual problem you are trying to address by this change? >> >> Hello >> >> As described on the original commit, >> 34e431b0ae39 /proc/meminfo: provide estimated available memory >> mm is tring to provide the avaiable memory to user space. >> >> But if free is negative, the available memory shown to userspace >> would be shown smaller thatn the actual available size. The userspace >> may do unwanted memory shrinking actions like process kills. > >Do you have any specific example? Have you seen this happening in >practice or is this based on the code inspection? I found this from a device using v5.10 based kernel. Actually the log was printed by user space in its format after reading /proc/meminfo. MemFree 38220 KB MemAvailable 90008 KB Active(file) 137116 KB Inactive(file) 124128 KB SReclaimable 100960 KB Here's /proc/zoneinfo for wmark info. ------ ZONEINFO (/proc/zoneinfo) ------ Node 0, zone DMA32 pages free 17059 min 862 low 9790 high 18718 spanned 524288 present 497920 managed 413348 Node 0, zone Normal pages free 12795 min 1044 low 11855 high 22666 spanned 8388608 present 524288 managed 500548 The pagecache at this time, seems to be 174,664 KB. pagecache -= min(pagecache / 2, wmark_low) We also need to add the reclaimable and the actual free on it to be MemAvaiable. The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only because the big wmark high 165,536 seems to be used. > >Also does this patch actually fix anything? Say the system is really >struggling and we are under min watermark. Shouldn't that lead to >Available to be reported as 0 without even looking at other counters? > Sorry but I did not understand, this mis-calculation can be happened above the min watermark. Do you think the wmark high should be extracted all the time even if the free is negative? >> I think the logic sholud account the positive size only. >> >> BR >> >> > >> >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> >> >> --- >> >> mm/page_alloc.c | 2 ++ >> >> 1 file changed, 2 insertions(+) >> >> >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> >> index 218b28ee49ed..e510ae83d5f3 100644 >> >> --- a/mm/page_alloc.c >> >> +++ b/mm/page_alloc.c >> >> @@ -5948,6 +5948,8 @@ long si_mem_available(void) >> >> * without causing swapping or OOM. >> >> */ >> >> available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; >> >> + if (available < 0) >> >> + available = 0; >> >> >> >> /* >> >> * Not all the page cache can be freed, otherwise the system will >> >> -- >> >> 2.17.1 >> > >> >-- >> >Michal Hocko >> >SUSE Labs >> >> >> >> >> --------- Original Message --------- >> Sender : Michal Hocko <mhocko@suse.com> >> Date : 2023-01-03 17:03 (GMT+9) >> Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available >> >> On Tue 03-01-23 16:28:07, Jaewon Kim wrote: >> > The totalreserve_pages could be higher than the free because of >> > watermark high or watermark boost. Handle this situation and fix it to 0 >> > free size. >> >> What is the actual problem you are trying to address by this change? >> >> > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> >> > --- >> > mm/page_alloc.c | 2 ++ >> > 1 file changed, 2 insertions(+) >> > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> > index 218b28ee49ed..e510ae83d5f3 100644 >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -5948,6 +5948,8 @@ long si_mem_available(void) >> > * without causing swapping or OOM. >> > */ >> > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; >> > + if (available < 0) >> > + available = 0; >> > >> > /* >> > * Not all the page cache can be freed, otherwise the system will >> > -- >> > 2.17.1 >> >> -- >> Michal Hocko >> SUSE Labs >>
On Tue 03-01-23 18:22:32, 김재원 wrote: > >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote: > >> >> The totalreserve_pages could be higher than the free because of > >> >> watermark high or watermark boost. Handle this situation and fix it to 0 > >> >> free size. > >> > > >> >What is the actual problem you are trying to address by this change? > >> > >> Hello > >> > >> As described on the original commit, > >> 34e431b0ae39 /proc/meminfo: provide estimated available memory > >> mm is tring to provide the avaiable memory to user space. > >> > >> But if free is negative, the available memory shown to userspace > >> would be shown smaller thatn the actual available size. The userspace > >> may do unwanted memory shrinking actions like process kills. > > > >Do you have any specific example? Have you seen this happening in > >practice or is this based on the code inspection? > > I found this from a device using v5.10 based kernel. > Actually the log was printed by user space in its format after reading /proc/meminfo. > > MemFree 38220 KB > MemAvailable 90008 KB > Active(file) 137116 KB > Inactive(file) 124128 KB > SReclaimable 100960 KB > > Here's /proc/zoneinfo for wmark info. > > ------ ZONEINFO (/proc/zoneinfo) ------ > Node 0, zone DMA32 > pages free 17059 > min 862 > low 9790 > high 18718 > spanned 524288 > present 497920 > managed 413348 > Node 0, zone Normal > pages free 12795 > min 1044 > low 11855 > high 22666 > spanned 8388608 > present 524288 > managed 500548 > > The pagecache at this time, seems to be 174,664 KB. > pagecache -= min(pagecache / 2, wmark_low) > We also need to add the reclaimable and the actual free on it to be MemAvaiable. > > The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only > because the big wmark high 165,536 seems to be used. How have you concluded that? Are you saying that a userspace would be behaving more sanely when considering more memory to be available? Please see more on the semantics below. > >Also does this patch actually fix anything? Say the system is really > >struggling and we are under min watermark. Shouldn't that lead to > >Available to be reported as 0 without even looking at other counters? > > > > Sorry but I did not understand, What I meant here is that the core of the high level definition says: "An estimate of how much memory is available for starting new applications, without swapping." If the system is close enough to watermarks that NR_FREE_PAGES < reserves then it is likely that further memory allocations will not do without reclaim and potentially swapout. So the question really is whether just clamping the value to 0 is actually making MemAvailable more "correct"? See my point? The actual value is never going to be lazer cut precise. Close to watermark behavior will vary wildly depending on the memory reclaimability. Kswapd might easily keep up with memory demand but it also could get stuck. MemAvailable should be considered a hint rather than an exact value IMHO.
>> >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote: >> >> >> The totalreserve_pages could be higher than the free because of >> >> >> watermark high or watermark boost. Handle this situation and fix it to 0 >> >> >> free size. >> >> > >> >> >What is the actual problem you are trying to address by this change? >> >> >> >> Hello >> >> >> >> As described on the original commit, >> >> 34e431b0ae39 /proc/meminfo: provide estimated available memory >> >> mm is tring to provide the avaiable memory to user space. >> >> >> >> But if free is negative, the available memory shown to userspace >> >> would be shown smaller thatn the actual available size. The userspace >> >> may do unwanted memory shrinking actions like process kills. >> > >> >Do you have any specific example? Have you seen this happening in >> >practice or is this based on the code inspection? >> >> I found this from a device using v5.10 based kernel. >> Actually the log was printed by user space in its format after reading /proc/meminfo. >> >> MemFree 38220 KB >> MemAvailable 90008 KB >> Active(file) 137116 KB >> Inactive(file) 124128 KB >> SReclaimable 100960 KB >> >> Here's /proc/zoneinfo for wmark info. >> >> ------ ZONEINFO (/proc/zoneinfo) ------ >> Node 0, zone DMA32 >> pages free 17059 >> min 862 >> low 9790 >> high 18718 >> spanned 524288 >> present 497920 >> managed 413348 >> Node 0, zone Normal >> pages free 12795 >> min 1044 >> low 11855 >> high 22666 >> spanned 8388608 >> present 524288 >> managed 500548 >> >> The pagecache at this time, seems to be 174,664 KB. >> pagecache -= min(pagecache / 2, wmark_low) >> We also need to add the reclaimable and the actual free on it to be MemAvaiable. >> >> The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only >> because the big wmark high 165,536 seems to be used. > >How have you concluded that? Are you saying that a userspace would be >behaving more sanely when considering more memory to be available? >Please see more on the semantics below. > >> >Also does this patch actually fix anything? Say the system is really >> >struggling and we are under min watermark. Shouldn't that lead to >> >Available to be reported as 0 without even looking at other counters? >> > >> >> Sorry but I did not understand, > >What I meant here is that the core of the high level definition says: >"An estimate of how much memory is available for starting new >applications, without swapping." If the system is close enough to watermarks >that NR_FREE_PAGES < reserves then it is likely that further memory >allocations will not do without reclaim and potentially swapout. Yes reclaim would be needed in that case. I think it is just a matter of perspective. If I follow you, I think, the totalreserve_pages should be considered as must-have free size. > >So the question really is whether just clamping the value to 0 is >actually making MemAvailable more "correct"? See my point? > >The actual value is never going to be lazer cut precise. Close to >watermark behavior will vary wildly depending on the memory >reclaimability. Kswapd might easily keep up with memory demand but it >also could get stuck. MemAvailable should be considered a hint rather >than an exact value IMHO. Yeah correct, it is not perfect. I will drop my patch. It was nice discussion. Thank you
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 218b28ee49ed..e510ae83d5f3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5948,6 +5948,8 @@ long si_mem_available(void) * without causing swapping or OOM. */ available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; + if (available < 0) + available = 0; /* * Not all the page cache can be freed, otherwise the system will
The totalreserve_pages could be higher than the free because of watermark high or watermark boost. Handle this situation and fix it to 0 free size. Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> --- mm/page_alloc.c | 2 ++ 1 file changed, 2 insertions(+)