Message ID | 20231108065818.19932-1-link@vivo.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce unbalance proactive reclaim | expand |
Huan Yang <link@vivo.com> writes: > In some cases, we need to selectively reclaim file pages or anonymous > pages in an unbalanced manner. > > For example, when an application is pushed to the background and frozen, > it may not be opened for a long time, and we can safely reclaim the > application's anonymous pages, but we do not want to touch the file pages. > > This patchset extends the proactive reclaim interface to achieve > unbalanced reclamation. Users can control the reclamation tendency by > inputting swappiness under the original interface. Specifically, users > can input special values to extremely reclaim specific pages. From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup swappiness. So you need to add that firstly? > Example: > echo "1G" 200 > memory.reclaim (only reclaim anon) > echo "1G" 0 > memory.reclaim (only reclaim file) > echo "1G" 1 > memory.reclaim (only reclaim file) > > Note that when performing unbalanced reclamation, the cgroup swappiness > will be temporarily adjusted dynamically to the input value. Therefore, > if the cgroup swappiness is further modified during runtime, there may > be some errors. If cgroup swappiness will be adjusted temporarily, why not just change it via a script before/after proactive reclaiming? > However, this is acceptable because the interface is dynamically called > by the user and the timing should be controlled by the user. > > This patchset did not implement the type-based reclamation as expected > in the documentation.(anon or file) Because in addition to extreme unbalanced > reclamation, this patchset can also adapt to the reclamation tendency > allocated according to swappiness, which is more flexible. > > Self test > ======== > After applying the following patches and myself debug patch, my self-test > results are as follows: > > 1. LRU test > =========== > a. Anon unbalance reclaim > ``` > cat memory.stat | grep anon > inactive_anon 7634944 > active_anon 7741440 > > echo "200M" 200 > memory.reclaim > > cat memory.stat | grep anon > inactive_anon 0 > active_anon 0 > > cat memory.reclaim_stat_summary(self debug interface) > [22368]sh total reclaimed 0 file, 3754 anon, covered item=0 > ``` > > b. File unbalance reclaim > ``` > cat memory.stat | grep file > inactive_file 82862080 > active_file 48664576 > > echo "100M" 0 > memory.reclaim > cat memory.stat | grep file > inactive_file 34164736 > active_file 18370560 > > cat memory.reclaim_stat_summary(self debug interface) > [22368]sh total reclaimed 13732 file, 0 anon, covered item=0 > ``` > > 2. MGLRU test > ============ > a. Anon unbalance reclaim > ``` > echo y > /sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0003 > > cat memory.stat | grep anon > inactive_anon 17653760 > active_anon 1740800 > > echo "100M" 200 > memory.reclaim > > cat memory.reclaim_stat_summary > [8251]sh total reclaimed 0 file, 5393 anon, covered item=0 > ``` > > b. File unbalance reclaim > ``` > cat memory.stat | grep file > inactive_file 17858560 > active_file 5943296 > > echo "100M" 0 > memory.reclaim > > cat memory.stat | grep file > inactive_file 491520 > active_file 2764800 > cat memory.reclaim_stat_summary > [8251]sh total reclaimed 5230 file, 0 anon, covered item=0 > ``` > > Patch 1-3 implement the functionality described above. > Patch 4 aims to implement proactive reclamation to the cgroupv1 interface > for use on Android. > > Huan Yang (4): > mm: vmscan: LRU unbalance cgroup reclaim > mm: multi-gen LRU: MGLRU unbalance reclaim > mm: memcg: implement unbalance proactive reclaim > mm: memcg: apply proactive reclaim into cgroupv1 We will not add new features to cgroupv1 in upstream. > .../admin-guide/cgroup-v1/memory.rst | 38 +++++- > Documentation/admin-guide/cgroup-v2.rst | 16 ++- > include/linux/swap.h | 1 + > mm/memcontrol.c | 126 ++++++++++++------ > mm/vmscan.c | 38 +++++- > 5 files changed, 169 insertions(+), 50 deletions(-) -- Best Regards, Huang, Ying
HI Huang, Ying Thanks for reply. 在 2023/11/8 15:35, Huang, Ying 写道: > Huan Yang <link@vivo.com> writes: > >> In some cases, we need to selectively reclaim file pages or anonymous >> pages in an unbalanced manner. >> >> For example, when an application is pushed to the background and frozen, >> it may not be opened for a long time, and we can safely reclaim the >> application's anonymous pages, but we do not want to touch the file pages. >> >> This patchset extends the proactive reclaim interface to achieve >> unbalanced reclamation. Users can control the reclamation tendency by >> inputting swappiness under the original interface. Specifically, users >> can input special values to extremely reclaim specific pages. > From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup > swappiness. So you need to add that firstly? Sorry for this mistake, we always work on cgroupv1, so, not notice this commit 4550c4e, thank your for point that. I see this commit comment that `that's a different discussion`, but, to implements this, I will try add. > >> Example: >> echo "1G" 200 > memory.reclaim (only reclaim anon) >> echo "1G" 0 > memory.reclaim (only reclaim file) >> echo "1G" 1 > memory.reclaim (only reclaim file) >> >> Note that when performing unbalanced reclamation, the cgroup swappiness >> will be temporarily adjusted dynamically to the input value. Therefore, >> if the cgroup swappiness is further modified during runtime, there may >> be some errors. > If cgroup swappiness will be adjusted temporarily, why not just change > it via a script before/after proactive reclaiming? IMO, this unbalance reclaim only takes effect for a single command, so if it is pre-set using a script, the judgment of the reclamation tendency may become complicated. So, do you mean avoid use cgroup swappiness, just type anon or file to control this extreme unbalanced reclamation? > >> However, this is acceptable because the interface is dynamically called >> by the user and the timing should be controlled by the user. >> >> This patchset did not implement the type-based reclamation as expected >> in the documentation.(anon or file) Because in addition to extreme unbalanced >> reclamation, this patchset can also adapt to the reclamation tendency >> allocated according to swappiness, which is more flexible. >> >> Self test >> ======== >> After applying the following patches and myself debug patch, my self-test >> results are as follows: >> >> 1. LRU test >> =========== >> a. Anon unbalance reclaim >> ``` >> cat memory.stat | grep anon >> inactive_anon 7634944 >> active_anon 7741440 >> >> echo "200M" 200 > memory.reclaim >> >> cat memory.stat | grep anon >> inactive_anon 0 >> active_anon 0 >> >> cat memory.reclaim_stat_summary(self debug interface) >> [22368]sh total reclaimed 0 file, 3754 anon, covered item=0 >> ``` >> >> b. File unbalance reclaim >> ``` >> cat memory.stat | grep file >> inactive_file 82862080 >> active_file 48664576 >> >> echo "100M" 0 > memory.reclaim >> cat memory.stat | grep file >> inactive_file 34164736 >> active_file 18370560 >> >> cat memory.reclaim_stat_summary(self debug interface) >> [22368]sh total reclaimed 13732 file, 0 anon, covered item=0 >> ``` >> >> 2. MGLRU test >> ============ >> a. Anon unbalance reclaim >> ``` >> echo y > /sys/kernel/mm/lru_gen/enabled >> cat /sys/kernel/mm/lru_gen/enabled >> 0x0003 >> >> cat memory.stat | grep anon >> inactive_anon 17653760 >> active_anon 1740800 >> >> echo "100M" 200 > memory.reclaim >> >> cat memory.reclaim_stat_summary >> [8251]sh total reclaimed 0 file, 5393 anon, covered item=0 >> ``` >> >> b. File unbalance reclaim >> ``` >> cat memory.stat | grep file >> inactive_file 17858560 >> active_file 5943296 >> >> echo "100M" 0 > memory.reclaim >> >> cat memory.stat | grep file >> inactive_file 491520 >> active_file 2764800 >> cat memory.reclaim_stat_summary >> [8251]sh total reclaimed 5230 file, 0 anon, covered item=0 >> ``` >> >> Patch 1-3 implement the functionality described above. >> Patch 4 aims to implement proactive reclamation to the cgroupv1 interface >> for use on Android. >> >> Huan Yang (4): >> mm: vmscan: LRU unbalance cgroup reclaim >> mm: multi-gen LRU: MGLRU unbalance reclaim >> mm: memcg: implement unbalance proactive reclaim >> mm: memcg: apply proactive reclaim into cgroupv1 > We will not add new features to cgroupv1 in upstream. Thx for point that. If this feature is worth further updating, the next patchset will remove this patch. > >> .../admin-guide/cgroup-v1/memory.rst | 38 +++++- >> Documentation/admin-guide/cgroup-v2.rst | 16 ++- >> include/linux/swap.h | 1 + >> mm/memcontrol.c | 126 ++++++++++++------ >> mm/vmscan.c | 38 +++++- >> 5 files changed, 169 insertions(+), 50 deletions(-) > -- > Best Regards, > Huang, Ying Thanks, Huan Yang
+Wei Xu +David Rientjes On Tue, Nov 7, 2023 at 10:59 PM Huan Yang <link@vivo.com> wrote: > > In some cases, we need to selectively reclaim file pages or anonymous > pages in an unbalanced manner. > > For example, when an application is pushed to the background and frozen, > it may not be opened for a long time, and we can safely reclaim the > application's anonymous pages, but we do not want to touch the file pages. > > This patchset extends the proactive reclaim interface to achieve > unbalanced reclamation. Users can control the reclamation tendency by > inputting swappiness under the original interface. Specifically, users > can input special values to extremely reclaim specific pages. I proposed this a while back: https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ The takeaway from the discussion was that swappiness is not the right way to do this. We can add separate arguments to specify types of memory to reclaim, as Roman suggested in that thread. I had some patches lying around to do that at some point, I can dig them up if that's helpful, but they are probably based on a very old kernel now, and before MGLRU landed. IIRC it wasn't very difficult, I think I added anon/file/shrinkers bits to struct scan_control and then plumbed them through to memory.reclaim. > > Example: > echo "1G" 200 > memory.reclaim (only reclaim anon) > echo "1G" 0 > memory.reclaim (only reclaim file) > echo "1G" 1 > memory.reclaim (only reclaim file) The type of interface here is nested-keyed, so if we add arguments they need to be in key=value format. Example: echo 1G swappiness=200 > memory.reclaim As I mentioned above though, I don't think swappiness is the right way of doing this. Also, without swappiness, I don't think there's a v1 vs v2 dilemma here. memory.reclaim can work as-is in cgroup v1, it just needs to be exposed there.
Huan Yang <link@vivo.com> writes: > HI Huang, Ying > > Thanks for reply. > > 在 2023/11/8 15:35, Huang, Ying 写道: >> Huan Yang <link@vivo.com> writes: >> >>> In some cases, we need to selectively reclaim file pages or anonymous >>> pages in an unbalanced manner. >>> >>> For example, when an application is pushed to the background and frozen, >>> it may not be opened for a long time, and we can safely reclaim the >>> application's anonymous pages, but we do not want to touch the file pages. >>> >>> This patchset extends the proactive reclaim interface to achieve >>> unbalanced reclamation. Users can control the reclamation tendency by >>> inputting swappiness under the original interface. Specifically, users >>> can input special values to extremely reclaim specific pages. >> From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup >> swappiness. So you need to add that firstly? > Sorry for this mistake, we always work on cgroupv1, so, not notice > this commit 4550c4e, thank your for point that. > > I see this commit comment that `that's a different discussion`, but, > to implements this, I will try add. > >> >>> Example: >>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>> echo "1G" 0 > memory.reclaim (only reclaim file) >>> echo "1G" 1 > memory.reclaim (only reclaim file) >>> >>> Note that when performing unbalanced reclamation, the cgroup swappiness >>> will be temporarily adjusted dynamically to the input value. Therefore, >>> if the cgroup swappiness is further modified during runtime, there may >>> be some errors. >> If cgroup swappiness will be adjusted temporarily, why not just change >> it via a script before/after proactive reclaiming? > IMO, this unbalance reclaim only takes effect for a single command, > so if it is pre-set using a script, the judgment of the reclamation tendency > may become complicated. If swappiness == 0, then we will only reclaim file pages. If swappiness == 200, then we may still reclaim file pages. So you need a way to reclaim only anon pages? If so, can we use some special swappiness value to specify that? I don't know whether use 200 will cause regression. If so, we may need some other value, e.g. >= 65536. > So, do you mean avoid use cgroup swappiness, just type anon or file to > control > this extreme unbalanced reclamation? > >> >>> However, this is acceptable because the interface is dynamically called >>> by the user and the timing should be controlled by the user. >>> >>> This patchset did not implement the type-based reclamation as expected >>> in the documentation.(anon or file) Because in addition to extreme unbalanced >>> reclamation, this patchset can also adapt to the reclamation tendency >>> allocated according to swappiness, which is more flexible. >>> -- Best Regards, Huang, Ying
On Wed, Nov 8, 2023 at 12:11 AM Huang, Ying <ying.huang@intel.com> wrote: > > Huan Yang <link@vivo.com> writes: > > > HI Huang, Ying > > > > Thanks for reply. > > > > 在 2023/11/8 15:35, Huang, Ying 写道: > >> Huan Yang <link@vivo.com> writes: > >> > >>> In some cases, we need to selectively reclaim file pages or anonymous > >>> pages in an unbalanced manner. > >>> > >>> For example, when an application is pushed to the background and frozen, > >>> it may not be opened for a long time, and we can safely reclaim the > >>> application's anonymous pages, but we do not want to touch the file pages. > >>> > >>> This patchset extends the proactive reclaim interface to achieve > >>> unbalanced reclamation. Users can control the reclamation tendency by > >>> inputting swappiness under the original interface. Specifically, users > >>> can input special values to extremely reclaim specific pages. > >> From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup > >> swappiness. So you need to add that firstly? > > Sorry for this mistake, we always work on cgroupv1, so, not notice > > this commit 4550c4e, thank your for point that. > > > > I see this commit comment that `that's a different discussion`, but, > > to implements this, I will try add. > > > >> > >>> Example: > >>> echo "1G" 200 > memory.reclaim (only reclaim anon) > >>> echo "1G" 0 > memory.reclaim (only reclaim file) > >>> echo "1G" 1 > memory.reclaim (only reclaim file) > >>> > >>> Note that when performing unbalanced reclamation, the cgroup swappiness > >>> will be temporarily adjusted dynamically to the input value. Therefore, > >>> if the cgroup swappiness is further modified during runtime, there may > >>> be some errors. > >> If cgroup swappiness will be adjusted temporarily, why not just change > >> it via a script before/after proactive reclaiming? > > IMO, this unbalance reclaim only takes effect for a single command, > > so if it is pre-set using a script, the judgment of the reclamation tendency > > may become complicated. > > If swappiness == 0, then we will only reclaim file pages. If swappiness > == 200, then we may still reclaim file pages. So you need a way to > reclaim only anon pages? > > If so, can we use some special swappiness value to specify that? I > don't know whether use 200 will cause regression. If so, we may need > some other value, e.g. >= 65536. I don't think swappiness is the answer here. This has been discussed a while back, please see my response. As you mentioned, swappiness may be ignored by the kernel in some cases, and its behavior has historically changed before.
在 2023/11/8 16:14, Yosry Ahmed 写道: > On Wed, Nov 8, 2023 at 12:11 AM Huang, Ying <ying.huang@intel.com> wrote: >> Huan Yang <link@vivo.com> writes: >> >>> HI Huang, Ying >>> >>> Thanks for reply. >>> >>> 在 2023/11/8 15:35, Huang, Ying 写道: >>>> Huan Yang <link@vivo.com> writes: >>>> >>>>> In some cases, we need to selectively reclaim file pages or anonymous >>>>> pages in an unbalanced manner. >>>>> >>>>> For example, when an application is pushed to the background and frozen, >>>>> it may not be opened for a long time, and we can safely reclaim the >>>>> application's anonymous pages, but we do not want to touch the file pages. >>>>> >>>>> This patchset extends the proactive reclaim interface to achieve >>>>> unbalanced reclamation. Users can control the reclamation tendency by >>>>> inputting swappiness under the original interface. Specifically, users >>>>> can input special values to extremely reclaim specific pages. >>>> From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup >>>> swappiness. So you need to add that firstly? >>> Sorry for this mistake, we always work on cgroupv1, so, not notice >>> this commit 4550c4e, thank your for point that. >>> >>> I see this commit comment that `that's a different discussion`, but, >>> to implements this, I will try add. >>> >>>>> Example: >>>>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>>>> echo "1G" 0 > memory.reclaim (only reclaim file) >>>>> echo "1G" 1 > memory.reclaim (only reclaim file) >>>>> >>>>> Note that when performing unbalanced reclamation, the cgroup swappiness >>>>> will be temporarily adjusted dynamically to the input value. Therefore, >>>>> if the cgroup swappiness is further modified during runtime, there may >>>>> be some errors. >>>> If cgroup swappiness will be adjusted temporarily, why not just change >>>> it via a script before/after proactive reclaiming? >>> IMO, this unbalance reclaim only takes effect for a single command, >>> so if it is pre-set using a script, the judgment of the reclamation tendency >>> may become complicated. >> If swappiness == 0, then we will only reclaim file pages. If swappiness >> == 200, then we may still reclaim file pages. So you need a way to >> reclaim only anon pages? >> >> If so, can we use some special swappiness value to specify that? I >> don't know whether use 200 will cause regression. If so, we may need >> some other value, e.g. >= 65536. > I don't think swappiness is the answer here. This has been discussed a > while back, please see my response. As you mentioned, swappiness may > be ignored by the kernel in some cases, and its behavior has > historically changed before. For type base, reclaim can have direct tendencies as well. It's good. But, what if we only want to make small adjustments to the reclamation ratio? Of course, sometimes swappiness may become ineffective.
在 2023/11/8 16:00, Yosry Ahmed 写道: > +Wei Xu +David Rientjes > > On Tue, Nov 7, 2023 at 10:59 PM Huan Yang <link@vivo.com> wrote: >> In some cases, we need to selectively reclaim file pages or anonymous >> pages in an unbalanced manner. >> >> For example, when an application is pushed to the background and frozen, >> it may not be opened for a long time, and we can safely reclaim the >> application's anonymous pages, but we do not want to touch the file pages. >> >> This patchset extends the proactive reclaim interface to achieve >> unbalanced reclamation. Users can control the reclamation tendency by >> inputting swappiness under the original interface. Specifically, users >> can input special values to extremely reclaim specific pages. > I proposed this a while back: > > https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ Well to know this, proactive reclaim single type is usefull in our production too. > > The takeaway from the discussion was that swappiness is not the right > way to do this. We can add separate arguments to specify types of > memory to reclaim, as Roman suggested in that thread. I had some > patches lying around to do that at some point, I can dig them up if > that's helpful, but they are probably based on a very old kernel now, > and before MGLRU landed. IIRC it wasn't very difficult, I think I > added anon/file/shrinkers bits to struct scan_control and then plumbed > them through to memory.reclaim. > >> Example: >> echo "1G" 200 > memory.reclaim (only reclaim anon) >> echo "1G" 0 > memory.reclaim (only reclaim file) >> echo "1G" 1 > memory.reclaim (only reclaim file) > The type of interface here is nested-keyed, so if we add arguments > they need to be in key=value format. Example: > > echo 1G swappiness=200 > memory.reclaim Yes, this is better. > > As I mentioned above though, I don't think swappiness is the right way > of doing this. Also, without swappiness, I don't think there's a v1 vs > v2 dilemma here. memory.reclaim can work as-is in cgroup v1, it just > needs to be exposed there. Cgroupv1 can't use memory.reclaim, so, how to exposed it? Reclaim this by pass memcg's ID?
On Wed, Nov 8, 2023 at 12:26 AM Huan Yang <link@vivo.com> wrote: > > > 在 2023/11/8 16:00, Yosry Ahmed 写道: > > +Wei Xu +David Rientjes > > > > On Tue, Nov 7, 2023 at 10:59 PM Huan Yang <link@vivo.com> wrote: > >> In some cases, we need to selectively reclaim file pages or anonymous > >> pages in an unbalanced manner. > >> > >> For example, when an application is pushed to the background and frozen, > >> it may not be opened for a long time, and we can safely reclaim the > >> application's anonymous pages, but we do not want to touch the file pages. > >> > >> This patchset extends the proactive reclaim interface to achieve > >> unbalanced reclamation. Users can control the reclamation tendency by > >> inputting swappiness under the original interface. Specifically, users > >> can input special values to extremely reclaim specific pages. > > I proposed this a while back: > > > > https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ > Well to know this, proactive reclaim single type is usefull in our > production too. > > > > The takeaway from the discussion was that swappiness is not the right > > way to do this. We can add separate arguments to specify types of > > memory to reclaim, as Roman suggested in that thread. I had some > > patches lying around to do that at some point, I can dig them up if > > that's helpful, but they are probably based on a very old kernel now, > > and before MGLRU landed. IIRC it wasn't very difficult, I think I > > added anon/file/shrinkers bits to struct scan_control and then plumbed > > them through to memory.reclaim. > > > >> Example: > >> echo "1G" 200 > memory.reclaim (only reclaim anon) > >> echo "1G" 0 > memory.reclaim (only reclaim file) > >> echo "1G" 1 > memory.reclaim (only reclaim file) > > The type of interface here is nested-keyed, so if we add arguments > > they need to be in key=value format. Example: > > > > echo 1G swappiness=200 > memory.reclaim > Yes, this is better. > > > > As I mentioned above though, I don't think swappiness is the right way > > of doing this. Also, without swappiness, I don't think there's a v1 vs > > v2 dilemma here. memory.reclaim can work as-is in cgroup v1, it just > > needs to be exposed there. > Cgroupv1 can't use memory.reclaim, so, how to exposed it? Reclaim this by > pass memcg's ID? That was mainly about the idea that cgroup v2 does not have per-memcg swappiness, so this proposal seems to be inclined towards v1, at least conceptually. Either way, we need to add memory.reclaim to the v1 files to get it to work on v1. Whether this is acceptable or not is up to the maintainers. I personally don't think it's a problem, it should work as-is for v1.
On Wed, Nov 8, 2023 at 12:22 AM Huan Yang <link@vivo.com> wrote: > > > 在 2023/11/8 16:14, Yosry Ahmed 写道: > > On Wed, Nov 8, 2023 at 12:11 AM Huang, Ying <ying.huang@intel.com> wrote: > >> Huan Yang <link@vivo.com> writes: > >> > >>> HI Huang, Ying > >>> > >>> Thanks for reply. > >>> > >>> 在 2023/11/8 15:35, Huang, Ying 写道: > >>>> Huan Yang <link@vivo.com> writes: > >>>> > >>>>> In some cases, we need to selectively reclaim file pages or anonymous > >>>>> pages in an unbalanced manner. > >>>>> > >>>>> For example, when an application is pushed to the background and frozen, > >>>>> it may not be opened for a long time, and we can safely reclaim the > >>>>> application's anonymous pages, but we do not want to touch the file pages. > >>>>> > >>>>> This patchset extends the proactive reclaim interface to achieve > >>>>> unbalanced reclamation. Users can control the reclamation tendency by > >>>>> inputting swappiness under the original interface. Specifically, users > >>>>> can input special values to extremely reclaim specific pages. > >>>> From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup > >>>> swappiness. So you need to add that firstly? > >>> Sorry for this mistake, we always work on cgroupv1, so, not notice > >>> this commit 4550c4e, thank your for point that. > >>> > >>> I see this commit comment that `that's a different discussion`, but, > >>> to implements this, I will try add. > >>> > >>>>> Example: > >>>>> echo "1G" 200 > memory.reclaim (only reclaim anon) > >>>>> echo "1G" 0 > memory.reclaim (only reclaim file) > >>>>> echo "1G" 1 > memory.reclaim (only reclaim file) > >>>>> > >>>>> Note that when performing unbalanced reclamation, the cgroup swappiness > >>>>> will be temporarily adjusted dynamically to the input value. Therefore, > >>>>> if the cgroup swappiness is further modified during runtime, there may > >>>>> be some errors. > >>>> If cgroup swappiness will be adjusted temporarily, why not just change > >>>> it via a script before/after proactive reclaiming? > >>> IMO, this unbalance reclaim only takes effect for a single command, > >>> so if it is pre-set using a script, the judgment of the reclamation tendency > >>> may become complicated. > >> If swappiness == 0, then we will only reclaim file pages. If swappiness > >> == 200, then we may still reclaim file pages. So you need a way to > >> reclaim only anon pages? > >> > >> If so, can we use some special swappiness value to specify that? I > >> don't know whether use 200 will cause regression. If so, we may need > >> some other value, e.g. >= 65536. > > I don't think swappiness is the answer here. This has been discussed a > > while back, please see my response. As you mentioned, swappiness may > > be ignored by the kernel in some cases, and its behavior has > > historically changed before. > > For type base, reclaim can have direct tendencies as well. It's good. > But, what if > we only want to make small adjustments to the reclamation ratio? > Of course, sometimes swappiness may become ineffective. > Is there a real use case for this? I think it's difficult to reason about swappiness and make small adjustments to the file/anon ratio based on it. I'd prefer a more concrete implementation.
在 2023/11/8 17:00, Yosry Ahmed 写道: > On Wed, Nov 8, 2023 at 12:22 AM Huan Yang <link@vivo.com> wrote: >> >> 在 2023/11/8 16:14, Yosry Ahmed 写道: >>> On Wed, Nov 8, 2023 at 12:11 AM Huang, Ying <ying.huang@intel.com> wrote: >>>> Huan Yang <link@vivo.com> writes: >>>> >>>>> HI Huang, Ying >>>>> >>>>> Thanks for reply. >>>>> >>>>> 在 2023/11/8 15:35, Huang, Ying 写道: >>>>>> Huan Yang <link@vivo.com> writes: >>>>>> >>>>>>> In some cases, we need to selectively reclaim file pages or anonymous >>>>>>> pages in an unbalanced manner. >>>>>>> >>>>>>> For example, when an application is pushed to the background and frozen, >>>>>>> it may not be opened for a long time, and we can safely reclaim the >>>>>>> application's anonymous pages, but we do not want to touch the file pages. >>>>>>> >>>>>>> This patchset extends the proactive reclaim interface to achieve >>>>>>> unbalanced reclamation. Users can control the reclamation tendency by >>>>>>> inputting swappiness under the original interface. Specifically, users >>>>>>> can input special values to extremely reclaim specific pages. >>>>>> From mem_cgroup_swappiness(), cgroupv2 doesn't have per-cgroup >>>>>> swappiness. So you need to add that firstly? >>>>> Sorry for this mistake, we always work on cgroupv1, so, not notice >>>>> this commit 4550c4e, thank your for point that. >>>>> >>>>> I see this commit comment that `that's a different discussion`, but, >>>>> to implements this, I will try add. >>>>> >>>>>>> Example: >>>>>>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>>>>>> echo "1G" 0 > memory.reclaim (only reclaim file) >>>>>>> echo "1G" 1 > memory.reclaim (only reclaim file) >>>>>>> >>>>>>> Note that when performing unbalanced reclamation, the cgroup swappiness >>>>>>> will be temporarily adjusted dynamically to the input value. Therefore, >>>>>>> if the cgroup swappiness is further modified during runtime, there may >>>>>>> be some errors. >>>>>> If cgroup swappiness will be adjusted temporarily, why not just change >>>>>> it via a script before/after proactive reclaiming? >>>>> IMO, this unbalance reclaim only takes effect for a single command, >>>>> so if it is pre-set using a script, the judgment of the reclamation tendency >>>>> may become complicated. >>>> If swappiness == 0, then we will only reclaim file pages. If swappiness >>>> == 200, then we may still reclaim file pages. So you need a way to >>>> reclaim only anon pages? >>>> >>>> If so, can we use some special swappiness value to specify that? I >>>> don't know whether use 200 will cause regression. If so, we may need >>>> some other value, e.g. >= 65536. >>> I don't think swappiness is the answer here. This has been discussed a >>> while back, please see my response. As you mentioned, swappiness may >>> be ignored by the kernel in some cases, and its behavior has >>> historically changed before. >> For type base, reclaim can have direct tendencies as well. It's good. >> But, what if >> we only want to make small adjustments to the reclamation ratio? >> Of course, sometimes swappiness may become ineffective. >> > Is there a real use case for this? I think it's difficult to reason > about swappiness and make small adjustments to the file/anon ratio > based on it. I'd prefer a more concrete implementation. For example, swappiness=170 to try hard reclaim anon, a little pressure to reclaim file(expect reclaim clean file). In theory, this method can help reduce memory pressure. Or else, reclaim 80% anon and trim 5% code file control is good when it is detected that an application has been frozen for a period of time.
在 2023/11/8 16:59, Yosry Ahmed 写道: > On Wed, Nov 8, 2023 at 12:26 AM Huan Yang <link@vivo.com> wrote: >> >> 在 2023/11/8 16:00, Yosry Ahmed 写道: >>> +Wei Xu +David Rientjes >>> >>> On Tue, Nov 7, 2023 at 10:59 PM Huan Yang <link@vivo.com> wrote: >>>> In some cases, we need to selectively reclaim file pages or anonymous >>>> pages in an unbalanced manner. >>>> >>>> For example, when an application is pushed to the background and frozen, >>>> it may not be opened for a long time, and we can safely reclaim the >>>> application's anonymous pages, but we do not want to touch the file pages. >>>> >>>> This patchset extends the proactive reclaim interface to achieve >>>> unbalanced reclamation. Users can control the reclamation tendency by >>>> inputting swappiness under the original interface. Specifically, users >>>> can input special values to extremely reclaim specific pages. >>> I proposed this a while back: >>> >>> https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ >> Well to know this, proactive reclaim single type is usefull in our >> production too. >>> The takeaway from the discussion was that swappiness is not the right >>> way to do this. We can add separate arguments to specify types of >>> memory to reclaim, as Roman suggested in that thread. I had some >>> patches lying around to do that at some point, I can dig them up if >>> that's helpful, but they are probably based on a very old kernel now, >>> and before MGLRU landed. IIRC it wasn't very difficult, I think I >>> added anon/file/shrinkers bits to struct scan_control and then plumbed >>> them through to memory.reclaim. >>> >>>> Example: >>>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>>> echo "1G" 0 > memory.reclaim (only reclaim file) >>>> echo "1G" 1 > memory.reclaim (only reclaim file) >>> The type of interface here is nested-keyed, so if we add arguments >>> they need to be in key=value format. Example: >>> >>> echo 1G swappiness=200 > memory.reclaim >> Yes, this is better. >>> As I mentioned above though, I don't think swappiness is the right way >>> of doing this. Also, without swappiness, I don't think there's a v1 vs >>> v2 dilemma here. memory.reclaim can work as-is in cgroup v1, it just >>> needs to be exposed there. >> Cgroupv1 can't use memory.reclaim, so, how to exposed it? Reclaim this by >> pass memcg's ID? > That was mainly about the idea that cgroup v2 does not have per-memcg > swappiness, so this proposal seems to be inclined towards v1, at least I seem current comments of mem_cgroup_swappiness it is believed that per-memcg swappiness can be added. But, we first need to explain that using swappiness is a very useful way. And in the discussions of your patchset, end that not use it. > conceptually. Either way, we need to add memory.reclaim to the v1 > files to get it to work on v1. Whether this is acceptable or not is up > to the maintainers. I personally don't think it's a problem, it should Yes, but, I understand that cgroup v2 is a trend, so it is understandable that no new interfaces are added to v1. :) Maybe you can promoting the use of cgroupv2 on Android? > work as-is for v1.
On Wed 08-11-23 14:58:11, Huan Yang wrote: > In some cases, we need to selectively reclaim file pages or anonymous > pages in an unbalanced manner. > > For example, when an application is pushed to the background and frozen, > it may not be opened for a long time, and we can safely reclaim the > application's anonymous pages, but we do not want to touch the file pages. Could you explain why? And also why do you need to swap out in that case? > This patchset extends the proactive reclaim interface to achieve > unbalanced reclamation. Users can control the reclamation tendency by > inputting swappiness under the original interface. Specifically, users > can input special values to extremely reclaim specific pages. Other have already touched on this in other replies but v2 doesn't have a per-memcg swappiness > Example: > echo "1G" 200 > memory.reclaim (only reclaim anon) > echo "1G" 0 > memory.reclaim (only reclaim file) > echo "1G" 1 > memory.reclaim (only reclaim file) > > Note that when performing unbalanced reclamation, the cgroup swappiness > will be temporarily adjusted dynamically to the input value. Therefore, > if the cgroup swappiness is further modified during runtime, there may > be some errors. In general this is a bad semantic. The operation shouldn't have side effect that are potentially visible for another operation.
On Wed, 8 Nov 2023 14:58:11 +0800 Huan Yang <link@vivo.com> wrote: > For example, when an application is pushed to the background and frozen, > it may not be opened for a long time, and we can safely reclaim the > application's anonymous pages, but we do not want to touch the file pages. This paragraph is key to the entire patchset and it would benefit from some expanding upon. If the application is dormant, why on earth would we want to evict its text but keep its data around?
在 2023/11/8 22:06, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Wed 08-11-23 14:58:11, Huan Yang wrote: >> In some cases, we need to selectively reclaim file pages or anonymous >> pages in an unbalanced manner. >> >> For example, when an application is pushed to the background and frozen, >> it may not be opened for a long time, and we can safely reclaim the >> application's anonymous pages, but we do not want to touch the file pages. > Could you explain why? And also why do you need to swap out in that > case? When an application is frozen, it usually means that we predict that it will not be used for a long time. In order to proactively save some memory, our strategy will choose to compress the application's private data into zram. And we will also select some of the cold application data that we think is in zram and swap it out. The above operations assume that anonymous pages are private to the application. After the application is frozen, compressing these pages into zram can save memory to some extent without worrying about frequent refaults. And the cost of refaults on zram is lower than that of IO. > >> This patchset extends the proactive reclaim interface to achieve >> unbalanced reclamation. Users can control the reclamation tendency by >> inputting swappiness under the original interface. Specifically, users >> can input special values to extremely reclaim specific pages. > Other have already touched on this in other replies but v2 doesn't have > a per-memcg swappiness > >> Example: >> echo "1G" 200 > memory.reclaim (only reclaim anon) >> echo "1G" 0 > memory.reclaim (only reclaim file) >> echo "1G" 1 > memory.reclaim (only reclaim file) >> >> Note that when performing unbalanced reclamation, the cgroup swappiness >> will be temporarily adjusted dynamically to the input value. Therefore, >> if the cgroup swappiness is further modified during runtime, there may >> be some errors. > In general this is a bad semantic. The operation shouldn't have side > effect that are potentially visible for another operation. So, maybe pass swappiness into sc and keep a single reclamation ensure that swappiness is not changed? Or, it's a bad idea that use swappiness to control unbalance reclaim. > -- > Michal Hocko > SUSE Labs
在 2023/11/9 0:14, Andrew Morton 写道: > [Some people who received this message don't often get email from akpm@linux-foundation.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Wed, 8 Nov 2023 14:58:11 +0800 Huan Yang <link@vivo.com> wrote: > >> For example, when an application is pushed to the background and frozen, >> it may not be opened for a long time, and we can safely reclaim the >> application's anonymous pages, but we do not want to touch the file pages. > This paragraph is key to the entire patchset and it would benefit from > some expanding upon. > > If the application is dormant, why on earth would we want to evict its > text but keep its data around? In fact, we currently use this method to only reclaim application anonymous pages, because we believe that the refault cost of reclaiming anonymous pages is relatively small, especially when using zram and only proactively reclaiming the anonymous pages of frozen applications.
Huan Yang <link@vivo.com> writes: > 在 2023/11/8 22:06, Michal Hocko 写道: >> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> On Wed 08-11-23 14:58:11, Huan Yang wrote: >>> In some cases, we need to selectively reclaim file pages or anonymous >>> pages in an unbalanced manner. >>> >>> For example, when an application is pushed to the background and frozen, >>> it may not be opened for a long time, and we can safely reclaim the >>> application's anonymous pages, but we do not want to touch the file pages. >> Could you explain why? And also why do you need to swap out in that >> case? > When an application is frozen, it usually means that we predict that > it will not be > used for a long time. In order to proactively save some memory, our > strategy will > choose to compress the application's private data into zram. And we > will also > select some of the cold application data that we think is in zram and > swap it out. > > The above operations assume that anonymous pages are private to the > application. If so, is it better only to reclaim private anonymous pages explicitly? Add another option for that? > After the application is frozen, compressing these pages into zram can > save memory > to some extent without worrying about frequent refaults. > > And the cost of refaults on zram is lower than that of IO. If so, swappiness should be high system-wise? -- Best Regards, Huang, Ying > >> >>> This patchset extends the proactive reclaim interface to achieve >>> unbalanced reclamation. Users can control the reclamation tendency by >>> inputting swappiness under the original interface. Specifically, users >>> can input special values to extremely reclaim specific pages. >> Other have already touched on this in other replies but v2 doesn't have >> a per-memcg swappiness >> >>> Example: >>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>> echo "1G" 0 > memory.reclaim (only reclaim file) >>> echo "1G" 1 > memory.reclaim (only reclaim file) >>> >>> Note that when performing unbalanced reclamation, the cgroup swappiness >>> will be temporarily adjusted dynamically to the input value. Therefore, >>> if the cgroup swappiness is further modified during runtime, there may >>> be some errors. >> In general this is a bad semantic. The operation shouldn't have side >> effect that are potentially visible for another operation. > So, maybe pass swappiness into sc and keep a single reclamation ensure that > swappiness is not changed? > Or, it's a bad idea that use swappiness to control unbalance reclaim. >> -- >> Michal Hocko >> SUSE Labs
在 2023/11/9 11:15, Huang, Ying 写道: > [Some people who received this message don't often get email from ying.huang@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Huan Yang <link@vivo.com> writes: > >> 在 2023/11/8 22:06, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Wed 08-11-23 14:58:11, Huan Yang wrote: >>>> In some cases, we need to selectively reclaim file pages or anonymous >>>> pages in an unbalanced manner. >>>> >>>> For example, when an application is pushed to the background and frozen, >>>> it may not be opened for a long time, and we can safely reclaim the >>>> application's anonymous pages, but we do not want to touch the file pages. >>> Could you explain why? And also why do you need to swap out in that >>> case? >> When an application is frozen, it usually means that we predict that >> it will not be >> used for a long time. In order to proactively save some memory, our >> strategy will >> choose to compress the application's private data into zram. And we >> will also >> select some of the cold application data that we think is in zram and >> swap it out. >> >> The above operations assume that anonymous pages are private to the >> application. > If so, is it better only to reclaim private anonymous pages explicitly? Yes, in practice, we only proactively compress anonymous pages and do not want to touch file pages. However, I like the phrase "Provide mechanisms, not strategies". Maybe letter zcache can use well, we can also proactively compress certain file pages at a lower cost. So, maybe give a way to only reclaim page cache is good? > Add another option for that? But, yes, I also believe that providing a way to specify the tendency to reclaim anonymous and file types can achieve a certain degree of flexibility. And swappiness-based control is currently not very accurate. > >> After the application is frozen, compressing these pages into zram can >> save memory >> to some extent without worrying about frequent refaults. >> >> And the cost of refaults on zram is lower than that of IO. > If so, swappiness should be high system-wise? Yes, I agree. Swappiness should not be used to control unbalanced reclamation. Moreover, this patchset actually use flags to control unbalanced reclamation. Therefore, the proactive reclamation interface should receive additional options (such as anon, file) instead of swappiness. > > -- > Best Regards, > Huang, Ying > >>>> This patchset extends the proactive reclaim interface to achieve >>>> unbalanced reclamation. Users can control the reclamation tendency by >>>> inputting swappiness under the original interface. Specifically, users >>>> can input special values to extremely reclaim specific pages. >>> Other have already touched on this in other replies but v2 doesn't have >>> a per-memcg swappiness >>> >>>> Example: >>>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>>> echo "1G" 0 > memory.reclaim (only reclaim file) >>>> echo "1G" 1 > memory.reclaim (only reclaim file) >>>> >>>> Note that when performing unbalanced reclamation, the cgroup swappiness >>>> will be temporarily adjusted dynamically to the input value. Therefore, >>>> if the cgroup swappiness is further modified during runtime, there may >>>> be some errors. >>> In general this is a bad semantic. The operation shouldn't have side >>> effect that are potentially visible for another operation. >> So, maybe pass swappiness into sc and keep a single reclamation ensure that >> swappiness is not changed? >> Or, it's a bad idea that use swappiness to control unbalance reclaim. >>> -- >>> Michal Hocko >>> SUSE Labs
On Thu 09-11-23 09:56:46, Huan Yang wrote: > > 在 2023/11/8 22:06, Michal Hocko 写道: > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Wed 08-11-23 14:58:11, Huan Yang wrote: > > > In some cases, we need to selectively reclaim file pages or anonymous > > > pages in an unbalanced manner. > > > > > > For example, when an application is pushed to the background and frozen, > > > it may not be opened for a long time, and we can safely reclaim the > > > application's anonymous pages, but we do not want to touch the file pages. > > Could you explain why? And also why do you need to swap out in that > > case? > > When an application is frozen, it usually means that we predict that > it will not be used for a long time. In order to proactively save some > memory, our strategy will choose to compress the application's private > data into zram. And we will also select some of the cold application > data that we think is in zram and swap it out. > > The above operations assume that anonymous pages are private to the > application. After the application is frozen, compressing these pages > into zram can save memory to some extent without worrying about > frequent refaults. Why don't you rely on the default reclaim heuristics? In other words do you have any numbers showing that a selective reclaim results in a much better behavior? How do you evaluate that? > > And the cost of refaults on zram is lower than that of IO. > > > > > > > This patchset extends the proactive reclaim interface to achieve > > > unbalanced reclamation. Users can control the reclamation tendency by > > > inputting swappiness under the original interface. Specifically, users > > > can input special values to extremely reclaim specific pages. > > Other have already touched on this in other replies but v2 doesn't have > > a per-memcg swappiness > > > > > Example: > > > echo "1G" 200 > memory.reclaim (only reclaim anon) > > > echo "1G" 0 > memory.reclaim (only reclaim file) > > > echo "1G" 1 > memory.reclaim (only reclaim file) > > > > > > Note that when performing unbalanced reclamation, the cgroup swappiness > > > will be temporarily adjusted dynamically to the input value. Therefore, > > > if the cgroup swappiness is further modified during runtime, there may > > > be some errors. > > In general this is a bad semantic. The operation shouldn't have side > > effect that are potentially visible for another operation. > So, maybe pass swappiness into sc and keep a single reclamation ensure that > swappiness is not changed? That would be a much saner approach. > Or, it's a bad idea that use swappiness to control unbalance reclaim. Memory reclaim is not really obliged to consider swappiness. In fact the actual behavior has changed several times in the past and it is safer to assume this might change in the future again.
On Thu 09-11-23 11:38:56, Huan Yang wrote: [...] > > If so, is it better only to reclaim private anonymous pages explicitly? > Yes, in practice, we only proactively compress anonymous pages and do not > want to touch file pages. If that is the case and this is mostly application centric (which you seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) instead.
HI Michal Hocko, Thanks for your suggestion. 在 2023/11/9 17:57, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu 09-11-23 11:38:56, Huan Yang wrote: > [...] >>> If so, is it better only to reclaim private anonymous pages explicitly? >> Yes, in practice, we only proactively compress anonymous pages and do not >> want to touch file pages. > If that is the case and this is mostly application centric (which you > seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) > instead. Madvise may not be applicable in this scenario.(IMO) This feature is aimed at a core goal, which is to compress the anonymous pages of frozen applications. How to detect that an application is frozen and determine which pages can be safely reclaimed is the responsibility of the policy part. Setting madvise for an application is an active behavior, while the above policy is a passive approach.(If I misunderstood, please let me know if there is a better way to set madvise.) > -- > Michal Hocko > SUSE Labs
On Thu 09-11-23 18:29:03, Huan Yang wrote: > HI Michal Hocko, > > Thanks for your suggestion. > > 在 2023/11/9 17:57, Michal Hocko 写道: > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Thu 09-11-23 11:38:56, Huan Yang wrote: > > [...] > > > > If so, is it better only to reclaim private anonymous pages explicitly? > > > Yes, in practice, we only proactively compress anonymous pages and do not > > > want to touch file pages. > > If that is the case and this is mostly application centric (which you > > seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) > > instead. > Madvise may not be applicable in this scenario.(IMO) > > This feature is aimed at a core goal, which is to compress the anonymous > pages > of frozen applications. > > How to detect that an application is frozen and determine which pages can be > safely reclaimed is the responsibility of the policy part. > > Setting madvise for an application is an active behavior, while the above > policy > is a passive approach.(If I misunderstood, please let me know if there is a > better > way to set madvise.) You are proposing an extension to the pro-active reclaim interface so this is an active behavior pretty much by definition. So I am really not following you here. Your agent can simply scan the address space of the application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) on the private memory is that is really what you want/need.
在 2023/11/9 18:39, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu 09-11-23 18:29:03, Huan Yang wrote: >> HI Michal Hocko, >> >> Thanks for your suggestion. >> >> 在 2023/11/9 17:57, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>> [...] >>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>> want to touch file pages. >>> If that is the case and this is mostly application centric (which you >>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>> instead. >> Madvise may not be applicable in this scenario.(IMO) >> >> This feature is aimed at a core goal, which is to compress the anonymous >> pages >> of frozen applications. >> >> How to detect that an application is frozen and determine which pages can be >> safely reclaimed is the responsibility of the policy part. >> >> Setting madvise for an application is an active behavior, while the above >> policy >> is a passive approach.(If I misunderstood, please let me know if there is a >> better >> way to set madvise.) > You are proposing an extension to the pro-active reclaim interface so > this is an active behavior pretty much by definition. So I am really not > following you here. Your agent can simply scan the address space of the > application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) > on the private memory is that is really what you want/need. There is a key point here. We want to use the grouping policy of memcg to perform proactive reclamation with certain tendencies. Your suggestion is to reclaim memory by scanning the task process space. However, in the mobile field, memory is usually viewed at the granularity of an APP. Therefore, after an APP is frozen, we hope to reclaim memory uniformly according to the pre-grouped APP processes. Of course, as you suggested, madvise can also achieve this, but implementing it in the agent may be more complex.(In terms of achieving the same goal, using memcg to group all the processes of an APP and perform proactive reclamation is simpler than using madvise and scanning multiple processes of an application using an agent?) > > -- > Michal Hocko > SUSE Labs
在 2023/11/9 17:53, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu 09-11-23 09:56:46, Huan Yang wrote: >> 在 2023/11/8 22:06, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Wed 08-11-23 14:58:11, Huan Yang wrote: >>>> In some cases, we need to selectively reclaim file pages or anonymous >>>> pages in an unbalanced manner. >>>> >>>> For example, when an application is pushed to the background and frozen, >>>> it may not be opened for a long time, and we can safely reclaim the >>>> application's anonymous pages, but we do not want to touch the file pages. >>> Could you explain why? And also why do you need to swap out in that >>> case? >> When an application is frozen, it usually means that we predict that >> it will not be used for a long time. In order to proactively save some >> memory, our strategy will choose to compress the application's private >> data into zram. And we will also select some of the cold application >> data that we think is in zram and swap it out. >> >> The above operations assume that anonymous pages are private to the >> application. After the application is frozen, compressing these pages >> into zram can save memory to some extent without worrying about >> frequent refaults. > Why don't you rely on the default reclaim heuristics? In other words do As I mentioned earlier, the madvise approach may not be suitable for my needs. > you have any numbers showing that a selective reclaim results in a much In the mobile field, we have a core metric called application residency. This mechanism can help us improve the application residency if we can provide a good freeze detection and proactive reclamation policy. I can only provide specific data from our internal tests, and it may be older data, and it tested using cgroup v1: In 12G ram phone, app residency improve from 29 to 38. > better behavior? How do you evaluate that? > >> And the cost of refaults on zram is lower than that of IO. >> >> >>>> This patchset extends the proactive reclaim interface to achieve >>>> unbalanced reclamation. Users can control the reclamation tendency by >>>> inputting swappiness under the original interface. Specifically, users >>>> can input special values to extremely reclaim specific pages. >>> Other have already touched on this in other replies but v2 doesn't have >>> a per-memcg swappiness >>> >>>> Example: >>>> echo "1G" 200 > memory.reclaim (only reclaim anon) >>>> echo "1G" 0 > memory.reclaim (only reclaim file) >>>> echo "1G" 1 > memory.reclaim (only reclaim file) >>>> >>>> Note that when performing unbalanced reclamation, the cgroup swappiness >>>> will be temporarily adjusted dynamically to the input value. Therefore, >>>> if the cgroup swappiness is further modified during runtime, there may >>>> be some errors. >>> In general this is a bad semantic. The operation shouldn't have side >>> effect that are potentially visible for another operation. >> So, maybe pass swappiness into sc and keep a single reclamation ensure that >> swappiness is not changed? > That would be a much saner approach. > >> Or, it's a bad idea that use swappiness to control unbalance reclaim. > Memory reclaim is not really obliged to consider swappiness. In fact the > actual behavior has changed several times in the past and it is safer to > assume this might change in the future again. Thank you for the guidance. > > -- > Michal Hocko > SUSE Labs
On Thu 09-11-23 18:50:36, Huan Yang wrote: > > 在 2023/11/9 18:39, Michal Hocko 写道: > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Thu 09-11-23 18:29:03, Huan Yang wrote: > > > HI Michal Hocko, > > > > > > Thanks for your suggestion. > > > > > > 在 2023/11/9 17:57, Michal Hocko 写道: > > > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > On Thu 09-11-23 11:38:56, Huan Yang wrote: > > > > [...] > > > > > > If so, is it better only to reclaim private anonymous pages explicitly? > > > > > Yes, in practice, we only proactively compress anonymous pages and do not > > > > > want to touch file pages. > > > > If that is the case and this is mostly application centric (which you > > > > seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) > > > > instead. > > > Madvise may not be applicable in this scenario.(IMO) > > > > > > This feature is aimed at a core goal, which is to compress the anonymous > > > pages > > > of frozen applications. > > > > > > How to detect that an application is frozen and determine which pages can be > > > safely reclaimed is the responsibility of the policy part. > > > > > > Setting madvise for an application is an active behavior, while the above > > > policy > > > is a passive approach.(If I misunderstood, please let me know if there is a > > > better > > > way to set madvise.) > > You are proposing an extension to the pro-active reclaim interface so > > this is an active behavior pretty much by definition. So I am really not > > following you here. Your agent can simply scan the address space of the > > application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) > > on the private memory is that is really what you want/need. > > There is a key point here. We want to use the grouping policy of memcg > to perform proactive reclamation with certain tendencies. Your > suggestion is to reclaim memory by scanning the task process space. > However, in the mobile field, memory is usually viewed at the > granularity of an APP. OK, sthis is likely a terminology gap on my end. By application you do not really mean a process but rather a whole cgroup. That would have been really useful to be explicit about. > Therefore, after an APP is frozen, we hope to reclaim memory uniformly > according to the pre-grouped APP processes. > > Of course, as you suggested, madvise can also achieve this, but > implementing it in the agent may be more complex.(In terms of > achieving the same goal, using memcg to group all the processes of an > APP and perform proactive reclamation is simpler than using madvise > and scanning multiple processes of an application using an agent?) It might be more involved but the primary question is whether it is usable for the specific use case. Madvise interface is not LRU aware but you are not really talking about that to be a requirement? So it would really help if you go deeper into details on how is the interface actually supposed to be used in your case. Also make sure to exaplain why you cannot use other existing interfaces. For example, why you simply don't decrease the limit of the frozen cgroup and rely on the normal reclaim process to evict the most cold memory? What are you basing your anon vs. file proportion decision on? In other words more details, ideally with some numbers and make sure to describe why existing APIs cannot be used.
On Thu 09-11-23 18:55:09, Huan Yang wrote: > > 在 2023/11/9 17:53, Michal Hocko 写道: > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Thu 09-11-23 09:56:46, Huan Yang wrote: > > > 在 2023/11/8 22:06, Michal Hocko 写道: > > > > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > On Wed 08-11-23 14:58:11, Huan Yang wrote: > > > > > In some cases, we need to selectively reclaim file pages or anonymous > > > > > pages in an unbalanced manner. > > > > > > > > > > For example, when an application is pushed to the background and frozen, > > > > > it may not be opened for a long time, and we can safely reclaim the > > > > > application's anonymous pages, but we do not want to touch the file pages. > > > > Could you explain why? And also why do you need to swap out in that > > > > case? > > > When an application is frozen, it usually means that we predict that > > > it will not be used for a long time. In order to proactively save some > > > memory, our strategy will choose to compress the application's private > > > data into zram. And we will also select some of the cold application > > > data that we think is in zram and swap it out. > > > > > > The above operations assume that anonymous pages are private to the > > > application. After the application is frozen, compressing these pages > > > into zram can save memory to some extent without worrying about > > > frequent refaults. > > Why don't you rely on the default reclaim heuristics? In other words do > As I mentioned earlier, the madvise approach may not be suitable for my > needs. I was asking about default reclaim behavior not madvise here. > > you have any numbers showing that a selective reclaim results in a much > > In the mobile field, we have a core metric called application residency. As already pointed out in other reply, make sure you explain this so that we, who are not active in mobile field, can understand the metric, how it is affected by the tooling relying on this interface. > This mechanism can help us improve the application residency if we can > provide a good freeze detection and proactive reclamation policy. > > I can only provide specific data from our internal tests, and it may > be older data, and it tested using cgroup v1: > > In 12G ram phone, app residency improve from 29 to 38. cgroup v1 is in maintenance mode and new extension would need to pass even a higher feasibility test than v2 based interface. Also make sure that you are testing the current upstream kernel. Also let me stress out that you are proposing an extension to the user visible API and we will have to maintain that for ever. So make sure your justification is solid and understandable.
HI, 在 2023/11/9 20:40, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu 09-11-23 18:50:36, Huan Yang wrote: >> 在 2023/11/9 18:39, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Thu 09-11-23 18:29:03, Huan Yang wrote: >>>> HI Michal Hocko, >>>> >>>> Thanks for your suggestion. >>>> >>>> 在 2023/11/9 17:57, Michal Hocko 写道: >>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>>>> [...] >>>>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>>>> want to touch file pages. >>>>> If that is the case and this is mostly application centric (which you >>>>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>>>> instead. >>>> Madvise may not be applicable in this scenario.(IMO) >>>> >>>> This feature is aimed at a core goal, which is to compress the anonymous >>>> pages >>>> of frozen applications. >>>> >>>> How to detect that an application is frozen and determine which pages can be >>>> safely reclaimed is the responsibility of the policy part. >>>> >>>> Setting madvise for an application is an active behavior, while the above >>>> policy >>>> is a passive approach.(If I misunderstood, please let me know if there is a >>>> better >>>> way to set madvise.) >>> You are proposing an extension to the pro-active reclaim interface so >>> this is an active behavior pretty much by definition. So I am really not >>> following you here. Your agent can simply scan the address space of the >>> application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) >>> on the private memory is that is really what you want/need. >> There is a key point here. We want to use the grouping policy of memcg >> to perform proactive reclamation with certain tendencies. Your >> suggestion is to reclaim memory by scanning the task process space. >> However, in the mobile field, memory is usually viewed at the >> granularity of an APP. > OK, sthis is likely a terminology gap on my end. By application you do > not really mean a process but rather a whole cgroup. That would have > been really useful to be explicit about. I'm sorry for the confusion. But, in reality, the example I gave was just the one we use here. In terms of policy, any reasonable method can be chosen to organize cgroups and reclaim memory with certain tendencies. But, let's continue the discussion assuming that memcg is grouped by application to avoid confusion. > >> Therefore, after an APP is frozen, we hope to reclaim memory uniformly >> according to the pre-grouped APP processes. >> >> Of course, as you suggested, madvise can also achieve this, but >> implementing it in the agent may be more complex.(In terms of >> achieving the same goal, using memcg to group all the processes of an >> APP and perform proactive reclamation is simpler than using madvise >> and scanning multiple processes of an application using an agent?) > It might be more involved but the primary question is whether it is > usable for the specific use case. Madvise interface is not LRU aware but > you are not really talking about that to be a requirement? So it would > really help if you go deeper into details on how is the interface > actually supposed to be used in your case. In mobile field, we usually configure zram to compress anonymous page. We can approximate to expand memory usage with limited hardware memory by using zram. With proper strategies, an 8GB RAM phone can approximate the usage of a 12GB phone (or more). In our strategy, we group memcg by application. When the agent detects that an application has entered the background, then frozen, and has not been used for a long time, the agent will slowly issue commands to reclaim the anonymous page of that application. With this interface, `echo memory anon > memory.reclaim` > > Also make sure to exaplain why you cannot use other existing interfaces. > For example, why you simply don't decrease the limit of the frozen > cgroup and rely on the normal reclaim process to evict the most cold This is a question of reclamation tendency, and simply decreasing the limit of the frozen cgroup cannot achieve this. > memory? What are you basing your anon vs. file proportion decision on? When zram is configured and anonymous pages are reclaimed proactively, the refault probability of anonymous pages is low when an application is frozen and not reopened. Also, the cost of refaulting from zram is relatively low. However, file pages usually have shared properties, so even if an application is frozen, other processes may still access the file pages. If a limit is set and the reclamation encounters file pages, it will cause a certain amount of refault I/O, which is costly for mobile devices. Therefore, we want to have a proactive reclamation interface that has a tendency to only reclaim anonymous pages rather than file pages. By doing so, more application data can be stored in the background, and when the application is reopened from the background, cold start can be avoided.(Cold start means that the application needs to reload the required data and reinitialize its running logic.) > > In other words more details, ideally with some numbers and make sure to > describe why existing APIs cannot be used. > -- > Michal Hocko > SUSE Labs
在 2023/11/9 20:45, Michal Hocko 写道: > [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu 09-11-23 18:55:09, Huan Yang wrote: >> 在 2023/11/9 17:53, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Thu 09-11-23 09:56:46, Huan Yang wrote: >>>> 在 2023/11/8 22:06, Michal Hocko 写道: >>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Wed 08-11-23 14:58:11, Huan Yang wrote: >>>>>> In some cases, we need to selectively reclaim file pages or anonymous >>>>>> pages in an unbalanced manner. >>>>>> >>>>>> For example, when an application is pushed to the background and frozen, >>>>>> it may not be opened for a long time, and we can safely reclaim the >>>>>> application's anonymous pages, but we do not want to touch the file pages. >>>>> Could you explain why? And also why do you need to swap out in that >>>>> case? >>>> When an application is frozen, it usually means that we predict that >>>> it will not be used for a long time. In order to proactively save some >>>> memory, our strategy will choose to compress the application's private >>>> data into zram. And we will also select some of the cold application >>>> data that we think is in zram and swap it out. >>>> >>>> The above operations assume that anonymous pages are private to the >>>> application. After the application is frozen, compressing these pages >>>> into zram can save memory to some extent without worrying about >>>> frequent refaults. >>> Why don't you rely on the default reclaim heuristics? In other words do >> As I mentioned earlier, the madvise approach may not be suitable for my >> needs. > I was asking about default reclaim behavior not madvise here. Sorry for the misunderstand. > >>> you have any numbers showing that a selective reclaim results in a much >> In the mobile field, we have a core metric called application residency. > As already pointed out in other reply, make sure you explain this so > that we, who are not active in mobile field, can understand the metric, > how it is affected by the tooling relying on this interface. OK. > >> This mechanism can help us improve the application residency if we can >> provide a good freeze detection and proactive reclamation policy. >> >> I can only provide specific data from our internal tests, and it may >> be older data, and it tested using cgroup v1: >> >> In 12G ram phone, app residency improve from 29 to 38. > cgroup v1 is in maintenance mode and new extension would need to pass > even a higher feasibility test than v2 based interface. Also make sure > that you are testing the current upstream kernel. OK, if patchset v2 expect, I will change work into cgroup v2 and give test data. > > Also let me stress out that you are proposing an extension to the user > visible API and we will have to maintain that for ever. So make sure > your justification is solid and understandable. Thank you very much for your explanation. Let's focus on these discussions in another email. > -- > Michal Hocko > SUSE Labs
On Thu 09-11-23 21:07:29, Huan Yang wrote: [...] > > > Of course, as you suggested, madvise can also achieve this, but > > > implementing it in the agent may be more complex.(In terms of > > > achieving the same goal, using memcg to group all the processes of an > > > APP and perform proactive reclamation is simpler than using madvise > > > and scanning multiple processes of an application using an agent?) > > It might be more involved but the primary question is whether it is > > usable for the specific use case. Madvise interface is not LRU aware but > > you are not really talking about that to be a requirement? So it would > > really help if you go deeper into details on how is the interface > > actually supposed to be used in your case. > In mobile field, we usually configure zram to compress anonymous page. > We can approximate to expand memory usage with limited hardware memory > by using zram. > > With proper strategies, an 8GB RAM phone can approximate the usage of a 12GB > phone > (or more). > > In our strategy, we group memcg by application. When the agent detects that > an > application has entered the background, then frozen, and has not been used > for a long time, > the agent will slowly issue commands to reclaim the anonymous page of that > application. > > With this interface, `echo memory anon > memory.reclaim` This doesn't really answer my questions above. > > Also make sure to exaplain why you cannot use other existing interfaces. > > For example, why you simply don't decrease the limit of the frozen > > cgroup and rely on the normal reclaim process to evict the most cold > This is a question of reclamation tendency, and simply decreasing the limit > of the frozen cgroup cannot achieve this. Why? > > memory? What are you basing your anon vs. file proportion decision on? > When zram is configured and anonymous pages are reclaimed proactively, the > refault > probability of anonymous pages is low when an application is frozen and not > reopened. > Also, the cost of refaulting from zram is relatively low. > > However, file pages usually have shared properties, so even if an > application is frozen, > other processes may still access the file pages. If a limit is set and the > reclamation encounters > file pages, it will cause a certain amount of refault I/O, which is costly > for mobile devices. Two points here (and the reason why I am repeatedly asking for some data) 1) are you really seeing shared and actively used page cache pages being reclaimed? 2) Is the refault IO really a problem. What kind of storage those phone have that this is more significant than potentially GB of compressed anonymous memory which would need CPU to refaulted back. I mean do you have any actual numbers to show that the default reclaim strategy would lead to a less utilized or less performant system?
Huan Yang <link@vivo.com> writes: > 在 2023/11/9 18:39, Michal Hocko 写道: >> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> On Thu 09-11-23 18:29:03, Huan Yang wrote: >>> HI Michal Hocko, >>> >>> Thanks for your suggestion. >>> >>> 在 2023/11/9 17:57, Michal Hocko 写道: >>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>> >>>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>>> [...] >>>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>>> want to touch file pages. >>>> If that is the case and this is mostly application centric (which you >>>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>>> instead. >>> Madvise may not be applicable in this scenario.(IMO) >>> >>> This feature is aimed at a core goal, which is to compress the anonymous >>> pages >>> of frozen applications. >>> >>> How to detect that an application is frozen and determine which pages can be >>> safely reclaimed is the responsibility of the policy part. >>> >>> Setting madvise for an application is an active behavior, while the above >>> policy >>> is a passive approach.(If I misunderstood, please let me know if there is a >>> better >>> way to set madvise.) >> You are proposing an extension to the pro-active reclaim interface so >> this is an active behavior pretty much by definition. So I am really not >> following you here. Your agent can simply scan the address space of the >> application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) >> on the private memory is that is really what you want/need. > There is a key point here. We want to use the grouping policy of memcg > to perform > proactive reclamation with certain tendencies. Your suggestion is to > reclaim memory > by scanning the task process space. However, in the mobile field, > memory is usually > viewed at the granularity of an APP. > > Therefore, after an APP is frozen, we hope to reclaim memory uniformly > according > to the pre-grouped APP processes. > > Of course, as you suggested, madvise can also achieve this, but > implementing it in > the agent may be more complex.(In terms of achieving the same goal, > using memcg > to group all the processes of an APP and perform proactive reclamation > is simpler > than using madvise and scanning multiple processes of an application > using an agent?) I still think that it's not too complex to use process_madvise() to do this. For each process of the application, the agent can read /proc/PID/maps to get all anonymous address ranges, then call process_madvise(MADV_PAGEOUT) to reclaim pages. This can even filter out shared anonymous pages. Does this work for you? -- Best Regards, Huang, Ying
在 2023/11/10 9:19, Huang, Ying 写道: > [Some people who received this message don't often get email from ying.huang@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Huan Yang <link@vivo.com> writes: > >> 在 2023/11/9 18:39, Michal Hocko 写道: >>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On Thu 09-11-23 18:29:03, Huan Yang wrote: >>>> HI Michal Hocko, >>>> >>>> Thanks for your suggestion. >>>> >>>> 在 2023/11/9 17:57, Michal Hocko 写道: >>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>>>> [...] >>>>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>>>> want to touch file pages. >>>>> If that is the case and this is mostly application centric (which you >>>>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>>>> instead. >>>> Madvise may not be applicable in this scenario.(IMO) >>>> >>>> This feature is aimed at a core goal, which is to compress the anonymous >>>> pages >>>> of frozen applications. >>>> >>>> How to detect that an application is frozen and determine which pages can be >>>> safely reclaimed is the responsibility of the policy part. >>>> >>>> Setting madvise for an application is an active behavior, while the above >>>> policy >>>> is a passive approach.(If I misunderstood, please let me know if there is a >>>> better >>>> way to set madvise.) >>> You are proposing an extension to the pro-active reclaim interface so >>> this is an active behavior pretty much by definition. So I am really not >>> following you here. Your agent can simply scan the address space of the >>> application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) >>> on the private memory is that is really what you want/need. >> There is a key point here. We want to use the grouping policy of memcg >> to perform >> proactive reclamation with certain tendencies. Your suggestion is to >> reclaim memory >> by scanning the task process space. However, in the mobile field, >> memory is usually >> viewed at the granularity of an APP. >> >> Therefore, after an APP is frozen, we hope to reclaim memory uniformly >> according >> to the pre-grouped APP processes. >> >> Of course, as you suggested, madvise can also achieve this, but >> implementing it in >> the agent may be more complex.(In terms of achieving the same goal, >> using memcg >> to group all the processes of an APP and perform proactive reclamation >> is simpler >> than using madvise and scanning multiple processes of an application >> using an agent?) > I still think that it's not too complex to use process_madvise() to do > this. For each process of the application, the agent can read > /proc/PID/maps to get all anonymous address ranges, then call > process_madvise(MADV_PAGEOUT) to reclaim pages. This can even filter > out shared anonymous pages. Does this work for you? Thanks for this suggestion. This way can avoid touch shared anonymous, it's pretty well. But, I have some doubts about this, CPU resources are usually limited in embedded devices, and power consumption must also be taken into consideration. If this approach is adopted, the agent needs to periodically scan frozen applications and set pageout for the address space. Is the frequency of this active operation more complex and unsuitable for embedded devices compared to reclamation based on memcg grouping features? In addition, without LRU, it is difficult to control the reclamation of only partially cold anonymous page data of frozen applications. For example, if I only want to proactively reclaim 100MB of anonymous pages and issue the proactive reclamation interface, we can use the LRU feature to only reclaim 100MB of cold anonymous pages. However, this cannot be achieved through madvise.(If I have misunderstood something, please correct me.) > > -- > Best Regards, > Huang, Ying
在 2023/11/9 21:46, Michal Hocko 写道: > On Thu 09-11-23 21:07:29, Huan Yang wrote: > [...] >>>> Of course, as you suggested, madvise can also achieve this, but >>>> implementing it in the agent may be more complex.(In terms of >>>> achieving the same goal, using memcg to group all the processes of an >>>> APP and perform proactive reclamation is simpler than using madvise >>>> and scanning multiple processes of an application using an agent?) >>> It might be more involved but the primary question is whether it is >>> usable for the specific use case. Madvise interface is not LRU aware but >>> you are not really talking about that to be a requirement? So it would >>> really help if you go deeper into details on how is the interface >>> actually supposed to be used in your case. >> In mobile field, we usually configure zram to compress anonymous page. >> We can approximate to expand memory usage with limited hardware memory >> by using zram. >> >> With proper strategies, an 8GB RAM phone can approximate the usage of a 12GB >> phone >> (or more). >> >> In our strategy, we group memcg by application. When the agent detects that >> an >> application has entered the background, then frozen, and has not been used >> for a long time, >> the agent will slowly issue commands to reclaim the anonymous page of that >> application. >> >> With this interface, `echo memory anon > memory.reclaim` > This doesn't really answer my questions above. > >>> Also make sure to exaplain why you cannot use other existing interfaces. >>> For example, why you simply don't decrease the limit of the frozen >>> cgroup and rely on the normal reclaim process to evict the most cold >> This is a question of reclamation tendency, and simply decreasing the limit >> of the frozen cgroup cannot achieve this. > Why? Can I ask how to limit the reclamation to only anonymous pages using the limit? >>> memory? What are you basing your anon vs. file proportion decision on? >> When zram is configured and anonymous pages are reclaimed proactively, the >> refault >> probability of anonymous pages is low when an application is frozen and not >> reopened. >> Also, the cost of refaulting from zram is relatively low. >> >> However, file pages usually have shared properties, so even if an >> application is frozen, >> other processes may still access the file pages. If a limit is set and the >> reclamation encounters >> file pages, it will cause a certain amount of refault I/O, which is costly >> for mobile devices. > Two points here (and the reason why I am repeatedly asking for some > data) 1) are you really seeing shared and actively used page cache pages When we call the current proactive reclamation interface to actively reclaim memory, the debug program can usually observe that file pages are partially reclaimed. However, when we start other APPs for testing(the current reclaimed APP is in the background), the trace shows that there is a lot of block I/O for the background application. > being reclaimed? 2) Is the refault IO really a problem. What kind of > storage those phone have that this is more significant than potentially > GB of compressed anonymous memory which would need CPU to refaulted Phone typically use UFS. > back. I mean do you have any actual numbers to show that the default > reclaim strategy would lead to a less utilized or less performant > system? Also, When the application enters the foreground, the startup speed may be slower. Also trace show that here are a lot of block I/O. (usually 1000+ IO count and 200+ms IO Time) We usually observe very little block I/O caused by zram refault.(read: 1698.39MB/s, write: 995.109MB/s), usually, it is faster than random disk reads.(read: 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a little to test UFS. Therefore, if the proactive reclamation encounters many file pages, the application may become slow when it is opened.
Huan Yang <link@vivo.com> writes: > 在 2023/11/10 9:19, Huang, Ying 写道: >> [Some people who received this message don't often get email from ying.huang@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> Huan Yang <link@vivo.com> writes: >> >>> 在 2023/11/9 18:39, Michal Hocko 写道: >>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>> >>>> On Thu 09-11-23 18:29:03, Huan Yang wrote: >>>>> HI Michal Hocko, >>>>> >>>>> Thanks for your suggestion. >>>>> >>>>> 在 2023/11/9 17:57, Michal Hocko 写道: >>>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>>> >>>>>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>>>>> [...] >>>>>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>>>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>>>>> want to touch file pages. >>>>>> If that is the case and this is mostly application centric (which you >>>>>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>>>>> instead. >>>>> Madvise may not be applicable in this scenario.(IMO) >>>>> >>>>> This feature is aimed at a core goal, which is to compress the anonymous >>>>> pages >>>>> of frozen applications. >>>>> >>>>> How to detect that an application is frozen and determine which pages can be >>>>> safely reclaimed is the responsibility of the policy part. >>>>> >>>>> Setting madvise for an application is an active behavior, while the above >>>>> policy >>>>> is a passive approach.(If I misunderstood, please let me know if there is a >>>>> better >>>>> way to set madvise.) >>>> You are proposing an extension to the pro-active reclaim interface so >>>> this is an active behavior pretty much by definition. So I am really not >>>> following you here. Your agent can simply scan the address space of the >>>> application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) >>>> on the private memory is that is really what you want/need. >>> There is a key point here. We want to use the grouping policy of memcg >>> to perform >>> proactive reclamation with certain tendencies. Your suggestion is to >>> reclaim memory >>> by scanning the task process space. However, in the mobile field, >>> memory is usually >>> viewed at the granularity of an APP. >>> >>> Therefore, after an APP is frozen, we hope to reclaim memory uniformly >>> according >>> to the pre-grouped APP processes. >>> >>> Of course, as you suggested, madvise can also achieve this, but >>> implementing it in >>> the agent may be more complex.(In terms of achieving the same goal, >>> using memcg >>> to group all the processes of an APP and perform proactive reclamation >>> is simpler >>> than using madvise and scanning multiple processes of an application >>> using an agent?) >> I still think that it's not too complex to use process_madvise() to do >> this. For each process of the application, the agent can read >> /proc/PID/maps to get all anonymous address ranges, then call >> process_madvise(MADV_PAGEOUT) to reclaim pages. This can even filter >> out shared anonymous pages. Does this work for you? > > Thanks for this suggestion. This way can avoid touch shared anonymous, it's > pretty well. But, I have some doubts about this, CPU resources are > usually limited in > embedded devices, and power consumption must also be taken into > consideration. > > If this approach is adopted, the agent needs to periodically scan > frozen applications > and set pageout for the address space. Is the frequency of this active > operation more > complex and unsuitable for embedded devices compared to reclamation based on > memcg grouping features? In memcg based solution, when will you start the proactive reclaiming? You can just replace the reclaiming part of the solution from memcg proactive reclaiming to process_madvise(MADV_PAGEOUT). Because you can get PIDs in a memcg. Is it possible? > In addition, without LRU, it is difficult to control the reclamation > of only partially cold > anonymous page data of frozen applications. For example, if I only > want to proactively > reclaim 100MB of anonymous pages and issue the proactive reclamation > interface, > we can use the LRU feature to only reclaim 100MB of cold anonymous pages. > However, this cannot be achieved through madvise.(If I have > misunderstood something, > please correct me.) IIUC, it should be OK to reclaim all private anonymous pages of an application in your specific use case? If you really want to restrict the number of pages reclaimed, it's possible too. You can restrict the size of address range to call process_madvise(MADV_PAGEOUT), and check the RSS of the application. The accuracy of the number reclaimed isn't good. But I think that it should OK in practice? BTW: how do you know the number of pages to be reclaimed proactively in memcg proactive reclaiming based solution? -- Best Regards, Huang, Ying
在 2023/11/10 12:00, Huang, Ying 写道: > [Some people who received this message don't often get email from ying.huang@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Huan Yang <link@vivo.com> writes: > >> 在 2023/11/10 9:19, Huang, Ying 写道: >>> [Some people who received this message don't often get email from ying.huang@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> Huan Yang <link@vivo.com> writes: >>> >>>> 在 2023/11/9 18:39, Michal Hocko 写道: >>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Thu 09-11-23 18:29:03, Huan Yang wrote: >>>>>> HI Michal Hocko, >>>>>> >>>>>> Thanks for your suggestion. >>>>>> >>>>>> 在 2023/11/9 17:57, Michal Hocko 写道: >>>>>>> [Some people who received this message don't often get email from mhocko@suse.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>>>> >>>>>>> On Thu 09-11-23 11:38:56, Huan Yang wrote: >>>>>>> [...] >>>>>>>>> If so, is it better only to reclaim private anonymous pages explicitly? >>>>>>>> Yes, in practice, we only proactively compress anonymous pages and do not >>>>>>>> want to touch file pages. >>>>>>> If that is the case and this is mostly application centric (which you >>>>>>> seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT) >>>>>>> instead. >>>>>> Madvise may not be applicable in this scenario.(IMO) >>>>>> >>>>>> This feature is aimed at a core goal, which is to compress the anonymous >>>>>> pages >>>>>> of frozen applications. >>>>>> >>>>>> How to detect that an application is frozen and determine which pages can be >>>>>> safely reclaimed is the responsibility of the policy part. >>>>>> >>>>>> Setting madvise for an application is an active behavior, while the above >>>>>> policy >>>>>> is a passive approach.(If I misunderstood, please let me know if there is a >>>>>> better >>>>>> way to set madvise.) >>>>> You are proposing an extension to the pro-active reclaim interface so >>>>> this is an active behavior pretty much by definition. So I am really not >>>>> following you here. Your agent can simply scan the address space of the >>>>> application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT) >>>>> on the private memory is that is really what you want/need. >>>> There is a key point here. We want to use the grouping policy of memcg >>>> to perform >>>> proactive reclamation with certain tendencies. Your suggestion is to >>>> reclaim memory >>>> by scanning the task process space. However, in the mobile field, >>>> memory is usually >>>> viewed at the granularity of an APP. >>>> >>>> Therefore, after an APP is frozen, we hope to reclaim memory uniformly >>>> according >>>> to the pre-grouped APP processes. >>>> >>>> Of course, as you suggested, madvise can also achieve this, but >>>> implementing it in >>>> the agent may be more complex.(In terms of achieving the same goal, >>>> using memcg >>>> to group all the processes of an APP and perform proactive reclamation >>>> is simpler >>>> than using madvise and scanning multiple processes of an application >>>> using an agent?) >>> I still think that it's not too complex to use process_madvise() to do >>> this. For each process of the application, the agent can read >>> /proc/PID/maps to get all anonymous address ranges, then call >>> process_madvise(MADV_PAGEOUT) to reclaim pages. This can even filter >>> out shared anonymous pages. Does this work for you? >> Thanks for this suggestion. This way can avoid touch shared anonymous, it's >> pretty well. But, I have some doubts about this, CPU resources are >> usually limited in >> embedded devices, and power consumption must also be taken into >> consideration. >> >> If this approach is adopted, the agent needs to periodically scan >> frozen applications >> and set pageout for the address space. Is the frequency of this active >> operation more >> complex and unsuitable for embedded devices compared to reclamation based on >> memcg grouping features? > In memcg based solution, when will you start the proactive reclaiming? > You can just replace the reclaiming part of the solution from memcg > proactive reclaiming to process_madvise(MADV_PAGEOUT). Because you can > get PIDs in a memcg. Is it possible? > >> In addition, without LRU, it is difficult to control the reclamation >> of only partially cold >> anonymous page data of frozen applications. For example, if I only >> want to proactively >> reclaim 100MB of anonymous pages and issue the proactive reclamation >> interface, >> we can use the LRU feature to only reclaim 100MB of cold anonymous pages. >> However, this cannot be achieved through madvise.(If I have >> misunderstood something, >> please correct me.) > IIUC, it should be OK to reclaim all private anonymous pages of an > application in your specific use case? If you really want to restrict This is a gradual process, It will not reclaim all anonymous pages at once. > the number of pages reclaimed, it's possible too. You can restrict the > size of address range to call process_madvise(MADV_PAGEOUT), and check > the RSS of the application. The accuracy of the number reclaimed isn't > good. But I think that it should OK in practice? If you only want to reclaim all anonymous memory, this can indeed be done, and fast. :) > > BTW: how do you know the number of pages to be reclaimed proactively in > memcg proactive reclaiming based solution? One point here is that we are not sure how long the frozen application will be opened, it could be 10 minutes, an hour, or even days. So we need to predict and try, gradually reclaim anonymous pages in proportion, preferably based on the LRU algorithm. For example, if the application has been frozen for 10 minutes, reclaim 5% of anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more complicated as it requires adding a mechanism for predicting failure penalties. > > -- > Best Regards, > Huang, Ying
On Fri 10-11-23 11:48:49, Huan Yang wrote: [...] > Also, When the application enters the foreground, the startup speed > may be slower. Also trace show that here are a lot of block I/O. > (usually 1000+ IO count and 200+ms IO Time) We usually observe very > little block I/O caused by zram refault.(read: 1698.39MB/s, write: > 995.109MB/s), usually, it is faster than random disk reads.(read: > 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a > little to test UFS. > > Therefore, if the proactive reclamation encounters many file pages, > the application may become slow when it is opened. OK, this is an interesting information. From the above it seems that storage based IO refaults are order of magnitude more expensive than swap (zram in this case). That means that the memory reclaim should _in general_ prefer anonymous memory reclaim over refaulted page cache, right? Or is there any reason why "frozen" applications are any different in this case? Our traditional interface to control the anon vs. file balance has been swappiness. It is not the best interface and it has its flaws but have you experimented with the global swappiness to express that preference? What were your observations? Please note that the behavior might be really different with different kernel versions so I would really stress out that testing with the current Linus (or akpm) tree is necessary. Anyway, the more I think about that the more I am convinced that explicit anon/file extension for the memory.reclaim interface is just a wrong way to address a more fundamental underlying problem. That is, the default reclaim choice over anon vs file preference should consider the cost of the refaulting IO. This is more a property of the underlying storage than a global characteristic. In other words, say you have mutlitple storages, one that is a network based with a high latency and other that is a local fast SSD. Reclaiming a page backed by the slower storage is going to be more expensive to refault than the one backed by the fast storage. So even page cache pages are not really all the same. It is quite likely that a IO cost aspect is not really easy to integrate into the memory reclaim but it seems to me this is a better way to focus on for a better long term solution. Our existing refaulting infrastructure should help in that respect. Also MGLRU could fit for that purpose better than the traditional LRU based reclaim as the higher generations could be used for more more expensive pages.
On Fri 10-11-23 14:21:17, Huan Yang wrote: [...] > > BTW: how do you know the number of pages to be reclaimed proactively in > > memcg proactive reclaiming based solution? > > One point here is that we are not sure how long the frozen application > will be opened, it could be 10 minutes, an hour, or even days. So we > need to predict and try, gradually reclaim anonymous pages in > proportion, preferably based on the LRU algorithm. For example, if > the application has been frozen for 10 minutes, reclaim 5% of > anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more > complicated as it requires adding a mechanism for predicting failure > penalties. Why would make your reclaiming decisions based on time rather than the actual memory demand? I can see how a pro-active reclaim could make a head room for an unexpected memory pressure but applying more pressure just because of inactivity sound rather dubious to me TBH. Why cannot you simply wait for the external memory pressure (e.g. from kswapd) to deal with that based on the demand?
在 2023/11/10 20:32, Michal Hocko 写道: > On Fri 10-11-23 14:21:17, Huan Yang wrote: > [...] >>> BTW: how do you know the number of pages to be reclaimed proactively in >>> memcg proactive reclaiming based solution? >> One point here is that we are not sure how long the frozen application >> will be opened, it could be 10 minutes, an hour, or even days. So we >> need to predict and try, gradually reclaim anonymous pages in >> proportion, preferably based on the LRU algorithm. For example, if >> the application has been frozen for 10 minutes, reclaim 5% of >> anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more >> complicated as it requires adding a mechanism for predicting failure >> penalties. > Why would make your reclaiming decisions based on time rather than the > actual memory demand? I can see how a pro-active reclaim could make a > head room for an unexpected memory pressure but applying more pressure > just because of inactivity sound rather dubious to me TBH. Why cannot > you simply wait for the external memory pressure (e.g. from kswapd) to > deal with that based on the demand? Because the current kswapd and direct memory reclamation are a passive memory reclamation based on the watermark, and in the event of triggering these reclamation scenarios, the smoothness of the phone application cannot be guaranteed. (We often observe that when the above reclamation is triggered, there is a delay in the application startup, usually accompanied by block I/O, and some concurrency issues caused by lock design.) To ensure the smoothness of application startup, we have a module in Android called LMKD (formerly known as lowmemorykiller). Based on a certain algorithm, LMKD detects if application startup may be delayed and proactively kills inactive applications. (For example, based on factors such as refault IO and swap usage.) However, this behavior may cause the applications we want to protect to be killed, which will result in users having to wait for them to restart when they are reopened, which may affect the user experience.(For example, if the user wants to reopen the application interface they are working on, or re-enter the order interface they were viewing.) Therefore, the above proactive reclamation interface is designed to compress memory types with minimal cost for upper-layer applications based on reasonable strategies, in order to avoid triggering LMKD or memory reclamation as much as possible, even if it is not balanced.
在 2023/11/10 20:24, Michal Hocko 写道: > On Fri 10-11-23 11:48:49, Huan Yang wrote: > [...] >> Also, When the application enters the foreground, the startup speed >> may be slower. Also trace show that here are a lot of block I/O. >> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >> 995.109MB/s), usually, it is faster than random disk reads.(read: >> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >> little to test UFS. >> >> Therefore, if the proactive reclamation encounters many file pages, >> the application may become slow when it is opened. > OK, this is an interesting information. From the above it seems that > storage based IO refaults are order of magnitude more expensive than > swap (zram in this case). That means that the memory reclaim should > _in general_ prefer anonymous memory reclaim over refaulted page cache, > right? Or is there any reason why "frozen" applications are any > different in this case? Frozen applications mean that the application process is no longer active, so once its private anonymous page data is swapped out, the anonymous pages will not be refaulted until the application becomes active again. On the contrary, page caches are usually shared. Even if the application that first read the file is no longer active, other processes may still read the file. Therefore, it is not reasonable to use the proactive reclamation interface to reclaim page caches without considering memory pressure. Then, considering the recycling cost of anonymous pages and page cache, the idea of unbalanced recycling as described above is generated. > > Our traditional interface to control the anon vs. file balance has been > swappiness. It is not the best interface and it has its flaws but > have you experimented with the global swappiness to express that > preference? What were your observations? Please note that the behavior We have tested this part and found that no version of the code has the priority control over swappiness. This means that even if we modify swappiness to 0 or 200, we cannot achieve the goal of unbalanced reclaim if some conditions are not met during the reclaim process. Under certain conditions, we may mistakenly reclaim file pages, and since we usually trigger active reclaim when there is sufficient memory(before LMKD trigger), this will cause higher block IO. This RFC code provide some flags with the highest priority to set reclaim tendencies. Currently, it can only be triggered by the active reclaim interface. > might be really different with different kernel versions so I would > really stress out that testing with the current Linus (or akpm) tree is > necessary. OK, thank you for the reminder. > > Anyway, the more I think about that the more I am convinced that > explicit anon/file extension for the memory.reclaim interface is just a > wrong way to address a more fundamental underlying problem. That is, the > default reclaim choice over anon vs file preference should consider the > cost of the refaulting IO. This is more a property of the underlying > storage than a global characteristic. In other words, say you have > mutlitple storages, one that is a network based with a high latency and > other that is a local fast SSD. Reclaiming a page backed by the slower > storage is going to be more expensive to refault than the one backed by > the fast storage. So even page cache pages are not really all the same. > > It is quite likely that a IO cost aspect is not really easy to integrate > into the memory reclaim but it seems to me this is a better way to focus > on for a better long term solution. Our existing refaulting > infrastructure should help in that respect. Also MGLRU could fit for > that purpose better than the traditional LRU based reclaim as the higher > generations could be used for more more expensive pages. Yes, your insights are very informative. However, before our algorithm is perfected, I think it is reasonable to provide different reclaim tendencies for the active reclaim interface. This will provide greater flexibility for the strategy layer. For example, in the field of mobile phones, we can consider the comprehensive impact of refault IO overhead and LMKD killing when providing different reclaim tendencies for the active reclaim interface.
Huan Yang <link@vivo.com> writes: > 在 2023/11/10 20:24, Michal Hocko 写道: >> On Fri 10-11-23 11:48:49, Huan Yang wrote: >> [...] >>> Also, When the application enters the foreground, the startup speed >>> may be slower. Also trace show that here are a lot of block I/O. >>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >>> little to test UFS. >>> >>> Therefore, if the proactive reclamation encounters many file pages, >>> the application may become slow when it is opened. >> OK, this is an interesting information. From the above it seems that >> storage based IO refaults are order of magnitude more expensive than >> swap (zram in this case). That means that the memory reclaim should >> _in general_ prefer anonymous memory reclaim over refaulted page cache, >> right? Or is there any reason why "frozen" applications are any >> different in this case? > Frozen applications mean that the application process is no longer active, > so once its private anonymous page data is swapped out, the anonymous > pages will not be refaulted until the application becomes active again. > > On the contrary, page caches are usually shared. Even if the > application that > first read the file is no longer active, other processes may still > read the file. > Therefore, it is not reasonable to use the proactive reclamation > interface to > reclaim page caches without considering memory pressure. No. Not all page caches are shared. For example, the page caches used for use-once streaming IO. And, they should be reclaimed firstly. So, your solution may work good for your specific use cases, but it's not a general solution. Per my understanding, you want to reclaim only private pages to avoid impact the performance of other applications. Privately mapped anonymous pages is easy to be identified (And I suggest that you can find a way to avoid reclaim shared mapped anonymous pages). There's some heuristics to identify use-once page caches in reclaiming code. Why doesn't it work for your situation? [snip] -- Best Regards, Huang, Ying
在 2023/11/13 14:10, Huang, Ying 写道: > Huan Yang <link@vivo.com> writes: > >> 在 2023/11/10 20:24, Michal Hocko 写道: >>> On Fri 10-11-23 11:48:49, Huan Yang wrote: >>> [...] >>>> Also, When the application enters the foreground, the startup speed >>>> may be slower. Also trace show that here are a lot of block I/O. >>>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >>>> little to test UFS. >>>> >>>> Therefore, if the proactive reclamation encounters many file pages, >>>> the application may become slow when it is opened. >>> OK, this is an interesting information. From the above it seems that >>> storage based IO refaults are order of magnitude more expensive than >>> swap (zram in this case). That means that the memory reclaim should >>> _in general_ prefer anonymous memory reclaim over refaulted page cache, >>> right? Or is there any reason why "frozen" applications are any >>> different in this case? >> Frozen applications mean that the application process is no longer active, >> so once its private anonymous page data is swapped out, the anonymous >> pages will not be refaulted until the application becomes active again. >> >> On the contrary, page caches are usually shared. Even if the >> application that >> first read the file is no longer active, other processes may still >> read the file. >> Therefore, it is not reasonable to use the proactive reclamation >> interface to >> reclaim page caches without considering memory pressure. > No. Not all page caches are shared. For example, the page caches used > for use-once streaming IO. And, they should be reclaimed firstly. Yes, but this part is done very well in MGLRU and does not require our intervention. Moreover, the reclaim speed of clean files is very fast, but compared to it, the reclaim speed of anonymous pages is a bit slower. > > So, your solution may work good for your specific use cases, but it's Yes, this approach is not universal. > not a general solution. Per my understanding, you want to reclaim only > private pages to avoid impact the performance of other applications. > Privately mapped anonymous pages is easy to be identified (And I suggest > that you can find a way to avoid reclaim shared mapped anonymous pages). Yes, it is not good to reclaim shared anonymous pages, and it needs to be identified. In the future, we will consider how to filter them. Thanks. > There's some heuristics to identify use-once page caches in reclaiming > code. Why doesn't it work for your situation? As mentioned above, the default reclaim algorithm is suitable for recycling file pages, but we do not need to intervene in it. Direct reclaim or kswapd of these use-once file pages is very fast and will not cause lag or other effects. Our overall goal is to actively and reasonably compress unused anonymous pages based on certain strategies, in order to increase available memory to a certain extent, avoid lag, and prevent applications from being killed. Therefore, using the proactive reclaim interface, combined with LRU algorithm and reclaim tendencies, is a good way to achieve our goal. > > [snip] > > -- > Best Regards, > Huang, Ying
Huan Yang <link@vivo.com> writes: > 在 2023/11/13 14:10, Huang, Ying 写道: >> Huan Yang <link@vivo.com> writes: >> >>> 在 2023/11/10 20:24, Michal Hocko 写道: >>>> On Fri 10-11-23 11:48:49, Huan Yang wrote: >>>> [...] >>>>> Also, When the application enters the foreground, the startup speed >>>>> may be slower. Also trace show that here are a lot of block I/O. >>>>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>>>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>>>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>>>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >>>>> little to test UFS. >>>>> >>>>> Therefore, if the proactive reclamation encounters many file pages, >>>>> the application may become slow when it is opened. >>>> OK, this is an interesting information. From the above it seems that >>>> storage based IO refaults are order of magnitude more expensive than >>>> swap (zram in this case). That means that the memory reclaim should >>>> _in general_ prefer anonymous memory reclaim over refaulted page cache, >>>> right? Or is there any reason why "frozen" applications are any >>>> different in this case? >>> Frozen applications mean that the application process is no longer active, >>> so once its private anonymous page data is swapped out, the anonymous >>> pages will not be refaulted until the application becomes active again. >>> >>> On the contrary, page caches are usually shared. Even if the >>> application that >>> first read the file is no longer active, other processes may still >>> read the file. >>> Therefore, it is not reasonable to use the proactive reclamation >>> interface to >>> reclaim page caches without considering memory pressure. >> No. Not all page caches are shared. For example, the page caches used >> for use-once streaming IO. And, they should be reclaimed firstly. > Yes, but this part is done very well in MGLRU and does not require our > intervention. > Moreover, the reclaim speed of clean files is very fast, but compared to it, > the reclaim speed of anonymous pages is a bit slower. >> >> So, your solution may work good for your specific use cases, but it's > Yes, this approach is not universal. >> not a general solution. Per my understanding, you want to reclaim only >> private pages to avoid impact the performance of other applications. >> Privately mapped anonymous pages is easy to be identified (And I suggest >> that you can find a way to avoid reclaim shared mapped anonymous pages). > Yes, it is not good to reclaim shared anonymous pages, and it needs to be > identified. In the future, we will consider how to filter them. > Thanks. >> There's some heuristics to identify use-once page caches in reclaiming >> code. Why doesn't it work for your situation? > As mentioned above, the default reclaim algorithm is suitable for recycling > file pages, but we do not need to intervene in it. > Direct reclaim or kswapd of these use-once file pages is very fast and will > not cause lag or other effects. > Our overall goal is to actively and reasonably compress unused anonymous > pages based on certain strategies, in order to increase available memory to > a certain extent, avoid lag, and prevent applications from being killed. > Therefore, using the proactive reclaim interface, combined with LRU > algorithm > and reclaim tendencies, is a good way to achieve our goal. If so, why can't you just use the proactive reclaim with some large enough swappiness? That will reclaim use-once page caches and compress anonymous pages. So, more applications can be kept in memory before passive reclaiming or killing background applications? -- Best Regards, Huang, Ying
在 2023/11/13 16:05, Huang, Ying 写道: > Huan Yang <link@vivo.com> writes: > >> 在 2023/11/13 14:10, Huang, Ying 写道: >>> Huan Yang <link@vivo.com> writes: >>> >>>> 在 2023/11/10 20:24, Michal Hocko 写道: >>>>> On Fri 10-11-23 11:48:49, Huan Yang wrote: >>>>> [...] >>>>>> Also, When the application enters the foreground, the startup speed >>>>>> may be slower. Also trace show that here are a lot of block I/O. >>>>>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>>>>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>>>>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>>>>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >>>>>> little to test UFS. >>>>>> >>>>>> Therefore, if the proactive reclamation encounters many file pages, >>>>>> the application may become slow when it is opened. >>>>> OK, this is an interesting information. From the above it seems that >>>>> storage based IO refaults are order of magnitude more expensive than >>>>> swap (zram in this case). That means that the memory reclaim should >>>>> _in general_ prefer anonymous memory reclaim over refaulted page cache, >>>>> right? Or is there any reason why "frozen" applications are any >>>>> different in this case? >>>> Frozen applications mean that the application process is no longer active, >>>> so once its private anonymous page data is swapped out, the anonymous >>>> pages will not be refaulted until the application becomes active again. >>>> >>>> On the contrary, page caches are usually shared. Even if the >>>> application that >>>> first read the file is no longer active, other processes may still >>>> read the file. >>>> Therefore, it is not reasonable to use the proactive reclamation >>>> interface to >>>> reclaim page caches without considering memory pressure. >>> No. Not all page caches are shared. For example, the page caches used >>> for use-once streaming IO. And, they should be reclaimed firstly. >> Yes, but this part is done very well in MGLRU and does not require our >> intervention. >> Moreover, the reclaim speed of clean files is very fast, but compared to it, >> the reclaim speed of anonymous pages is a bit slower. >>> So, your solution may work good for your specific use cases, but it's >> Yes, this approach is not universal. >>> not a general solution. Per my understanding, you want to reclaim only >>> private pages to avoid impact the performance of other applications. >>> Privately mapped anonymous pages is easy to be identified (And I suggest >>> that you can find a way to avoid reclaim shared mapped anonymous pages). >> Yes, it is not good to reclaim shared anonymous pages, and it needs to be >> identified. In the future, we will consider how to filter them. >> Thanks. >>> There's some heuristics to identify use-once page caches in reclaiming >>> code. Why doesn't it work for your situation? >> As mentioned above, the default reclaim algorithm is suitable for recycling >> file pages, but we do not need to intervene in it. >> Direct reclaim or kswapd of these use-once file pages is very fast and will >> not cause lag or other effects. >> Our overall goal is to actively and reasonably compress unused anonymous >> pages based on certain strategies, in order to increase available memory to >> a certain extent, avoid lag, and prevent applications from being killed. >> Therefore, using the proactive reclaim interface, combined with LRU >> algorithm >> and reclaim tendencies, is a good way to achieve our goal. > If so, why can't you just use the proactive reclaim with some large > enough swappiness? That will reclaim use-once page caches and compress This works very well for proactive memory reclaim that is only executed once. However, considering that we need to perform proactive reclaim in batches, suppose that only 5% of the use-once page cache in this memcg can be reclaimed, but we need to call proactive memory reclaim step by step, such as 5%, 10%, 15% ... 100%. Then, the page cache may be reclaimed due to the balancing adjustment of reclamation, even if the 5% of use-once pages are reclaimed. We may still touch on shared file pages. (If I misunderstood anything, please correct me.) We previously used the two values of modifying swappiness to 200 and 0 to adjust reclaim tendencies. However, the debug interface showed that some file pages were reclaimed, and after being actively reclaimed, some applications and the reopened applications that were reclaimed had some block IO and startup lag. This way of having incomplete control over the process maybe is not suitable for proactive memory reclaim. Instead, with an proactive reclaim interface with tendencies, we can issue a 5% page cache trim once and then gradually reclaim anonymous pages. > anonymous pages. So, more applications can be kept in memory before > passive reclaiming or killing background applications? > > -- > Best Regards, > Huang, Ying
On Mon 13-11-23 10:17:57, Huan Yang wrote: > > 在 2023/11/10 20:24, Michal Hocko 写道: > > On Fri 10-11-23 11:48:49, Huan Yang wrote: > > [...] > > > Also, When the application enters the foreground, the startup speed > > > may be slower. Also trace show that here are a lot of block I/O. > > > (usually 1000+ IO count and 200+ms IO Time) We usually observe very > > > little block I/O caused by zram refault.(read: 1698.39MB/s, write: > > > 995.109MB/s), usually, it is faster than random disk reads.(read: > > > 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a > > > little to test UFS. > > > > > > Therefore, if the proactive reclamation encounters many file pages, > > > the application may become slow when it is opened. > > OK, this is an interesting information. From the above it seems that > > storage based IO refaults are order of magnitude more expensive than > > swap (zram in this case). That means that the memory reclaim should > > _in general_ prefer anonymous memory reclaim over refaulted page cache, > > right? Or is there any reason why "frozen" applications are any > > different in this case? > Frozen applications mean that the application process is no longer active, > so once its private anonymous page data is swapped out, the anonymous > pages will not be refaulted until the application becomes active again. I was probably not clear in my question. It is quite clear that frozen applications are inactive. It is not really clear why they should be treated any differently though. Their memory will be naturally cold as the memory is not in use so why cannot we realy on the standard memory reclaim to deal with the implicit inactivity and you need to handle that explicitly? [...] > > Our traditional interface to control the anon vs. file balance has been > > swappiness. It is not the best interface and it has its flaws but > > have you experimented with the global swappiness to express that > > preference? What were your observations? Please note that the behavior > We have tested this part and found that no version of the code has the > priority control over swappiness. > > This means that even if we modify swappiness to 0 or 200, > we cannot achieve the goal of unbalanced reclaim if some conditions > are not met during the reclaim process. Under certain conditions, > we may mistakenly reclaim file pages, and since we usually trigger > active reclaim when there is sufficient memory(before LMKD trigger), > this will cause higher block IO. Yes there are heuristics which might override the global swappinness but have you investigated those cases and can show that those heuristics could be changed? [...] > > It is quite likely that a IO cost aspect is not really easy to integrate > > into the memory reclaim but it seems to me this is a better way to focus > > on for a better long term solution. Our existing refaulting > > infrastructure should help in that respect. Also MGLRU could fit for > > that purpose better than the traditional LRU based reclaim as the higher > > generations could be used for more more expensive pages. > > Yes, your insights are very informative. > > However, before our algorithm is perfected, I think it is reasonable > to provide different reclaim tendencies for the active reclaim > interface. This will provide greater flexibility for the strategy > layer. Flexibility is really nice but it comes with a price and interface cost can be really high. There were several attempts to make memory reclaim LRU type specific but I still maintain my opinion that this is not really a good abstraction. As stated above even page cache is not all the same. A more future proof interface should really consider the IO refault cost rather than all anon/file.
On Mon 13-11-23 16:26:00, Huan Yang wrote: [...] > However, considering that we need to perform proactive reclaim in batches, > suppose that only 5% of the use-once page cache in this memcg can be > reclaimed, > but we need to call proactive memory reclaim step by step, such as 5%, 10%, > 15% ... 100%. You haven't really explained this and I have asked several times IIRC. Why do you even need to do those batches? Why cannot you simply relly on the memory pressure triggering the memory reclaim? Do you have any actual numbers showing that being pro-active results in smaller latencies or anything that would show this is actually needed?
On Tue 14-11-23 10:54:05, Michal Hocko wrote: > On Mon 13-11-23 16:26:00, Huan Yang wrote: > [...] > > However, considering that we need to perform proactive reclaim in batches, > > suppose that only 5% of the use-once page cache in this memcg can be > > reclaimed, > > but we need to call proactive memory reclaim step by step, such as 5%, 10%, > > 15% ... 100%. > > You haven't really explained this and I have asked several times IIRC. > Why do you even need to do those batches? Why cannot you simply relly on > the memory pressure triggering the memory reclaim? Do you have any > actual numbers showing that being pro-active results in smaller > latencies or anything that would show this is actually needed? Just noticed dcd2eff8-400b-4ade-a5b2-becfe26b437b@vivo.com, will reply there.
On Mon 13-11-23 09:54:55, Huan Yang wrote: > > 在 2023/11/10 20:32, Michal Hocko 写道: > > On Fri 10-11-23 14:21:17, Huan Yang wrote: > > [...] > > > > BTW: how do you know the number of pages to be reclaimed proactively in > > > > memcg proactive reclaiming based solution? > > > One point here is that we are not sure how long the frozen application > > > will be opened, it could be 10 minutes, an hour, or even days. So we > > > need to predict and try, gradually reclaim anonymous pages in > > > proportion, preferably based on the LRU algorithm. For example, if > > > the application has been frozen for 10 minutes, reclaim 5% of > > > anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more > > > complicated as it requires adding a mechanism for predicting failure > > > penalties. > > Why would make your reclaiming decisions based on time rather than the > > actual memory demand? I can see how a pro-active reclaim could make a > > head room for an unexpected memory pressure but applying more pressure > > just because of inactivity sound rather dubious to me TBH. Why cannot > > you simply wait for the external memory pressure (e.g. from kswapd) to > > deal with that based on the demand? > Because the current kswapd and direct memory reclamation are a passive > memory reclamation based on the watermark, and in the event of triggering > these reclamation scenarios, the smoothness of the phone application cannot > be guaranteed. OK, so you are worried about latencies on spike memory usage. > (We often observe that when the above reclamation is triggered, there > is a delay in the application startup, usually accompanied by block > I/O, and some concurrency issues caused by lock design.) Does that mean you do not have enough head room for kswapd to keep with the memory demand? It is really hard to discuss this without some actual numbers or more specifics. > To ensure the smoothness of application startup, we have a module in > Android called LMKD (formerly known as lowmemorykiller). Based on a > certain algorithm, LMKD detects if application startup may be delayed > and proactively kills inactive applications. (For example, based on > factors such as refault IO and swap usage.) > > However, this behavior may cause the applications we want to protect > to be killed, which will result in users having to wait for them to > restart when they are reopened, which may affect the user > experience.(For example, if the user wants to reopen the application > interface they are working on, or re-enter the order interface they > were viewing.) This suggests that your LMKD doesn't pick up the right victim to kill. And I suspect this is a fundamental problem of those pro-active oom killer solutions. > Therefore, the above proactive reclamation interface is designed to > compress memory types with minimal cost for upper-layer applications > based on reasonable strategies, in order to avoid triggering LMKD or > memory reclamation as much as possible, even if it is not balanced. This would suggest that MADV_PAGEOUT is really what you are looking for. If you really aim at compressing a specific type of memory then tweking reclaim to achieve that sounds like a shortcut because madvise based solution is more involved. But that is not a solid justification for adding a new interface.
在 2023/11/14 18:04, Michal Hocko 写道: > On Mon 13-11-23 09:54:55, Huan Yang wrote: >> 在 2023/11/10 20:32, Michal Hocko 写道: >>> On Fri 10-11-23 14:21:17, Huan Yang wrote: >>> [...] >>>>> BTW: how do you know the number of pages to be reclaimed proactively in >>>>> memcg proactive reclaiming based solution? >>>> One point here is that we are not sure how long the frozen application >>>> will be opened, it could be 10 minutes, an hour, or even days. So we >>>> need to predict and try, gradually reclaim anonymous pages in >>>> proportion, preferably based on the LRU algorithm. For example, if >>>> the application has been frozen for 10 minutes, reclaim 5% of >>>> anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more >>>> complicated as it requires adding a mechanism for predicting failure >>>> penalties. >>> Why would make your reclaiming decisions based on time rather than the >>> actual memory demand? I can see how a pro-active reclaim could make a >>> head room for an unexpected memory pressure but applying more pressure >>> just because of inactivity sound rather dubious to me TBH. Why cannot >>> you simply wait for the external memory pressure (e.g. from kswapd) to >>> deal with that based on the demand? >> Because the current kswapd and direct memory reclamation are a passive >> memory reclamation based on the watermark, and in the event of triggering >> these reclamation scenarios, the smoothness of the phone application cannot >> be guaranteed. > OK, so you are worried about latencies on spike memory usage. > >> (We often observe that when the above reclamation is triggered, there >> is a delay in the application startup, usually accompanied by block >> I/O, and some concurrency issues caused by lock design.) > Does that mean you do not have enough head room for kswapd to keep with Yes, but if set high watermark a little high, the power consumption will be very high. We usually observe that kswapd will run frequently. Even if we have set a low kswapd water level, kswapd CPU usage can still be high in some extreme scenarios.(For example, when starting a large application that needs to acquire a large amount of memory in a short period of time. )However, we will not discuss it in detail here, the reasons are quite complex, and we have not yet sorted out a complete understanding of them. > the memory demand? It is really hard to discuss this without some actual > numbers or more specifics. > >> To ensure the smoothness of application startup, we have a module in >> Android called LMKD (formerly known as lowmemorykiller). Based on a >> certain algorithm, LMKD detects if application startup may be delayed >> and proactively kills inactive applications. (For example, based on >> factors such as refault IO and swap usage.) >> >> However, this behavior may cause the applications we want to protect >> to be killed, which will result in users having to wait for them to >> restart when they are reopened, which may affect the user >> experience.(For example, if the user wants to reopen the application >> interface they are working on, or re-enter the order interface they >> were viewing.) > This suggests that your LMKD doesn't pick up the right victim to kill. > And I suspect this is a fundamental problem of those pro-active oom Yes, but, our current LMKD configuration is already very conservative, which can cause lag in some scenarios, but we will not analyze the reasons in detail here. > killer solutions. > >> Therefore, the above proactive reclamation interface is designed to >> compress memory types with minimal cost for upper-layer applications >> based on reasonable strategies, in order to avoid triggering LMKD or >> memory reclamation as much as possible, even if it is not balanced. > This would suggest that MADV_PAGEOUT is really what you are looking for. Yes, I agree, especially to avoid reclaiming shared anonymous pages. However, I did some shallow research and found that MADV_PAGEOUT does not reclaim pages with mapcount != 1. Our applications are usually composed of multiple processes, and some anonymous pages are shared among them. When the application is frozen, the memory that is only shared among the processes within the application should be released, but MADV_PAGEOUT seems not to be suitable for this scenario?(If I misunderstood anything, please correct me.) In addition, I still have doubts that this approach will consume a lot of strategy resources, but it is worth studying. Thanks. > If you really aim at compressing a specific type of memory then tweking > reclaim to achieve that sounds like a shortcut because madvise based > solution is more involved. But that is not a solid justification for > adding a new interface. Yes, but this RFC is just adding an additional configuration option to the proactive reclaim interface. And in the reclaim path, prioritize processing these requests with reclaim tendencies. However, using `unlikely` judgment should not have much impact.
On Tue 14-11-23 20:37:07, Huan Yang wrote: > > 在 2023/11/14 18:04, Michal Hocko 写道: > > On Mon 13-11-23 09:54:55, Huan Yang wrote: > > > 在 2023/11/10 20:32, Michal Hocko 写道: > > > > On Fri 10-11-23 14:21:17, Huan Yang wrote: > > > > [...] > > > > > > BTW: how do you know the number of pages to be reclaimed proactively in > > > > > > memcg proactive reclaiming based solution? > > > > > One point here is that we are not sure how long the frozen application > > > > > will be opened, it could be 10 minutes, an hour, or even days. So we > > > > > need to predict and try, gradually reclaim anonymous pages in > > > > > proportion, preferably based on the LRU algorithm. For example, if > > > > > the application has been frozen for 10 minutes, reclaim 5% of > > > > > anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more > > > > > complicated as it requires adding a mechanism for predicting failure > > > > > penalties. > > > > Why would make your reclaiming decisions based on time rather than the > > > > actual memory demand? I can see how a pro-active reclaim could make a > > > > head room for an unexpected memory pressure but applying more pressure > > > > just because of inactivity sound rather dubious to me TBH. Why cannot > > > > you simply wait for the external memory pressure (e.g. from kswapd) to > > > > deal with that based on the demand? > > > Because the current kswapd and direct memory reclamation are a passive > > > memory reclamation based on the watermark, and in the event of triggering > > > these reclamation scenarios, the smoothness of the phone application cannot > > > be guaranteed. > > OK, so you are worried about latencies on spike memory usage. > > > > > (We often observe that when the above reclamation is triggered, there > > > is a delay in the application startup, usually accompanied by block > > > I/O, and some concurrency issues caused by lock design.) > > Does that mean you do not have enough head room for kswapd to keep with > > Yes, but if set high watermark a little high, the power consumption > will be very high. We usually observe that kswapd will run > frequently. Even if we have set a low kswapd water level, kswapd CPU > usage can still be high in some extreme scenarios.(For example, when > starting a large application that needs to acquire a large amount of > memory in a short period of time.)However, we will not discuss it in > detail here, the reasons are quite complex, and we have not yet sorted > out a complete understanding of them. This is definitely worth investigating further before resorting to proposing a new interface. If the kswapd consumes CPU cycles unproductively then we should look into why. If there is a big peak memory demand then that surely requires CPU capacity for the memory reclaim. The work has to be done, whether that is in kswapd or the pro-active reclaimer context. I can imagine the latter one could be invoked with a better timing in mind but that is not a trivial thing to do. There are examples where this could be driven by PSI feedback loop but from what you have mention earlier you are doing a idle time based reclaim. Anyway, this is mostly a tuning related discussion. I wanted to learn more about what you are trying to achieve and so far it seems to me you are trying to workaround some issues and a) we would like to learn about those issues and b) a new interface is unlikely a good fit to paper over a suboptimal behavior. > > This would suggest that MADV_PAGEOUT is really what you are looking > > for. > > Yes, I agree, especially to avoid reclaiming shared anonymous pages. > > However, I did some shallow research and found that MADV_PAGEOUT does > not reclaim pages with mapcount != 1. Our applications are usually > composed of multiple processes, and some anonymous pages are shared > among them. When the application is frozen, the memory that is only > shared among the processes within the application should be released, > but MADV_PAGEOUT seems not to be suitable for this scenario?(If I > misunderstood anything, please correct me.) Hmm, OK it seems that we are hitting some terminology problems. The discussion was about private memory so far (essentially MAP_PRIVATE) now you are talking about a shared anonymous memory. That would imply shmem and that is indeed not supported by MADV_PAGEOUT. The reason for that is that this poses a security risk for time based attacks. I can imagine, though, that we could extend the behavior to support shared mappings if they do not cross a security boundary (e.g. mapped by the same user). This would require some analysis though. > In addition, I still have doubts that this approach will consume a lot > of strategy resources, but it is worth studying. > > If you really aim at compressing a specific type of memory then > > tweking reclaim to achieve that sounds like a shortcut because > > madvise based solution is more involved. But that is not a solid > > justification for adding a new interface. > Yes, but this RFC is just adding an additional configuration option to > the proactive reclaim interface. And in the reclaim path, prioritize > processing these requests with reclaim tendencies. However, using > `unlikely` judgment should not have much impact. Just adding an adding configuration option means user interface contract that needs to be maintained for ever. Our future reclaim algorithm migh change (and in fact it has already changed quite a bit with MGLRU) and explicit request for LRU type specific reclaim might not even have any sense. See that point?
在 2023/11/14 21:03, Michal Hocko 写道: > On Tue 14-11-23 20:37:07, Huan Yang wrote: >> 在 2023/11/14 18:04, Michal Hocko 写道: >>> On Mon 13-11-23 09:54:55, Huan Yang wrote: >>>> 在 2023/11/10 20:32, Michal Hocko 写道: >>>>> On Fri 10-11-23 14:21:17, Huan Yang wrote: >>>>> [...] >>>>>>> BTW: how do you know the number of pages to be reclaimed proactively in >>>>>>> memcg proactive reclaiming based solution? >>>>>> One point here is that we are not sure how long the frozen application >>>>>> will be opened, it could be 10 minutes, an hour, or even days. So we >>>>>> need to predict and try, gradually reclaim anonymous pages in >>>>>> proportion, preferably based on the LRU algorithm. For example, if >>>>>> the application has been frozen for 10 minutes, reclaim 5% of >>>>>> anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more >>>>>> complicated as it requires adding a mechanism for predicting failure >>>>>> penalties. >>>>> Why would make your reclaiming decisions based on time rather than the >>>>> actual memory demand? I can see how a pro-active reclaim could make a >>>>> head room for an unexpected memory pressure but applying more pressure >>>>> just because of inactivity sound rather dubious to me TBH. Why cannot >>>>> you simply wait for the external memory pressure (e.g. from kswapd) to >>>>> deal with that based on the demand? >>>> Because the current kswapd and direct memory reclamation are a passive >>>> memory reclamation based on the watermark, and in the event of triggering >>>> these reclamation scenarios, the smoothness of the phone application cannot >>>> be guaranteed. >>> OK, so you are worried about latencies on spike memory usage. >>> >>>> (We often observe that when the above reclamation is triggered, there >>>> is a delay in the application startup, usually accompanied by block >>>> I/O, and some concurrency issues caused by lock design.) >>> Does that mean you do not have enough head room for kswapd to keep with >> Yes, but if set high watermark a little high, the power consumption >> will be very high. We usually observe that kswapd will run >> frequently. Even if we have set a low kswapd water level, kswapd CPU >> usage can still be high in some extreme scenarios.(For example, when >> starting a large application that needs to acquire a large amount of >> memory in a short period of time.)However, we will not discuss it in >> detail here, the reasons are quite complex, and we have not yet sorted >> out a complete understanding of them. > This is definitely worth investigating further before resorting to > proposing a new interface. If the kswapd consumes CPU cycles > unproductively then we should look into why. Yes, this is my current research objective. > > If there is a big peak memory demand then that surely requires CPU > capacity for the memory reclaim. The work has to be done, whether that > is in kswapd or the pro-active reclaimer context. I can imagine the > latter one could be invoked with a better timing in mind but that is not > a trivial thing to do. There are examples where this could be driven by > PSI feedback loop but from what you have mention earlier you are doing a > idle time based reclaim. Anyway, this is mostly a tuning related > discussion. I wanted to learn more about what you are trying to achieve > and so far it seems to me you are trying to workaround some issues and > a) we would like to learn about those issues and b) a new interface is > unlikely a good fit to paper over a suboptimal behavior. Our current research goal is to find a possible dynamic balance between the time consumption of passive memory reclamation and the application death caused by active process killing. The current strategy is to use proactive memory reclamation to intervene in this process. As mentioned earlier, by actively reclaiming anonymous pages that are deemed safe to reclaim, we can increase the currently available memory, avoid lag when starting new applications, and prevent the death of resident applications. Through the previous discussions, it seems that we have reached a consensus that although the active memory reclamation interface can achieve this goal, it is not the best approach. Using MADV can both use existing methods to achieve this goal and decide whether to reclaim based on the characteristics of the anon vma, especially the anon_vma name set. Therefore, I will also push for internal research on this approach. > >>> This would suggest that MADV_PAGEOUT is really what you are looking >>> for. >> Yes, I agree, especially to avoid reclaiming shared anonymous pages. >> >> However, I did some shallow research and found that MADV_PAGEOUT does >> not reclaim pages with mapcount != 1. Our applications are usually >> composed of multiple processes, and some anonymous pages are shared >> among them. When the application is frozen, the memory that is only >> shared among the processes within the application should be released, >> but MADV_PAGEOUT seems not to be suitable for this scenario?(If I >> misunderstood anything, please correct me.) > Hmm, OK it seems that we are hitting some terminology problems. The > discussion was about private memory so far (essentially MAP_PRIVATE) > now you are talking about a shared anonymous memory. That would imply > shmem and that is indeed not supported by MADV_PAGEOUT. The reason for > that is that this poses a security risk for time based attacks. I can > imagine, though, that we could extend the behavior to support shared > mappings if they do not cross a security boundary (e.g. mapped by the > same user). This would require some analysis though. OK, thanks. I have communicated with our internal team and found out that this part of the memory usage will not be particularly large. > >> In addition, I still have doubts that this approach will consume a lot >> of strategy resources, but it is worth studying. >>> If you really aim at compressing a specific type of memory then >>> tweking reclaim to achieve that sounds like a shortcut because >>> madvise based solution is more involved. But that is not a solid >>> justification for adding a new interface. >> Yes, but this RFC is just adding an additional configuration option to >> the proactive reclaim interface. And in the reclaim path, prioritize >> processing these requests with reclaim tendencies. However, using >> `unlikely` judgment should not have much impact. > Just adding an adding configuration option means user interface contract > that needs to be maintained for ever. Our future reclaim algorithm migh > change (and in fact it has already changed quite a bit with MGLRU) and > explicit request for LRU type specific reclaim might not even have any > sense. See that point? Yes, I get it. This also means that if the reclaim algorithm changes, the current implementation of tendencies will need to be modified accordingly, which requires a certain cost to maintain. If the current implementation of tendencies cannot prove its necessity, it should be keep deep research. This solution may be simpler for me to achieve our internal goals, but it may not be the best solution.So, MADV_PAGEOUT is worth to research. This conversation was very beneficial for me. Thank you all very much. >
Huan Yang <link@vivo.com> writes: > 在 2023/11/13 16:05, Huang, Ying 写道: >> Huan Yang <link@vivo.com> writes: >> >>> 在 2023/11/13 14:10, Huang, Ying 写道: >>>> Huan Yang <link@vivo.com> writes: >>>> >>>>> 在 2023/11/10 20:24, Michal Hocko 写道: >>>>>> On Fri 10-11-23 11:48:49, Huan Yang wrote: >>>>>> [...] >>>>>>> Also, When the application enters the foreground, the startup speed >>>>>>> may be slower. Also trace show that here are a lot of block I/O. >>>>>>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>>>>>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>>>>>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>>>>>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a >>>>>>> little to test UFS. >>>>>>> >>>>>>> Therefore, if the proactive reclamation encounters many file pages, >>>>>>> the application may become slow when it is opened. >>>>>> OK, this is an interesting information. From the above it seems that >>>>>> storage based IO refaults are order of magnitude more expensive than >>>>>> swap (zram in this case). That means that the memory reclaim should >>>>>> _in general_ prefer anonymous memory reclaim over refaulted page cache, >>>>>> right? Or is there any reason why "frozen" applications are any >>>>>> different in this case? >>>>> Frozen applications mean that the application process is no longer active, >>>>> so once its private anonymous page data is swapped out, the anonymous >>>>> pages will not be refaulted until the application becomes active again. >>>>> >>>>> On the contrary, page caches are usually shared. Even if the >>>>> application that >>>>> first read the file is no longer active, other processes may still >>>>> read the file. >>>>> Therefore, it is not reasonable to use the proactive reclamation >>>>> interface to >>>>> reclaim page caches without considering memory pressure. >>>> No. Not all page caches are shared. For example, the page caches used >>>> for use-once streaming IO. And, they should be reclaimed firstly. >>> Yes, but this part is done very well in MGLRU and does not require our >>> intervention. >>> Moreover, the reclaim speed of clean files is very fast, but compared to it, >>> the reclaim speed of anonymous pages is a bit slower. >>>> So, your solution may work good for your specific use cases, but it's >>> Yes, this approach is not universal. >>>> not a general solution. Per my understanding, you want to reclaim only >>>> private pages to avoid impact the performance of other applications. >>>> Privately mapped anonymous pages is easy to be identified (And I suggest >>>> that you can find a way to avoid reclaim shared mapped anonymous pages). >>> Yes, it is not good to reclaim shared anonymous pages, and it needs to be >>> identified. In the future, we will consider how to filter them. >>> Thanks. >>>> There's some heuristics to identify use-once page caches in reclaiming >>>> code. Why doesn't it work for your situation? >>> As mentioned above, the default reclaim algorithm is suitable for recycling >>> file pages, but we do not need to intervene in it. >>> Direct reclaim or kswapd of these use-once file pages is very fast and will >>> not cause lag or other effects. >>> Our overall goal is to actively and reasonably compress unused anonymous >>> pages based on certain strategies, in order to increase available memory to >>> a certain extent, avoid lag, and prevent applications from being killed. >>> Therefore, using the proactive reclaim interface, combined with LRU >>> algorithm >>> and reclaim tendencies, is a good way to achieve our goal. >> If so, why can't you just use the proactive reclaim with some large >> enough swappiness? That will reclaim use-once page caches and compress > This works very well for proactive memory reclaim that is only > executed once. > However, considering that we need to perform proactive reclaim in batches, > suppose that only 5% of the use-once page cache in this memcg can be > reclaimed, > but we need to call proactive memory reclaim step by step, such as 5%, > 10%, 15% ... 100%. > Then, the page cache may be reclaimed due to the balancing adjustment > of reclamation, > even if the 5% of use-once pages are reclaimed. We may still touch on > shared file pages. > (If I misunderstood anything, please correct me.) If the proactive reclaim amount is less than the size of anonymous pages, I think that you are safe. For example, if the size of anonymous pages is 100MB, the size of use-once file pages is 10MB, the size of shared file pages is 20MB. Then if you reclaim 100MB proactively with swappiness=200, you will reclaim 10MB use-once file pages and 90MB anonymous pages. In the next time, if you reclaim 10MB proactively, you will still not reclaim shared file pages. > We previously used the two values of modifying swappiness to 200 and 0 > to adjust reclaim > tendencies. However, the debug interface showed that some file pages > were reclaimed, > and after being actively reclaimed, some applications and the reopened > applications that were > reclaimed had some block IO and startup lag. If so, please research why use-once file page heuristics not work and try to fix it or raise the issue. > This way of having incomplete control over the process maybe is not > suitable for proactive memory > reclaim. Instead, with an proactive reclaim interface with tendencies, > we can issue a > 5% page cache trim once and then gradually reclaim anonymous pages. >> anonymous pages. So, more applications can be kept in memory before >> passive reclaiming or killing background applications? -- Best Regards, Huang, Ying