mbox series

[v3,0/2] mm: tlb swap entries batch async release

Message ID 20240805153639.1057-1-justinjiang@vivo.com (mailing list archive)
Headers show
Series mm: tlb swap entries batch async release | expand

Message

zhiguojiang Aug. 5, 2024, 3:36 p.m. UTC
One of the main reasons for the prolonged exit of the process with
independent mm is the time-consuming release of its swap entries.
The proportion of swap memory occupied by the process increases over
time due to high memory pressure triggering to reclaim anonymous folio
into swapspace, e.g., in Android devices, we found this proportion can
reach 60% or more after a period of time. Additionally, the relatively
lengthy path for releasing swap entries further contributes to the
longer time required to release swap entries.

Testing Platform: 8GB RAM
Testing procedure:
After booting up, start 15 processes first, and then observe the
physical memory size occupied by the last launched process at different
time points.
Example: The process launched last: com.qiyi.video
|  memory type  |  0min  |  1min  |   5min  |   10min  |   15min  |
-------------------------------------------------------------------
|     VmRSS(KB) | 453832 | 252300 |  204364 |   199944 |  199748  |
|   RssAnon(KB) | 247348 |  99296 |   71268 |    67808 |   67660  |
|   RssFile(KB) | 205536 | 152020 |  132144 |   131184 |  131136  |
|  RssShmem(KB) |   1048 |    984 |     952 |     952  |     952  |
|    VmSwap(KB) | 202692 | 334852 |  362880 |   366340 |  366488  |
| Swap ratio(%) | 30.87% | 57.03% |  63.97% |   64.69% |  64.72%  |
Note: min - minute.

When there are multiple processes with independent mm and the high
memory pressure in system, if the large memory required process is
launched at this time, system will is likely to trigger the instantaneous
killing of many processes with independent mm. Due to multiple exiting
processes occupying multiple CPU core resources for concurrent execution,
leading to some issues such as the current non-exiting and important
processes lagging.

To solve this problem, we have introduced the multiple exiting process
asynchronous swap entries release mechanism, which isolates and caches
swap entries occupied by multiple exiting processes, and hands them over
to an asynchronous kworker to complete the release. This allows the
exiting processes to complete quickly and release CPU resources. We have
validated this modification on the Android products and achieved the
expected benefits.

Testing Platform: 8GB RAM
Testing procedure:
After restarting the machine, start 15 app processes first, and then
start the camera app processes, we monitor the cold start and preview
time datas of the camera app processes.

Test datas of camera processes cold start time (unit: millisecond):
|  seq   |   1  |   2  |   3  |   4  |   5  |   6  | average |
| before | 1498 | 1476 | 1741 | 1337 | 1367 | 1655 |   1512  |
| after  | 1396 | 1107 | 1136 | 1178 | 1071 | 1339 |   1204  |

Test datas of camera processes preview time (unit: millisecond):
|  seq   |   1  |   2  |   3  |   4  |   5  |   6  | average |
| before |  267 |  402 |  504 |  513 |  161 |  265 |   352   |
| after  |  188 |  223 |  301 |  203 |  162 |  154 |   205   |

Base on the average of the six sets of test datas above, we can see that
the benefit datas of the modified patch:
1. The cold start time of camera app processes has reduced by about 20%.
2. The preview time of camera app processes has reduced by about 42%.

It offers several benefits:
1. Alleviate the high system cpu loading caused by multiple exiting
   processes running simultaneously.
2. Reduce lock competition in swap entry free path by an asynchronous
   kworker instead of multiple exiting processes parallel execution.
3. Release pte_present memory occupied by exiting processes more
   efficiently.

-v3:
1. Fix compilation warning and squash them into patch #2 according to
 David Hildenbrand's suggestion.
 Reported-by: kernel test robot <lkp@intel.com>
 Closes: https://lore.kernel.org/oe-kbuild-all/202408010150.13yZScv6-lkp@intel.com/
2. Update comments according to Andrew Morton and Barry Song.

-v2:
1. fix arch s390 config compilation warning.
 Reported-by: kernel test robot <lkp@intel.com>
 Closes: https://lore.kernel.org/oe-kbuild-all/202407311703.8q8sDQ2p-lkp@intel.com/
 Closes: https://lore.kernel.org/oe-kbuild-all/202407311947.VPJNRqad-lkp@intel.com/

-v1:
 https://lore.kernel.org/linux-mm/20240730114426.511-1-justinjiang@vivo.com/

Zhiguo Jiang (2):
  mm: move task_is_dying to h headfile
  mm: tlb: add tlb swap entries batch async release

 arch/s390/include/asm/tlb.h |   8 +
 include/asm-generic/tlb.h   |  44 ++++++
 include/linux/mm_types.h    |  58 +++++++
 include/linux/oom.h         |   6 +
 mm/memcontrol.c             |   6 -
 mm/memory.c                 |   3 +-
 mm/mmu_gather.c             | 296 ++++++++++++++++++++++++++++++++++++
 7 files changed, 414 insertions(+), 7 deletions(-)