Message ID | 20221118070603.84081-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/cgroup/reclaim: Fix dirty pages throttling on cgroup v1 | expand |
On Fri, Nov 18, 2022 at 12:36:03PM +0530, Aneesh Kumar K.V wrote: > balance_dirty_pages doesn't do the required dirty throttling on cgroupv1. See > commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback on > traditional hierarchies"). Instead, the kernel depends on writeback throttling > in shrink_folio_list to achieve the same goal. With large memory systems, the > flusher may not be able to writeback quickly enough such that we will start > finding pages in the shrink_folio_list already in writeback. Hence for cgroupv1 > let's do a reclaim throttle after waking up the flusher. > > The below test which used to fail on a 256GB system completes till the > the file system is full with this change. > > root@lp2:/sys/fs/cgroup/memory# mkdir test > root@lp2:/sys/fs/cgroup/memory# cd test/ > root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes > root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks > root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M > Killed > > Cc: <stable@kernel.org> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Thanks Aneesh
diff --git a/mm/vmscan.c b/mm/vmscan.c index 04d8b88e5216..388022c5ef2b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2514,8 +2514,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, * the flushers simply cannot keep up with the allocation * rate. Nudge the flusher threads in case they are asleep. */ - if (stat.nr_unqueued_dirty == nr_taken) + if (stat.nr_unqueued_dirty == nr_taken) { wakeup_flusher_threads(WB_REASON_VMSCAN); + /* + * For cgroupv1 dirty throttling is achieved by waking up + * the kernel flusher here and later waiting on folios + * which are in writeback to finish (see shrink_folio_list()). + * + * Flusher may not be able to issue writeback quickly + * enough for cgroupv1 writeback throttling to work + * on a large system. + */ + if (!writeback_throttling_sane(sc)) + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); + } sc->nr.dirty += stat.nr_dirty; sc->nr.congested += stat.nr_congested;
balance_dirty_pages doesn't do the required dirty throttling on cgroupv1. See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback on traditional hierarchies"). Instead, the kernel depends on writeback throttling in shrink_folio_list to achieve the same goal. With large memory systems, the flusher may not be able to writeback quickly enough such that we will start finding pages in the shrink_folio_list already in writeback. Hence for cgroupv1 let's do a reclaim throttle after waking up the flusher. The below test which used to fail on a 256GB system completes till the the file system is full with this change. root@lp2:/sys/fs/cgroup/memory# mkdir test root@lp2:/sys/fs/cgroup/memory# cd test/ root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M Killed Cc: <stable@kernel.org> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- mm/vmscan.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)