From patchwork Mon Aug 31 15:36:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Schatzberg X-Patchwork-Id: 11746525 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFEE613B6 for ; Mon, 31 Aug 2020 15:37:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE366214D8 for ; Mon, 31 Aug 2020 15:37:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nY4/WHze" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE366214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AC9886B0003; Mon, 31 Aug 2020 11:37:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A7B4B8E0001; Mon, 31 Aug 2020 11:37:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 990206B0055; Mon, 31 Aug 2020 11:37:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 279886B0003 for ; Mon, 31 Aug 2020 11:37:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D56D38245571 for ; Mon, 31 Aug 2020 15:37:19 +0000 (UTC) X-FDA: 77211267798.06.cows05_4713fee27090 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id A6F20100410D0 for ; Mon, 31 Aug 2020 15:37:19 +0000 (UTC) X-Spam-Summary: 50,0,0,b67f828b7459313f,d41d8cd98f00b204,schatzberg.dan@gmail.com,,RULES_HIT:41:355:379:387:541:967:973:982:988:989:1260:1311:1314:1345:1437:1515:1535:1543:1605:1711:1730:1747:1777:1792:2198:2199:2393:2525:2553:2561:2564:2682:2685:2859:2903:2910:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3653:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4118:4250:4605:5007:6119:6261:6653:6742:7903:8603:9025:9201:9413:10004:11026:11658:11914:12043:12291:12296:12297:12517:12519:12663:12679:12683:12895:13149:13161:13180:13229:13230:13894:14096:14181:14394:14687:14721:21433:21444:21450:21451:21627:21666:21740:21788:21939:30003:30054:30070:30090,0,RBL:209.85.219.68:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100;04ygdn1w11cj53y64b1iob49xfmgkypukcwpiogy4bbxaxrexw3kkareezbqhak.of6omju1hze1yuxtm1itd66htwcg1bkb6uxkn6mu46rrfjt39qupzxzj1wmsxde.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0 .5,0.5,N X-HE-Tag: cows05_4713fee27090 X-Filterd-Recvd-Size: 7426 Received: from mail-qv1-f68.google.com (mail-qv1-f68.google.com [209.85.219.68]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 31 Aug 2020 15:37:19 +0000 (UTC) Received: by mail-qv1-f68.google.com with SMTP id j10so2845295qvk.11 for ; Mon, 31 Aug 2020 08:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jiQBbNngqw8IuyIhXzfGIT0+JHoFfv7mtdOFjAup3WM=; b=nY4/WHzeh14Gz4k06j9sj2hHUQgb0o0vKw83EYBDo5y2WJsyE+naeliL14ibXte6Mg V5mlr5G5VXGwpvrJIAulVbWl2WMu0MNti1mRnLS/OFmdogahV8iuN8NTk2XzSBv8x9ug fnmLbmeCinlu+AvChQIYP8Th9ja5yiGygdDs5gFEZKzZlSnm96gGwPgLF2yGmUJ4EMHK N4pt7R5/46LHkh2qIlTjLtJVE7YGdciFYQJ+ekoKK/TGDyilTy0qJ4B2Lp9QIijcgE90 skgwGyYDnLtjhGD3Qew2OzXoghx4qp7kvFLLtBBeXfyIFYlLYKEM8nqF8GcMCOp2yice ggrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jiQBbNngqw8IuyIhXzfGIT0+JHoFfv7mtdOFjAup3WM=; b=CQOCnUCfZjTABDFfngUmYnLlJwmt8EcORtGzCf+WOmtVrpYncZfJlTyQPvOFhupPvh v55p4HSHUd3IsVoeiWK0tBEJVmplaVV+m8FrjwPzXMoueDcoaet3Z5tm0Pqsvb5uVncx 0AROAsgTJjvKZtAcOi5A5V6f1YVB1/X310fXZwxfLvXoNu4FSvi+nl1qVYJUBnctz0tl U8O9q3umFmnRr5vIXdxTEymJtpUn4f5+/F9qwvp4H93pah/4HHIRScCmXZ3ncy+w8Pmf 90s0FcyPY0oQfoWWPXzpzGB2TkA2d9/r47JPioG/6K2panWTNVjQdYin//m8aQsSsyne L/aA== X-Gm-Message-State: AOAM530mHPYBJ8Anx049lrWyDfqQyR4O+vi+iJc3L22ilcc8HA8aDNC3 UWfQF31PR0FgWsV4UBPUvDw= X-Google-Smtp-Source: ABdhPJz4hqK1YGX+tTjEMtxf1tCzkK3llJ6iE4/enptUHC9YObIoMBc4iFMY8Ul+vZqN1hQDSBMxzQ== X-Received: by 2002:a0c:ec86:: with SMTP id u6mr1613973qvo.58.1598888238587; Mon, 31 Aug 2020 08:37:18 -0700 (PDT) Received: from dschatzberg-fedora-PC0Y6AEN.thefacebook.com ([2620:10d:c091:480::1:2edc]) by smtp.gmail.com with ESMTPSA id s5sm9908872qke.120.2020.08.31.08.37.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Aug 2020 08:37:17 -0700 (PDT) From: Dan Schatzberg To: Cc: Dan Schatzberg , Jens Axboe , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Shakeel Butt , Roman Gushchin , Joonsoo Kim , Chris Down , Yang Shi , Jakub Kicinski , linux-block@vger.kernel.org (open list:BLOCK LAYER), linux-kernel@vger.kernel.org (open list), cgroups@vger.kernel.org (open list:CONTROL GROUP (CGROUP)), linux-mm@kvack.org (open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)) Subject: [PATCH v8 0/3] Charge loop device i/o to issuing cgroup Date: Mon, 31 Aug 2020 11:36:57 -0400 Message-Id: <20200831153704.16848-1-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.21.3 MIME-Version: 1.0 X-Rspamd-Queue-Id: A6F20100410D0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Much of the discussion about this has died down. There's been a concern raised that we could generalize infrastructure across loop, md, etc. This may be possible, in the future, but it isn't clear to me how this would look like. I'm inclined to fix the existing issue with loop devices now (this is a problem we hit at FB) and address consolidation with other cases if and when those need to be addressed. Note that this series needs to be based off of Roman Gushchin's patch (https://lkml.org/lkml/2020/8/21/1464) to compile. Changes since V8: * Rebased on top of Roman Gushchin's patch (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting support for setting active memcg. Dropped the patch from this series that did the same thing. Changes since V7: * Rebased against linus's branch Changes since V6: * Added separate spinlock for worker synchronization * Minor style changes Changes since V5: * Fixed a missing css_put when failing to allocate a worker * Minor style changes Changes since V4: Only patches 1 and 2 have changed. * Fixed irq lock ordering bug * Simplified loop detach * Added support for nesting memalloc_use_memcg Changes since V3: * Fix race on loop device destruction and deferred worker cleanup * Ensure charge on shmem_swapin_page works just like getpage * Minor style changes Changes since V2: * Deferred destruction of workqueue items so in the common case there is no allocation needed Changes since V1: * Split out and reordered patches so cgroup charging changes are separate from kworker -> workqueue change * Add mem_css to struct loop_cmd to simplify logic The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=/sys/fs/cgroup/test.slice LOOP_DEV=/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; dd if=/dev/zero of=/tmp/file bs=1M count=256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=/dev/zero of=$LOOP_DEV bs=1M count=256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. Dan Schatzberg (3): loop: Use worker per cgroup instead of kworker mm: Charge active memcg when no mm is set loop: Charge i/o to mem and blk cg drivers/block/loop.c | 248 ++++++++++++++++++++++++++++++------- drivers/block/loop.h | 15 ++- include/linux/memcontrol.h | 6 + kernel/cgroup/cgroup.c | 1 + mm/memcontrol.c | 11 +- mm/shmem.c | 4 +- 6 files changed, 232 insertions(+), 53 deletions(-)