From patchwork Tue Mar 16 15:36:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Schatzberg X-Patchwork-Id: 12142533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06543C433DB for ; Tue, 16 Mar 2021 15:37:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3C33650EB for ; Tue, 16 Mar 2021 15:37:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3C33650EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2B3E78D0009; Tue, 16 Mar 2021 11:37:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28A9E8D0001; Tue, 16 Mar 2021 11:37:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 104778D0009; Tue, 16 Mar 2021 11:37:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id E8E338D0001 for ; Tue, 16 Mar 2021 11:37:17 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A6024181AF5D0 for ; Tue, 16 Mar 2021 15:37:17 +0000 (UTC) X-FDA: 77926141314.24.C6389DB Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf17.hostedemail.com (Postfix) with ESMTP id 4759440F8C0F for ; Tue, 16 Mar 2021 15:37:15 +0000 (UTC) Received: by mail-qt1-f171.google.com with SMTP id 73so11934029qtg.13 for ; Tue, 16 Mar 2021 08:37:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Tkj0ia5hpCZDucubv8I4sizyR82LNTriAr2vMPAQj/E=; b=fmqt1PM3c9r4ycT0+M0peIYt0t0qYLcjBuaENLbx/bgURKlepe0/DFmwjyLv+NL1Bp 6BU8XJAV59vNYUVC9nXjmBpPTRRNCP/CTiplDVoETJt9qLIDRkS7W2GRQKQczjlEU/yv NJ5y4i+hWh9cJyzwXPEsmB19k3cNXI7y8h4SyZmhZdnIIvSKoXFtqiMXn54u107oblhr qN9rTSkdIoB3UUciSVvUOqZeHFgmPwkCL456MECrA54N5VUoh3QnQqHqs+k3d4jOldwF cDzQ8oeLbXMQ+chuTZxAdS2KeMuB9lkB40aYVhq36Gfdj2n4zSKFXWJ4ANDmsMvl60rK qCuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Tkj0ia5hpCZDucubv8I4sizyR82LNTriAr2vMPAQj/E=; b=YP3MZRONmLGZuBw2SQ1Ir5C5T8MMzuwHcYbwHpX1sJVuPy18Cvrs7Me+2WomO5mBBS rYpsssO3l9xZz0A1acwbdbaoM7DITcDyP/KTam4ZIRczWSehPu9N6R1fieaTw8scK7ag u3Cj2zJjkEyn2v39D3xX9gM42Ox2Jf5F4i2/kImA5GhsBVHgJ9GgvTKy+itF+GyCMmt0 lG4XVVxol617Umke1digKIzgjjIT6AYw5dV10c+mIilzeGdGGftatBKzyK494suEpc8o L7hvCgY5g0yyPl7i+h+OM/zk5dpb8490EL9gwCb5SNIeCHOiK4SEos4LCWyUEx60FfPk /Amw== X-Gm-Message-State: AOAM530dzXWDOD/VH5z7DKSmyjlm9djlzxY5NShSoyd5lpi8t585jyZ3 czs72Vo27h4z6J9ira85A9Q= X-Google-Smtp-Source: ABdhPJwemAre0PlJRoHQM1SeNE4jPOqETZrw2LN1kDiEtB+J4YeYq2Pdj3MbPBjFrm3tG+aawQdU1A== X-Received: by 2002:ac8:7f52:: with SMTP id g18mr273540qtk.250.1615909029421; Tue, 16 Mar 2021 08:37:09 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:1ee]) by smtp.gmail.com with ESMTPSA id v7sm15321005qkv.86.2021.03.16.08.37.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Mar 2021 08:37:09 -0700 (PDT) From: Dan Schatzberg To: Cc: Jens Axboe , Tejun Heo , Zefan Li , Johannes Weiner , Andrew Morton , Michal Hocko , Vladimir Davydov , Hugh Dickins , Shakeel Butt , Roman Gushchin , Muchun Song , Alex Shi , Alexander Duyck , Chris Down , Yafang Shao , Wei Yang , linux-block@vger.kernel.org (open list:BLOCK LAYER), linux-kernel@vger.kernel.org (open list), cgroups@vger.kernel.org (open list:CONTROL GROUP (CGROUP)), linux-mm@kvack.org (open list:MEMORY MANAGEMENT) Subject: [PATCH v10 0/3] Charge loop device i/o to issuing cgroup Date: Tue, 16 Mar 2021 08:36:49 -0700 Message-Id: <20210316153655.500806-1-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Stat-Signature: mybwytt6rsxftt3zxeij7hkr4njbp1w7 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4759440F8C0F Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mail-qt1-f171.google.com; client-ip=209.85.160.171 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615909035-590523 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No major changes, just rebasing and resubmitting Changes since V10: * Added page-cache charging to mm: Charge active memcg when no mm is set Changes since V9: * Rebased against linus's branch which now includes Roman Gushchin's patch this series is based off of Changes since V8: * Rebased on top of Roman Gushchin's patch (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting support for setting active memcg. Dropped the patch from this series that did the same thing. Changes since V7: * Rebased against linus's branch Changes since V6: * Added separate spinlock for worker synchronization * Minor style changes Changes since V5: * Fixed a missing css_put when failing to allocate a worker * Minor style changes Changes since V4: Only patches 1 and 2 have changed. * Fixed irq lock ordering bug * Simplified loop detach * Added support for nesting memalloc_use_memcg Changes since V3: * Fix race on loop device destruction and deferred worker cleanup * Ensure charge on shmem_swapin_page works just like getpage * Minor style changes Changes since V2: * Deferred destruction of workqueue items so in the common case there is no allocation needed Changes since V1: * Split out and reordered patches so cgroup charging changes are separate from kworker -> workqueue change * Add mem_css to struct loop_cmd to simplify logic The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=/sys/fs/cgroup/test.slice LOOP_DEV=/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; dd if=/dev/zero of=/tmp/file bs=1M count=256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=/dev/zero of=$LOOP_DEV bs=1M count=256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. Dan Schatzberg (3): loop: Use worker per cgroup instead of kworker mm: Charge active memcg when no mm is set loop: Charge i/o to mem and blk cg drivers/block/loop.c | 248 ++++++++++++++++++++++++++++++------- drivers/block/loop.h | 15 ++- include/linux/memcontrol.h | 11 ++ kernel/cgroup/cgroup.c | 1 + mm/filemap.c | 2 +- mm/memcontrol.c | 15 ++- mm/shmem.c | 4 +- 7 files changed, 242 insertions(+), 54 deletions(-)