From patchwork Fri May  8 18:30:49 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Johannes Weiner <hannes@cmpxchg.org>
X-Patchwork-Id: 11537375
Return-Path: <SRS0=W/Kb=6W=kvack.org=owner-linux-mm@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5394415AB
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Fri,  8 May 2020 18:32:20 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 13EF02192A
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Fri,  8 May 2020 18:32:20 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (2048-bit key)
 header.d=cmpxchg-org.20150623.gappssmtp.com
 header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="E2/VFAYi"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13EF02192A
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=cmpxchg.org
Authentication-Results: mail.kernel.org;
 spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 1A6B7900003; Fri,  8 May 2020 14:32:17 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 132908E0006; Fri,  8 May 2020 14:32:17 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 020E6900003; Fri,  8 May 2020 14:32:16 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com
 [216.40.44.62])
	by kanga.kvack.org (Postfix) with ESMTP id C78DC8E0006
	for <linux-mm@kvack.org>; Fri,  8 May 2020 14:32:16 -0400 (EDT)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com
 [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 9286E2DFD
	for <linux-mm@kvack.org>; Fri,  8 May 2020 18:32:16 +0000 (UTC)
X-FDA: 76794396672.22.dad01_6b21996f64b18
X-Spam-Summary: 
 2,0,0,b99e1c0719873e0a,d41d8cd98f00b204,hannes@cmpxchg.org,,RULES_HIT:41:69:355:379:541:800:960:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1543:1711:1730:1747:1777:1792:2194:2198:2199:2200:2393:2553:2559:2562:2693:2895:2897:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3871:3872:3874:4117:4250:5007:6119:6261:6653:6742:7903:8957:10004:11026:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:13161:13229:13894:14096:14181:14394:14721:21080:21444:21450:21451:21611:21627:30012:30045:30054:30070:30090,0,RBL:209.85.160.196:@cmpxchg.org:.lbl8.mailshell.net-62.2.0.100
 66.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not
 bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none
X-HE-Tag: dad01_6b21996f64b18
X-Filterd-Recvd-Size: 6436
Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com
 [209.85.160.196])
	by imf45.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri,  8 May 2020 18:32:16 +0000 (UTC)
Received: by mail-qt1-f196.google.com with SMTP id j2so1844591qtr.12
        for <linux-mm@kvack.org>; Fri, 08 May 2020 11:32:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20150623.gappssmtp.com; s=20150623;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=EjChPX7glAdooI23QIihtHHWKSsyoFkOYmamMQuDTUs=;
        b=E2/VFAYie4kY0oIe+I5W4UoCkIR1MFpm8OEx4ahvA8yieQG6LQNzAimM1n3W6Li+Oc
         I0/5BUjFGqhTxN8yW7GlZ3Qzgx77PFKUTO/h1kkZEBPp6kUrL/LmkNlQb1nLZXjWmB0e
         rVfbzBbUsU1z770LBdlYdN5Lm4Dn5WiB6Yvl/XrvK1A2xYQHB9jIvQZjN+BkQ6AHKJ0Z
         vLZWRfxOlBhIZzVQIEwWwp+RcnrRAK1wpNAGjkQX5HJxuSN0dt59OcKSyfcOZ8DqFpJe
         5E+Xw3g91V+3mRBPj1qstLwYoLSWZwC6SzFfsjqKkKRqw6wOVDd6flbMtu4wLGK37FZw
         LjbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=EjChPX7glAdooI23QIihtHHWKSsyoFkOYmamMQuDTUs=;
        b=mR4xX4HQIogXk7GGXlMVPo8ymXzc8kQPVdniRcNLZ14I/yfjCUw8zKzgGzaX+XHUbJ
         YSMdnpomKDx4u2PT+A18GYzg2pyDrQayRGTjJ1KuOcqtD9b38IFKC9TE29JmYzGPE5FH
         REykhAQIM5tWezdoFv7yFoQN4Tnd40pRHbqGTl12UTWO2BEqEYniXOCUJWSu9mOXz/T6
         Z6la2DZV/hANPhqDBK7cuedFdBcRNoq2P4WWWaKfH847RATqCv28U1LAEBPRbNHru7B2
         AjtRFcGKn36kITJ1pXN5eXj3PoN9+mtZ/khZowGjx2TKjy48bt06RufWSDJX9Mxft7TB
         70zQ==
X-Gm-Message-State: AGi0Pua3sry1FymNK1QcnKX1gENDxuftIp40t6wPhkH7lpidQG7MDW8V
	fFT680tfoBc9kvacdQ/L0G2Rhg==
X-Google-Smtp-Source: 
 APiQypKfEaNko+TNl5fRiEjFygGSRu6c+fbfE4Rvr4dO+JYHm+E6+/ujwz88q8b+m6+BImDW3PWNwA==
X-Received: by 2002:ac8:c0d:: with SMTP id k13mr1670052qti.136.1588962735392;
        Fri, 08 May 2020 11:32:15 -0700 (PDT)
Received: from localhost ([2620:10d:c091:480::1:2627])
        by smtp.gmail.com with ESMTPSA id
 g11sm1725212qkk.106.2020.05.08.11.32.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 08 May 2020 11:32:14 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	Joonsoo Kim <js1304@gmail.com>,
	Shakeel Butt <shakeelb@google.com>,
	Hugh Dickins <hughd@google.com>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Roman Gushchin <guro@fb.com>,
	linux-mm@kvack.org,
	cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	kernel-team@fb.com
Subject: [PATCH 02/19] mm: memcontrol: fix stat-corrupting race in charge
 moving
Date: Fri,  8 May 2020 14:30:49 -0400
Message-Id: <20200508183105.225460-3-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.26.2
In-Reply-To: <20200508183105.225460-1-hannes@cmpxchg.org>
References: <20200508183105.225460-1-hannes@cmpxchg.org>
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

The move_lock is a per-memcg lock, but the VM accounting code that
needs to acquire it comes from the page and follows page->mem_cgroup
under RCU protection. That means that the page becomes unlocked not
when we drop the move_lock, but when we update page->mem_cgroup. And
that assignment doesn't imply any memory ordering. If that pointer
write gets reordered against the reads of the page state -
page_mapped, PageDirty etc. the state may change while we rely on it
being stable and we can end up corrupting the counters.

Place an SMP memory barrier to make sure we're done with all page
state by the time the new page->mem_cgroup becomes visible.

Also replace the open-coded move_lock with a lock_page_memcg() to make
it more obvious what we're serializing against.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 317dbbaac603..cdd29b59929b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5376,7 +5376,6 @@ static int mem_cgroup_move_account(struct page *page,
 {
 	struct lruvec *from_vec, *to_vec;
 	struct pglist_data *pgdat;
-	unsigned long flags;
 	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 	int ret;
 	bool anon;
@@ -5403,18 +5402,13 @@ static int mem_cgroup_move_account(struct page *page,
 	from_vec = mem_cgroup_lruvec(from, pgdat);
 	to_vec = mem_cgroup_lruvec(to, pgdat);
 
-	spin_lock_irqsave(&from->move_lock, flags);
+	lock_page_memcg(page);
 
 	if (!anon && page_mapped(page)) {
 		__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
 		__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
 	}
 
-	/*
-	 * move_lock grabbed above and caller set from->moving_account, so
-	 * mod_memcg_page_state will serialize updates to PageDirty.
-	 * So mapping should be stable for dirty pages.
-	 */
 	if (!anon && PageDirty(page)) {
 		struct address_space *mapping = page_mapping(page);
 
@@ -5430,15 +5424,23 @@ static int mem_cgroup_move_account(struct page *page,
 	}
 
 	/*
+	 * All state has been migrated, let's switch to the new memcg.
+	 *
 	 * It is safe to change page->mem_cgroup here because the page
-	 * is referenced, charged, and isolated - we can't race with
-	 * uncharging, charging, migration, or LRU putback.
+	 * is referenced, charged, isolated, and locked: we can't race
+	 * with (un)charging, migration, LRU putback, or anything else
+	 * that would rely on a stable page->mem_cgroup.
+	 *
+	 * Note that lock_page_memcg is a memcg lock, not a page lock,
+	 * to save space. As soon as we switch page->mem_cgroup to a
+	 * new memcg that isn't locked, the above state can change
+	 * concurrently again. Make sure we're truly done with it.
 	 */
+	smp_mb();
 
-	/* caller should have done css_get */
-	page->mem_cgroup = to;
+	page->mem_cgroup = to; 	/* caller should have done css_get */
 
-	spin_unlock_irqrestore(&from->move_lock, flags);
+	__unlock_page_memcg(from);
 
 	ret = 0;