From patchwork Mon Nov 5 15:22:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vitaly Wool X-Patchwork-Id: 10668441 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6676E15A6 for ; Mon, 5 Nov 2018 15:22:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54F5129AAC for ; Mon, 5 Nov 2018 15:22:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4956729ABC; Mon, 5 Nov 2018 15:22:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2925B29ABE for ; Mon, 5 Nov 2018 15:22:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE72B6B0007; Mon, 5 Nov 2018 10:22:31 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D6D9C6B0008; Mon, 5 Nov 2018 10:22:31 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE73A6B0010; Mon, 5 Nov 2018 10:22:31 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by kanga.kvack.org (Postfix) with ESMTP id 4E0716B0007 for ; Mon, 5 Nov 2018 10:22:31 -0500 (EST) Received: by mail-lj1-f200.google.com with SMTP id a1-v6so2703833ljk.7 for ; Mon, 05 Nov 2018 07:22:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:from:to:cc:subject :message-id:mime-version:content-transfer-encoding; bh=u0yyA1W2KMF2mBPnxO39hGAhOsYu28DbBMo8SZUUdTE=; b=Lo5/TSDUjozziR/SjJ/Wv27q83lCnobdr/Jcumw2lKO9QAQihDB70PwyRqeWs/RExl WuffLXuq1pYS/OtT6HKa9QYPgVC6MB0nZcxIyC5Jd8hPNrlbcLKNG84NhVW4BJSx56LS ZKBRW08UOuBk37M3v9MdzL8uDoMdGNc/KZ9/OhkXL7csXNCX8oW/LTH99auipQSKJm3c T+AIKsIotMyPEp4/6pocsRIur8I1orG9R3BieZ2F77B4KD1rVpp/ACWgtQ4K3D55TgSP u+6zAJy8f4+b2l/qBREkTeaKF+TxdltEALxeenrLo/LnLFtXtSWTVYArtZy4akPbVp8e MfOA== X-Gm-Message-State: AGRZ1gIQlKVde0x3fj4Ox1VNp1mr7Nahaq7YGp/ytDmKk+dA2VmRsJf5 t4GAvA2lVTEGOv8UsKAMrx1rmFhm3twBdFGSl9We+3h1cDSk4dO/AEbVX490hipLzfjz2yTn/s0 nwj03fnapsVbkrnS2+iGb/Hq/N13YZuwQN64CbHHtVzopqLCR4gQKu9aKP8HTZWa5/bXI75gZO4 T777SxsPcDhdVWrTuBKK2UnFTe1jl5WL7MA0fv7dVNrx3KxU4Irxjh9CBKwDyo9EP+UEX0Df1/p PYAhipq4PV+6F4vUnwMDgGDvZXQ+j4Yf7IOPJQTu8rqn0WGYQw2xWs/uMJyvLyesuhn8Fw93O8I q8RfYFMhatLdbnoteJUrnaUhHKuSs/t2PLLvsxWbnvvEd3Yh2DzOrtuOtkWoTzm/RUZJzzYLGEN j X-Received: by 2002:a2e:478f:: with SMTP id u137-v6mr14472954lja.142.1541431350300; Mon, 05 Nov 2018 07:22:30 -0800 (PST) X-Received: by 2002:a2e:478f:: with SMTP id u137-v6mr14472878lja.142.1541431348558; Mon, 05 Nov 2018 07:22:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541431348; cv=none; d=google.com; s=arc-20160816; b=UzzZGMVlIzYNKu00plA3RaQUHyB3qRPZtKu+aZs3NtXuKaIJWNpFh2SbWIlouTxlq0 q7LcZ7KhILPJjyOgWKqZdZVuKLQLMXSRaB8+O7DUnVYjgPQZVeQg6HgI/j4/Q1iJizHs fyW2vbgonRCInpZrkk+6nupFnfgLjKscGZa287hKKOFOe1RewmOeQc6+j+O+eRbAfxv4 PDD+gEV/ggaLIVm03fdyNyFYlY8F1EJd430rD8zJ0PM2gRxeUiw1eapEqMmCaNWDNk2p c00S2WrGDBJZPZVW3vSsUiGvwImY89OAPJiw9RMicdJYwT1nbQK0YMXm2Ff6s/vssX3O SutQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:subject:cc:to :from:date:dkim-signature; bh=u0yyA1W2KMF2mBPnxO39hGAhOsYu28DbBMo8SZUUdTE=; b=zb1Oidcpc1jNzU2viJF7KHGDb5c605dahaOL6+l5sG2Vypzd32m+LFCtDzAcWip+R0 ve2mZhs7YV76GKaFGVdpcYNjjBad6rwb6HOd5/wyCfTuiydUegxAUwET7zmNqynysx1S Mju1S29L25mGxZpDYdtW97YLapuhWlXoikpmK/Va3Zwx1EcjXCHyVFYm/c8UxxWBDFb1 o0oXnJ3z/9YLjBYdH8taw1cywbUbmmVlduXIWRsl0euXGYWjyDJyWtxpKLcpItXJw+GO PRLh+qBM7LXEK5IIpZS2l7NEOZA7BlSgEJGwFFx8ZYhxHB3OUvtN8kCTm4avk8pcaEtO NKCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nCDFwcWp; spf=pass (google.com: domain of vitalywool@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=vitalywool@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id u5-v6sor1911627lja.17.2018.11.05.07.22.28 for (Google Transport Security); Mon, 05 Nov 2018 07:22:28 -0800 (PST) Received-SPF: pass (google.com: domain of vitalywool@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nCDFwcWp; spf=pass (google.com: domain of vitalywool@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=vitalywool@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mime-version :content-transfer-encoding; bh=u0yyA1W2KMF2mBPnxO39hGAhOsYu28DbBMo8SZUUdTE=; b=nCDFwcWp4w8mbq9XNvPu92p8rrVZcuzg0f2eS0boApTIlBaucJDPsESPaz4xIIxym4 RWtbaYmLX5uPygl5MNi4no6ReLt/AWlPTzbicxPrnW26ZMkbQj95wdgmnMAhbhmH8NCc +jAcq4NccakjMAbDW9BkMWzfxsRGC74w2mtIFoa2bHAZheNTnekhhJdWfBl1NFD3Iojr JRAPtus0tE4djMadSo+bx4xgn0A05YA/tlSMcYSUmkjVpS58QATE4AyL9gIWxwPleDWg VYjK7ACYr1ry9iIF1xPMRKK70CF3MUDQt2rHlXqFUNWHgEAbMdhhQFlMlFLtCbEQurxW BhTg== X-Google-Smtp-Source: AJdET5ee+/ikYEu6osnmvSMwnHbI22WNzeY33dGO5lDnT+FtZTj8X8XIVmblctz07NtwVMZj1OWv0Q== X-Received: by 2002:a2e:9c08:: with SMTP id s8-v6mr12333005lji.149.1541431347221; Mon, 05 Nov 2018 07:22:27 -0800 (PST) Received: from seldlx21914.corpusers.net ([37.139.156.40]) by smtp.gmail.com with ESMTPSA id f15-v6sm2805919lfc.9.2018.11.05.07.22.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Nov 2018 07:22:26 -0800 (PST) Date: Mon, 5 Nov 2018 16:22:25 +0100 From: Vitaly Wool To: Linux-MM , linux-kernel@vger.kernel.org Cc: Andrew Morton , Oleksiy.Avramchenko@sony.com, Guenter Roeck , snild@sony.com Subject: [PATCH] z3fold: fix possible reclaim races Message-Id: <20181105162225.74e8837d03583a9b707cf559@gmail.com> X-Mailer: Sylpheed 3.5.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu) Mime-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Reclaim and free can race on an object which is basically fine but in order for reclaim to be able to map "freed" object we need to encode object length in the handle. handle_to_chunks() is then introduced to extract object length from a handle and use it during mapping. Moreover, to avoid racing on a z3fold "headless" page release, we should not try to free that page in z3fold_free() if the reclaim bit is set. Also, in the unlikely case of trying to reclaim a page being freed, we should not proceed with that page. While at it, fix the page accounting in reclaim function. This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups". Signed-off-by: Vitaly Wool Reviewed-by: Snild Dolkow Signed-off-by: Jongseok Kim Signed-off-by: Andrew Morton --- mm/z3fold.c | 101 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 62 insertions(+), 39 deletions(-) diff --git a/mm/z3fold.c b/mm/z3fold.c index ff73ef8124bf..5999727ee17c 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -99,6 +99,7 @@ struct z3fold_header { #define NCHUNKS ((PAGE_SIZE - ZHDR_SIZE_ALIGNED) >> CHUNK_SHIFT) #define BUDDY_MASK (0x3) +#define BUDDY_SHIFT 2 /** * struct z3fold_pool - stores metadata for each z3fold pool @@ -145,7 +146,7 @@ enum z3fold_page_flags { MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, PAGE_STALE, - UNDER_RECLAIM + PAGE_CLAIMED, /* by either reclaim or free */ }; /***************** @@ -174,7 +175,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, &page->private); clear_bit(NEEDS_COMPACTING, &page->private); clear_bit(PAGE_STALE, &page->private); - clear_bit(UNDER_RECLAIM, &page->private); + clear_bit(PAGE_CLAIMED, &page->private); spin_lock_init(&zhdr->page_lock); kref_init(&zhdr->refcount); @@ -223,8 +224,11 @@ static unsigned long encode_handle(struct z3fold_header *zhdr, enum buddy bud) unsigned long handle; handle = (unsigned long)zhdr; - if (bud != HEADLESS) - handle += (bud + zhdr->first_num) & BUDDY_MASK; + if (bud != HEADLESS) { + handle |= (bud + zhdr->first_num) & BUDDY_MASK; + if (bud == LAST) + handle |= (zhdr->last_chunks << BUDDY_SHIFT); + } return handle; } @@ -234,6 +238,12 @@ static struct z3fold_header *handle_to_z3fold_header(unsigned long handle) return (struct z3fold_header *)(handle & PAGE_MASK); } +/* only for LAST bud, returns zero otherwise */ +static unsigned short handle_to_chunks(unsigned long handle) +{ + return (handle & ~PAGE_MASK) >> BUDDY_SHIFT; +} + /* * (handle & BUDDY_MASK) < zhdr->first_num is possible in encode_handle * but that doesn't matter. because the masking will result in the @@ -720,37 +730,39 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) page = virt_to_page(zhdr); if (test_bit(PAGE_HEADLESS, &page->private)) { - /* HEADLESS page stored */ - bud = HEADLESS; - } else { - z3fold_page_lock(zhdr); - bud = handle_to_buddy(handle); - - switch (bud) { - case FIRST: - zhdr->first_chunks = 0; - break; - case MIDDLE: - zhdr->middle_chunks = 0; - zhdr->start_middle = 0; - break; - case LAST: - zhdr->last_chunks = 0; - break; - default: - pr_err("%s: unknown bud %d\n", __func__, bud); - WARN_ON(1); - z3fold_page_unlock(zhdr); - return; + /* if a headless page is under reclaim, just leave. + * NB: we use test_and_set_bit for a reason: if the bit + * has not been set before, we release this page + * immediately so we don't care about its value any more. + */ + if (!test_and_set_bit(PAGE_CLAIMED, &page->private)) { + spin_lock(&pool->lock); + list_del(&page->lru); + spin_unlock(&pool->lock); + free_z3fold_page(page); + atomic64_dec(&pool->pages_nr); } + return; } - if (bud == HEADLESS) { - spin_lock(&pool->lock); - list_del(&page->lru); - spin_unlock(&pool->lock); - free_z3fold_page(page); - atomic64_dec(&pool->pages_nr); + /* Non-headless case */ + z3fold_page_lock(zhdr); + bud = handle_to_buddy(handle); + + switch (bud) { + case FIRST: + zhdr->first_chunks = 0; + break; + case MIDDLE: + zhdr->middle_chunks = 0; + break; + case LAST: + zhdr->last_chunks = 0; + break; + default: + pr_err("%s: unknown bud %d\n", __func__, bud); + WARN_ON(1); + z3fold_page_unlock(zhdr); return; } @@ -758,7 +770,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(&pool->pages_nr); return; } - if (test_bit(UNDER_RECLAIM, &page->private)) { + if (test_bit(PAGE_CLAIMED, &page->private)) { z3fold_page_unlock(zhdr); return; } @@ -836,20 +848,30 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) } list_for_each_prev(pos, &pool->lru) { page = list_entry(pos, struct page, lru); + + /* this bit could have been set by free, in which case + * we pass over to the next page in the pool. + */ + if (test_and_set_bit(PAGE_CLAIMED, &page->private)) + continue; + + zhdr = page_address(page); if (test_bit(PAGE_HEADLESS, &page->private)) - /* candidate found */ break; - zhdr = page_address(page); - if (!z3fold_page_trylock(zhdr)) + if (!z3fold_page_trylock(zhdr)) { + zhdr = NULL; continue; /* can't evict at this point */ + } kref_get(&zhdr->refcount); list_del_init(&zhdr->buddy); zhdr->cpu = -1; - set_bit(UNDER_RECLAIM, &page->private); break; } + if (!zhdr) + break; + list_del_init(&page->lru); spin_unlock(&pool->lock); @@ -898,6 +920,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) if (test_bit(PAGE_HEADLESS, &page->private)) { if (ret == 0) { free_z3fold_page(page); + atomic64_dec(&pool->pages_nr); return 0; } spin_lock(&pool->lock); @@ -905,7 +928,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) spin_unlock(&pool->lock); } else { z3fold_page_lock(zhdr); - clear_bit(UNDER_RECLAIM, &page->private); + clear_bit(PAGE_CLAIMED, &page->private); if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { atomic64_dec(&pool->pages_nr); @@ -964,7 +987,7 @@ static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle) set_bit(MIDDLE_CHUNK_MAPPED, &page->private); break; case LAST: - addr += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); + addr += PAGE_SIZE - (handle_to_chunks(handle) << CHUNK_SHIFT); break; default: pr_err("unknown buddy id %d\n", buddy);