From patchwork Mon Nov 26 16:12:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 10698591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F75C13AD for ; Mon, 26 Nov 2018 16:12:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0ED0829FD4 for ; Mon, 26 Nov 2018 16:12:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F08E529FD9; Mon, 26 Nov 2018 16:12:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,SUBJ_OBFU_PUNCT_MANY autolearn=no version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B3F4229FD4 for ; Mon, 26 Nov 2018 16:12:44 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 92A8521A09130; Mon, 26 Nov 2018 08:12:44 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=2607:7c80:54:e::133; helo=bombadil.infradead.org; envelope-from=willy@infradead.org; receiver=linux-nvdimm@lists.01.org Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 18BBE211958EF for ; Mon, 26 Nov 2018 08:12:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=BF8KkwSpAEuV9jVmC/PcmZmSI+rHhH1viDCaVBbj7vg=; b=dVxpCe9byA0PCxLXqHPwTkhEHR TMy06KAixfwSQ2YD3GN5j5HbqrMtmaMFMjAqAcb0rxGyASSav6YzbjLeMB1wgiVINJgKJ05xi8X0n oUamoOOn0Iwfk71dm8Mq4Cm3Sy3ACy6BBMuvcm7dTtL5h1gZTk/yVj0hSRxFaz6WZYgGnML1N92bM eZ1WKwGmFssyZmMVh7UWeCe9nnI0ARETq8MZOTriqiNqcPU0BQMcQYtWMmbLkuPrivyOo7GmgA3F7 TDXjGoa6EVTrgbv+rwk5LPdX6iNrHstN/TOeb1iYISFlXl/JboFp79kvwEUOYOJsbU0DKWC5eXCqn 8gfETBHQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1gRJV2-0001ch-Ue; Mon, 26 Nov 2018 16:12:40 +0000 Date: Mon, 26 Nov 2018 08:12:40 -0800 From: Matthew Wilcox To: Dan Williams Subject: dax_lock_mapping_entry was never safe Message-ID: <20181126161240.GH3065@bombadil.infradead.org> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.9.2 (2017-12-15) X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-fsdevel@vger.kernel.org, Jan Kara , linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP I noticed this path while I was doing the 4.19 backport of dax: Avoid losing wakeup in dax_lock_mapping_entry xa_unlock_irq(&mapping->i_pages); revalidate = wait_fn(); finish_wait(wq, &ewait.wait); xa_lock_irq(&mapping->i_pages); It's not safe to call xa_lock_irq() if mapping can have been freed while we slept. We'll probably get away with it; most filesystems use a unique slab for their inodes, so you'll likely get either a freed inode or an inode which is now the wrong inode. But if that page has been freed back to the page allocator, that pointer could now be pointing at anything. Fixing this in the current codebase is no easier than fixing it in the 4.19 codebase. This is the best I've come up with. Could we do better by not using the _exclusive form of prepare_to_wait()? I'm not familiar with all the things that need to be considered when using this family of interfaces. diff --git a/fs/dax.c b/fs/dax.c index 9bcce89ea18e..154b592b18eb 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas) } } +static void wait_unlocked_entry(struct xa_state *xas, void *entry) +{ + struct wait_exceptional_entry_queue ewait; + wait_queue_head_t *wq; + + init_wait(&ewait.wait); + ewait.wait.func = wake_exceptional_entry_func; + + wq = dax_entry_waitqueue(xas, entry, &ewait.key); + prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE); + xas_unlock_irq(xas); + /* We can no longer look at xas */ + schedule(); + finish_wait(wq, &ewait.wait); + if (waitqueue_active(wq)) + __wake_up(wq, TASK_NORMAL, 1, &ewait.key); +} + static void put_unlocked_entry(struct xa_state *xas, void *entry) { /* If we were the only waiter woken, wake the next one */ @@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page) entry = xas_load(&xas); if (dax_is_locked(entry)) { rcu_read_unlock(); - entry = get_unlocked_entry(&xas); - xas_unlock_irq(&xas); - put_unlocked_entry(&xas, entry); + wait_unlocked_entry(&xas, entry); rcu_read_lock(); continue; }