From patchwork Fri Jan 26 23:09:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10187181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9E37260211 for ; Fri, 26 Jan 2018 23:10:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A18A25EF7 for ; Fri, 26 Jan 2018 23:10:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D09828B66; Fri, 26 Jan 2018 23:10:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F33752A44A for ; Fri, 26 Jan 2018 23:10:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751541AbeAZXKA (ORCPT ); Fri, 26 Jan 2018 18:10:00 -0500 Received: from mail-pf0-f193.google.com ([209.85.192.193]:37778 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751431AbeAZXJ7 (ORCPT ); Fri, 26 Jan 2018 18:09:59 -0500 Received: by mail-pf0-f193.google.com with SMTP id p1so1239757pfh.4 for ; Fri, 26 Jan 2018 15:09:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=pDN17fggiL+ymmEMmtPaFRGpGQbGmkQFAUPi/Mou1Jo=; b=n64OCeufpr4L98MrDgcdMxDbgKoR1Skp11nUsy/1PnswLYK+piVQR5dvnh1lVpjh1y IS81Ly5zqBwVacua7i0b8PFZATOsSI6mWlNAqqR6yf/QRqBBF2lqQYiklTygwyQQTIWn gnH4R5wOtk/XTOBzccD/Ezpb1/5Zr1gfOzQYrTdajoZAmNZS0lu/WF8xhz1eFSWqgewj wcZN1GSDCJIOtCmDzqlFOKbzfmbYcmXUOQTxV/JG5SIgK5ny2TBgRMGWV9P8juVTugK7 LezHjcEkSglHIEMFEdjG878CrPmKVHCEno5jxCsWWBtIWM2SLhM9M+ZvguSUP7jFzRxg MiAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=pDN17fggiL+ymmEMmtPaFRGpGQbGmkQFAUPi/Mou1Jo=; b=PWV+MbDV+hXbe96OwzdxcN66YGhonraxhW/eraXj1UCfCl6YOS9i/OcLKbV2xPmW6C QcPzmrqxZS05N+NmbtnILZOSUgzUyPCRXXDKrCbR5pDehG9KGI4+HF+8p6GHF/81kCiF cJnF6o7MA9n2BKxvn1bEHkf1I14XZN/GKHwpmSY/2UMngXXGE97noiruDImHi0zW/de8 MDw76tRFf3B/ZS020eXeaiWvE4awXuyzeJfagg0HB4EMY+Tk65kXvIY24N66t8B224nK zhDQHNmGqtmj6AqYuDRg6jGyvxgoqnDn6skXZRtdcBA99c7E/tygZYt7FFLKQezavISb ikug== X-Gm-Message-State: AKwxytduo4IJMjoi2bPjjB9FI1gmq00zYvp1ZYLdEupp9wO/JRzyemZG qYQPZHiLpeyllfSEpbEL7ZWiuaxXSr8= X-Google-Smtp-Source: AH8x225z1NroDdjWawNM1bORv2SodzgvqILzp0LasqgswPl89qpJ66gwUlhpwdfIomK9A1S9T2Tf5A== X-Received: by 10.99.127.24 with SMTP id a24mr16560926pgd.225.1517008198988; Fri, 26 Jan 2018 15:09:58 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::6:7f96]) by smtp.gmail.com with ESMTPSA id b63sm22489838pfl.78.2018.01.26.15.09.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Jan 2018 15:09:58 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com, "Eric W . Biederman" , Tejun Heo Subject: [PATCH v3] fs: only sync() superblocks reachable from the current namespace Date: Fri, 26 Jan 2018 15:09:53 -0800 Message-Id: <095717754a0f35cb8502b9763500c046cb8dcb87.1517008068.git.osandov@fb.com> X-Mailer: git-send-email 2.16.1 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Currently, the sync() syscall is system-wide, so any process in a container can cause significant I/O stalls across the system by calling sync(). This is even true for filesystems which are not accessible in the process' mount namespace. This patch scopes sync() to only write out filesystems reachable in the current mount namespace, except for the initial mount namespace, which still syncs everything to avoid surprises. This fixes the broken isolation we were seeing here. Signed-off-by: Omar Sandoval --- v2->v3: Fix stupid missed unlock bug v1->v2: Grab mount_lock while iterating mounts fs/sync.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 56 insertions(+), 14 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 6e0a2cbaf6de..654da445340a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -17,6 +17,7 @@ #include #include #include "internal.h" +#include "mount.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ SYNC_FILE_RANGE_WAIT_AFTER) @@ -68,16 +69,51 @@ int sync_filesystem(struct super_block *sb) } EXPORT_SYMBOL(sync_filesystem); -static void sync_inodes_one_sb(struct super_block *sb, void *arg) +struct sb_sync { + /* + * Only sync superblocks reachable from this namespace. If NULL, sync + * everything. + */ + struct mnt_namespace *mnt_ns; + + /* ->sync_fs() wait argument. */ + int wait; +}; + +static int sb_reachable(struct super_block *sb, struct mnt_namespace *mnt_ns) { - if (!sb_rdonly(sb)) + struct mount *mnt; + int ret = 0; + + if (!mnt_ns) + return 1; + + read_seqlock_excl(&mount_lock); + list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { + if (mnt->mnt_ns == mnt_ns) { + ret = 1; + break; + } + } + read_sequnlock_excl(&mount_lock); + return ret; +} + +static void sync_inodes_one_sb(struct super_block *sb, void *p) +{ + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns)) sync_inodes_sb(sb); } -static void sync_fs_one_sb(struct super_block *sb, void *arg) +static void sync_fs_one_sb(struct super_block *sb, void *p) { - if (!sb_rdonly(sb) && sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, *(int *)arg); + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns) && + sb->s_op->sync_fs) + sb->s_op->sync_fs(sb, arg->wait); } static void fdatawrite_one_bdev(struct block_device *bdev, void *arg) @@ -107,12 +143,18 @@ static void fdatawait_one_bdev(struct block_device *bdev, void *arg) */ SYSCALL_DEFINE0(sync) { - int nowait = 0, wait = 1; + struct sb_sync arg = { + .mnt_ns = current->nsproxy->mnt_ns, + }; + + if (arg.mnt_ns == init_task.nsproxy->mnt_ns) + arg.mnt_ns = NULL; wakeup_flusher_threads(WB_REASON_SYNC); - iterate_supers(sync_inodes_one_sb, NULL); - iterate_supers(sync_fs_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &wait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); + arg.wait = 1; + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); iterate_bdevs(fdatawait_one_bdev, NULL); if (unlikely(laptop_mode)) @@ -122,17 +164,17 @@ SYSCALL_DEFINE0(sync) static void do_sync_work(struct work_struct *work) { - int nowait = 0; + struct sb_sync arg = {}; /* * Sync twice to reduce the possibility we skipped some inodes / pages * because they were temporarily locked */ - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); printk("Emergency Sync complete\n"); kfree(work);