From patchwork Fri Jan 26 23:06:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10187179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8A65760211 for ; Fri, 26 Jan 2018 23:06:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D35C25EF7 for ; Fri, 26 Jan 2018 23:06:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6CD002A3AC; Fri, 26 Jan 2018 23:06:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F117925EF7 for ; Fri, 26 Jan 2018 23:06:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751751AbeAZXGk (ORCPT ); Fri, 26 Jan 2018 18:06:40 -0500 Received: from mail-pg0-f65.google.com ([74.125.83.65]:41967 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751431AbeAZXGh (ORCPT ); Fri, 26 Jan 2018 18:06:37 -0500 Received: by mail-pg0-f65.google.com with SMTP id 136so1161693pgd.8 for ; Fri, 26 Jan 2018 15:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=7i7lUuwLz3/LSWCerpBljSnQop2Q7X9tGuT3DKvQttQ=; b=L1TApHBrc/vGB0+hDiEkwR6D7JXHyd17Gq5LtSzSOfL+ijUmrSNz3R4nqf7nXLXD8M Qmre6m3fCqIEIHQ94C8D9xrLWZxOZMjEjcq35zJ85/0kh1BRT9WmK9LH/Uq6vE8xPeKs 9HxkPSDxdxnJ2fJCJnnyCInM5IykRSUGSZdlHrXOkfMqKXLiIh/ue7c4LbwX/HkLHYoX KD3tKwkyEHu/+jgruozMmvicc7cCD01BT+UUKuBLRIHTMGQ2A/f3qEnb+LUtAQhrsYzn UVZnJX3bWZUDzS35aypAUR49ZGvCFFi3Z/wk0SkMYO639FU66dJU7p+SULtVI0MdUEiW Us8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=7i7lUuwLz3/LSWCerpBljSnQop2Q7X9tGuT3DKvQttQ=; b=CZGuG9uVq/HmLABI3M+1Kx22IjELMJDhb2fSEueTRaxJ0KgeibsgyXxYhHe8ovaRMA p7IE0NqnjowvD9VLUxlfRlkbw4Rp98jY7/tPyBXPU0NC9BsspfFMYMuVIIyFX8R6DKZ5 x1LPWvKBJnTWy8PijFWINZlA0AZubHEtHNLmcJARx1f5mZfS0BiovchFIceLhUiDXRHx O80tQg+AB6FZVGA3HqCo+1421zEJ6XVTUiREpQkhrEP2+YuXI3P/NwwTgFVTFdQLjhWA pwnhHo1Wit7hR893oQA7OA36epBVwW8vmGFfGDhtwycXcJc5x7SlQN01HOmBP9SsGRdo Nr6Q== X-Gm-Message-State: AKwxytfhWs9V5fDuvO/FHkm3zctXR516boGmfaZgmvr5pWXLVtQT63G6 wWCYngwi4T8VRow9XozfMeY0I4GG8+s= X-Google-Smtp-Source: AH8x226z/1fcj/X5f1w1F8zoDuyWoa68bODF5ur6M8cTM7x0hVf+WsrkMvTKC0trswWYZa2GOYqqZw== X-Received: by 2002:a17:902:2803:: with SMTP id e3-v6mr12584110plb.447.1517007996929; Fri, 26 Jan 2018 15:06:36 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::6:7f96]) by smtp.gmail.com with ESMTPSA id t71sm22764904pfg.115.2018.01.26.15.06.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Jan 2018 15:06:36 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com, "Eric W . Biederman" , Tejun Heo Subject: [PATCH v2] fs: only sync() superblocks reachable from the current namespace Date: Fri, 26 Jan 2018 15:06:31 -0800 Message-Id: X-Mailer: git-send-email 2.16.1 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Currently, the sync() syscall is system-wide, so any process in a container can cause significant I/O stalls across the system by calling sync(). This is even true for filesystems which are not accessible in the process' mount namespace. This patch scopes sync() to only write out filesystems reachable in the current mount namespace, except for the initial mount namespace, which still syncs everything to avoid surprises. This fixes the broken isolation we were seeing here. Signed-off-by: Omar Sandoval --- Grab mount_lock while iterating mounts. fs/sync.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 53 insertions(+), 14 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 6e0a2cbaf6de..03d27b48b972 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -17,6 +17,7 @@ #include #include #include "internal.h" +#include "mount.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ SYNC_FILE_RANGE_WAIT_AFTER) @@ -68,16 +69,48 @@ int sync_filesystem(struct super_block *sb) } EXPORT_SYMBOL(sync_filesystem); -static void sync_inodes_one_sb(struct super_block *sb, void *arg) +struct sb_sync { + /* + * Only sync superblocks reachable from this namespace. If NULL, sync + * everything. + */ + struct mnt_namespace *mnt_ns; + + /* ->sync_fs() wait argument. */ + int wait; +}; + +static int sb_reachable(struct super_block *sb, struct mnt_namespace *mnt_ns) +{ + struct mount *mnt; + + if (!mnt_ns) + return 1; + + read_seqlock_excl(&mount_lock); + list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { + if (mnt->mnt_ns == mnt_ns) + return 1; + } + read_sequnlock_excl(&mount_lock); + return 0; +} + +static void sync_inodes_one_sb(struct super_block *sb, void *p) { - if (!sb_rdonly(sb)) + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns)) sync_inodes_sb(sb); } -static void sync_fs_one_sb(struct super_block *sb, void *arg) +static void sync_fs_one_sb(struct super_block *sb, void *p) { - if (!sb_rdonly(sb) && sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, *(int *)arg); + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns) && + sb->s_op->sync_fs) + sb->s_op->sync_fs(sb, arg->wait); } static void fdatawrite_one_bdev(struct block_device *bdev, void *arg) @@ -107,12 +140,18 @@ static void fdatawait_one_bdev(struct block_device *bdev, void *arg) */ SYSCALL_DEFINE0(sync) { - int nowait = 0, wait = 1; + struct sb_sync arg = { + .mnt_ns = current->nsproxy->mnt_ns, + }; + + if (arg.mnt_ns == init_task.nsproxy->mnt_ns) + arg.mnt_ns = NULL; wakeup_flusher_threads(WB_REASON_SYNC); - iterate_supers(sync_inodes_one_sb, NULL); - iterate_supers(sync_fs_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &wait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); + arg.wait = 1; + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); iterate_bdevs(fdatawait_one_bdev, NULL); if (unlikely(laptop_mode)) @@ -122,17 +161,17 @@ SYSCALL_DEFINE0(sync) static void do_sync_work(struct work_struct *work) { - int nowait = 0; + struct sb_sync arg = {}; /* * Sync twice to reduce the possibility we skipped some inodes / pages * because they were temporarily locked */ - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); printk("Emergency Sync complete\n"); kfree(work);