From patchwork Fri Jan 26 22:58:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10187169 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6C23560383 for ; Fri, 26 Jan 2018 22:58:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D4C32A810 for ; Fri, 26 Jan 2018 22:58:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 508DC2A81C; Fri, 26 Jan 2018 22:58:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B39872A810 for ; Fri, 26 Jan 2018 22:58:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751863AbeAZW6r (ORCPT ); Fri, 26 Jan 2018 17:58:47 -0500 Received: from mail-pf0-f194.google.com ([209.85.192.194]:36187 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751321AbeAZW6q (ORCPT ); Fri, 26 Jan 2018 17:58:46 -0500 Received: by mail-pf0-f194.google.com with SMTP id 23so1221821pfp.3 for ; Fri, 26 Jan 2018 14:58:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=1bm3xPFzds2DygWs7dO8kNHWKbLRp+76JEf0EaQkrp8=; b=PSgo7Ln6fXVgm+yMLCnetrQo0fGPLa/doBtLOXD/RwzSTT43ttdsMcxxqipMXODW1n sI38zwsu9OTYuXSwh90SBI/ofUIK/Y1ZvC4SpfKJUu/0aO/wD0gLxtv5Fw+PXLpT6zmD KNiwoAydDSi8TbDwWB0nZvt7oiiNVsAJhw1WEc9mhhv9yRDt9QRHVahhX/+vZczJtwH0 P6GSsAxARuPkEvVR9w/If7aRJqkKSekFCj3QinNsrSLC5Yxl9B2aOHTo2C7d6Xamof01 sd04n7TJNEZzEvaXR6lVBIK/6qfd42bpoNMHjiL9IQFD6AfTHARB5ZESYpqaLmPv2nA+ iI2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=1bm3xPFzds2DygWs7dO8kNHWKbLRp+76JEf0EaQkrp8=; b=dYITTKmjyOKi0aoQrd2SzJpko99yKMBY+QMO6HyhjjIEY25ccaxpkbu5wCa6VNaTJX ImFk7mDDgbnyvGm11ymI8R9BGoFpqETEuS3Dr8T9U7m7AvvImpL5Z/HWh9R5C3k/KXPG Ocgx5bfnDJZ2ZQZkhTF4f77i58nC+Ex1cSG62fgsv2rz0oemgkpExCnapywQo93Lf+2x hjdHisqpfh2V074qHgud+4ikGfZ1wpuljobT+7jqoYiVzmdd+nxcT+nusbLeHSbhOULY fhVUjmYwdFRiO+fHt0wFClkn6b3qnlCrdPEbsGKjuXG++1RXwlcp1hg1F4iO1N4d8Bv7 9xGA== X-Gm-Message-State: AKwxytd+wVRUbXiscoPClM0wbuL7YXc3pWp2mUuv2ZDz1O8mpnmkrVlt SKYCLVdO73FHG6fOU1Ixfbyw1QrceFU= X-Google-Smtp-Source: AH8x224CCcpDhgbD12r33ZdVaye/2MhuqYsogbkfX2g9q6XHCYbjva+25TtdWwXMkjv6tOYdsPOF5A== X-Received: by 10.98.32.93 with SMTP id g90mr7585457pfg.17.1517007525117; Fri, 26 Jan 2018 14:58:45 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::6:7f96]) by smtp.gmail.com with ESMTPSA id d205sm19832230pfd.165.2018.01.26.14.58.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Jan 2018 14:58:44 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com, "Eric W . Biederman" , Tejun Heo Subject: [PATCH] fs: only sync() superblocks reachable from the current namespace Date: Fri, 26 Jan 2018 14:58:39 -0800 Message-Id: <05434cda5cc3b461b5d70467b094904ad23fdc11.1517007510.git.osandov@fb.com> X-Mailer: git-send-email 2.16.1 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Currently, the sync() syscall is system-wide, so any process in a container can cause significant I/O stalls across the system by calling sync(). This is even true for filesystems which are not accessible in the process' mount namespace. This patch scopes sync() to only write out filesystems reachable in the current mount namespace, except for the initial mount namespace, which still syncs everything to avoid surprises. This fixes the broken isolation we were seeing here. Signed-off-by: Omar Sandoval --- fs/sync.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 14 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 6e0a2cbaf6de..bde1e3196298 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -17,6 +17,7 @@ #include #include #include "internal.h" +#include "mount.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ SYNC_FILE_RANGE_WAIT_AFTER) @@ -68,16 +69,46 @@ int sync_filesystem(struct super_block *sb) } EXPORT_SYMBOL(sync_filesystem); -static void sync_inodes_one_sb(struct super_block *sb, void *arg) +struct sb_sync { + /* + * Only sync superblocks reachable from this namespace. If NULL, sync + * everything. + */ + struct mnt_namespace *mnt_ns; + + /* ->sync_fs() wait argument. */ + int wait; +}; + +static int sb_reachable(struct super_block *sb, struct mnt_namespace *mnt_ns) +{ + struct mount *mnt; + + if (!mnt_ns) + return 1; + + list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { + if (mnt->mnt_ns == mnt_ns) + return 1; + } + return 0; +} + +static void sync_inodes_one_sb(struct super_block *sb, void *p) { - if (!sb_rdonly(sb)) + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns)) sync_inodes_sb(sb); } -static void sync_fs_one_sb(struct super_block *sb, void *arg) +static void sync_fs_one_sb(struct super_block *sb, void *p) { - if (!sb_rdonly(sb) && sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, *(int *)arg); + struct sb_sync *arg = p; + + if (!sb_rdonly(sb) && sb_reachable(sb, arg->mnt_ns) && + sb->s_op->sync_fs) + sb->s_op->sync_fs(sb, arg->wait); } static void fdatawrite_one_bdev(struct block_device *bdev, void *arg) @@ -107,12 +138,18 @@ static void fdatawait_one_bdev(struct block_device *bdev, void *arg) */ SYSCALL_DEFINE0(sync) { - int nowait = 0, wait = 1; + struct sb_sync arg = { + .mnt_ns = current->nsproxy->mnt_ns, + }; + + if (arg.mnt_ns == init_task.nsproxy->mnt_ns) + arg.mnt_ns = NULL; wakeup_flusher_threads(WB_REASON_SYNC); - iterate_supers(sync_inodes_one_sb, NULL); - iterate_supers(sync_fs_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &wait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); + arg.wait = 1; + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); iterate_bdevs(fdatawait_one_bdev, NULL); if (unlikely(laptop_mode)) @@ -122,17 +159,17 @@ SYSCALL_DEFINE0(sync) static void do_sync_work(struct work_struct *work) { - int nowait = 0; + struct sb_sync arg = {}; /* * Sync twice to reduce the possibility we skipped some inodes / pages * because they were temporarily locked */ - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); - iterate_supers(sync_inodes_one_sb, &nowait); - iterate_supers(sync_fs_one_sb, &nowait); + iterate_supers(sync_inodes_one_sb, &arg); + iterate_supers(sync_fs_one_sb, &arg); iterate_bdevs(fdatawrite_one_bdev, NULL); printk("Emergency Sync complete\n"); kfree(work);