[v2] fs: add new O_MNT flag for opening mount root from mountpoint fd

Imagine that we have an open fd on the directory (or file) - dfd, and a
new mount - mnt is created with these directory as a mountpoint. Before
this patch we had no way to access the contents of mnt through these
dfd.

You would say - who cares, we can just open it by path. But actually it
is not always possible: one can make a (I call it) "propagation trap"
when mnt's propagation overmounts mnt and makes it unresolvable with
simple open just after creation.

You can say - just pre-open the dfd's parent directory - pdfd like you
did with dfd, and you will have access to mnt, but what is not generic,
e.g. if mount point is '/', it can't have pdfd. And also this pdfd
pre-open does not work in case you want to create a mount under some
other mount (these can happen through propagation) there is no way to
access the root of such a mount currently after it was created. (*)

To be extra safe here, add a check that the new path which will be
opened with O_MNT is not getting under MNT_LOCKED mount and can be
accessed. Currently I see no way to get such an fd under locked mount
but better have a precaution here.

But why I actually need these:

When we recreate mount tree in CRIU, we do it by recreating one mount at
a time (we don't have mount-save / mount-restore like with iptables) and
it is quiet hard to determine the right order in which mounts should be
restored: if we mount mnt it can hide directories under it's mountpoint,
so either we need to first create all mounts under mnt's mountpoint and
only after these  mount mnt, or all mounts under mnt can be propagated
and we can safely mount mnt now? Moreover if mnt is not mounted, it can
also block other mounts with other "dependencies" (something like mnt's
child can be in a propagation group with some of mnt's undermounts and
they need to be mounted as one), and we can have circular dependency if
we have wrong order chosen and will fail.

So it would be easier for us if we can create mounts in the file tree
even if the mountpoint is invisible from root. And one way how it could
be done is: First, to have open fd to mountpoint under each mount,
second, to have open fd to each mount root.

More precisely the algorithm is:
a) openat mpfd to a new mountpoint through parent mount's root -
p_rootfd (which we already have) or mountpoint fd under a sibling mount
- s_mpfd if our mountpoint is already overmounted.
b) create a new mount on mpfd via /proc/<pid>/fd/<N> interface
c) openat it's rootfd via O_MNT from mpfd

If we have mpfd and rootfd for each mount through /proc/<pid>/fd/<N>
interface we will be able to bindmount any part of each of already
created mounts to restore other mounts  and we will be able to configure
mounts, e.g. change sharing or other options even if mounts are
invisible from fs-root.

Here is an example of how O_MNT works:

  #term1
	  #term2

  mkdir /test-mounts
  mount -t tmpfs tmpfs-test-mounts /test-mounts
  mount --make-private /test-mounts
  cd /test-mounts/
  mkdir sh1 sh2
  mount -t tmpfs tmpfs_sh sh1
  mount --make-shared sh1
  mkdir sh1/mp
  touch sh1/mp/1

	  ./test_o_mnt /test-mounts/sh1/mp

  mount -t tmpfs tmpfs_mp sh1/mp
  touch sh1/mp/2

	  input

  mount --bind sh1 sh2
  mount -t tmpfs tmpfs_prop sh2/mp
  touch sh2/mp/3

	  input

And now through fds we have an access to all three files:

  ls /proc/3799/fd/*
  /proc/3799/fd/0  /proc/3799/fd/1  /proc/3799/fd/2

  /proc/3799/fd/3:
  1

  /proc/3799/fd/4:
  2

  /proc/3799/fd/5:
  1

  /proc/3799/fd/6:
  3

  /proc/3799/fd/7:
  1

Code of test_o_mnt.c:

  #include <stdio.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define O_MNT 040000000

  int main(int argc, char **argv)
  {
  	int dfd, fd, fd2;

  	if (argc != 2) {
  		printf("usage: %s <path/under/mountpoint>\n", argv[0]);
  		return 1;
  	}

  	dfd = open(argv[1], O_DIRECTORY);
  	if (dfd < 0) {
  		perror("open");
  		return 1;
  	}

  	scanf("%*s");

  	fd = openat(dfd, ".", O_DIRECTORY | O_MNT);
  	if (fd < 0) {
  		perror("open");
  		return 1;
  	}

  	fd2 = openat(dfd, ".", O_DIRECTORY);
  	if (fd2 < 0) {
  		perror("open");
  		return 1;
  	}

  	scanf("%*s");

  	fd = openat(dfd, ".", O_DIRECTORY | O_MNT);
  	if (fd < 0) {
  		perror("open");
  		return 1;
  	}

  	fd2 = openat(dfd, ".", O_DIRECTORY);
  	if (fd2 < 0) {
  		perror("open");
  		return 1;
  	}

  	while (1) {}

  	return 0;
  }

v2: add non-conflicting O_MNT values for alpha parisc and sparc

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 arch/alpha/include/uapi/asm/fcntl.h  |  1 +
 arch/parisc/include/uapi/asm/fcntl.h |  1 +
 arch/sparc/include/uapi/asm/fcntl.h  |  1 +
 fs/fcntl.c                           |  2 +-
 fs/namei.c                           | 66 ++++++++++++++++++++++++++++
 fs/open.c                            |  2 +
 include/linux/fcntl.h                |  2 +-
 include/linux/namei.h                |  1 +
 include/uapi/asm-generic/fcntl.h     |  4 ++
 9 files changed, 78 insertions(+), 2 deletions(-)

Message ID	20191114090454.27903-1-ptikhomirov@virtuozzo.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=h9WC=ZG=vger.kernel.org=linux-fsdevel-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E5DD6C1 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Thu, 14 Nov 2019 09:05:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F38842071B for <patchwork-linux-fsdevel@patchwork.kernel.org>; Thu, 14 Nov 2019 09:05:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726057AbfKNJFi (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Thu, 14 Nov 2019 04:05:38 -0500 Received: from relay.sw.ru ([185.231.240.75]:59466 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725920AbfKNJFh (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Thu, 14 Nov 2019 04:05:37 -0500 Received: from [192.168.15.88] (helo=snorch.sw.ru) by relay.sw.ru with esmtp (Exim 4.92.3) (envelope-from <ptikhomirov@virtuozzo.com>) id 1iVB3g-0005L0-2d; Thu, 14 Nov 2019 12:04:56 +0300 From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> To: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Layton <jlayton@kernel.org>, "J . Bruce Fields" <bfields@fieldses.org>, Arnd Bergmann <arnd@arndb.de>, Paul Moore <paul@paul-moore.com>, Richard Guy Briggs <rgb@redhat.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com> Subject: [PATCH v2] fs: add new O_MNT flag for opening mount root from mountpoint fd Date: Thu, 14 Nov 2019 12:04:54 +0300 Message-Id: <20191114090454.27903-1-ptikhomirov@virtuozzo.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[v2] fs: add new O_MNT flag for opening mount root from mountpoint fd \| expand [v2] fs: add new O_MNT flag for opening mount root from mountpoint fd

[v2] fs: add new O_MNT flag for opening mount root from mountpoint fd

Commit Message

Comments

Patch