From patchwork Wed Sep 19 20:50:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dmitry Safonov X-Patchwork-Id: 10606553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 383CC112B for ; Wed, 19 Sep 2018 20:52:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25C2B2CC41 for ; Wed, 19 Sep 2018 20:52:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 16CF12CC4C; Wed, 19 Sep 2018 20:52:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D0992CC41 for ; Wed, 19 Sep 2018 20:52:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727760AbeITCc1 (ORCPT ); Wed, 19 Sep 2018 22:32:27 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:40407 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727745AbeITCaV (ORCPT ); Wed, 19 Sep 2018 22:30:21 -0400 Received: by mail-ed1-f67.google.com with SMTP id j62-v6so6018210edd.7 for ; Wed, 19 Sep 2018 13:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=googlenew; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=WkVlsiNEy23+OxYjKPjFquVP3V+xEhkwwb6yhw9kDiE=; b=cIdHcmQgn6cll6ebL6MHgJlhAyf5qH6/TQ/LYgqAAIvy8xr/YIMOVG36e2gB0wZ0g1 por7vCzSb8nGYwkapYavtOk7SQzaVDnpwkUg6YaxNtV6Sf/KLYFCOvWrWiHhs5WRyrxT ookmw8MVcpFvmNpHcS4406hB3fxe2seUa/QQ06ATQw1YWoRGm9XZSsT4ChssNWVGczno Om7hePJ5mtgkqo1S1XwE9ZC6EnsbB0g8PKIfGC7r79dG4Lb9+maztRZMYBKSltKMw7bZ vZwbggUn9PyeD31hXgG299CUVaLX5SMD8V/bjD+bQFOICm8arAPgN/hcl7lvkmqQdlMs lq4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=WkVlsiNEy23+OxYjKPjFquVP3V+xEhkwwb6yhw9kDiE=; b=KpK+NUCEo9oIm4baMoGLPUSeiI1wgrNWw/Zulg2+IUVYhEyqWv8FIlTdV2bwGNuyw4 0SOfJ5Q1WmfxX4PaKN3POHspVfILfSuFD0aw4+sk4UaX9qWZW9OaO3LI1Sbq2ArGs5K7 Ci+NFe1CWrcPh7/suOuaaQKDsuUjO0Oh/VDdJZHqO8iV5ek6jdLjdyqrhEFgc3xMlJZW j9j8bbdbDxiMgEkWlW4vpyFgghxn+ajyephoIGrsEuqJ8RFUw3RoWkfiPusqqWuEPRJB 9ltjUp9Z06aFosq/04J5IaDdGW0ZucV6RqakeEIQWYMsg2nXN/69dmQSLDaO0m52f9Gj RV0w== X-Gm-Message-State: APzg51B3LMftbCP0DEbR7jEoEom8TDkeuUHkGcnw/xdPBE/dCHDPM6c4 myPCN3qTyj0E6+xeM2YJGZfAXA== X-Google-Smtp-Source: ANB0VdYd7abd/4d1tlOPP7Ik6KqxW0y77wahiRlGg5XSjGySwqsDfXp8PJIPqk5lnZVgWmPMKg2qqw== X-Received: by 2002:a50:a2a6:: with SMTP id 35-v6mr59495779edm.276.1537390239506; Wed, 19 Sep 2018 13:50:39 -0700 (PDT) Received: from dhcp.ire.aristanetworks.com ([217.173.96.166]) by smtp.gmail.com with ESMTPSA id t17-v6sm1747729edb.27.2018.09.19.13.50.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 19 Sep 2018 13:50:38 -0700 (PDT) From: Dmitry Safonov To: linux-kernel@vger.kernel.org Cc: Dmitry Safonov <0x7f454c46@gmail.com>, Dmitry Safonov , Adrian Reber , Andrei Vagin , Andy Lutomirski , Christian Brauner , Cyrill Gorcunov , "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , containers@lists.linux-foundation.org, criu@openvz.org, linux-api@vger.kernel.org, x86@kernel.org, Alexey Dobriyan , linux-kselftest@vger.kernel.org Subject: [RFC 00/20] ns: Introduce Time Namespace Date: Wed, 19 Sep 2018 21:50:17 +0100 Message-Id: <20180919205037.9574-1-dima@arista.com> X-Mailer: git-send-email 2.13.6 MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Discussions around time virtualization are there for a long time. The first attempt to implement time namespace was in 2006 by Jeff Dike. From that time, the topic appears on and off in various discussions. There are two main use cases for time namespaces: 1. change date and time inside a container; 2. adjust clocks for a container restored from a checkpoint. “It seems like this might be one of the last major obstacles keeping migration from being used in production systems, given that not all containers and connections can be migrated as long as a time dependency is capable of messing it up.” (by github.com/dav-ell) The kernel provides access to several clocks: CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the start points for them are not defined and are different for each running system. When a container is migrated from one node to another, all clocks have to be restored into consistent states; in other words, they have to continue running from the same points where they have been dumped. The main idea behind this patch set is adding per-namespace offsets for system clocks. When a process in a non-root time namespace requests time of a clock, a namespace offset is added to the current value of this clock on a host and the sum is returned. All offsets are placed on a separate page, this allows up to map it as part of vvar into user processes and use offsets from vdso calls. Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks. Questions to discuss: * Clone flags exhaustion. Currently there is only one unused clone flag bit left, and it may be worth to use it to extend arguments of the clone system call. * Realtime clock implementation details: Is having a simple offset enough? What to do when date and time is changed on the host? Is there a need to adjust vfs modification and creation times? Implementation for adjtime() syscall. Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Adrian Reber Cc: Andrei Vagin Cc: Andy Lutomirski Cc: Christian Brauner Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jeff Dike Cc: Oleg Nesterov Cc: Pavel Emelyanov Cc: Shuah Khan Cc: Thomas Gleixner Cc: containers@lists.linux-foundation.org Cc: criu@openvz.org Cc: linux-api@vger.kernel.org Cc: x86@kernel.org Andrei Vagin (12): ns: Introduce Time Namespace timens: Add timens_offsets timens: Introduce CLOCK_MONOTONIC offsets timens: Introduce CLOCK_BOOTTIME offset timerfd/timens: Take into account ns clock offsets kernel: Take into account timens clock offsets in clock_nanosleep x86/vdso/timens: Add offsets page in vvar x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow posix-timers/timens: Take into account clock offsets selftest/timens: Add test for timerfd selftest/timens: Add test for clock_nanosleep timens/selftest: Add timer offsets test Dmitry Safonov (8): timens: Shift /proc/uptime x86/vdso: Restrict splitting vvar vma x86/vdso: Purge timens page on setns()/unshare()/clone() x86/vdso: Look for vvar vma to purge timens page timens: Add align for timens_offsets timens: Optimize zero-offsets selftest: Add Time Namespace test for supported clocks timens/selftest: Add procfs selftest arch/Kconfig | 5 + arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vclock_gettime.c | 52 +++++ arch/x86/entry/vdso/vdso-layout.lds.S | 9 +- arch/x86/entry/vdso/vdso2c.c | 3 + arch/x86/entry/vdso/vma.c | 67 +++++++ arch/x86/include/asm/vdso.h | 2 + fs/proc/namespaces.c | 3 + fs/proc/uptime.c | 3 + fs/timerfd.c | 16 +- include/linux/nsproxy.h | 1 + include/linux/proc_ns.h | 1 + include/linux/time_namespace.h | 72 +++++++ include/linux/timens_offsets.h | 25 +++ include/linux/user_namespace.h | 1 + include/uapi/linux/sched.h | 1 + init/Kconfig | 8 + kernel/Makefile | 1 + kernel/fork.c | 3 +- kernel/nsproxy.c | 19 +- kernel/time/hrtimer.c | 8 + kernel/time/posix-timers.c | 89 ++++++++- kernel/time/posix-timers.h | 2 + kernel/time_namespace.c | 230 +++++++++++++++++++++++ tools/testing/selftests/timens/.gitignore | 5 + tools/testing/selftests/timens/Makefile | 6 + tools/testing/selftests/timens/clock_nanosleep.c | 98 ++++++++++ tools/testing/selftests/timens/config | 1 + tools/testing/selftests/timens/log.h | 21 +++ tools/testing/selftests/timens/procfs.c | 145 ++++++++++++++ tools/testing/selftests/timens/timens.c | 196 +++++++++++++++++++ tools/testing/selftests/timens/timer.c | 95 ++++++++++ tools/testing/selftests/timens/timerfd.c | 96 ++++++++++ 33 files changed, 1272 insertions(+), 13 deletions(-) create mode 100644 include/linux/time_namespace.h create mode 100644 include/linux/timens_offsets.h create mode 100644 kernel/time_namespace.c create mode 100644 tools/testing/selftests/timens/.gitignore create mode 100644 tools/testing/selftests/timens/Makefile create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c create mode 100644 tools/testing/selftests/timens/config create mode 100644 tools/testing/selftests/timens/log.h create mode 100644 tools/testing/selftests/timens/procfs.c create mode 100644 tools/testing/selftests/timens/timens.c create mode 100644 tools/testing/selftests/timens/timer.c create mode 100644 tools/testing/selftests/timens/timerfd.c