From patchwork Thu Apr 27 10:57:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Tkhai X-Patchwork-Id: 9702727 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 94772602CC for ; Thu, 27 Apr 2017 10:57:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9468023F88 for ; Thu, 27 Apr 2017 10:57:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84FBE2807B; Thu, 27 Apr 2017 10:57:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1DFC723F88 for ; Thu, 27 Apr 2017 10:57:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966683AbdD0K51 (ORCPT ); Thu, 27 Apr 2017 06:57:27 -0400 Received: from mail-he1eur01on0133.outbound.protection.outlook.com ([104.47.0.133]:35424 "EHLO EUR01-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S966345AbdD0K5Y (ORCPT ); Thu, 27 Apr 2017 06:57:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=HQO7yq0DV5blczh/xfmgkuIyPKUKfviyOTw8Aure2fw=; b=PEBfdyGMpR4pPY5Qqr+3pEe9r8sY19Pn3Q1GGl77Ybr9XGp7KeGGVOKPB5zNw2nFR3KEsZLtN8XfvMxePX2u2nxneK/D4MgyglZrxQ+yIylOucTDuWNIbF+0vGTzP63r74PNxbh2H7y6kpFSK61RGirQj5qC7zOd1l8NVy4/SbE= Authentication-Results: hallyn.com; dkim=none (message not signed) header.d=none; hallyn.com; dmarc=none action=none header.from=virtuozzo.com; Received: from localhost.localdomain (195.214.232.6) by DB6PR0802MB2277.eurprd08.prod.outlook.com (10.172.227.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.13; Thu, 27 Apr 2017 10:57:18 +0000 Subject: [PATCH v2] pid_ns: Introduce ioctl to set vector of ns_last_pid's on ns hierarhy From: Kirill Tkhai To: , , , , , , , , , , , , , , , , Date: Thu, 27 Apr 2017 13:57:16 +0300 Message-ID: <149329053642.12846.16389129928422677700.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: VI1PR09CA0072.eurprd09.prod.outlook.com (10.174.49.144) To DB6PR0802MB2277.eurprd08.prod.outlook.com (10.172.227.150) X-MS-Office365-Filtering-Correlation-Id: af8372b4-df09-4d47-5730-08d48d5c2d29 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201703131423075)(201703031133081); SRVR:DB6PR0802MB2277; X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2277; 3:UDBmR+yZn2EBrAgXt+LHvPU+kEGtkgqzOrB54horsBZ1H3EO4SWoFo4h5f+Qm9juh6V8PXsKQYHC8jHnZUOx8RMktv+9y3FLyBFEbPqox9crKaB3vnKC9bS9wJS0tT3OfjQhNr1Hxip54YGj+MhHjMRrt9XK3XMy2uFwfpwTSi5i6dHdW8QTCswnSv/gulnolvW2m6HgG/YLsc8D4WJLQiMebLfsg+CGIhCVoFtNquSvGfH23lC0uQINvZtt+tShI2MmzmJkWVXbJiLCwLhWO7ndTawNKBzlG9G99g/QHy2Zc1VrmC90CYuQCQaU35AlD6FCp0PuAhk6I8ctiFljXA==; 25:EY/Pn1joxV/IR+mSsL/fRi99IseDoC0e4ggKmqNJLg1WH+6h973/Ci6wAQXxlC9VFXrDNkJA1SDoJvgFlJ0Y53SEmsYy1i8QJZq33u1DdGjmntbsPoJGLGLtTTS+JMVBxmUdjSjTaNk+hT8bKQC2GLc/2j0rPBY+kIkToOBnkbf1rp11kVpeYTC0vpZhWGUs2ycjpY1UGlu/4wpUCMe1SCZCU56KGHWJLCq7MpTag7i7b8WsL2YzKHM4ROTAHxUFD+DJO6CigjuuB/tsQNZPOY+TfzPTOdQgvQ0KLkwszUkAvJ/Y71WY3ZExpFlbAeubG+6+mpeku+OPleX5lFZ2ojicZe03po1e/fleJz6+oJJbeh7h6U8ehSWCIO7Y7MWM6ElbYJPU5PGUMPR2QuJEB5F0PXgF+oAbEaHJv1+/TUDOeL/kIYwW3vABdssqvX7M5E95CGelIUa3h+zFrqYeyg== X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2277; 31:HZ1NEPAf1fv7olHZyjsa+mCpbMjxw3hoG6j+ffbCfl5nRnjr7A5T3aPptOueRVCb+pK6stkSwsw/IKgfyaE3/4OGcVuB/vE5nds1e/7saA8yFUI8avMZ/g6aBfIgZ7ld3Ca6uMtXc+nw0faz05r4R029IZWgOjs8O8Yx7OjdQR3rOT/1kBbJ1r8umSPkOrUJ8P2g5MmTqQlJyg9zee2aWUV2Lc4luAd8o79Ty08wZjd/7amg2lyTm2DAF44ZEw5KWSQao5vFGz6JB/5L246fZg==; 20:hUdUCrva1UBNLsqBiNqJpVoh0nIcvZZm1k8l4PdN1BgbBV6xpXn6KCYEOQ4rJNi76jX6oSDXvCf0XcvfALJ469M3rgdIX4Opvc0kczHhatFZJZo3sjI40KTIbLxpakfVO9pBBx2TgYEeSuPYrVdo7bPUZwCUmGrKET1ZjWZt/Otatx29e1iB3Px+uecQGHzd9s3nVNfFJ/anQrTHG4kneAL/KdNdvc9p1WSuMRQ8RhHCvgLEI2ncBy5v/aK7zz1Os50zUNOU0AWCqsugxRDf1+2tVrcr2kPCFIVkFEi7W7jHTOhZJa2T7pc0VfFSZqp+Tpa8FA9QcG4S6/WWUbd8JnqXfC+Ckocq+kCpRrOp+oJkbtc8SgqkSOlIP+s3WXJNjxutx5fRrJrt4DHB18CPFJmlB3q7yekZVgyVXSeup9Y= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(6041248)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(20161123555025)(20161123560025)(6072148); SRVR:DB6PR0802MB2277; BCL:0; PCL:0; RULEID:; SRVR:DB6PR0802MB2277; X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2277; 4:WcvwsCbmCnQZananJpOFAEAcDCd5jFrGpr4pIXNAd3AhMZ0tctozyUbU8tAKSn6glOB/aX7ueB23MlSSu9pXJgCjNXeznpTc17ZfrOy/gqmLFPkwaykAfFe1sm+jW1SWk4OITALRMRWApfmGwIEbxvF5mHS/kjDDC8cg/jgxUhunvIAQME+4H9Kr3egPKSUFRM3jG+1KLs6qecZGWQObm2/c9CUKu/KaMEVc2KajQ4UudbbA/5bFqDc6g/I6LMYKG40rjLYvEzFdDMhFJe2g5ov0zPbkV0b3+VW3z2FYTkTXlIsunP/LIBkMmhTjF50oF4pAGeFPRTulzwqnGJL7fL6E8iALFL/jI7BBgT5/jCKt7aESlF6SeKKASnkJnrG0nThcPFikum28STrsaw7K/q4vzuTdJaQ9ijKRHAAOv+4jnON4+O9rL3j8hIjW4ZbZPLJDW9CDF/Ef24IRgJgx50Pj0lVlP4wA8TQI+nUD4kzoEAknhQt6/3qVRnx3T1Oi6kBMSym56XJDj8V3HXH+Tly7EDJfP7TNva+M5dvCS1qg6eiKra/4spC+mrlzGSuWd5wEufJjfL6LjUmM5PL85OM0rtB/pK5AFfMaQvGi/p0gCugmwZiPMwc92MhENUmqhp4R3UanZVxb0QkDaNnx3GucfoGehurRFXmcrTJSbBPpw/ZhhcLKCWVKs0zuo11aXtcfPqcz0auTaHcbACKpjQ== X-Forefront-PRVS: 029097202E X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6069001)(6009001)(39450400003)(39840400002)(39410400002)(39400400002)(2906002)(50986999)(5660300001)(230700001)(53936002)(6506006)(9686003)(38730400002)(54356999)(55016002)(6116002)(42186005)(81166006)(61506002)(3846002)(305945005)(8676002)(2201001)(7416002)(25786009)(103116003)(50466002)(86362001)(575784001)(47776003)(66066001)(23676002)(189998001)(33646002)(921003)(1121003); DIR:OUT; SFP:1102; SCL:1; SRVR:DB6PR0802MB2277; H:localhost.localdomain; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjZQUjA4MDJNQjIyNzc7MjM6eEJWNFJIOGRnMzVoZ1Q1bmRWUFF6Zmt0?= =?utf-8?B?UEE5MVVpUUg2R0hONklKZWhTc1g2bitFeFZiS0FpdGxhZU8ydC9CNjl6aVZm?= =?utf-8?B?TzU3K3V4MUFraTdEazQzK2lrSUdIZk5HVy9YbHBiOGwycG1WMjlIbmtnOGJI?= =?utf-8?B?cmxtUzZJbTNjeXpHSWVnRzR5aTlYdDB4UDJwTkZhOHRreTJoL09XMnk5aXE3?= =?utf-8?B?UXRyVmw3bFJmUnZlUDJkbEdJY0JGb3luaWFpZDhXRkdKdlRYbFUzY1p3blA4?= =?utf-8?B?UWgxWmE5RFRGV0hPOVlGTE5SRmpQRFJTbXQ5SGRCd3l4QzNSTGdaeCtBL1B6?= =?utf-8?B?cXdIbTE5UGZoS3phaUMrNGpHMlV6N2dsTFlsSEpBZ1NITHhMenY2MVUvVFlI?= =?utf-8?B?LzhvS09aSDdieFlrNmdZak9DRFBHQmFjVDJWa3k2MmxIQ21PR0ZXQ0hKamdL?= =?utf-8?B?cTVSN2M3WExSMW9WSks5VEtKNDFWNG9QRURJb2g3TmU5ZGFLY1dvWG1PR0dN?= =?utf-8?B?ZmZXTUlJSnM3T2RIeFBycVRRTithN1FSZVFHU0FraTB6SFBSQXo3YjVsNHY0?= =?utf-8?B?TkFhRm9LSng4cXRwV1dTNFBOQjJKazZGWUZiVDZPU2FNQ2VKK1dCdXhsOVRz?= =?utf-8?B?dTNSRDVzRkVsVE5uUXFMVW8yaTZkbDZoRmp2RXFiMXB0bms5M3lpTU1EUG01?= =?utf-8?B?aVFTVTl5OXBSQ0FSTXlvSVcwSnhjVDRaSnRpTTE5YkN1eXNCQUxFZk53SW5i?= =?utf-8?B?cjhzYWtSSWJEa2FsRGIwM0U0M3duQUNFbHV1alJTQU1mYXQrWVB1THZtaU14?= =?utf-8?B?ak40K3dSRWxZR2toT09RK0gyMUVKMnowejVDRVIyRFFlcWtLcmFOL1VGeWYw?= =?utf-8?B?YVd3QVBWemgzMWE3c1laOU5NUkhDSkxmQ2FFcjh6Ykwweitxd1U1NDNUQ2c5?= =?utf-8?B?eTNyOUF3ajdHNzRyQUVmTHNSWkZDaDJvL3RyZzJOdDhBYkovbVVxMjlvQmh3?= =?utf-8?B?cElhVFFob2ZkKzFhVU12ZUlGN0ZLdXFoRDlVdWw5Q0swbkdaajNrNXdQY2Za?= =?utf-8?B?ckZ2clB6WDIyVXhhMVFzK25EL1hlU0Fnc2tZSzZuL1JHenoyRWVWY0hDMmh0?= =?utf-8?B?bFY5cVlUTUVGYjMrWFBQVklVUDUrbnlJVzRMSTc2NlVpd2hEUit3UW9HRHZQ?= =?utf-8?B?QUdyRVZDZWRjb0JOWGxPS3k5cHZHS3JPeUJxNExNVzFoY2llMThGMElYc3l5?= =?utf-8?B?N04xRmJvekhzVTdsZGtZbVJ1SmFjZWdwTFlHL21zaGdBb3UrVEwxQ0xUdnBa?= =?utf-8?B?RlpvSlZDWEJLY09TUT09?= X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2277; 6:5YvD7KMjlOVBy3FqgS86g2fkh7mYDCxcPf3by7n9dSuzgCobBw/ylVXDDJ/sEF+rrs/NQ/pvjZc8sWsXNToaIFR3PIf8P6qnYvwnQ0iRW3WWJvEVO3iyvaLNuSnQsu1suPiBrMnRvd6NHudvqZ2EQEbNUYMszN2vKlXdf7wQFFuIl+6SMUyr/7qELUqOUMNQt1933nACggp45da1G6MTIL0NBhGm+FtVmOLUbEQ4S2EpM/Z1noGIZbm4b8DNaOo5JMOaSA83RhOJg17yUedqrGtcc1XEfwVUaQD573BqyZafS5A6+Ow9kzJ2z0CQ4CxRY6RbeFJY6IIVhLFs2eVZIJtDmp8n55tNJAOwSpG65FYyZ2avd2b3o6uoEtZyBq/3feBg77G+Z/3rU4GEPIXOCSX6NoaGFJOmwcJK2Hz0hnvuOWEMkBy9yEc8EAtECj1Nc+GZ0GeTH51vLNoR8jnBD09Ob/v+1oQ0ZA30385cMGay8JoKRKhk9Rz8jiucDOgnXkar5rLe7I+jGCanr7jH8A==; 5:Wjq1tjHfVSrDDUHWWk9mHAGfmYgzY8GzPeeNwCCRoQfYml/OCTiCm0MG1jwcL2HAy3Fuq2tXUor2WQzc6LzEtlepFHua1+QOU0jzMif1cTNF5ARsVNsOhL+QSGkRKb3Wg9HbbAnKey/hOW83XzMacQ==; 24:FEBAebt3FC//NL5D6J8Sx5ZhqRc+i9vPEpM5Cn5cvn72GLiLjJImsQ0fuVVpgdBUfWHnrznlm+Qj8cWPnEk8pTULRh0X/lxLp/E4Srhc7Uw= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2277; 7:CUlQcj07qfZXNTdyDeUL3W/5wyH4PZlFcOlfNaaipTywYnLUr/a9ZK5Ior8rVOqa8QmHwfUOzpfO7Muf10/rJQ70Le80OE0EadEopTicM5qQo/FFps7/dQHvotYKABD9k9USt653jDXKnTyMDG+vBTt3na3KheD+1J1mN3KRvLC+Qa3RjBAkImIMn5tkSh+rokAMcxlmhpc1928dNQueO2nOlpoWtb6UjsVpAHqC296uObdAcjwUV585rEL0z1YbMyDXVlwQx5j19jASCeWDgaWyEzEXYidvpN5w97jqezChyT6BoLgPDkXfif5CJj7fesqM4BgXoeKJBGoC3qXzbQ== X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Apr 2017 10:57:18.1888 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2277 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On implementing of nested pid namespaces support in CRIU (checkpoint-restore in userspace tool) we run into the situation, that it's impossible to create a task with specific NSpid effectively. After commit 49f4d8b93ccf "pidns: Capture the user namespace and filter ns_last_pid" it is impossible to set ns_last_pid on any pid namespace, except task's active pid_ns (before the commit it was possible to write to pid_ns_for_children). Thus, if a restored task in a container has more than one pid_ns levels, the restorer code must have a task helper for every pid namespace of the task's pid_ns hierarhy. This is a big problem, because of communication with a helper for every pid_ns in the hierarchy is not cheap. It's not performance-good as it implies many helpers wakeups to create a single task (independently, how you communicate with the helpers). This patch tries to decide the problem. It introduces a new pid_ns ioctl(NS_SET_LAST_PID_VEC), which allows to set a vector of last pids on pid_ns hierarchy. The vector is passed as array of pids in struct ns_ioc_pid_vec, written in reverse order. The first number corresponds to the opened namespace ns_last_pid, the second is to its parent, etc. So, if you have the pid namespaces hierarchy like: pid_ns1 (grand father) | v pid_ns2 (father) | v pid_ns3 (child) and the pid_ns3 is open, then the corresponding vector will be {last_ns_pid3, last_ns_pid2, last_ns_pid1}. This vector may be shorter and it may contain less levels. For example, {last_ns_pid3, last_ns_pid2} or even {last_ns_pid3}, in dependence of which levels you want to populate. v2: Kill pid_ns->child_reaper check as it's impossible to have such a pid namespace file open. Use generic namespaces ioctl() number. Pass pids as array, not as a string. Signed-off-by: Kirill Tkhai --- fs/nsfs.c | 5 +++++ include/linux/pid_namespace.h | 12 ++++++++++++ include/uapi/linux/nsfs.h | 7 +++++++ kernel/pid_namespace.c | 35 +++++++++++++++++++++++++++++++++++ 4 files changed, 59 insertions(+) diff --git a/fs/nsfs.c b/fs/nsfs.c index 323f492e0822..f669a1552003 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -186,6 +187,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp); + case NS_SET_LAST_PID_VEC: + if (ns->ops->type != CLONE_NEWPID) + return -EINVAL; + return pidns_set_last_pid_vec(ns, (void *)arg); default: return -ENOTTY; } diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index c2a989dee876..c8dc4173a4e8 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -9,6 +9,7 @@ #include #include #include +#include struct pidmap { atomic_t nr_free; @@ -103,6 +104,17 @@ static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) } #endif /* CONFIG_PID_NS */ +#if defined(CONFIG_PID_NS) && defined(CONFIG_CHECKPOINT_RESTORE) +extern long pidns_set_last_pid_vec(struct ns_common *ns, + struct ns_ioc_pid_vec __user *vec); +#else /* CONFIG_PID_NS && CONFIG_CHECKPOINT_RESTORE */ +static inline long pidns_set_last_pid_vec(struct ns_common *ns, + struct ns_ioc_pid_vec __user *vec) +{ + return -ENOTTY; +} +#endif /* CONFIG_PID_NS && CONFIG_CHECKPOINT_RESTORE */ + extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk); void pidhash_init(void); void pidmap_init(void); diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 1a3ca79f466b..59adf97d1d63 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -14,5 +14,12 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) +/* Set a vector of ns_last_pid for a pid namespace stack */ +#define NS_SET_LAST_PID_VEC _IO(NSIO, 0x5) + +struct ns_ioc_pid_vec { + unsigned int nr; + pid_t pid[0]; +}; #endif /* __LINUX_NSFS_H */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index de461aa0bf9a..f68ee8793606 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -21,6 +21,7 @@ #include #include #include +#include struct pid_cache { int nr_ids; @@ -428,6 +429,40 @@ static struct ns_common *pidns_get_parent(struct ns_common *ns) return &get_pid_ns(pid_ns)->ns; } +#ifdef CONFIG_CHECKPOINT_RESTORE +long pidns_set_last_pid_vec(struct ns_common *ns, + struct ns_ioc_pid_vec __user *vec) +{ + struct pid_namespace *pid_ns = to_pid_ns(ns); + pid_t pid, __user *pid_ptr; + unsigned int nr; + + if (get_user(nr, &vec->nr)) + return -EFAULT; + if (nr > 32 || nr < 1) + return -EINVAL; + + pid_ptr = &vec->pid[0]; + do { + if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)) + return -EPERM; + + if (get_user(pid, pid_ptr)) + return -EFAULT; + if (pid < 0 || pid > pid_max) + return -EINVAL; + + /* Write directly: see the comment in pid_ns_ctl_handler() */ + pid_ns->last_pid = pid; + + pid_ns = pid_ns->parent; + pid_ptr++; + } while (--nr > 0 && pid_ns); + + return nr ? -EINVAL : 0; +} +#endif /* CONFIG_CHECKPOINT_RESTORE */ + static struct user_namespace *pidns_owner(struct ns_common *ns) { return to_pid_ns(ns)->user_ns;