From patchwork Thu Feb 8 15:07:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Tkhai X-Patchwork-Id: 10207247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7BD3360327 for ; Thu, 8 Feb 2018 15:07:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6AFA029553 for ; Thu, 8 Feb 2018 15:07:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5F2D429555; Thu, 8 Feb 2018 15:07:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82B8B29553 for ; Thu, 8 Feb 2018 15:07:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751310AbeBHPHr (ORCPT ); Thu, 8 Feb 2018 10:07:47 -0500 Received: from mail-he1eur01on0093.outbound.protection.outlook.com ([104.47.0.93]:61952 "EHLO EUR01-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750847AbeBHPHq (ORCPT ); Thu, 8 Feb 2018 10:07:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=iWJxQFWryOOPQ2hpLP4wmfzX9n/sC0SS10fnAFXxhmc=; b=LJLsIsd8YBZrooU5MPSv6BfmmkAZQHTqcTYzi4JlaCTixgGwQUk1+0n56dyUjBRxU1htAVCoRwHdW1kFCRBPBQpi1Fp2MTrblG4Jr+3nA+c22PyGMIuwgLTbimtdPujc0mhjSKeKUA80STcZhlZSGDwS05maMva1FV66vAqOBaE= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=ktkhai@virtuozzo.com; Received: from localhost.localdomain (195.214.232.6) by AM5PR0801MB1331.eurprd08.prod.outlook.com (2603:10a6:203:1f::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.464.11; Thu, 8 Feb 2018 15:07:41 +0000 Subject: [PATCH] inotify: Extend ioctl to allow to request id of new watch descriptor From: Kirill Tkhai To: akpm@linux-foundation.org, jack@suse.cz, amir73il@gmail.com, linux-fsdevel@vger.kernel.org, gorcunov@virtuozzo.com, ktkhai@virtuozzo.com Date: Thu, 08 Feb 2018 18:07:37 +0300 Message-ID: <151810242614.30935.12876744458891870220.stgit@localhost.localdomain> User-Agent: StGit/0.18 MIME-Version: 1.0 X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: HE1PR05CA0157.eurprd05.prod.outlook.com (2603:10a6:7:28::44) To AM5PR0801MB1331.eurprd08.prod.outlook.com (2603:10a6:203:1f::9) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3361b2c7-e479-4ffa-ce33-08d56f05b360 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(7168020)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307)(7153060)(7193020); SRVR:AM5PR0801MB1331; X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1331; 3:yKrujEYQx3JdE24P1pBqlvDIjMWM5KnOiigIS8OQR1uEIlgA0MZoYehEkE7VKckmtjN0C4pm5TsyOuUeM7/Vf0cAbc4Go0wmDkK2b01xxo7iEDdDKHSHIwSjCn3mPbomt0WEgnNbiP/mdDHgIEANjXcbEKYe+ynlfwgsAubNtkGtCE+0LfMCRYwapr1SK8nVALrU6C98oLr/9j95JITpWtIOsvrRXqut+QLK57zO9+L5mxfn9d+PiPGUaDjVd7ir; 25:H2PcG1pJr0QI/GugNsGKzshdGSofDveqsgXXjunxwRMpBG3p/ExV1sKpWbFaFgFKMztG+NYbuhn7sdmBHIbl8V/ebOKnjClDVEE/2hJMVNauta3B7t9xBADoSFqNPQ2FO9l6Kb1Ky/lof6BHWacLGVe7KOxQWoiNXTVNwaU0BiCpAIf5wzTI0YkuNk/nZLXiPagElZxM8GeJyIcNq21l/0gvIA378gHSnTbisOvzmJaKc+e5TmVqPyBIWUOdu5Xgjkgdppqs52gtfnVaqwew1CvIUggpP1U3x9KDrS2o5G6Q8t+VGpU4YDjHFw/C2kxB33nkHw+tE930U+3qX8R/Zg==; 31:rJqjuf40ade/3XA//RqPpXKj5lnY1E/huNugktT/3HxuRGumnuZ+Xaw+vHa84FGTL1Z47TbTeOvEkRuXo+S4He5HV7l/vlCuBme7P28NmTFU17Mp52fhaxoww0O3el6x5s5oPCIqL4zt5ltTrobLy3xcEkNZv8wwZu31O1GRLerdi9S7Sk6BSvyKMC1TWM3YJYkKTitpQCwOJLM8w9K1RvTiKX5SVYU4Cucoqv6VSV8= X-MS-TrafficTypeDiagnostic: AM5PR0801MB1331: X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1331; 20:fwYaG0TJ24jLTqdKwgi4FcqHbXqDZG4W14bP36na9k1GDM4O+OVudWzD/3AcQa5ne2zsbxNGIbOeldoz4In7iyPJDmbC8+hAmy1PpHalZwg8a9F/1/W4dKpSozJ09FM9SGgWR2IxS+ULuMiYupMZp0VwYTYFHUIvkpTbBdSvRtXEYYRm7FJ4f9wa7TVIJIvVqPPyDnVSx+6TgzXjmBw9o3NkBbIRUKX8d4T/nul831KJQeLCv18rjxIZD3STYr9mgBIqHJXiPKVQg8XczPCSMRBXvo8v3S2FUy/tlqoDoYFHKxhWpNjY/C9N9AFjL3EBJSpJdfSJsy6HqllpqlKVRBkkKIB1QCefkaiqYa4O50V0R5ne1lNEAZjHqoBd2WPi7j+1Z3KTZU2UZ3bHclZSHtbTqN7gBXPqcxNzfReAlOI=; 4:sPTkRvMMLmOTJUnXbdtjfHrJsdM5ztZspcNevKyKkCfz/bJ5hoVao3ZVlcZUlW+OC84LD5gYxQY+EFoyUHFSEP8TSO/4xPsa4Hq7pcagaLl32n1KGh4ZeUJUVt4gYfwOnHBO3i8scNfD6sesz9tVIlB9z5KTNduQ0u8vAwsJqqqQbI+5poZ3Koyj+LwqDlbtJlkkVRRxeiZYNYR0zz9PHczHt5CvGsk1fW0NSl38B1eAZLBaw7ksOT6KlbVnZabdfcYbNMaM4phvavVp6NVwaBc2UEi7Jc6zdYRYho02/kxCFFiIQIORjM2jnq7Hv4iK X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(166708455590820); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(3231101)(2400082)(944501161)(6041288)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(20161123562045)(6072148)(201708071742011); SRVR:AM5PR0801MB1331; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0801MB1331; X-Forefront-PRVS: 0577AD41D6 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(6069001)(396003)(346002)(39850400004)(376002)(366004)(39380400002)(199004)(189003)(52116002)(7696005)(58126008)(305945005)(316002)(16526019)(7736002)(386003)(26005)(25786009)(81166006)(6666003)(186003)(106356001)(6506007)(55016002)(81156014)(66066001)(3846002)(47776003)(105586002)(6116002)(1857600001)(9686003)(478600001)(103116003)(33896004)(61506002)(230700001)(6306002)(2486003)(59450400001)(966005)(8676002)(8936002)(55236004)(83506002)(23676004)(2906002)(53936002)(39060400002)(97736004)(68736007)(50466002)(5660300001)(86362001); DIR:OUT; SFP:1102; SCL:1; SRVR:AM5PR0801MB1331; H:localhost.localdomain; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: virtuozzo.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBTTVQUjA4MDFNQjEzMzE7MjM6TGU5TDJsUXZJVU1pMkxpU3NtSTBIVWFu?= =?utf-8?B?Z2I4bFFWOGltSTFyYktOMExIazhscGVNa29icUpOUlZsc25NZU10TkVhVUhT?= =?utf-8?B?TDNCZzZ4bWY2Z3ZYaU5nRzJLKzVzVmtxQW5LS0NEbCtQV08ybWJvL1VLaUQw?= =?utf-8?B?SUhRMEhVWW5FR0s2K0dETmhodkFpWllJeExHbmlZbEhBamM4bmhhanc0eUFU?= =?utf-8?B?b25XRS9wcm1TRURFSVNjVHkwOTZzTDViMlhpTmVLWGpHNTJYYkFsb2h1UnVw?= =?utf-8?B?WWRBY0ZBdTIrOE1wWEZDNi95U1JzcE9yTmxuNnYyRjJwYlNDVE9YQnpYSjBW?= =?utf-8?B?QzcrWDZRNkZUWmNXYUNZN3kzb2NGU3ZkWnp1clQzR3BSQTQvTVZaSkxxSXVY?= =?utf-8?B?Y0xWSUtkYkdKUUdlU2h6WE9ETTNpWWI4SFcxcDJMS0hnUHNvNW5sVWxjVnhI?= =?utf-8?B?QVBqa0JhdXRqQXRPVzEyMkhRVnBKVWs4SElKdzhhMUY4Y2JGeGZTbloybW9B?= =?utf-8?B?TGdnajA3bFZ2Y0VRK1RPRVBYdTlYbGVJSExiRHlkT1ZlMEw1SlZZY3BQbDVk?= =?utf-8?B?a2FZejhBaXNoMFNMMkVKcjdncm9McVNTTlNIclR2UGVFWnduSWsyUWVMaEYy?= =?utf-8?B?SElUYkkreit6eCtZdW41TUNxb3luQUhQaE5pa2lLRjhmdit0MEtReVova0V0?= =?utf-8?B?UkhMYUdoNnlXYmYvS2NJYS9nMVYxMVNoWW5hK3dhWEUxRmxnY2pWNFRQMTlK?= =?utf-8?B?d3ZFNWNjM09yeGFBNGlncU0wdjd1QUtKNWFKVlB1YWJzUWZSamludDRHU3Iw?= =?utf-8?B?TnB0bGVyTlVoOXN5NFM1bThTRjU4ZlpZL1Z5Wk0vL1E0WGhVQlJBWUVhZHRX?= =?utf-8?B?cC9hTlpDLzJlM2E2QUt5NUJTVWVRYkkzdnJ4Mmh3VzNvaDV3NzdNeE9HYXRs?= =?utf-8?B?azZyRCtNdU1qcUNHWWFFNzVmMkxDQWhXVTRXK3d1UU5uTDBOQXQ0cm1BUk9U?= =?utf-8?B?cmJUR0QxczRXWlVoNXp6VUl2bFE3TDlFN1hSMmllTHhuRTAwbkdScndTdkhN?= =?utf-8?B?NTUwN1hDa3laVnByRjdVOWE0NHREZ1IvWVBiVW1QblFxMURHRXhzczkzSFhC?= =?utf-8?B?UVhpTHZmUWNHRFFtV2MyVVdjRjExb202cmZmclh5U1lURTkwRE4xcUc3My9X?= =?utf-8?B?dnVrMHpjZUpMWER2Z3FtWVExQklkYzI0b1oxNFU1bnlOMExzKys3SHVHbVFm?= =?utf-8?B?NlhaRzRKNEphQ0R2SUNzOFpZUVc4Vi9DdlZwSDd1cUZaUkVqd2t5a0Y2OU41?= =?utf-8?B?ejV3anl4c1NOcWw4emRxajZyNHBCck1ZYUpqYXFxdnZLNjlsZTFDS3FsZUVn?= =?utf-8?B?SzhYbEZ3bTViSFRjWGhFaWpZZFJ2S0M1U2FITHNadDI1ek94ZjhEaUIrNGp6?= =?utf-8?B?L1JBdDcySFIxRjB3MHZKSkM0MVl4b3lMZnY2VUFXL3lGSzYwS3VHcmhqSW94?= =?utf-8?B?bURaWnlmdmI2MDN2bTdzN3NXOStZN3ArbmZ2TnRHSjJBSHZTZjhydlJsSWJk?= =?utf-8?B?R09pWmlQdGU2NTZFcFFUZUtrMzhLSWp3TmJHSlR2SksyYTk4dWdXQ09YbnRN?= =?utf-8?B?WmZqY1pZOGZERk5MY2NmeUtGVE81cW5ienhSQnhsakxFQm5yVThJbFdLRGpV?= =?utf-8?Q?WhPx46+HWUq9FMAhUYag=3D?= X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1331; 6:wp6cxcFU6javSXJsDsjCmt0I8bEYMtr2mTu6AhA/JFPeoyHCtBLK281qZ6RqeQE0EzfeiO/cKcpkzpbyYZXtcXOv+iQnLGxtXzf+o1MlYbYdAxg7FbMLTOZLRfWqKhMNVzJtBR5Iz6THD+GFJ/gA667m7kjfg9jv/dj9nGPTnYQ5Ukogz0Mj0ywRKqorhcFsKTi71v8PiCA7GRBnth9TgqYN7K6eY9jhsyZ9r3/4jSW2YxyU3sHHYMptkKLvfo0rkxE94qsCmpxKr/BESGat6POw83rrOk77JIDV1SOyMSUnY/fjBHWWBAjjuuVoLPrOTFzc81hGs4Gbo27ikgTjYudEhro3E5IgvPCtlN9moB0=; 5:4MHfGYGa98NhPpzZkh0aAmnmOwQxaQszJd+09pQMj6AZIltIMKoEAGORBfLmdqkiNFfW1NSxn0xhQiGoosJH1oDtXJHDwPGo5ZR7OZ5O+reXpc3IABz8dpW8qUM5Yf5vjcIaMVq41ItwzQN2/So5BQFXLTe3HuUyl4Y7KChrdtc=; 24:PGRid0wUS7GPpJ876CsoGwF+Hg3pahiUjzNRNJ/Ybz532cUOjR7THswTc1dwrS6lticr9TDie70tm9Ip91AEsUgFUdUUyqfJTZ172OmOVIg=; 7:9p+s7SAy5wmLRVqr8AXUEgrupxZ5aw8o1Qp2PLbanXMjvbgkIm8Z5q9otLrG1J0bTKBvJLdmusfk7VKASUOppKgCcfix/kmIwO26G0XOokKXSdjvwLHrN6WrB5ZIoBf+Y6DDCV2UFEDj0XDCxjufB9LQhUNd1ZK/14XfhQ5ha0NWGzlFI4mC3QQJ/64/v17XqSVktIvVfDU8GdCy1a68bEVkel1O4tR3IyMTJiRH7F57h9+ihtFpRJ2XlmvErTS8 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1331; 20:HKIPmX3DDnUxoRGiHtxWIO69Tm6j1k08KgPZY4+hA1pWeiHadv+E+cGN8FGi22yellyqfmBqXuJ6GiBlVYWAthVxCnAtsqsKvSWrmsG98GCqEABRLJ8rUgOuBBmtJa0i92q1PhEi7MxC1hGOb9KjuYqRdkM52fAExehIJFxarcg= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2018 15:07:41.3247 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3361b2c7-e479-4ffa-ce33-08d56f05b360 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1331 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Watch descriptor is id of the watch created by inotify_add_watch(). It is allocated in inotify_add_to_idr(), and takes the numbers starting from 1. Every new inotify watch obtains next available number (usually, old + 1), as served by idr_alloc_cyclic(). CRIU (Checkpoint/Restore In Userspace) project supports inotify files, and restores watched descriptors with the same numbers, they had before dump. Since there was no kernel support, we had to use cycle to add a watch with specific descriptor id: while (1) { int wd; wd = inotify_add_watch(inotify_fd, path, mask); if (wd < 0) { break; } else if (wd == desired_wd_id) { ret = 0; break; } inotify_rm_watch(inotify_fd, wd); } (You may find the actual code at the below link: https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577) The cycle is suboptiomal and very expensive, but since there is no better kernel support, it was the only way to restore that. Happily, we had met mostly descriptors with small id, and this approach had worked somehow. But recent time containers with inotify with big watch descriptors begun to come, and this way stopped to work at all. When descriptor id is something about 0x34d71d6, the restoring process spins in busy loop for a long time, and the restore hungs and delay of migration from node to node could easily be watched. This patch aims to solve this problem. It introduces new ioctl INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created watch descriptor from userspace. It simply calls idr_set_cursor() primitive to populate idr::idr_next, so that next idr_alloc_cyclic() allocation will return this id, if it is not occupied. This is the way which is used to restore some other resources from userspace. For example, /proc/sys/kernel/ns_last_pid works the same for task pids. The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system may exclude it. The only change in generic inotify part is idr_alloc_cyclic() end argument. We had 0 there, and idr subsystem replaced it with INT_MAX in idr_get_free(). So, the max possible id was INT_MAX (see idr_get_free() again). Since I need INOTIFY_IDR_END to check ioctl's third argument, it's better it's defined as positive number. But when not-zero value is passed to idr_get_free(), this function decrements it. Also, idr_alloc_cyclic() defined @end as int argument. So, it's impossible to pass positive @end argument to idr_alloc_cyclic() to get INT_MAX id. And after this patch inotify watch descriptors ids will take numbers [1, INT_MAX-1], INT_MAX will be unavailable. Signed-off-by: Kirill Tkhai Reviewed-by: Cyrill Gorcunov --- fs/notify/inotify/inotify_user.c | 19 ++++++++++++++++++- include/uapi/linux/inotify.h | 8 ++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 5c29bf16814f..3c824e252c84 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -44,6 +44,9 @@ #include +#define INOTIFY_IDR_START 1 +#define INOTIFY_IDR_END S32_MAX + /* configurable via /proc/sys/fs/inotify/ */ static int inotify_max_queued_events __read_mostly; @@ -285,6 +288,7 @@ static int inotify_release(struct inode *ignored, struct file *file) static long inotify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { + struct inotify_group_private_data *data __maybe_unused; struct fsnotify_group *group; struct fsnotify_event *fsn_event; void __user *p; @@ -293,6 +297,7 @@ static long inotify_ioctl(struct file *file, unsigned int cmd, group = file->private_data; p = (void __user *) arg; + data = &group->inotify_data; pr_debug("%s: group=%p cmd=%u\n", __func__, group, cmd); @@ -307,6 +312,17 @@ static long inotify_ioctl(struct file *file, unsigned int cmd, spin_unlock(&group->notification_lock); ret = put_user(send_len, (int __user *) p); break; +#ifdef CONFIG_CHECKPOINT_RESTORE + case INOTIFY_IOC_SETNEXTWD: + ret = -EINVAL; + if (arg >= INOTIFY_IDR_START && arg < INOTIFY_IDR_END) { + spin_lock(&data->idr_lock); + idr_set_cursor(&data->idr, (unsigned int)arg); + spin_unlock(&data->idr_lock); + ret = 0; + } + break; +#endif /* CONFIG_CHECKPOINT_RESTORE */ } return ret; @@ -349,7 +365,8 @@ static int inotify_add_to_idr(struct idr *idr, spinlock_t *idr_lock, idr_preload(GFP_KERNEL); spin_lock(idr_lock); - ret = idr_alloc_cyclic(idr, i_mark, 1, 0, GFP_NOWAIT); + ret = idr_alloc_cyclic(idr, i_mark, INOTIFY_IDR_START, + INOTIFY_IDR_END, GFP_NOWAIT); if (ret >= 0) { /* we added the mark to the idr, take a reference */ i_mark->wd = ret; diff --git a/include/uapi/linux/inotify.h b/include/uapi/linux/inotify.h index 5474461683db..245489342c14 100644 --- a/include/uapi/linux/inotify.h +++ b/include/uapi/linux/inotify.h @@ -71,5 +71,13 @@ struct inotify_event { #define IN_CLOEXEC O_CLOEXEC #define IN_NONBLOCK O_NONBLOCK +/* + * ioctl numbers: inotify uses 'I' prefix for all ioctls, + * except historical FIONREAD, which based on 'T'. + * + * INOTIFY_IOC_SETNEXTWD: set desired number of next created + * watch descriptor. + */ +#define INOTIFY_IOC_SETNEXTWD _IOW('I', 0, __s32) #endif /* _UAPI_LINUX_INOTIFY_H */