From patchwork Mon Nov 5 15:43:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10668561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5511175A for ; Mon, 5 Nov 2018 15:48:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 690862917B for ; Mon, 5 Nov 2018 15:48:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5CDFF29832; Mon, 5 Nov 2018 15:48:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A964329505 for ; Mon, 5 Nov 2018 15:48:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730029AbeKFBGO (ORCPT ); Mon, 5 Nov 2018 20:06:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41576 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729019AbeKFBGN (ORCPT ); Mon, 5 Nov 2018 20:06:13 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 334B158E53; Mon, 5 Nov 2018 15:45:55 +0000 (UTC) Received: from llong.com (dhcp-17-8.bos.redhat.com [10.18.17.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id B34731084286; Mon, 5 Nov 2018 15:45:52 +0000 (UTC) From: Waiman Long To: "Luis R. Rodriguez" , Kees Cook , Andrew Morton , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Al Viro , Matthew Wilcox , "Eric W. Biederman" , Takashi Iwai , Davidlohr Bueso , Manfred Spraul , Waiman Long Subject: [PATCH v10 1/4] ipc: Allow boot time extension of IPCMNI from 32k to 8M Date: Mon, 5 Nov 2018 10:43:43 -0500 Message-Id: <1541432626-27780-2-git-send-email-longman@redhat.com> In-Reply-To: <1541432626-27780-1-git-send-email-longman@redhat.com> References: <1541432626-27780-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 05 Nov 2018 15:45:55 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more, especially those that are migrating from Solaris which uses 24 bits for unique identifiers. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 8M. This is a 256X increase which hopefully is big enough for them. The use of this new option will change the pattern of the IPC identifiers returned by functions like shmget(2). An application that depends on such pattern may not work properly. So it should only be used if the users really need more than 32k of unique IPC numbers. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 256. So it is a trade-off. The computation of a new IPC id is not done in the performance critical path. So a little bit of additional overhead shouldn't have any real performance impact. Signed-off-by: Waiman Long --- Documentation/admin-guide/kernel-parameters.txt | 3 ++ ipc/ipc_sysctl.c | 12 ++++++- ipc/util.c | 10 +++--- ipc/util.h | 44 ++++++++++++++++++++----- 4 files changed, 54 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b90fe3b..0449e0c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1795,6 +1795,9 @@ ip= [IP_PNP] See Documentation/filesystems/nfs/nfsroot.txt. + ipcmni_extend [KNL] Extend the maximum number of unique System V + IPC identifiers from 32,768 to 8,388,608. + irqaffinity= [SMP] Set the default irq affinity mask The argument is a cpu list, as described above. diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 49f9bf4..73b7782 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int zero; static int one = 1; static int int_max = INT_MAX; -static int ipc_mni = IPCMNI; +int ipc_mni = IPCMNI; +int ipc_mni_shift = IPCMNI_SHIFT; static struct ctl_table ipc_kern_table[] = { { @@ -246,3 +247,12 @@ static int __init ipc_sysctl_init(void) } device_initcall(ipc_sysctl_init); + +static int __init ipc_mni_extend(char *str) +{ + ipc_mni = IPCMNI_EXTEND; + ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + pr_info("IPCMNI extended to %d.\n", ipc_mni); + return 0; +} +early_param("ipcmni_extend", ipc_mni_extend); diff --git a/ipc/util.c b/ipc/util.c index 0af0575..07ae117 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -110,7 +110,7 @@ static int __init ipc_init(void) * @ids: ipc identifier set * * Set up the sequence range to use for the ipc identifier range (limited - * below IPCMNI) then initialise the keys hashtable and ids idr. + * below ipc_mni) then initialise the keys hashtable and ids idr. */ void ipc_init_ids(struct ipc_ids *ids) { @@ -226,7 +226,7 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) 0, GFP_NOWAIT); } if (idx >= 0) - new->id = SEQ_MULTIPLIER * new->seq + idx; + new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx; return idx; } @@ -254,8 +254,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit) /* 1) Initialize the refcount so that ipc_rcu_putref works */ refcount_set(&new->refcount, 1); - if (limit > IPCMNI) - limit = IPCMNI; + if (limit > ipc_mni) + limit = ipc_mni; if (ids->in_use >= limit) return -ENOSPC; @@ -738,7 +738,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos, if (total >= ids->in_use) return NULL; - for (; pos < IPCMNI; pos++) { + for (; pos < ipc_mni; pos++) { ipc = idr_find(&ids->ipcs_idr, pos); if (ipc != NULL) { *new_pos = pos + 1; diff --git a/ipc/util.h b/ipc/util.h index d768fdb..640f916 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -15,8 +15,34 @@ #include #include -#define IPCMNI 32768 /* <= MAX_INT limit for ipc arrays (including sysctl changes) */ -#define SEQ_MULTIPLIER (IPCMNI) +/* + * The IPC ID contains 2 separate numbers - index and sequence number. + * By default, + * bits 0-14: index (32k, 15 bits) + * bits 15-30: sequence number (64k, 16 bits) + * + * When IPCMNI extension mode is turned on, the composition changes: + * bits 0-22: index (8M, 23 bits) + * bits 23-30: sequence number (256, 8 bits) + */ +#define IPCMNI_SHIFT 15 +#define IPCMNI_EXTEND_SHIFT 23 +#define IPCMNI (1 << IPCMNI_SHIFT) +#define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) + +#ifdef CONFIG_SYSVIPC_SYSCTL +extern int ipc_mni; +extern int ipc_mni_shift; + +#define IPCMNI_SEQ_SHIFT ipc_mni_shift +#define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) + +#else /* CONFIG_SYSVIPC_SYSCTL */ + +#define ipc_mni IPCMNI +#define IPCMNI_SEQ_SHIFT IPCMNI_SHIFT +#define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) +#endif /* CONFIG_SYSVIPC_SYSCTL */ void sem_init(void); void msg_init(void); @@ -96,9 +122,9 @@ void __init ipc_init_proc_interface(const char *path, const char *header, #define IPC_MSG_IDS 1 #define IPC_SHM_IDS 2 -#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER) -#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER) -#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX) +#define ipcid_to_idx(id) ((id) & IPCMNI_IDX_MASK) +#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT) +#define IPCID_SEQ_MAX (INT_MAX >> IPCMNI_SEQ_SHIFT) /* must be called with ids->rwsem acquired for writing */ int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int); @@ -123,8 +149,8 @@ static inline int ipc_get_maxidx(struct ipc_ids *ids) if (ids->in_use == 0) return -1; - if (ids->in_use == IPCMNI) - return IPCMNI - 1; + if (ids->in_use == ipc_mni) + return ipc_mni - 1; return ids->max_idx; } @@ -219,10 +245,10 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, static inline int sem_check_semmni(struct ipc_namespace *ns) { /* - * Check semmni range [0, IPCMNI] + * Check semmni range [0, ipc_mni] * semmni is the last element of sem_ctls[4] array */ - return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI)) + return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > ipc_mni)) ? -ERANGE : 0; } From patchwork Mon Nov 5 15:43:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10668517 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 541A615A6 for ; Mon, 5 Nov 2018 15:46:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43E7929AD9 for ; Mon, 5 Nov 2018 15:46:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 36FCB29AFD; Mon, 5 Nov 2018 15:46:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFC8429AD9 for ; Mon, 5 Nov 2018 15:46:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730153AbeKFBGQ (ORCPT ); Mon, 5 Nov 2018 20:06:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:32992 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729019AbeKFBGP (ORCPT ); Mon, 5 Nov 2018 20:06:15 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DDAF73DD47; Mon, 5 Nov 2018 15:45:56 +0000 (UTC) Received: from llong.com (dhcp-17-8.bos.redhat.com [10.18.17.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id CECAC108BF00; Mon, 5 Nov 2018 15:45:54 +0000 (UTC) From: Waiman Long To: "Luis R. Rodriguez" , Kees Cook , Andrew Morton , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Al Viro , Matthew Wilcox , "Eric W. Biederman" , Takashi Iwai , Davidlohr Bueso , Manfred Spraul , Waiman Long Subject: [PATCH v10 2/4] ipc: Conserve sequence numbers in extended IPCMNI mode Date: Mon, 5 Nov 2018 10:43:44 -0500 Message-Id: <1541432626-27780-3-git-send-email-longman@redhat.com> In-Reply-To: <1541432626-27780-1-git-send-email-longman@redhat.com> References: <1541432626-27780-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 05 Nov 2018 15:45:57 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With extended IPCMNI mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when one or more IDs have been removed previously to make sure that those IDs will not be reused when a new one is built. This is being done only in the new extended IPCMNI mode. Signed-off-by: Waiman Long --- include/linux/ipc_namespace.h | 1 + ipc/ipc_sysctl.c | 2 ++ ipc/util.c | 17 ++++++++++++++--- ipc/util.h | 2 ++ 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 6ab8c1b..7d5f553 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -16,6 +16,7 @@ struct ipc_ids { int in_use; unsigned short seq; + unsigned short deleted; struct rw_semaphore rwsem; struct idr ipcs_idr; int max_idx; diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 73b7782..d9ac6ca 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -122,6 +122,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int int_max = INT_MAX; int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; +bool ipc_mni_extended; static struct ctl_table ipc_kern_table[] = { { @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_mni_extended = true; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index 07ae117..6ae0007 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -115,6 +115,7 @@ static int __init ipc_init(void) void ipc_init_ids(struct ipc_ids *ids) { ids->in_use = 0; + ids->deleted = false; ids->seq = 0; init_rwsem(&ids->rwsem); rhashtable_init(&ids->key_ht, &ipc_kht_params); @@ -198,6 +199,11 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) { int idx, next_id = -1; +/* + * To conserve sequence number space with extended ipc_mni when new ID + * is built, the sequence number is incremented only when one or more + * IDs have been removed previously. + */ #ifdef CONFIG_CHECKPOINT_RESTORE next_id = ids->next_id; ids->next_id = -1; @@ -216,9 +222,13 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) */ if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ - new->seq = ids->seq++; - if (ids->seq > IPCID_SEQ_MAX) - ids->seq = 0; + if (!ipc_mni_extended || ids->deleted) { + ids->seq++; + if (ids->seq > IPCID_SEQ_MAX) + ids->seq = 0; + ids->deleted = false; + } + new->seq = ids->seq; idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); } else { new->seq = ipcid_to_seqx(next_id); @@ -436,6 +446,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp) idr_remove(&ids->ipcs_idr, idx); ipc_kht_remove(ids, ipcp); ids->in_use--; + ids->deleted = true; ipcp->deleted = true; if (unlikely(idx == ids->max_idx)) { diff --git a/ipc/util.h b/ipc/util.h index 640f916..1f19729 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -33,6 +33,7 @@ #ifdef CONFIG_SYSVIPC_SYSCTL extern int ipc_mni; extern int ipc_mni_shift; +extern bool ipc_mni_extended; #define IPCMNI_SEQ_SHIFT ipc_mni_shift #define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) @@ -40,6 +41,7 @@ #else /* CONFIG_SYSVIPC_SYSCTL */ #define ipc_mni IPCMNI +#define ipc_mni_extended false #define IPCMNI_SEQ_SHIFT IPCMNI_SHIFT #define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) #endif /* CONFIG_SYSVIPC_SYSCTL */ From patchwork Mon Nov 5 15:43:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10668559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4903615A6 for ; Mon, 5 Nov 2018 15:48:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 37A9229B2D for ; Mon, 5 Nov 2018 15:48:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B23129B54; Mon, 5 Nov 2018 15:48:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 670DD29B4B for ; Mon, 5 Nov 2018 15:48:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730156AbeKFBGS (ORCPT ); Mon, 5 Nov 2018 20:06:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37178 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729019AbeKFBGR (ORCPT ); Mon, 5 Nov 2018 20:06:17 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E10933001BF3; Mon, 5 Nov 2018 15:45:58 +0000 (UTC) Received: from llong.com (dhcp-17-8.bos.redhat.com [10.18.17.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id B57B3108BF15; Mon, 5 Nov 2018 15:45:56 +0000 (UTC) From: Waiman Long To: "Luis R. Rodriguez" , Kees Cook , Andrew Morton , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Al Viro , Matthew Wilcox , "Eric W. Biederman" , Takashi Iwai , Davidlohr Bueso , Manfred Spraul , Waiman Long Subject: [PATCH v10 3/4] ipc: Make the new sequence number generation mode available to all Date: Mon, 5 Nov 2018 10:43:45 -0500 Message-Id: <1541432626-27780-4-git-send-email-longman@redhat.com> In-Reply-To: <1541432626-27780-1-git-send-email-longman@redhat.com> References: <1541432626-27780-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 05 Nov 2018 15:45:59 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It happens that the new IPC id sequence number inrement mode can be useful to reduce the chance of IPC id reuse even if the ipcmni_extend boot command line parameter isn't specified. So a new ipcid_mode sysctl parameter is added to control the sequence number generation mode - legacy and delete modes. In the legacy mode, the sequence number is incremented every time a new ID is generated. In the delete mode, the number is incremented only if one or more IDs have been previously deleted. The default is legacy for non-ipcmni_extend and delete for ipcmni_extend. This new ipcid_mode parameter is specific to each of the IPC namespaces. Signed-off-by: Waiman Long --- Documentation/sysctl/kernel.txt | 17 +++++++++++++++++ include/linux/ipc_namespace.h | 12 ++++++++++++ ipc/ipc_sysctl.c | 10 ++++++++++ ipc/msg.c | 3 ++- ipc/namespace.c | 2 ++ ipc/sem.c | 3 ++- ipc/shm.c | 3 ++- ipc/util.c | 10 ++++++---- ipc/util.h | 3 ++- 9 files changed, 55 insertions(+), 8 deletions(-) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 37a6795..91bada1 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - hung_task_check_interval_secs - hung_task_warnings - hyperv_record_panic_msg +- ipcid_mode - kexec_load_disabled - kptr_restrict - l2cr [ PPC only ] @@ -398,6 +399,22 @@ Controls whether the panic kmsg data should be reported to Hyper-V. ============================================================== +ipcid_mode: + +Controls how the IPC ids returned by msgget(), semget() and shmget() +are being generated. + +0: legacy mode +1: delete mode + +There are two components in an IPC id - an integer identifier and a +sequence number. In the legacy mode, the sequence number is incremented +every time a new id is generated. In the delete mode, the sequence number +is only incremented if one or more ids have been previously deleted. The +delete mode reduces the chance that a given id will be reused again. + +============================================================== + kexec_load_disabled: A toggle indicating if the kexec_load syscall has been disabled. This diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 7d5f553..79d9d50 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -26,6 +26,15 @@ struct ipc_ids { struct rhashtable key_ht; }; +/* + * IPC id generation mode for controlling how the IPC id returned by + * {msg,sem,shm}get() is being generated. + */ +enum ipc_id_mode { + ipc_id_legacy, /* Sequence # incremented on every allocation */ + ipc_id_delete, /* Sequence # incremented only if an ID was deleted */ +}; + struct ipc_namespace { refcount_t count; struct ipc_ids ids[3]; @@ -39,6 +48,9 @@ struct ipc_namespace { atomic_t msg_bytes; atomic_t msg_hdrs; + /* IPC id generation mode */ + unsigned int ipcid_mode; + size_t shm_ctlmax; size_t shm_ctlall; unsigned long shm_tot; diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index d9ac6ca..4c30e62 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -200,6 +200,15 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, .mode = 0644, .proc_handler = proc_ipc_sem_dointvec, }, + { + .procname = "ipcid_mode", + .data = &init_ipc_ns.ipcid_mode, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_ipc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, #ifdef CONFIG_CHECKPOINT_RESTORE { .procname = "sem_next_id", @@ -254,6 +263,7 @@ static int __init ipc_mni_extend(char *str) ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; ipc_mni_extended = true; + init_ipc_ns.ipcid_mode = ipc_id_delete; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/msg.c b/ipc/msg.c index 0833c64..b401ba2 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -161,7 +161,8 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params) INIT_LIST_HEAD(&msq->q_senders); /* ipc_addid() locks msq upon success. */ - retval = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni); + retval = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni, + ns->ipcid_mode); if (retval < 0) { ipc_rcu_putref(&msq->q_perm, msg_rcu_free); return retval; diff --git a/ipc/namespace.c b/ipc/namespace.c index 2160779..8b62cbd0 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -55,6 +55,8 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, ns->user_ns = get_user_ns(user_ns); ns->ucounts = ucounts; + ns->ipcid_mode = ipc_mni_extended ? ipc_id_delete : ipc_id_legacy; + err = mq_init_ns(ns); if (err) goto fail_put; diff --git a/ipc/sem.c b/ipc/sem.c index 745dc61..51af634 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -553,7 +553,8 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params) sma->sem_ctime = ktime_get_real_seconds(); /* ipc_addid() locks sma upon success. */ - retval = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni); + retval = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni, + ns->ipcid_mode); if (retval < 0) { ipc_rcu_putref(&sma->sem_perm, sem_rcu_free); return retval; diff --git a/ipc/shm.c b/ipc/shm.c index 0842411..182ae4c 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -676,7 +676,8 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) shp->shm_creator = current; /* ipc_addid() locks shp upon success. */ - error = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni); + error = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni, + ns->ipcid_mode); if (error < 0) goto no_id; diff --git a/ipc/util.c b/ipc/util.c index 6ae0007..04c8e31 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -195,7 +195,8 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key) * The caller must own kern_ipc_perm.lock.of the new object. * On error, the function returns a (negative) error code. */ -static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) +static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new, + int idmode) { int idx, next_id = -1; @@ -222,7 +223,7 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) */ if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ - if (!ipc_mni_extended || ids->deleted) { + if (idmode == ipc_id_legacy || ids->deleted) { ids->seq++; if (ids->seq > IPCID_SEQ_MAX) ids->seq = 0; @@ -255,7 +256,8 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) * * Called with writer ipc_ids.rwsem held. */ -int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit) +int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit, + int idmode) { kuid_t euid; kgid_t egid; @@ -282,7 +284,7 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit) new->deleted = false; - idx = ipc_idr_alloc(ids, new); + idx = ipc_idr_alloc(ids, new, idmode); idr_preload_end(); if (idx >= 0 && new->key != IPC_PRIVATE) { diff --git a/ipc/util.h b/ipc/util.h index 1f19729..c8f2d0ed 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -129,7 +129,8 @@ void __init ipc_init_proc_interface(const char *path, const char *header, #define IPCID_SEQ_MAX (INT_MAX >> IPCMNI_SEQ_SHIFT) /* must be called with ids->rwsem acquired for writing */ -int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int); +int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit, + int idmode); /* must be called with both locks acquired. */ void ipc_rmid(struct ipc_ids *, struct kern_ipc_perm *); From patchwork Mon Nov 5 15:43:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10668519 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0854A15A6 for ; Mon, 5 Nov 2018 15:46:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC19329AD9 for ; Mon, 5 Nov 2018 15:46:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DFF5329AFD; Mon, 5 Nov 2018 15:46:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 657A329AD9 for ; Mon, 5 Nov 2018 15:46:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730194AbeKFBGT (ORCPT ); Mon, 5 Nov 2018 20:06:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56302 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730166AbeKFBGS (ORCPT ); Mon, 5 Nov 2018 20:06:18 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 53CC8307EAA5; Mon, 5 Nov 2018 15:46:00 +0000 (UTC) Received: from llong.com (dhcp-17-8.bos.redhat.com [10.18.17.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF05810B034B; Mon, 5 Nov 2018 15:45:58 +0000 (UTC) From: Waiman Long To: "Luis R. Rodriguez" , Kees Cook , Andrew Morton , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Al Viro , Matthew Wilcox , "Eric W. Biederman" , Takashi Iwai , Davidlohr Bueso , Manfred Spraul , Waiman Long Subject: [PATCH v10 4/4] ipc: Add a cyclic mode for id generation Date: Mon, 5 Nov 2018 10:43:46 -0500 Message-Id: <1541432626-27780-5-git-send-email-longman@redhat.com> In-Reply-To: <1541432626-27780-1-git-send-email-longman@redhat.com> References: <1541432626-27780-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 05 Nov 2018 15:46:00 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The idea of using the cyclic mode to reduce id reuse came from Manfred Spraul . There may be a little bit of additional memory/performance overhead in doing cyclic id allocation, but it is a slow path anyway and a bit of overhead shouldn't be an issue. This patch differs from his as the cyclic mode is not the default and has to be explicitly opted in for users who want that. Note that it is possible to use an identifier larger than the given IPC mni number in cyclic mode. Signed-off-by: Waiman Long --- Documentation/sysctl/kernel.txt | 10 ++++++++-- include/linux/ipc_namespace.h | 1 + ipc/ipc_sysctl.c | 3 ++- ipc/util.c | 6 +++++- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 91bada1..6de0679 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -406,12 +406,18 @@ are being generated. 0: legacy mode 1: delete mode +2: cyclic mode There are two components in an IPC id - an integer identifier and a sequence number. In the legacy mode, the sequence number is incremented every time a new id is generated. In the delete mode, the sequence number -is only incremented if one or more ids have been previously deleted. The -delete mode reduces the chance that a given id will be reused again. +is only incremented if one or more ids have been previously deleted. +In the cyclic mode, the sequence number increments in the same way as the +delete mode, but the identifier is allocated cyclically through the whole +IPC identifier number space instead of using the lowest available number. + +The cyclic mode has the lowest chance of IPC id reuse followed by the +delete mode and the legacy mode. ============================================================== diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 79d9d50..481dc02 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -33,6 +33,7 @@ struct ipc_ids { enum ipc_id_mode { ipc_id_legacy, /* Sequence # incremented on every allocation */ ipc_id_delete, /* Sequence # incremented only if an ID was deleted */ + ipc_id_cyclic, /* Identifier is allocated cyclically */ }; struct ipc_namespace { diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 4c30e62..8d114d0 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -119,6 +119,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int zero; static int one = 1; +static int two = 2; static int int_max = INT_MAX; int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; @@ -207,7 +208,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, .mode = 0644, .proc_handler = proc_ipc_dointvec_minmax, .extra1 = &zero, - .extra2 = &one, + .extra2 = &two, }, #ifdef CONFIG_CHECKPOINT_RESTORE { diff --git a/ipc/util.c b/ipc/util.c index 04c8e31..8d73d17 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -230,7 +230,11 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new, ids->deleted = false; } new->seq = ids->seq; - idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); + if (idmode == ipc_id_cyclic) + idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, IPCMNI, + GFP_NOWAIT); + else + idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); } else { new->seq = ipcid_to_seqx(next_id); idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),