Message ID | 1551379645-819-4-git-send-email-longman@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ipc: Increase IPCMNI limit | expand |
Hi Waiman, On 2/28/19 7:47 PM, Waiman Long wrote: > For ipcmni_extend mode, the sequence number space is only 7 bits. So > the chance of id reuse is relatively high compared with the non-extended > mode. > > To alleviate this id reuse problem, the id allocation will be done > cyclically to cycle through all the 24-bit id space before wrapping > around when in ipcmni_extend mode. This may cause the use of more memory > in term of the number of xa_nodes allocated as well as potentially more > cachelines used as the xa_nodes may be spread more sparsely in this case. > > There is probably a slight memory and performance cost in doing cyclic > id allocation. For applications that really need more than 32k unique IPC > identifiers, this is a small price to pay to avoid the id reuse problem. Have you measured it? I have observed -3% for semop() for a 4 level radix tree compared to a 1-level radix tree, and I'm a bit reluctant to accept that. Especially as the percentage will increase if the syscall overhead goes down again (-> less spectre impact). [...] > --- a/ipc/util.c > +++ b/ipc/util.c > @@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) > */ > > if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ > - idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); > + if (ipc_mni_extended) > + idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni, > + GFP_NOWAIT); > + else > + idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); > + > if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX)) > ids->seq = 0; I don't like it that there are two different codepaths. Attached is a different proposal: Always use cyclic allocation, with some logic to minimize the additional radix tree levels. What do you think? -- Manfred From 491ea87cc3022e50c02caae009f9aeba2b6ddcb4 Mon Sep 17 00:00:00 2001 From: Manfred Spraul <manfred@colorfullife.com> Date: Sun, 17 Mar 2019 06:29:00 +0100 Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc objects. For ipcmni_extend mode, the sequence number space is only 7 bits. So the chance of id reuse is relatively high compared with the non-extended mode. To alleviate this id reuse problem, this patch enables cyclic allocation for the index to the radix tree (idx). The disadvantage is that this can cause a slight slow-down of the fast path, as the radix tree could be higher than necessary. To limit the radix tree height, I have chosen the following limits: - 1) The cycling is done over in_use*1.5. - 2) At least, the cycling is done over "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements "ipcmni_extended": 4096 elements Result: - for normal mode: No change for <= 42 active ipc elements. With more than 42 active ipc elements, a 2nd level would be added to the radix tree. Without cyclic allocation, a 2nd level would be added only with more than 63 active elements. - for extended mode: Cycling creates always at least a 2-level radix tree. With more than 2730 active objects, a 3rd level would be added, instead of > 4095 active objects until the 3rd level is added without cyclic allocation. For a 2-level radix tree compared to a 1-level radix tree, I have observed < 1% performance impact. I consider this as a good compromise. Notes: 1) Normal "x=semget();y=semget();" is unaffected: Then the idx is e.g. a and a+1, regardless of idr_alloc() or idr_alloc_cyclic() is used. 2) The -1% happens in a microbenchmark after this situation: x=semget(); for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);} y=semget(); Now perform semget calls on x and y that do not sleep. 3) The worst-case reuse cycle time is unfortunately unaffected: If you have 2^24-1 ipc objects allocated, and get/remove the last possible element in a loop, then the id is reused after 128 get/remove pairs. Performance check: A microbenchmark that performes no-op semop() randomly on two IDs, with only these two IDs allocated. The IDs were set using /proc/sys/kernel/sem_next_id. The test was run 5 times, averages are shown. 1 & 2: Base (6.22 seconds for 10.000.000 semops) 1 & 40: -0.2% 1 & 3348: - 0.8% 1 & 27348: - 1.6% 1 & 15777204: - 3.2% Or: ~12.6 cpu cycles per additional radix tree level. The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower than what I remember (spectre impact?). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- ipc/ipc_sysctl.c | 2 ++ ipc/util.c | 10 +++++++++- ipc/util.h | 3 +++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 73b7782eccf4..bfaae457810c 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -122,6 +122,7 @@ static int one = 1; static int int_max = INT_MAX; int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; +int ipc_min_cycle = RADIX_TREE_MAP_SIZE; static struct ctl_table ipc_kern_table[] = { { @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index 6e0fe3410423..90764b67af51 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -221,9 +221,17 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) */ if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ + int max_idx; + + max_idx = ids->in_use*3/2; + if (max_idx > ipc_mni) + max_idx = ipc_mni; + if (max_idx < ipc_min_cycle) + max_idx = ipc_min_cycle; /* allocate the idx, with a NULL struct kern_ipc_perm */ - idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT); + idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx, + GFP_NOWAIT); if (idx >= 0) { /* diff --git a/ipc/util.h b/ipc/util.h index 8c834ed39012..ef4e86bb2db8 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -27,12 +27,14 @@ */ #define IPCMNI_SHIFT 15 #define IPCMNI_EXTEND_SHIFT 24 +#define IPCMNI_EXTEND_MIN_CYCLE (2 << 12) #define IPCMNI (1 << IPCMNI_SHIFT) #define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) #ifdef CONFIG_SYSVIPC_SYSCTL extern int ipc_mni; extern int ipc_mni_shift; +extern int ipc_min_cycle; #define ipcmni_seq_shift() ipc_mni_shift #define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) @@ -40,6 +42,7 @@ extern int ipc_mni_shift; #else /* CONFIG_SYSVIPC_SYSCTL */ #define ipc_mni IPCMNI +#define ipc_min_cycle RADIX_TREE_MAP_SIZE #define ipcmni_seq_shift() IPCMNI_SHIFT #define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) #endif /* CONFIG_SYSVIPC_SYSCTL */
On 03/17/2019 02:27 PM, Manfred Spraul wrote: > Hi Waiman, > > On 2/28/19 7:47 PM, Waiman Long wrote: >> For ipcmni_extend mode, the sequence number space is only 7 bits. So >> the chance of id reuse is relatively high compared with the non-extended >> mode. >> >> To alleviate this id reuse problem, the id allocation will be done >> cyclically to cycle through all the 24-bit id space before wrapping >> around when in ipcmni_extend mode. This may cause the use of more memory >> in term of the number of xa_nodes allocated as well as potentially more >> cachelines used as the xa_nodes may be spread more sparsely in this >> case. >> >> There is probably a slight memory and performance cost in doing cyclic >> id allocation. For applications that really need more than 32k unique >> IPC >> identifiers, this is a small price to pay to avoid the id reuse problem. > > Have you measured it? > > I have observed -3% for semop() for a 4 level radix tree compared to a > 1-level radix tree, and I'm a bit reluctant to accept that. > Especially as the percentage will increase if the syscall overhead > goes down again (-> less spectre impact). > It is both Spectre (retpoline) and Meltdown (PTI). PTI is not needed in AMD CPU and so you may see a bit higher slowdown. > [...] > >> --- a/ipc/util.c >> +++ b/ipc/util.c >> @@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids >> *ids, struct kern_ipc_perm *new) >> */ >> if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ >> - idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); >> + if (ipc_mni_extended) >> + idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni, >> + GFP_NOWAIT); >> + else >> + idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); >> + >> if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX)) >> ids->seq = 0; > > I don't like it that there are two different codepaths. > > Attached is a different proposal: > > Always use cyclic allocation, with some logic to minimize the > additional radix tree levels. > > What do you think? Your proposed patch look good. I saw that you use the max_idx to limit radix tree nesting level which mitigate my concern of using more memory and slower performance. I do have some minor comments about the patch in a later email. -Longman
On 03/18/2019 02:37 PM, Waiman Long wrote: > On 03/17/2019 02:27 PM, Manfred Spraul wrote: >> Hi Waiman, >> >> On 2/28/19 7:47 PM, Waiman Long wrote: >>> For ipcmni_extend mode, the sequence number space is only 7 bits. So >>> the chance of id reuse is relatively high compared with the non-extended >>> mode. >>> >>> To alleviate this id reuse problem, the id allocation will be done >>> cyclically to cycle through all the 24-bit id space before wrapping >>> around when in ipcmni_extend mode. This may cause the use of more memory >>> in term of the number of xa_nodes allocated as well as potentially more >>> cachelines used as the xa_nodes may be spread more sparsely in this >>> case. >>> >>> There is probably a slight memory and performance cost in doing cyclic >>> id allocation. For applications that really need more than 32k unique >>> IPC >>> identifiers, this is a small price to pay to avoid the id reuse problem. >> Have you measured it? >> >> I have observed -3% for semop() for a 4 level radix tree compared to a >> 1-level radix tree, and I'm a bit reluctant to accept that. >> Especially as the percentage will increase if the syscall overhead >> goes down again (-> less spectre impact). >> > It is both Spectre (retpoline) and Meltdown (PTI). PTI is not needed in > AMD CPU and so you may see a bit higher slowdown. The use of idr_replace() in your previous patch may also slow the code path a bit to reduce the performance difference that you saw. This is actually my main concern with using idr_replace() as suggested by Matthew, but I am OK to use it if people think absolute correctness is more important. Cheers, Longman
Hi Waiman, On 3/18/19 7:46 PM, Waiman Long wrote: > --- a/ipc/util.c >> +++ b/ipc/util.c >> @@ -221,9 +221,17 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) >> */ >> >> if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ >> + int max_idx; >> + >> + max_idx = ids->in_use*3/2; >> + if (max_idx > ipc_mni) >> + max_idx = ipc_mni; >> + if (max_idx < ipc_min_cycle) >> + max_idx = ipc_min_cycle; > > Why don't you use the min() and max() macros which will make it easier > to read? > Changed. >> >> /* allocate the idx, with a NULL struct kern_ipc_perm */ >> - idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT); >> + idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx, >> + GFP_NOWAIT); >> >> if (idx >= 0) { >> /* >> diff --git a/ipc/util.h b/ipc/util.h >> index 8c834ed39012..ef4e86bb2db8 100644 >> --- a/ipc/util.h >> +++ b/ipc/util.h >> @@ -27,12 +27,14 @@ >> */ >> #define IPCMNI_SHIFT 15 >> #define IPCMNI_EXTEND_SHIFT 24 >> +#define IPCMNI_EXTEND_MIN_CYCLE (2 << 12) > > How about > > #define IPCMNI_EXTEND_MIN_CYCLE (RADIX_TREE_MAP_SIZE * > RADIX_TREE_MAP_SIZE) > Good idea. Actually, "2<<12" was the initial guess. And then I noticed that this ends up as a two level radix tree during testing :-) Updated patch attached. -- Manfred From 844c9d78cea41983a89c820bd5265ceded59883b Mon Sep 17 00:00:00 2001 From: Manfred Spraul <manfred@colorfullife.com> Date: Sun, 17 Mar 2019 06:29:00 +0100 Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc object. For ipcmni_extend mode, the sequence number space is only 7 bits. So the chance of id reuse is relatively high compared with the non-extended mode. To alleviate this id reuse problem, this patch enables cyclic allocation for the index to the radix tree (idx). The disadvantage is that this can cause a slight slow-down of the fast path, as the radix tree could be higher than necessary. To limit the radix tree height, I have chosen the following limits: - 1) The cycling is done over in_use*1.5. - 2) At least, the cycling is done over "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements "ipcmni_extended": 4096 elements Result: - for normal mode: No change for <= 42 active ipc elements. With more than 42 active ipc elements, a 2nd level would be added to the radix tree. Without cyclic allocation, a 2nd level would be added only with more than 63 active elements. - for extended mode: Cycling creates always at least a 2-level radix tree. With more than 2730 active objects, a 3rd level would be added, instead of > 4095 active objects until the 3rd level is added without cyclic allocation. For a 2-level radix tree compared to a 1-level radix tree, I have observed < 1% performance impact. Notes: 1) Normal "x=semget();y=semget();" is unaffected: Then the idx is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic() is used. 2) The -1% happens in a microbenchmark after this situation: x=semget(); for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);} y=semget(); Now perform semget calls on x and y that do not sleep. 3) The worst-case reuse cycle time is unfortunately unaffected: If you have 2^24-1 ipc objects allocated, and get/remove the last possible element in a loop, then the id is reused after 128 get/remove pairs. Performance check: A microbenchmark that performes no-op semop() randomly on two IDs, with only these two IDs allocated. The IDs were set using /proc/sys/kernel/sem_next_id. The test was run 5 times, averages are shown. 1 & 2: Base (6.22 seconds for 10.000.000 semops) 1 & 40: -0.2% 1 & 3348: - 0.8% 1 & 27348: - 1.6% 1 & 15777204: - 3.2% Or: ~12.6 cpu cycles per additional radix tree level. The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower than what I remember (spectre impact?). V2 of the patch: - use "min" and "max" - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of (2<<12). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- ipc/ipc_sysctl.c | 2 ++ ipc/util.c | 7 ++++++- ipc/util.h | 3 +++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 73b7782eccf4..bfaae457810c 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -122,6 +122,7 @@ static int one = 1; static int int_max = INT_MAX; int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; +int ipc_min_cycle = RADIX_TREE_MAP_SIZE; static struct ctl_table ipc_kern_table[] = { { @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index 6e0fe3410423..1a492afb1d8b 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -221,9 +221,14 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) */ if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ + int max_idx; + + max_idx = max(ids->in_use*3/2, ipc_min_cycle); + max_idx = min(max_idx, ipc_mni); /* allocate the idx, with a NULL struct kern_ipc_perm */ - idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT); + idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx, + GFP_NOWAIT); if (idx >= 0) { /* diff --git a/ipc/util.h b/ipc/util.h index 8c834ed39012..d316399f0c32 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -27,12 +27,14 @@ */ #define IPCMNI_SHIFT 15 #define IPCMNI_EXTEND_SHIFT 24 +#define IPCMNI_EXTEND_MIN_CYCLE (RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE) #define IPCMNI (1 << IPCMNI_SHIFT) #define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) #ifdef CONFIG_SYSVIPC_SYSCTL extern int ipc_mni; extern int ipc_mni_shift; +extern int ipc_min_cycle; #define ipcmni_seq_shift() ipc_mni_shift #define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) @@ -40,6 +42,7 @@ extern int ipc_mni_shift; #else /* CONFIG_SYSVIPC_SYSCTL */ #define ipc_mni IPCMNI +#define ipc_min_cycle RADIX_TREE_MAP_SIZE #define ipcmni_seq_shift() IPCMNI_SHIFT #define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) #endif /* CONFIG_SYSVIPC_SYSCTL */
On 03/19/2019 02:18 PM, Manfred Spraul wrote: > From 844c9d78cea41983a89c820bd5265ceded59883b Mon Sep 17 00:00:00 2001 > From: Manfred Spraul <manfred@colorfullife.com> > Date: Sun, 17 Mar 2019 06:29:00 +0100 > Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc object. > > For ipcmni_extend mode, the sequence number space is only 7 bits. So > the chance of id reuse is relatively high compared with the non-extended > mode. > > To alleviate this id reuse problem, this patch enables cyclic allocation > for the index to the radix tree (idx). > The disadvantage is that this can cause a slight slow-down of the fast > path, as the radix tree could be higher than necessary. > > To limit the radix tree height, I have chosen the following limits: > - 1) The cycling is done over in_use*1.5. > - 2) At least, the cycling is done over > "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements > "ipcmni_extended": 4096 elements > > Result: > - for normal mode: > No change for <= 42 active ipc elements. With more than 42 > active ipc elements, a 2nd level would be added to the radix > tree. > Without cyclic allocation, a 2nd level would be added only with > more than 63 active elements. > > - for extended mode: > Cycling creates always at least a 2-level radix tree. > With more than 2730 active objects, a 3rd level would be > added, instead of > 4095 active objects until the 3rd level > is added without cyclic allocation. > > For a 2-level radix tree compared to a 1-level radix tree, I have > observed < 1% performance impact. > > Notes: > 1) Normal "x=semget();y=semget();" is unaffected: Then the idx > is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic() > is used. > > 2) The -1% happens in a microbenchmark after this situation: > x=semget(); > for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);} > y=semget(); > Now perform semget calls on x and y that do not sleep. > > 3) The worst-case reuse cycle time is unfortunately unaffected: > If you have 2^24-1 ipc objects allocated, and get/remove the last > possible element in a loop, then the id is reused after 128 > get/remove pairs. > > Performance check: > A microbenchmark that performes no-op semop() randomly on two IDs, > with only these two IDs allocated. > The IDs were set using /proc/sys/kernel/sem_next_id. > The test was run 5 times, averages are shown. > > 1 & 2: Base (6.22 seconds for 10.000.000 semops) > 1 & 40: -0.2% > 1 & 3348: - 0.8% > 1 & 27348: - 1.6% > 1 & 15777204: - 3.2% > > Or: ~12.6 cpu cycles per additional radix tree level. > The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower > than what I remember (spectre impact?). > > V2 of the patch: > - use "min" and "max" > - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of > (2<<12). > > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> > --- > ipc/ipc_sysctl.c | 2 ++ > ipc/util.c | 7 ++++++- > ipc/util.h | 3 +++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c > index 73b7782eccf4..bfaae457810c 100644 > --- a/ipc/ipc_sysctl.c > +++ b/ipc/ipc_sysctl.c > @@ -122,6 +122,7 @@ static int one = 1; > static int int_max = INT_MAX; > int ipc_mni = IPCMNI; > int ipc_mni_shift = IPCMNI_SHIFT; > +int ipc_min_cycle = RADIX_TREE_MAP_SIZE; > > static struct ctl_table ipc_kern_table[] = { > { > @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str) > { > ipc_mni = IPCMNI_EXTEND; > ipc_mni_shift = IPCMNI_EXTEND_SHIFT; > + ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE; > pr_info("IPCMNI extended to %d.\n", ipc_mni); > return 0; > } > diff --git a/ipc/util.c b/ipc/util.c > index 6e0fe3410423..1a492afb1d8b 100644 > --- a/ipc/util.c > +++ b/ipc/util.c > @@ -221,9 +221,14 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) > */ > > if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ > + int max_idx; > + > + max_idx = max(ids->in_use*3/2, ipc_min_cycle); > + max_idx = min(max_idx, ipc_mni); > > /* allocate the idx, with a NULL struct kern_ipc_perm */ > - idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT); > + idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx, > + GFP_NOWAIT); > > if (idx >= 0) { > /* > diff --git a/ipc/util.h b/ipc/util.h > index 8c834ed39012..d316399f0c32 100644 > --- a/ipc/util.h > +++ b/ipc/util.h > @@ -27,12 +27,14 @@ > */ > #define IPCMNI_SHIFT 15 > #define IPCMNI_EXTEND_SHIFT 24 > +#define IPCMNI_EXTEND_MIN_CYCLE (RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE) > #define IPCMNI (1 << IPCMNI_SHIFT) > #define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) > > #ifdef CONFIG_SYSVIPC_SYSCTL > extern int ipc_mni; > extern int ipc_mni_shift; > +extern int ipc_min_cycle; > > #define ipcmni_seq_shift() ipc_mni_shift > #define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) > @@ -40,6 +42,7 @@ extern int ipc_mni_shift; > #else /* CONFIG_SYSVIPC_SYSCTL */ > > #define ipc_mni IPCMNI > +#define ipc_min_cycle RADIX_TREE_MAP_SIZE > #define ipcmni_seq_shift() IPCMNI_SHIFT > #define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) > #endif /* CONFIG_SYSVIPC_SYSCTL */ > -- 2.17.2 Acked-by: Waiman Long <longman@redhat.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 074b775..bb851d0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1813,7 +1813,10 @@ See Documentation/filesystems/nfs/nfsroot.txt. ipcmni_extend [KNL] Extend the maximum number of unique System V - IPC identifiers from 32,768 to 16,777,216. + IPC identifiers from 32,768 to 16,777,216. Also do + cyclical identifier allocation through the entire + 24-bit identifier space to reduce the chance of + identifier reuse. irqaffinity= [SMP] Set the default irq affinity mask The argument is a cpu list, as described above. diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 73b7782..d9ac6ca 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -122,6 +122,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int int_max = INT_MAX; int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; +bool ipc_mni_extended; static struct ctl_table ipc_kern_table[] = { { @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_mni_extended = true; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index 0a835a4..78e14ac 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new) */ if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */ - idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); + if (ipc_mni_extended) + idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni, + GFP_NOWAIT); + else + idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT); + if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX)) ids->seq = 0; new->seq = ids->seq; diff --git a/ipc/util.h b/ipc/util.h index 6a88d51..9f0dd79 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -33,6 +33,7 @@ #ifdef CONFIG_SYSVIPC_SYSCTL extern int ipc_mni; extern int ipc_mni_shift; +extern bool ipc_mni_extended; #define IPCMNI_SEQ_SHIFT ipc_mni_shift #define IPCMNI_IDX_MASK ((1 << ipc_mni_shift) - 1) @@ -40,6 +41,7 @@ #else /* CONFIG_SYSVIPC_SYSCTL */ #define ipc_mni IPCMNI +#define ipc_mni_extended false #define IPCMNI_SEQ_SHIFT IPCMNI_SHIFT #define IPCMNI_IDX_MASK ((1 << IPCMNI_SHIFT) - 1) #endif /* CONFIG_SYSVIPC_SYSCTL */
For ipcmni_extend mode, the sequence number space is only 7 bits. So the chance of id reuse is relatively high compared with the non-extended mode. To alleviate this id reuse problem, the id allocation will be done cyclically to cycle through all the 24-bit id space before wrapping around when in ipcmni_extend mode. This may cause the use of more memory in term of the number of xa_nodes allocated as well as potentially more cachelines used as the xa_nodes may be spread more sparsely in this case. There is probably a slight memory and performance cost in doing cyclic id allocation. For applications that really need more than 32k unique IPC identifiers, this is a small price to pay to avoid the id reuse problem. As a result, the chance of id reuse should be even smaller in the ipcmni_extend mode. For users who worry about id reuse, they can turn on ipcmni_extend mode, even if they don't need more than 32k IPC identifiers. Signed-off-by: Waiman Long <longman@redhat.com> --- Documentation/admin-guide/kernel-parameters.txt | 5 ++++- ipc/ipc_sysctl.c | 2 ++ ipc/util.c | 7 ++++++- ipc/util.h | 2 ++ 4 files changed, 14 insertions(+), 2 deletions(-)