Message ID | 857cb160a981b5719d8ed6a3e5e7c456915c64fa.1654086665.git.legion@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | API extension for handling sysctl | expand |
On Wed, Jun 1, 2022 at 6:20 AM Alexey Gladkov <legion@kernel.org> wrote: > > Dynamic memory allocation is needed to modify .data and specify the per > namespace parameter. The new sysctl API is allowed to get rid of the > need for such modification. Ok, this is looking better. That said, a few comments: > > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c > index ef313ecfb53a..833b670c38f3 100644 > --- a/ipc/ipc_sysctl.c > +++ b/ipc/ipc_sysctl.c > @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, > return ret; > } > > +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table); > + > +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file) > +{ > + struct ipc_namespace *ns = current->nsproxy->ipc_ns; > + > + // For now, we only allow changes in init_user_ns. > + if (ns->user_ns != &init_user_ns) > + return -EPERM; > + > +#ifdef CONFIG_CHECKPOINT_RESTORE > + int index = (ctx->table - ipc_sysctls); > + > + switch (index) { > + case IPC_SYSCTL_SEM_NEXT_ID: > + case IPC_SYSCTL_MSG_NEXT_ID: [...] I don't think you actually even compile-tested this, because you're using these IPC_SYSCTL_SEM_NEXT_ID etc enums before you even declared them later in the same file. > +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file, > + char *buffer, size_t *lenp, loff_t *ppos) > +{ > + struct ctl_table table = *ctx->table; > + table.data = data_from_ns(ctx, ctx->table); > + return table.proc_handler(&table, 0, buffer, lenp, ppos); > +} Can we please fix the names and the types of this new 'ctx' structure? Yes, yes, I know the old legacy "sysctl table" is horribly named, and uses "ctl_table". But let's just write it out. It's not some random control table for anything. It's a very odd and specific thing: "sysctl". Let's use the full name. Also, Please just make that "ctl_data" member in that "ctl_context" struct not just have a real name, but a real type. Make it literally be struct ipc_namespace *ipc_ns; and if we end up with other things wanting other pointers, just add a new one (or make a union if we care about the size of that allocation, which I don't see any reason we'd do when it's literally just like a couple of pointers in size). There is no reason to have some pseudo-generic "void *ctl_data" that makes it ambiguous and allows for type confusion and isn't self-documenting. I'd rather have a properly typed pointer that is just initialized to NULL and is not always used or needed, but always has a clear case for *what* it would be used for. Yes, yes, we have f_private etc for things that are really very very generic and have arbitrary users. But 'sysctl' is not that kind of truly generic use. I wish we didn't have that silly "create a temporary ctl_table entry" either, and I wish it was properly named. But it's not worth the pointless churn to fix old bad interfaces. But the new ones should have better names, and try to avoid those bad old decisions. But yeah, I think this all is a step in the right direction. And maybe some of those cases and old 'ctl_table' things can be migrated to just using individual read() functions entirely. The whole 'ctl_table' model was broken, and came from the bad old days with an actual 'sysctl()' system call. Because I think it would be lovely if people would move away from the 'sysctl table' approach entirely for cases where that makes sense, and these guys that already need special handling are very much in that situation. Linus
On Wed, Jun 01, 2022 at 09:45:15AM -0700, Linus Torvalds wrote: > On Wed, Jun 1, 2022 at 6:20 AM Alexey Gladkov <legion@kernel.org> wrote: > > > > Dynamic memory allocation is needed to modify .data and specify the per > > namespace parameter. The new sysctl API is allowed to get rid of the > > need for such modification. > > Ok, this is looking better. That said, a few comments: > > > > > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c > > index ef313ecfb53a..833b670c38f3 100644 > > --- a/ipc/ipc_sysctl.c > > +++ b/ipc/ipc_sysctl.c > > @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, > > return ret; > > } > > > > +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table); > > + > > +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file) > > +{ > > + struct ipc_namespace *ns = current->nsproxy->ipc_ns; > > + > > + // For now, we only allow changes in init_user_ns. > > + if (ns->user_ns != &init_user_ns) > > + return -EPERM; > > + > > +#ifdef CONFIG_CHECKPOINT_RESTORE > > + int index = (ctx->table - ipc_sysctls); > > + > > + switch (index) { > > + case IPC_SYSCTL_SEM_NEXT_ID: > > + case IPC_SYSCTL_MSG_NEXT_ID https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392: > [...] > > I don't think you actually even compile-tested this, because you're > using these IPC_SYSCTL_SEM_NEXT_ID etc enums before you even declared > them later in the same file. I did it but without CONFIG_CHECKPOINT_RESTORE. This is where I'm not sure who can write to ipc sysctls inside ipc_namespace. > > +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file, > > + char *buffer, size_t *lenp, loff_t *ppos) > > +{ > > + struct ctl_table table = *ctx->table; > > + table.data = data_from_ns(ctx, ctx->table); > > + return table.proc_handler(&table, 0, buffer, lenp, ppos); > > +} > > Can we please fix the names and the types of this new 'ctx' structure? > > Yes, yes, I know the old legacy "sysctl table" is horribly named, and > uses "ctl_table". Sure. > But let's just write it out. It's not some random control table for > anything. It's a very odd and specific thing: "sysctl". Let's use the > full name. > > Also, Please just make that "ctl_data" member in that "ctl_context" > struct not just have a real name, but a real type. Make it literally > be > > struct ipc_namespace *ipc_ns; > > and if we end up with other things wanting other pointers, just add a > new one (or make a union if we care about the size of that allocation, > which I don't see any reason we'd do when it's literally just like a > couple of pointers in size). > > There is no reason to have some pseudo-generic "void *ctl_data" that > makes it ambiguous and allows for type confusion and isn't > self-documenting. I'd rather have a properly typed pointer that is > just initialized to NULL and is not always used or needed, but always > has a clear case for *what* it would be used for. > > Yes, yes, we have f_private etc for things that are really very very > generic and have arbitrary users. But 'sysctl' is not that kind of > truly generic use. Yep. I made ctl_data in the same way as f_private. My idea is that if someone needs to store more than one pointer, they can put a struct there. But it turned out that at least now, apart from ipc_namespace, nothing is needed. > I wish we didn't have that silly "create a temporary ctl_table entry" > either, and I wish it was properly named. But it's not worth the > pointless churn to fix old bad interfaces. But the new ones should > have better names, and try to avoid those bad old decisions. Currently temporary ctl_table is the main strategy for handling sysctl entries. Perhaps it will be possible to get rid of this if we add another get_data() that would return what is currently placed in .data in ctl_table. I mean make getting .data dynamic. > But yeah, I think this all is a step in the right direction. And maybe > some of those cases and old 'ctl_table' things can be migrated to just > using individual read() functions entirely. The whole 'ctl_table' > model was broken, and came from the bad old days with an actual > 'sysctl()' system call. I'm not sure how to get rid of ctl_table since net sysctls are heavily dependent on it. I was wondering if it's possible to get rid of ctl_table but if it's not possible to rewrite everything to some kind of new magic API, then keeping two of them would be a nightmare. Another problem is that ctl_table is being used by __cgroup_bpf_run_filter_sysctl. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392 > Because I think it would be lovely if people would move away from the > 'sysctl table' approach entirely for cases where that makes sense, and > these guys that already need special handling are very much in that > situation. Since you think that these patches are a step in the right direction, then I will prepare the first version with your comments in mind.
On Wed, Jun 1, 2022 at 11:25 AM Alexey Gladkov <legion@kernel.org> wrote: > > I'm not sure how to get rid of ctl_table since net sysctls are heavily > dependent on it. I don't actually think it's worth getting rid of entirely, because there's just a lot of simple cases where it "JustWorks(tm)" and having just that table entry describe all the semantics is not wrong at all. The name may suck, but hey, it's not a big deal. Changing it now would be more pain than it's worth. No, I was more thinking that things that already need more infrastructure than that simple static ctl_table entry might be better off trying to migrate to your new "proper read op" model, and having more of that dynamic behavior in the read op. The whole "create dynamic ctl_table entries on the fly" model works, but it's kind of ugly. Anyway, I think all of this is "I think there is more room for cleanup in this area", and maybe we'll never have enough motivation to actually do that. Your patches seem to fix the extant issue with the ipc namespace, and the truly disgusting parts (although maybe there are other truly disgusting things hiding - I didn't go look for them). Linus
On Wed, Jun 01, 2022 at 11:34:18AM -0700, Linus Torvalds wrote: > On Wed, Jun 1, 2022 at 11:25 AM Alexey Gladkov <legion@kernel.org> wrote: > > > > I'm not sure how to get rid of ctl_table since net sysctls are heavily > > dependent on it. > > I don't actually think it's worth getting rid of entirely, because > there's just a lot of simple cases where it "JustWorks(tm)" and having > just that table entry describe all the semantics is not wrong at all. > > The name may suck, but hey, it's not a big deal. Changing it now would > be more pain than it's worth. > > No, I was more thinking that things that already need more > infrastructure than that simple static ctl_table entry might be better > off trying to migrate to your new "proper read op" model, and having > more of that dynamic behavior in the read op. This was part of my plan. I wanted to step by step try migrating other sysctls to use open/read/write where it makes sense. To be honest, it was Eric Biederman who came up with the idea to separate open, read and write. I am very grateful to him. > The whole "create dynamic ctl_table entries on the fly" model works, > but it's kind of ugly. > > Anyway, I think all of this is "I think there is more room for cleanup > in this area", and maybe we'll never have enough motivation to > actually do that. > > Your patches seem to fix the extant issue with the ipc namespace, and > the truly disgusting parts (although maybe there are other truly > disgusting things hiding - I didn't go look for them). I also hope to try and fix the f_cred issue.
On Wed, Jun 01, 2022 at 11:34:18AM -0700, Linus Torvalds wrote: > On Wed, Jun 1, 2022 at 11:25 AM Alexey Gladkov <legion@kernel.org> wrote: > > > > I'm not sure how to get rid of ctl_table since net sysctls are heavily > > dependent on it. > > Anyway, I think all of this is "I think there is more room for cleanup > in this area", and maybe we'll never have enough motivation to > actually do that. That's a good summary of the cleanup with sysctls so I appreciate the nudges in the right directions. Luis
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..51c2c247c447 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -191,22 +191,4 @@ static inline bool setup_mq_sysctls(struct ipc_namespace *ns) } #endif /* CONFIG_POSIX_MQUEUE_SYSCTL */ - -#ifdef CONFIG_SYSVIPC_SYSCTL - -bool setup_ipc_sysctls(struct ipc_namespace *ns); -void retire_ipc_sysctls(struct ipc_namespace *ns); - -#else /* CONFIG_SYSVIPC_SYSCTL */ - -static inline void retire_ipc_sysctls(struct ipc_namespace *ns) -{ -} - -static inline bool setup_ipc_sysctls(struct ipc_namespace *ns) -{ - return true; -} - -#endif /* CONFIG_SYSVIPC_SYSCTL */ #endif diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index ef313ecfb53a..833b670c38f3 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, return ret; } +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table); + +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file) +{ + struct ipc_namespace *ns = current->nsproxy->ipc_ns; + + // For now, we only allow changes in init_user_ns. + if (ns->user_ns != &init_user_ns) + return -EPERM; + +#ifdef CONFIG_CHECKPOINT_RESTORE + int index = (ctx->table - ipc_sysctls); + + switch (index) { + case IPC_SYSCTL_SEM_NEXT_ID: + case IPC_SYSCTL_MSG_NEXT_ID: + case IPC_SYSCTL_SHM_NEXT_ID: + if (!checkpoint_restore_ns_capable(ns->user_ns)) + return -EPERM; + break; + } +#endif + ctx->ctl_data = ns; + return 0; +} + +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file, + char *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table table = *ctx->table; + table.data = data_from_ns(ctx, ctx->table); + return table.proc_handler(&table, 0, buffer, lenp, ppos); +} + +static ssize_t ipc_sys_write(struct ctl_context *ctx, struct file *file, + char *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table table = *ctx->table; + table.data = data_from_ns(ctx, ctx->table); + return table.proc_handler(&table, 1, buffer, lenp, ppos); +} + +static struct ctl_fops ipc_sys_fops = { + .open = ipc_sys_open, + .read = ipc_sys_read, + .write = ipc_sys_write, +}; + int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; int ipc_min_cycle = RADIX_TREE_MAP_SIZE; +enum { + IPC_SYSCTL_SHMMAX, + IPC_SYSCTL_SHMALL, + IPC_SYSCTL_SHMMNI, + IPC_SYSCTL_SHM_RMID_FORCED, + IPC_SYSCTL_MSGMAX, + IPC_SYSCTL_MSGMNI, + IPC_SYSCTL_AUTO_MSGMNI, + IPC_SYSCTL_MSGMNB, + IPC_SYSCTL_SEM, +#ifdef CONFIG_CHECKPOINT_RESTORE + IPC_SYSCTL_SEM_NEXT_ID, + IPC_SYSCTL_MSG_NEXT_ID, + IPC_SYSCTL_SHM_NEXT_ID, +#endif + IPC_SYSCTL_COUNTS +}; + static struct ctl_table ipc_sysctls[] = { - { + [IPC_SYSCTL_SHMMAX] = { .procname = "shmmax", .data = &init_ipc_ns.shm_ctlmax, .maxlen = sizeof(init_ipc_ns.shm_ctlmax), .mode = 0644, - .proc_handler = proc_doulongvec_minmax, + .proc_handler = proc_doulongvec_minmax, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_SHMALL] = { .procname = "shmall", .data = &init_ipc_ns.shm_ctlall, .maxlen = sizeof(init_ipc_ns.shm_ctlall), .mode = 0644, - .proc_handler = proc_doulongvec_minmax, + .proc_handler = proc_doulongvec_minmax, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_SHMMNI] = { .procname = "shmmni", .data = &init_ipc_ns.shm_ctlmni, .maxlen = sizeof(init_ipc_ns.shm_ctlmni), @@ -95,8 +163,9 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = &ipc_mni, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_SHM_RMID_FORCED] = { .procname = "shm_rmid_forced", .data = &init_ipc_ns.shm_rmid_forced, .maxlen = sizeof(init_ipc_ns.shm_rmid_forced), @@ -104,8 +173,9 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_ipc_dointvec_minmax_orphans, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_MSGMAX] = { .procname = "msgmax", .data = &init_ipc_ns.msg_ctlmax, .maxlen = sizeof(init_ipc_ns.msg_ctlmax), @@ -113,8 +183,9 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_MSGMNI] = { .procname = "msgmni", .data = &init_ipc_ns.msg_ctlmni, .maxlen = sizeof(init_ipc_ns.msg_ctlmni), @@ -122,8 +193,9 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = &ipc_mni, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_AUTO_MSGMNI] = { .procname = "auto_msgmni", .data = NULL, .maxlen = sizeof(int), @@ -131,8 +203,9 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_ipc_auto_msgmni, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_MSGMNB] = { .procname = "msgmnb", .data = &init_ipc_ns.msg_ctlmnb, .maxlen = sizeof(init_ipc_ns.msg_ctlmnb), @@ -140,152 +213,85 @@ static struct ctl_table ipc_sysctls[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_SEM] = { .procname = "sem", .data = &init_ipc_ns.sem_ctls, .maxlen = 4*sizeof(int), .mode = 0644, .proc_handler = proc_ipc_sem_dointvec, + .ctl_fops = &ipc_sys_fops, }, #ifdef CONFIG_CHECKPOINT_RESTORE - { + [IPC_SYSCTL_SEM_NEXT_ID] = { .procname = "sem_next_id", .data = &init_ipc_ns.ids[IPC_SEM_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_SEM_IDS].next_id), - .mode = 0444, + .mode = 0666, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_MSG_NEXT_ID] = { .procname = "msg_next_id", .data = &init_ipc_ns.ids[IPC_MSG_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_MSG_IDS].next_id), - .mode = 0444, + .mode = 0666, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, + .ctl_fops = &ipc_sys_fops, }, - { + [IPC_SYSCTL_SHM_NEXT_ID] = { .procname = "shm_next_id", .data = &init_ipc_ns.ids[IPC_SHM_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_SHM_IDS].next_id), - .mode = 0444, + .mode = 0666, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, + .ctl_fops = &ipc_sys_fops, }, #endif - {} + [IPC_SYSCTL_COUNTS] = {} }; -static struct ctl_table_set *set_lookup(struct ctl_table_root *root) +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table) { - return ¤t->nsproxy->ipc_ns->ipc_set; -} - -static int set_is_seen(struct ctl_table_set *set) -{ - return ¤t->nsproxy->ipc_ns->ipc_set == set; -} - -static int ipc_permissions(struct ctl_table_header *head, struct ctl_table *table) -{ - int mode = table->mode; - + struct ipc_namespace *ns = ctx->ctl_data; + + switch (ctx->table - ipc_sysctls) { + case IPC_SYSCTL_SHMMAX: return &ns->shm_ctlmax; + case IPC_SYSCTL_SHMALL: return &ns->shm_ctlall; + case IPC_SYSCTL_SHMMNI: return &ns->shm_ctlmni; + case IPC_SYSCTL_SHM_RMID_FORCED: return &ns->shm_rmid_forced; + case IPC_SYSCTL_MSGMAX: return &ns->msg_ctlmax; + case IPC_SYSCTL_MSGMNI: return &ns->msg_ctlmni; + case IPC_SYSCTL_MSGMNB: return &ns->msg_ctlmnb; + case IPC_SYSCTL_SEM: return &ns->sem_ctls; #ifdef CONFIG_CHECKPOINT_RESTORE - struct ipc_namespace *ns = current->nsproxy->ipc_ns; - - if (((table->data == &ns->ids[IPC_SEM_IDS].next_id) || - (table->data == &ns->ids[IPC_MSG_IDS].next_id) || - (table->data == &ns->ids[IPC_SHM_IDS].next_id)) && - checkpoint_restore_ns_capable(ns->user_ns)) - mode = 0666; + case IPC_SYSCTL_SEM_NEXT_ID: return &ns->ids[IPC_SEM_IDS].next_id; + case IPC_SYSCTL_MSG_NEXT_ID: return &ns->ids[IPC_MSG_IDS].next_id; + case IPC_SYSCTL_SHM_NEXT_ID: return &ns->ids[IPC_SHM_IDS].next_id; #endif - return mode; -} - -static struct ctl_table_root set_root = { - .lookup = set_lookup, - .permissions = ipc_permissions, -}; - -bool setup_ipc_sysctls(struct ipc_namespace *ns) -{ - struct ctl_table *tbl; - - setup_sysctl_set(&ns->ipc_set, &set_root, set_is_seen); - - tbl = kmemdup(ipc_sysctls, sizeof(ipc_sysctls), GFP_KERNEL); - if (tbl) { - int i; - - for (i = 0; i < ARRAY_SIZE(ipc_sysctls); i++) { - if (tbl[i].data == &init_ipc_ns.shm_ctlmax) - tbl[i].data = &ns->shm_ctlmax; - - else if (tbl[i].data == &init_ipc_ns.shm_ctlall) - tbl[i].data = &ns->shm_ctlall; - - else if (tbl[i].data == &init_ipc_ns.shm_ctlmni) - tbl[i].data = &ns->shm_ctlmni; - - else if (tbl[i].data == &init_ipc_ns.shm_rmid_forced) - tbl[i].data = &ns->shm_rmid_forced; - - else if (tbl[i].data == &init_ipc_ns.msg_ctlmax) - tbl[i].data = &ns->msg_ctlmax; - - else if (tbl[i].data == &init_ipc_ns.msg_ctlmni) - tbl[i].data = &ns->msg_ctlmni; - - else if (tbl[i].data == &init_ipc_ns.msg_ctlmnb) - tbl[i].data = &ns->msg_ctlmnb; - - else if (tbl[i].data == &init_ipc_ns.sem_ctls) - tbl[i].data = &ns->sem_ctls; -#ifdef CONFIG_CHECKPOINT_RESTORE - else if (tbl[i].data == &init_ipc_ns.ids[IPC_SEM_IDS].next_id) - tbl[i].data = &ns->ids[IPC_SEM_IDS].next_id; - - else if (tbl[i].data == &init_ipc_ns.ids[IPC_MSG_IDS].next_id) - tbl[i].data = &ns->ids[IPC_MSG_IDS].next_id; - - else if (tbl[i].data == &init_ipc_ns.ids[IPC_SHM_IDS].next_id) - tbl[i].data = &ns->ids[IPC_SHM_IDS].next_id; -#endif - else - tbl[i].data = NULL; - } - - ns->ipc_sysctls = __register_sysctl_table(&ns->ipc_set, "kernel", tbl); - } - if (!ns->ipc_sysctls) { - kfree(tbl); - retire_sysctl_set(&ns->ipc_set); - return false; } - - return true; + return NULL; } -void retire_ipc_sysctls(struct ipc_namespace *ns) -{ - struct ctl_table *tbl; - - tbl = ns->ipc_sysctls->ctl_table_arg; - unregister_sysctl_table(ns->ipc_sysctls); - retire_sysctl_set(&ns->ipc_set); - kfree(tbl); -} +static struct ctl_table ipc_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = ipc_sysctls, + }, + {} +}; static int __init ipc_sysctl_init(void) { - if (!setup_ipc_sysctls(&init_ipc_ns)) { - pr_warn("ipc sysctl registration failed\n"); - return -ENOMEM; - } + register_sysctl_table(ipc_root_table); return 0; } diff --git a/ipc/namespace.c b/ipc/namespace.c index 754f3237194a..f760243ca685 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -63,9 +63,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (!setup_mq_sysctls(ns)) goto fail_put; - if (!setup_ipc_sysctls(ns)) - goto fail_put; - sem_init_ns(ns); msg_init_ns(ns); shm_init_ns(ns); @@ -133,7 +130,6 @@ static void free_ipc_ns(struct ipc_namespace *ns) shm_exit_ns(ns); retire_mq_sysctls(ns); - retire_ipc_sysctls(ns); dec_ipc_namespaces(ns->ucounts); put_user_ns(ns->user_ns);
Dynamic memory allocation is needed to modify .data and specify the per namespace parameter. The new sysctl API is allowed to get rid of the need for such modification. Signed-off-by: Alexey Gladkov <legion@kernel.org> --- include/linux/ipc_namespace.h | 18 --- ipc/ipc_sysctl.c | 236 +++++++++++++++++----------------- ipc/namespace.c | 4 - 3 files changed, 121 insertions(+), 137 deletions(-)