Message ID | 20200828031146.43035-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers | expand |
On 8/27/20 8:11 PM, Muchun Song wrote: > There is a race between the assignment of `table->data` and write value > to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on > the other thread. > > CPU0: CPU1: > proc_sys_write > hugetlb_sysctl_handler proc_sys_call_handler > hugetlb_sysctl_handler_common hugetlb_sysctl_handler > table->data = &tmp; hugetlb_sysctl_handler_common > table->data = &tmp; > proc_doulongvec_minmax > do_proc_doulongvec_minmax sysctl_head_finish > __do_proc_doulongvec_minmax unuse_table > i = table->data; > *i = val; // corrupt CPU1's stack > > Fix this by duplicating the `table`, and only update the duplicate of > it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to > simplify the code. > > The following oops was seen: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor instruction fetch in kernel mode > #PF: error_code(0x0010) - not-present page > Code: Bad RIP value. > ... > Call Trace: > ? set_max_huge_pages+0x3da/0x4f0 > ? alloc_pool_huge_page+0x150/0x150 > ? proc_doulongvec_minmax+0x46/0x60 > ? hugetlb_sysctl_handler_common+0x1c7/0x200 > ? nr_hugepages_store+0x20/0x20 > ? copy_fd_bitmaps+0x170/0x170 > ? hugetlb_sysctl_handler+0x1e/0x20 > ? proc_sys_call_handler+0x2f1/0x300 > ? unregister_sysctl_table+0xb0/0xb0 > ? __fd_install+0x78/0x100 > ? proc_sys_write+0x14/0x20 > ? __vfs_write+0x4d/0x90 > ? vfs_write+0xef/0x240 > ? ksys_write+0xc0/0x160 > ? __ia32_sys_read+0x50/0x50 > ? __close_fd+0x129/0x150 > ? __x64_sys_write+0x43/0x50 > ? do_syscall_64+0x6c/0x200 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> Thank you! Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
I believe the Fixes line is still not correct. The original patch
didn't have that problem. Please identify which patch added
the problematic code.
-Andi
On 8/28/20 7:59 AM, Andi Kleen wrote: >> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") > > I believe the Fixes line is still not correct. The original patch > didn't have that problem. Please identify which patch added > the problematic code. Hi Andi, I agree with Muchun's assessment that the issue was caused by e5ff215941d5. Why? Commit e5ff215941d5 changed initialization of the .data field in the ctl_table structure for hugetlb_sysctl_handler. This was required because the commit introduced multiple hstates. As a result, it can not be known until runtime the value for .data. This code was added to hugetlb_sysctl_handler to set the value at runtime. @@ -1037,8 +1109,19 @@ int hugetlb_sysctl_handler(struct ctl_table *table, int write, struct file *file, void __user *buffer, size_t *length, loff_t *ppos) { + struct hstate *h = &default_hstate; + unsigned long tmp; + + if (!write) + tmp = h->max_huge_pages; + + table->data = &tmp; + table->maxlen = sizeof(unsigned long); The problem is that two threads running in parallel can modify table->data in the global data structure without any synchronization. Worse yet, is that that value is local to their stacks. That was the root cause of the issue addressed by Muchun's patch. Does that analysis make sense? Or, are we missing something.
On Fri, Aug 28, 2020 at 10:14:16AM -0700, Mike Kravetz wrote: > On 8/28/20 7:59 AM, Andi Kleen wrote: > >> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") > > > > I believe the Fixes line is still not correct. The original patch > > didn't have that problem. Please identify which patch added > > the problematic code. > > Hi Andi, > > I agree with Muchun's assessment that the issue was caused by e5ff215941d5. > Why? Yes when checking the code again I agree. It was introduced with that patch. Patches look ok to me now. -Andi
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a301c2d672bf..4c2a2620eeed 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3454,6 +3454,22 @@ static unsigned int allowed_mems_nr(struct hstate *h) } #ifdef CONFIG_SYSCTL +static int proc_hugetlb_doulongvec_minmax(struct ctl_table *table, int write, + void *buffer, size_t *length, + loff_t *ppos, unsigned long *out) +{ + struct ctl_table dup_table; + + /* + * In order to avoid races with __do_proc_doulongvec_minmax(), we + * can duplicate the @table and alter the duplicate of it. + */ + dup_table = *table; + dup_table.data = out; + + return proc_doulongvec_minmax(&dup_table, write, buffer, length, ppos); +} + static int hugetlb_sysctl_handler_common(bool obey_mempolicy, struct ctl_table *table, int write, void *buffer, size_t *length, loff_t *ppos) @@ -3465,9 +3481,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy, if (!hugepages_supported()) return -EOPNOTSUPP; - table->data = &tmp; - table->maxlen = sizeof(unsigned long); - ret = proc_doulongvec_minmax(table, write, buffer, length, ppos); + ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos, + &tmp); if (ret) goto out; @@ -3510,9 +3525,8 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write, if (write && hstate_is_gigantic(h)) return -EINVAL; - table->data = &tmp; - table->maxlen = sizeof(unsigned long); - ret = proc_doulongvec_minmax(table, write, buffer, length, ppos); + ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos, + &tmp); if (ret) goto out;
There is a race between the assignment of `table->data` and write value to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on the other thread. CPU0: CPU1: proc_sys_write hugetlb_sysctl_handler proc_sys_call_handler hugetlb_sysctl_handler_common hugetlb_sysctl_handler table->data = &tmp; hugetlb_sysctl_handler_common table->data = &tmp; proc_doulongvec_minmax do_proc_doulongvec_minmax sysctl_head_finish __do_proc_doulongvec_minmax unuse_table i = table->data; *i = val; // corrupt CPU1's stack Fix this by duplicating the `table`, and only update the duplicate of it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to simplify the code. The following oops was seen: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page Code: Bad RIP value. ... Call Trace: ? set_max_huge_pages+0x3da/0x4f0 ? alloc_pool_huge_page+0x150/0x150 ? proc_doulongvec_minmax+0x46/0x60 ? hugetlb_sysctl_handler_common+0x1c7/0x200 ? nr_hugepages_store+0x20/0x20 ? copy_fd_bitmaps+0x170/0x170 ? hugetlb_sysctl_handler+0x1e/0x20 ? proc_sys_call_handler+0x2f1/0x300 ? unregister_sysctl_table+0xb0/0xb0 ? __fd_install+0x78/0x100 ? proc_sys_write+0x14/0x20 ? __vfs_write+0x4d/0x90 ? vfs_write+0xef/0x240 ? ksys_write+0xc0/0x160 ? __ia32_sys_read+0x50/0x50 ? __close_fd+0x129/0x150 ? __x64_sys_write+0x43/0x50 ? do_syscall_64+0x6c/0x200 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- chagelogs in v2: 1. Add more details about how the race happened to the commit message. 2. Remove unnecessary assignment of table->maxlen. mm/hugetlb.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)