Message ID | 20210701015441.snfkDnNcO%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c | expand |
On Wed, Jun 30, 2021 at 6:54 PM Andrew Morton <akpm@linux-foundation.org> wrote: > When we do the kernel monitor, use the DRGN > (https://github.com/osandov/drgn) access to kernel data structures, found > that the system calls a lot. DRGN is implemented by reading /proc/kcore. > After looking at the kcore code, it is found that kcore does not implement > mmap, resulting in frequent context switching triggered by read. > Therefore, we want to add mmap interface to optimize performance. Ok, this is funky, but I'm going to drop this patch because I think it's buggy as is. Since > +static int mmap_kcore(struct file *file, struct vm_area_struct *vma) > +{ > + size_t size = vma->vm_end - vma->vm_start; Ok. But then: > + start = kc_offset_to_vaddr(((u64)vma->vm_pgoff << PAGE_SHIFT) - > + ((data_offset >> PAGE_SHIFT) << PAGE_SHIFT)); Not only is that ((data_offset >> PAGE_SHIFT) << PAGE_SHIFT) a very strange calculation (did you mean "data_offset & PAGE_MASK"?), but I don't see anything that protects against underflow in that calculation. pg_off can easily be arbitrarily small (eg zero), so that subtraction can underflow afaik. So that needs a test, and return -EINVAL or whatever. But even if that is fixed, this test is entirely broken: > + list_for_each_entry(m, &kclist_head, list) { > + if (start >= m->addr && size <= m->size) > + break; > + } No, that's wrong. You allow 'size' to be as big as 'm->size', but you do that even if 'start' isn't 'm->start'. The proper check would be something like u64 end = start + size; if (start >= m->addr && end <= m->addr+m->size) .. or similar (and that should check that "start+size" hasn't overflowed). So I see what appears to be multiple problems, and while I hand-waved some fixes for them, those are very much "maybe something like this", and I'm going to drop this patch. Not for 5.14. Linus
在 2021/7/1 上午11:32, Linus Torvalds 写道: > On Wed, Jun 30, 2021 at 6:54 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> When we do the kernel monitor, use the DRGN >> (https://github.com/osandov/drgn) access to kernel data structures, found >> that the system calls a lot. DRGN is implemented by reading /proc/kcore. >> After looking at the kcore code, it is found that kcore does not implement >> mmap, resulting in frequent context switching triggered by read. >> Therefore, we want to add mmap interface to optimize performance. > > Ok, this is funky, but I'm going to drop this patch because I think > it's buggy as is. > > Since > >> +static int mmap_kcore(struct file *file, struct vm_area_struct *vma) >> +{ >> + size_t size = vma->vm_end - vma->vm_start; > > Ok. > > But then: > >> + start = kc_offset_to_vaddr(((u64)vma->vm_pgoff << PAGE_SHIFT) - >> + ((data_offset >> PAGE_SHIFT) << PAGE_SHIFT)); > > Not only is that > > ((data_offset >> PAGE_SHIFT) << PAGE_SHIFT) > > a very strange calculation (did you mean "data_offset & PAGE_MASK"?), > but I don't see anything that protects against underflow in that > calculation. pg_off can easily be arbitrarily small (eg zero), so that > subtraction can underflow afaik. Sorry, the calculations here are really confusing. The reason is that when DRGN read /proc/kcore for ELF file header: phdr->p_offset = kc_vaddr_to_offset(m->addr) + data_offset; and DRGN call mmap, use phdr->p_offset passed in, I need to subtract "data_offset". > > So that needs a test, and return -EINVAL or whatever. > There's a problem with not judging "start". I will fix it in a v3. > But even if that is fixed, this test is entirely broken: > >> + list_for_each_entry(m, &kclist_head, list) { >> + if (start >= m->addr && size <= m->size) >> + break; >> + } > > No, that's wrong. > Yes, this is indeed wrong, I will fix it in a v3. > You allow 'size' to be as big as 'm->size', but you do that even if > 'start' isn't 'm->start'. > > The proper check would be something like > > u64 end = start + size; > > if (start >= m->addr && end <= m->addr+m->size) .. > > or similar (and that should check that "start+size" hasn't overflowed). > > So I see what appears to be multiple problems, and while I hand-waved > some fixes for them, those are very much "maybe something like this", > and I'm going to drop this patch. Not for 5.14. > > Linus >
--- a/fs/proc/kcore.c~fs-proc-kcorec-add-mmap-interface +++ a/fs/proc/kcore.c @@ -614,11 +614,78 @@ static int release_kcore(struct inode *i return 0; } +static vm_fault_t mmap_kcore_fault(struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + +static const struct vm_operations_struct kcore_mmap_ops = { + .fault = mmap_kcore_fault, +}; + +static int mmap_kcore(struct file *file, struct vm_area_struct *vma) +{ + size_t size = vma->vm_end - vma->vm_start; + u64 start, pfn; + int nphdr; + size_t data_offset; + size_t phdrs_len, notes_len; + struct kcore_list *m = NULL; + int ret = 0; + + down_read(&kclist_lock); + + get_kcore_size(&nphdr, &phdrs_len, ¬es_len, &data_offset); + + start = kc_offset_to_vaddr(((u64)vma->vm_pgoff << PAGE_SHIFT) - + ((data_offset >> PAGE_SHIFT) << PAGE_SHIFT)); + + list_for_each_entry(m, &kclist_head, list) { + if (start >= m->addr && size <= m->size) + break; + } + + if (&m->list == &kclist_head) { + ret = -EINVAL; + goto out; + } + + if (vma->vm_flags & (VM_WRITE | VM_EXEC)) { + ret = -EPERM; + goto out; + } + + vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC); + vma->vm_flags |= VM_MIXEDMAP; + vma->vm_ops = &kcore_mmap_ops; + + if (kern_addr_valid(start)) { + if (m->type == KCORE_RAM) + pfn = __pa(start) >> PAGE_SHIFT; + else if (m->type == KCORE_TEXT) + pfn = __pa_symbol(start) >> PAGE_SHIFT; + else { + ret = -EFAULT; + goto out; + } + + ret = remap_pfn_range(vma, vma->vm_start, pfn, size, + vma->vm_page_prot); + } else { + ret = -EFAULT; + } + +out: + up_read(&kclist_lock); + return ret; +} + static const struct proc_ops kcore_proc_ops = { .proc_read = read_kcore, .proc_open = open_kcore, .proc_release = release_kcore, .proc_lseek = default_llseek, + .proc_mmap = mmap_kcore, }; /* just remember that we have to update kcore */