Message ID | 20210412080611.769864829@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Unexport apply_to_page_range() | expand |
On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > +extern int verify_page_range(struct mm_struct *mm, No need for the extern here. > +int verify_page_range(struct mm_struct *mm, > + unsigned long addr, unsigned long size, > + int (*fn)(pte_t pte, unsigned long addr, void *data), > + void *data) A kerneldoc comment would be nice for this function. Otherwise this looks fine.
On Mon, Apr 12, 2021 at 10:28:05AM +0200, Christoph Hellwig wrote: > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > +extern int verify_page_range(struct mm_struct *mm, > > No need for the extern here. It's consistent with the rest of the functions there. Also, I personally like that extern. > > +int verify_page_range(struct mm_struct *mm, > > + unsigned long addr, unsigned long size, > > + int (*fn)(pte_t pte, unsigned long addr, void *data), > > + void *data) > > A kerneldoc comment would be nice for this function. > > Otherwise this looks fine. Something like so? /** * verify_page_range() - Scan (and fill) a range of virtual memory and validate PTEs * @mm: mm identifying the virtual memory map * @addr: starting virtual address of the range * @size: size of the range * @fn: function that verifies the PTEs * @data: opaque data passed to @fn * * Scan a region of virtual memory, filling in page tables as necessary and * calling a provided function on each leaf, providing a copy of the * page-table-entry. * * Similar apply_to_page_range(), but does not provide direct access to the * page-tables. * * NOTE! this function does not work correctly vs large pages. * * Return: the first !0 return of the provided function, or 0 on completion. */ int verify_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, int (*fn)(pte_t pte, unsigned long addr, void *data), void *data)
On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > +struct vpr_data { > + int (*fn)(pte_t pte, unsigned long addr, void *data); > + void *data; > +}; Eeerg. This is likely to become an attack target itself. Stored function pointer with stored (3rd) argument. This doesn't seem needed: only DRM uses it, and that's for error reporting. I'd rather plumb back errors in a way to not have to add another place in the kernel where we do func+arg stored calling.
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote: > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > +struct vpr_data { > > + int (*fn)(pte_t pte, unsigned long addr, void *data); > > + void *data; > > +}; > > Eeerg. This is likely to become an attack target itself. Stored function > pointer with stored (3rd) argument. You got some further reading on that? How exactly are those exploited?
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote: > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > +struct vpr_data { > > + int (*fn)(pte_t pte, unsigned long addr, void *data); > > + void *data; > > +}; > > Eeerg. This is likely to become an attack target itself. Stored function > pointer with stored (3rd) argument. > > This doesn't seem needed: only DRM uses it, and that's for error > reporting. I'd rather plumb back errors in a way to not have to add > another place in the kernel where we do func+arg stored calling. Is this any better? It does have the stored pointer, but not a stored argument, assuming you don't count returns as arguments I suppose. The alternative is refactoring apply_to_page_range() :-/ --- struct vpr_data { bool (*fn)(pte_t pte, unsigned long addr); unsigned long addr; }; static int vpr_fn(pte_t *pte, unsigned long addr, void *data) { struct vpr_data *vpr = data; if (!vpr->fn(*pte, addr)) { vpr->addr = addr; return -EINVAL; } return 0; } /** * verify_page_range() - Scan (and fill) a range of virtual memory and validate PTEs * @mm: mm identifying the virtual memory map * @addr: starting virtual address of the range * @size: size of the range * @fn: function that verifies the PTEs * * Scan a region of virtual memory, filling in page tables as necessary and * calling a provided function on each leaf, providing a copy of the * page-table-entry. * * Similar apply_to_page_range(), but does not provide direct access to the * page-tables. * * NOTE! this function does not work correctly vs large pages. * * Return: the address that failed verification or 0 on success. */ unsigned long verify_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, bool (*fn)(pte_t pte, unsigned long addr)) { struct vpr_data vpr = { .fn = fn, .addr = 0, }; apply_to_page_range(mm, addr, size, vpr_fn, &vpr); return vpr.addr; } EXPORT_SYMBOL_GPL(verify_page_range);
On Tue, Apr 13, 2021 at 09:36:32AM +0200, Peter Zijlstra wrote: > On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote: > > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > > +struct vpr_data { > > > + int (*fn)(pte_t pte, unsigned long addr, void *data); > > > + void *data; > > > +}; > > > > Eeerg. This is likely to become an attack target itself. Stored function > > pointer with stored (3rd) argument. > > > > This doesn't seem needed: only DRM uses it, and that's for error > > reporting. I'd rather plumb back errors in a way to not have to add > > another place in the kernel where we do func+arg stored calling. > > Is this any better? It does have the stored pointer, but not a stored > argument, assuming you don't count returns as arguments I suppose. It's better in the sense that it's not the func/arg pair that really bugs me, yes. :) > The alternative is refactoring apply_to_page_range() :-/ Yeah, I'm looking now, I see what you mean. > --- > > struct vpr_data { > bool (*fn)(pte_t pte, unsigned long addr); > unsigned long addr; > }; > > static int vpr_fn(pte_t *pte, unsigned long addr, void *data) > { > struct vpr_data *vpr = data; > if (!vpr->fn(*pte, addr)) { > vpr->addr = addr; > return -EINVAL; > } > return 0; > } My point about passing "addr" was that nothing in the callback actually needs it -- the top level can just as easily report the error. And that the helper is always vpr_fn(), so it doesn't need to be passed either. So the addr can just be encoded in "int", and no structure is needed at: typedef bool (*vpr_fn_t)(pte_t pte); static int vpr_fn(pte_t *pte, unsigned long addr, void *data) { vpr_fn_t callback = data; if (!callback(*pte)) return addr >> PAGE_SIZE; return 0; } unsigned long verify_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, vpr_fn_t callback) { return apply_to_page_range(mm, addr, size, vpr_fn, callback) << PAGE_SIZE; } But maybe I'm missing something?
On Tue, Apr 13, 2021 at 08:01:08PM -0700, Kees Cook wrote: > So the addr can just be encoded in "int", and no structure is needed at: > > typedef bool (*vpr_fn_t)(pte_t pte); > > static int vpr_fn(pte_t *pte, unsigned long addr, void *data) > { > vpr_fn_t callback = data; > > if (!callback(*pte)) > return addr >> PAGE_SIZE; > return 0; > } > > unsigned long verify_page_range(struct mm_struct *mm, > unsigned long addr, unsigned long size, > vpr_fn_t callback) > { > return apply_to_page_range(mm, addr, size, vpr_fn, callback) << PAGE_SIZE; > } > > But maybe I'm missing something? That covers only (32+12) bits of address space and will mostly work, but we definitely support architectures (very much including x86_64) with larger address spaces than that.
On Tue, Apr 13, 2021 at 08:54:06AM +0200, Peter Zijlstra wrote: > On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote: > > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote: > > > +struct vpr_data { > > > + int (*fn)(pte_t pte, unsigned long addr, void *data); > > > + void *data; > > > +}; > > > > Eeerg. This is likely to become an attack target itself. Stored function > > pointer with stored (3rd) argument. > > You got some further reading on that? How exactly are those exploited? Sure, see "Executing code" in https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html I killed the entire primitive (for timer_list) https://outflux.net/blog/archives/2018/02/05/security-things-in-linux-v4-15/#v4.15-timer_list but that was a lot of work, so I'm trying to avoid seeing more things like it appear. :) (And I'm trying to get rid of similar APIs, like tasklet.) This new code is unlikely to ever be used as widely as timer_list, but I just cringe when I see the code pattern. I'll understand if there isn't a solution that doesn't require major refactoring, but I can dream. :)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2876,6 +2876,10 @@ extern int apply_to_page_range(struct mm extern int apply_to_existing_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); +extern int verify_page_range(struct mm_struct *mm, + unsigned long addr, unsigned long size, + int (*fn)(pte_t pte, unsigned long addr, void *data), + void *data); extern void init_mem_debugging_and_hardening(void); #ifdef CONFIG_PAGE_POISONING --- a/mm/memory.c +++ b/mm/memory.c @@ -2559,6 +2559,30 @@ int apply_to_existing_page_range(struct return __apply_to_page_range(mm, addr, size, fn, data, false); } +struct vpr_data { + int (*fn)(pte_t pte, unsigned long addr, void *data); + void *data; +}; + +static int vpr_fn(pte_t *pte, unsigned long addr, void *data) +{ + struct vpr_data *vpr = data; + return vpr->fn(*pte, addr, vpr->data); +} + +int verify_page_range(struct mm_struct *mm, + unsigned long addr, unsigned long size, + int (*fn)(pte_t pte, unsigned long addr, void *data), + void *data) +{ + struct vpr_data vpr = { + .fn = fn, + .data = data, + }; + return apply_to_page_range(mm, addr, size, vpr_fn, &vpr); +} +EXPORT_SYMBOL_GPL(verify_page_range); + /* * handle_pte_fault chooses page fault handler according to an entry which was * read non-atomically. Before making any commitment, on those architectures
Introduce and EXPORT a read-only counterpart to apply_to_page_range(). It only exposes the PTE value, not a pointer to the pagetables itself and is thus quite a bit safer to export. A number of apply_to_page_range() users can be converted to this primitive. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/mm.h | 4 ++++ mm/memory.c | 24 ++++++++++++++++++++++++ 2 files changed, 28 insertions(+)