Message ID | 20210405203711.1095940-1-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Group pages on the direct map for permissioned vmallocs | expand |
On 4/5/21 1:37 PM, Rick Edgecombe wrote: > +static void __dispose_pages(struct list_head *head) > +{ > + struct list_head *cur, *next; > + > + list_for_each_safe(cur, next, head) { > + list_del(cur); > + > + /* The list head is stored at the start of the page */ > + free_page((unsigned long)cur); > + } > +} This is interesting. While the page is in the allocator, you're using the page contents themselves to store the list_head. It took me a minute to figure out what you were doing here because: "start of the page" is a bit ambiguous. It could mean: * the first 16 bytes in 'struct page' or * the first 16 bytes in the page itself, aka *page_address(page) The fact that this doesn't work on higmem systems makes this an OK thing to do, but it is a bit weird. It's also doubly susceptible to bugs where there's a page_to_virt() or virt_to_page() screwup. I was *hoping* there was still sufficient space in 'struct page' for this second list_head in addition to page->lru. I think there *should* be. That would at least make this allocator a bit more "normal" in not caring about page contents while the page is free in the allocator. If you were able to do that you could do things like kmemcheck or page alloc debugging while the page is in the allocator. Anyway, I think I'd prefer that you *try* to use 'struct page' alone. But, if that doesn't work out, please comment the snot out of this thing because it _is_ weird.
On Mon, Apr 05, 2021 at 02:01:58PM -0700, Dave Hansen wrote: > On 4/5/21 1:37 PM, Rick Edgecombe wrote: > > +static void __dispose_pages(struct list_head *head) > > +{ > > + struct list_head *cur, *next; > > + > > + list_for_each_safe(cur, next, head) { > > + list_del(cur); > > + > > + /* The list head is stored at the start of the page */ > > + free_page((unsigned long)cur); > > + } > > +} > > This is interesting. > > While the page is in the allocator, you're using the page contents > themselves to store the list_head. It took me a minute to figure out > what you were doing here because: "start of the page" is a bit > ambiguous. It could mean: > > * the first 16 bytes in 'struct page' > or > * the first 16 bytes in the page itself, aka *page_address(page) > > The fact that this doesn't work on higmem systems makes this an OK thing > to do, but it is a bit weird. It's also doubly susceptible to bugs > where there's a page_to_virt() or virt_to_page() screwup. > > I was *hoping* there was still sufficient space in 'struct page' for > this second list_head in addition to page->lru. I think there *should* > be. That would at least make this allocator a bit more "normal" in not > caring about page contents while the page is free in the allocator. If > you were able to do that you could do things like kmemcheck or page > alloc debugging while the page is in the allocator. > > Anyway, I think I'd prefer that you *try* to use 'struct page' alone. > But, if that doesn't work out, please comment the snot out of this thing > because it _is_ weird. Hi! Current closest-thing-we-have-to-an-expert-on-struct-page here! I haven't read over these patches yet. If these pages are in use by vmalloc, they can't use mapping+index because get_user_pages() will call page_mapping() and the list_head will confuse it. I think it could use index+private for a list_head. If the pages are in the buddy, I _think_ mapping+index are free. private is in use for buddy order. But I haven't read through the buddy code in a while. Does it need to be a doubly linked list? Can it be an hlist?
On Mon, 2021-04-05 at 14:01 -0700, Dave Hansen wrote: > On 4/5/21 1:37 PM, Rick Edgecombe wrote: > > +static void __dispose_pages(struct list_head *head) > > +{ > > + struct list_head *cur, *next; > > + > > + list_for_each_safe(cur, next, head) { > > + list_del(cur); > > + > > + /* The list head is stored at the start of the page > > */ > > + free_page((unsigned long)cur); > > + } > > +} > > This is interesting. > > While the page is in the allocator, you're using the page contents > themselves to store the list_head. It took me a minute to figure out > what you were doing here because: "start of the page" is a bit > ambiguous. It could mean: > > * the first 16 bytes in 'struct page' > or > * the first 16 bytes in the page itself, aka *page_address(page) > > The fact that this doesn't work on higmem systems makes this an OK > thing > to do, but it is a bit weird. It's also doubly susceptible to bugs > where there's a page_to_virt() or virt_to_page() screwup. > > I was *hoping* there was still sufficient space in 'struct page' for > this second list_head in addition to page->lru. I think there > *should* > be. That would at least make this allocator a bit more "normal" in > not > caring about page contents while the page is free in the allocator. > If > you were able to do that you could do things like kmemcheck or page > alloc debugging while the page is in the allocator. > > Anyway, I think I'd prefer that you *try* to use 'struct page' alone. > But, if that doesn't work out, please comment the snot out of this > thing > because it _is_ weird. Yes sorry, that deserved more explanation. I tried putting it in struct page actually. The problem was list_lru automatically determines the node id from the list_head provided to it via page_to_nid(virt_to_page(head)). I guess it assumes the list_head is on the actual item. I started adding another list_lru function that let the node id be passed in separately, but I remembered this trick from the deferred free list in vmalloc. If this ever expands to handle direct map unmapped pages (which would probably be the next step), the list_head will have to be moved out of the actual page anyway. But in the meantime it resulted in the smallest change. I can try the other way if it's still too weird.
On Mon, 2021-04-05 at 22:32 +0100, Matthew Wilcox wrote: > On Mon, Apr 05, 2021 at 02:01:58PM -0700, Dave Hansen wrote: > > On 4/5/21 1:37 PM, Rick Edgecombe wrote: > > > +static void __dispose_pages(struct list_head *head) > > > +{ > > > + struct list_head *cur, *next; > > > + > > > + list_for_each_safe(cur, next, head) { > > > + list_del(cur); > > > + > > > + /* The list head is stored at the start of the > > > page */ > > > + free_page((unsigned long)cur); > > > + } > > > +} > > > > This is interesting. > > > > While the page is in the allocator, you're using the page contents > > themselves to store the list_head. It took me a minute to figure > > out > > what you were doing here because: "start of the page" is a bit > > ambiguous. It could mean: > > > > * the first 16 bytes in 'struct page' > > or > > * the first 16 bytes in the page itself, aka *page_address(page) > > > > The fact that this doesn't work on higmem systems makes this an OK > > thing > > to do, but it is a bit weird. It's also doubly susceptible to bugs > > where there's a page_to_virt() or virt_to_page() screwup. > > > > I was *hoping* there was still sufficient space in 'struct page' > > for > > this second list_head in addition to page->lru. I think there > > *should* > > be. That would at least make this allocator a bit more "normal" in > > not > > caring about page contents while the page is free in the > > allocator. If > > you were able to do that you could do things like kmemcheck or page > > alloc debugging while the page is in the allocator. > > > > Anyway, I think I'd prefer that you *try* to use 'struct page' > > alone. > > But, if that doesn't work out, please comment the snot out of this > > thing > > because it _is_ weird. > > Hi! Current closest-thing-we-have-to-an-expert-on-struct-page here! > > I haven't read over these patches yet. If these pages are in use by > vmalloc, they can't use mapping+index because get_user_pages() will > call > page_mapping() and the list_head will confuse it. I think it could > use > index+private for a list_head. > > If the pages are in the buddy, I _think_ mapping+index are free. > private > is in use for buddy order. But I haven't read through the buddy code > in a while. > > Does it need to be a doubly linked list? Can it be an hlist? It does need to be a doubly linked list. I think they should never be mapped to userspace. As far as the page allocator is concerned these pages are not free. And they are not compound. Originally I was just using the lru member. Would it be ok in that case?