Message ID | 0c53589f34a6195938eeb58c3a88594fa30cc90a.1736352361.git.lorenzo.stoakes@oracle.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | expose mapping wrprotect, fix fb_defio use | expand |
On Wed, Jan 08, 2025 at 04:18:40PM +0000, Lorenzo Stoakes wrote: > +/* > + * rmap_walk_file - do something to file page using the object-based rmap method > + * @folio: the folio to be handled > + * @rwc: control variable according to each walk type > + * @locked: caller holds relevant rmap lock > + * > + * Find all the mappings of a folio using the mapping pointer and the vma chains > + * contained in the address_space struct it points to. > + */ > +static void rmap_walk_file(struct folio *folio, > + struct rmap_walk_control *rwc, bool locked) > +{ > + struct address_space *mapping = folio_mapping(folio); I'm unconvinced this shouldn't be just folio->mapping. On the face of it, we're saying that we're walking a file, and file folios just want to use folio->mapping. But let's dig a little deeper. The folio passed in is locked, so it can't be changed during this call. In folio_mapping(), folio_test_slab() is guaranteed untrue. folio_test_swapcache() doesn't seem likely to be true either; unless it's shmem, it can't be in the swapcache, and if it's shmem and in the swap cache, it can't be mapped to userspace (they're swizzled back from the swapcache to the pagecache before being mapped). And then the check for PAGE_MAPPING_FLAGS is guaranteed to be untrue (we know it's not anon/ksm/movable). So I think this should just be folio->mapping. > + /* > + * The page lock not only makes sure that page->mapping cannot > + * suddenly be NULLified by truncation, it makes sure that the > + * structure at mapping cannot be freed and reused yet, > + * so we can safely take mapping->i_mmap_rwsem. > + */ I know you only moved this comment, but please fix it to refer to folios, not pages. > + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > + > + if (!mapping) > + return; Maybe make this a WARN_ON_ONCE? > + __rmap_walk_file(folio, mapping, folio_pgoff(folio), > + folio_nr_pages(folio), rwc, locked); folio_pgoff() can go too. Just use folio->index.
On Wed, Jan 08, 2025 at 04:38:57PM +0000, Matthew Wilcox wrote: > On Wed, Jan 08, 2025 at 04:18:40PM +0000, Lorenzo Stoakes wrote: > > +/* > > + * rmap_walk_file - do something to file page using the object-based rmap method > > + * @folio: the folio to be handled > > + * @rwc: control variable according to each walk type > > + * @locked: caller holds relevant rmap lock > > + * > > + * Find all the mappings of a folio using the mapping pointer and the vma chains > > + * contained in the address_space struct it points to. > > + */ > > +static void rmap_walk_file(struct folio *folio, > > + struct rmap_walk_control *rwc, bool locked) > > +{ > > + struct address_space *mapping = folio_mapping(folio); > > I'm unconvinced this shouldn't be just folio->mapping. On the face of > it, we're saying that we're walking a file, and file folios just want > to use folio->mapping. But let's dig a little deeper. > > The folio passed in is locked, so it can't be changed during this call. > In folio_mapping(), folio_test_slab() is guaranteed untrue. > folio_test_swapcache() doesn't seem likely to be true either; unless > it's shmem, it can't be in the swapcache, and if it's shmem and in the > swap cache, it can't be mapped to userspace (they're swizzled back from > the swapcache to the pagecache before being mapped). And then the > check for PAGE_MAPPING_FLAGS is guaranteed to be untrue (we know it's > not anon/ksm/movable). So I think this should just be folio->mapping. Ack, and we assert that it is indeed locked first. We will have checked that this is not anon, and with the lock we shouldn't see it disappear under us to be slab, we have also explicitly checked for ksm so that's out. Wasn't aware of that swizzling actually... good to know! But I guess that makes sense since you'd hit a swap entry in the fault code and trigger all that fun stuff (hm let me go read the swap chapter in my book again :P) TL;DR - will change. But will add a comment saying we can do it safely. > > > + /* > > + * The page lock not only makes sure that page->mapping cannot > > + * suddenly be NULLified by truncation, it makes sure that the > > + * structure at mapping cannot be freed and reused yet, > > + * so we can safely take mapping->i_mmap_rwsem. > > + */ > > I know you only moved this comment, but please fix it to refer to > folios, not pages. Ack will do. > > > + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > > + > > + if (!mapping) > > + return; > > Maybe make this a WARN_ON_ONCE? I'm not sure if this isn't actually a vaguely possible scenario? Though hm. I'm not 100% certain it's not expected to happen _sometimes_. Perhaps one to do as a follow up in case it turns out this is sometimes expected due to timing issues with a truncate? But I may be wrong and this should demonstrably not happen other than in case of programmatic error? > > > + __rmap_walk_file(folio, mapping, folio_pgoff(folio), > > + folio_nr_pages(folio), rwc, locked); > > folio_pgoff() can go too. Just use folio->index. > Ack. Will change.
diff --git a/mm/rmap.c b/mm/rmap.c index 227c60e38261..effafdb44365 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2710,35 +2710,37 @@ static void rmap_walk_anon(struct folio *folio, anon_vma_unlock_read(anon_vma); } -/* - * rmap_walk_file - do something to file page using the object-based rmap method - * @folio: the folio to be handled - * @rwc: control variable according to each walk type - * @locked: caller holds relevant rmap lock +/** + * __rmap_walk_file() - Traverse the reverse mapping for a file-backed mapping + * of a page mapped within a specified page cache object at a specified offset. * - * Find all the mappings of a folio using the mapping pointer and the vma chains - * contained in the address_space struct it points to. + * @folio: Either the folio whose mappings to traverse, or if NULL, + * the callbacks specified in @rwc will be configured such + * as to be able to look up mappings correctly. + * @mapping: The page cache object whose mapping VMAs we intend to + * traverse. If @folio is non-NULL, this should be equal to + * folio_mapping(folio). + * @pgoff_start: The offset within @mapping of the page which we are + * looking up. If @folio is non-NULL, this should be equal + * to folio_pgoff(folio). + * @nr_pages: The number of pages mapped by the mapping. If @folio is + * non-NULL, this should be equal to folio_nr_pages(folio). + * @rwc: The reverse mapping walk control object describing how + * the traversal should proceed. + * @locked: Is the @mapping already locked? If not, we acquire the + * lock. */ -static void rmap_walk_file(struct folio *folio, - struct rmap_walk_control *rwc, bool locked) +static void __rmap_walk_file(struct folio *folio, struct address_space *mapping, + pgoff_t pgoff_start, unsigned long nr_pages, + struct rmap_walk_control *rwc, bool locked) { - struct address_space *mapping = folio_mapping(folio); - pgoff_t pgoff_start, pgoff_end; + pgoff_t pgoff_end = pgoff_start + nr_pages - 1; struct vm_area_struct *vma; - /* - * The page lock not only makes sure that page->mapping cannot - * suddenly be NULLified by truncation, it makes sure that the - * structure at mapping cannot be freed and reused yet, - * so we can safely take mapping->i_mmap_rwsem. - */ - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_FOLIO(folio && mapping != folio_mapping(folio), folio); + VM_WARN_ON_FOLIO(folio && pgoff_start != folio_pgoff(folio), folio); + VM_WARN_ON_FOLIO(folio && nr_pages != folio_nr_pages(folio), folio); - if (!mapping) - return; - - pgoff_start = folio_pgoff(folio); - pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; if (!locked) { if (i_mmap_trylock_read(mapping)) goto lookup; @@ -2753,8 +2755,7 @@ static void rmap_walk_file(struct folio *folio, lookup: vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { - unsigned long address = vma_address(vma, pgoff_start, - folio_nr_pages(folio)); + unsigned long address = vma_address(vma, pgoff_start, nr_pages); VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); @@ -2767,12 +2768,40 @@ static void rmap_walk_file(struct folio *folio, if (rwc->done && rwc->done(folio)) goto done; } - done: if (!locked) i_mmap_unlock_read(mapping); } +/* + * rmap_walk_file - do something to file page using the object-based rmap method + * @folio: the folio to be handled + * @rwc: control variable according to each walk type + * @locked: caller holds relevant rmap lock + * + * Find all the mappings of a folio using the mapping pointer and the vma chains + * contained in the address_space struct it points to. + */ +static void rmap_walk_file(struct folio *folio, + struct rmap_walk_control *rwc, bool locked) +{ + struct address_space *mapping = folio_mapping(folio); + + /* + * The page lock not only makes sure that page->mapping cannot + * suddenly be NULLified by truncation, it makes sure that the + * structure at mapping cannot be freed and reused yet, + * so we can safely take mapping->i_mmap_rwsem. + */ + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!mapping) + return; + + __rmap_walk_file(folio, mapping, folio_pgoff(folio), + folio_nr_pages(folio), rwc, locked); +} + void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc) { if (unlikely(folio_test_ksm(folio)))
In order to permit the traversal of the reverse mapping at a specified mapping and offset rather than those specified by an input folio, we need to separate out the portion of the rmap file logic which deals with this traversal from those parts of the logic which interact with the folio. This patch achieves this by adding a new static __rmap_walk_file() function which rmap_walk_file() invokes. This function permits the ability to pass NULL folio, on the assumption that the caller has provided for this correctly in the callbacks specified in the rmap_walk_control object. Though it provides for this, and adds debug asserts to ensure that, should a folio be specified, these are equal to the mapping and offset specified in the folio, there should be no functional change as a result of this patch. The reason for adding this is to enable for future changes to permit users to be able to traverse mappings of userland-mapped kernel memory, write-protecting those mappings to enable page_mkwrite() or pfn_mkwrite() fault handlers to be retriggered on subsequent dirty. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- mm/rmap.c | 81 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 26 deletions(-)