Message ID | 20230612151545.3317766-1-ryan.roberts@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Encapsulate PTE contents from non-arch code | expand |
On Mon, 12 Jun 2023 16:15:42 +0100 Ryan Roberts <ryan.roberts@arm.com> wrote: > Hi All, > > (Including wider audience this time since changes touch a fair few subsystems) > > This is the second half of v3 of a series to improve the encapsulation of pte > entries by disallowing non-arch code from directly dereferencing pte_t pointers. That's basically all we have here for [0/N] cover letter content. I stole some words from the [3/3] changelog, so we now have: : A series to improve the encapsulation of pte entries by disallowing : non-arch code from directly dereferencing pte_t pointers. : : This means that by default, the accesses change from a C dereference to a : READ_ONCE(). This is technically the correct thing to do since where : pgtables are modified by HW (for access/dirty) they are volatile and : therefore we should always ensure READ_ONCE() semantics. : : But more importantly, by always using the helper, it can be overridden by : the architecture to fully encapsulate the contents of the pte. Arch code : is deliberately not converted, as the arch code knows best. It is : intended that arch code (arm64) will override the default with its own : implementation that can (e.g.) hide certain bits from the core code, or : determine young/dirty status by mixing in state from another source. > Based on earlier feedback, I split the series in 2; the first part, fixes for > existing bugs, was already posted at [3] and merged into mm-stable. This second > part contains the conversion from direct dereferences to instead use > ptep_get()/ptep_get_lockless(). > > See the v1 cover letter at [1] for rationale for this work. > > Based on feedback at v2, I've removed the new ptep_deref() helper I originally > added, and am now using the existing ptep_get() and ptep_get_lockless() helpers. > Testing on Ampere Altra (arm64) showed no difference in performance when using > ptep_deref() (*pte) vs ptep_get() (READ_ONCE(*pte)). > > Patches are based on mm-unstable (49e038b1919e) and a branch is available at [4] > (Let me know if this is the wrong branch to target - I'm still not familiar with > the details of the mm- dev process!). Note that Hugh Dickins's "mm: allow > pte_offset_map[_lock]() to fail" (now in mm-unstable) patch set caused a number > of conflicts which I've resolved. But due to that, you won't be able to apply > these patches on top of Linus's tree. I have an alternate branch on top of > v6.4-rc6 at [5]. Yep, that's all great, thanks. Is there some clever trick we can do to prevent new open-coded derefs of pte_t* from being introduced? I suppose we could convert pte_t to a single-member struct to force a compile error. That struct will get passed by value to ptep_get() so that's OK. But this isn't viable unless/until all architectures are converted :( Or we rely upon Ryan to grep the tree occasionally ;)
> On Jun 12, 2023, at 23:15, Ryan Roberts <ryan.roberts@arm.com> wrote: > > Hi All, > > (Including wider audience this time since changes touch a fair few subsystems) > > This is the second half of v3 of a series to improve the encapsulation of pte > entries by disallowing non-arch code from directly dereferencing pte_t pointers. > Based on earlier feedback, I split the series in 2; the first part, fixes for > existing bugs, was already posted at [3] and merged into mm-stable. This second > part contains the conversion from direct dereferences to instead use > ptep_get()/ptep_get_lockless(). > > See the v1 cover letter at [1] for rationale for this work. > > Based on feedback at v2, I've removed the new ptep_deref() helper I originally > added, and am now using the existing ptep_get() and ptep_get_lockless() helpers. When I first saw the name of ptep_get()/ptep_get_lockless(), I thought the pte seems like to be protected by the refcount mechanism (Why I have this though? Because Qi Zheng has proposed a approach to free pte page tables by using the refcount mechanism [1]). And your proposed name of ptep_deref() is intuitive for me, so I have another thought, should we rename ptep_get() to ptep_deref()? Just a thought from me, I'd like to hear if others object. Thanks. [1] https://lore.kernel.org/lkml/20211110105428.32458-7-zhengqi.arch@bytedance.com/
Hi Andrew, Thanks for taking the patches! On 12/06/2023 21:16, Andrew Morton wrote: > On Mon, 12 Jun 2023 16:15:42 +0100 Ryan Roberts <ryan.roberts@arm.com> wrote: > >> Hi All, >> >> (Including wider audience this time since changes touch a fair few subsystems) >> >> This is the second half of v3 of a series to improve the encapsulation of pte >> entries by disallowing non-arch code from directly dereferencing pte_t pointers. > > That's basically all we have here for [0/N] cover letter content. I > stole some words from the [3/3] changelog, so we now have: > > : A series to improve the encapsulation of pte entries by disallowing > : non-arch code from directly dereferencing pte_t pointers. > : > : This means that by default, the accesses change from a C dereference to a > : READ_ONCE(). This is technically the correct thing to do since where > : pgtables are modified by HW (for access/dirty) they are volatile and > : therefore we should always ensure READ_ONCE() semantics. > : > : But more importantly, by always using the helper, it can be overridden by > : the architecture to fully encapsulate the contents of the pte. Arch code > : is deliberately not converted, as the arch code knows best. It is > : intended that arch code (arm64) will override the default with its own > : implementation that can (e.g.) hide certain bits from the core code, or > : determine young/dirty status by mixing in state from another source. I'm not sure I've understood what you are saying here? Is there some expectation for the content of the cover letter for mm patches, which I'm not aware of? If you can point me to some docs I'll make sure I'm conformant in future. > > >> Based on earlier feedback, I split the series in 2; the first part, fixes for >> existing bugs, was already posted at [3] and merged into mm-stable. This second >> part contains the conversion from direct dereferences to instead use >> ptep_get()/ptep_get_lockless(). >> >> See the v1 cover letter at [1] for rationale for this work. >> >> Based on feedback at v2, I've removed the new ptep_deref() helper I originally >> added, and am now using the existing ptep_get() and ptep_get_lockless() helpers. >> Testing on Ampere Altra (arm64) showed no difference in performance when using >> ptep_deref() (*pte) vs ptep_get() (READ_ONCE(*pte)). >> >> Patches are based on mm-unstable (49e038b1919e) and a branch is available at [4] >> (Let me know if this is the wrong branch to target - I'm still not familiar with >> the details of the mm- dev process!). Note that Hugh Dickins's "mm: allow >> pte_offset_map[_lock]() to fail" (now in mm-unstable) patch set caused a number >> of conflicts which I've resolved. But due to that, you won't be able to apply >> these patches on top of Linus's tree. I have an alternate branch on top of >> v6.4-rc6 at [5]. > > Yep, that's all great, thanks. > > > Is there some clever trick we can do to prevent new open-coded derefs > of pte_t* from being introduced? I've been thinking about this myself but don't have a good answer at the moment. I used 2 techniques to find them all, but neither are fullproof so required hand editing: 1) global find/replace of "pte_t *" to "pte_handle_t ", and define pte_handle_t as void *. Then the compiler will throw an error for an attempted dereference. This obbiously only checks code that is actually being compiled, and it doesn't prevent anyone from declaring a pte_t* and directly dereferencing in future. 2) Using the coccinelle script. This worked pretty well, but finds a few places where the pte_t pointer is not pointing to a page table, but a variable on the stack, and there it is legitimate to dereference (we don't want to use ptep_get()/ptep_set_at() there). And also flags up lots of arch code that needs to be ignored. There are coccinelle scripts kept in scripts/coccinelle that apprarently get run semi-regularly [1]. So could submit something there, but would need to add support for excluding the arch directory and for marking known false-positives, neither of which I think are currently available. Does anyone know better? > > I suppose we could convert pte_t to a single-member struct to force a > compile error. That struct will get passed by value to ptep_get() so > that's OK. But this isn't viable unless/until all architectures are > converted :( It already is defined that way for arm64 at least: typedef struct { pteval_t pte; } pte_t; But I'm not sure how that helps? You can still legally deref it in C. > > Or we rely upon Ryan to grep the tree occasionally ;) This is my least favourite option ;-) But it wouldn't be the end of the world in the short term. Once I achieve the full objective of using arm64's contpte capability, open coded derefs will manifest as bugs (at least for cases where the accuracy of the access/dirty bits matter), which will certainly make sure they get fixed. Thanks, Ryan
On 13/06/2023 03:16, Muchun Song wrote: > > >> On Jun 12, 2023, at 23:15, Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> Hi All, >> >> (Including wider audience this time since changes touch a fair few subsystems) >> >> This is the second half of v3 of a series to improve the encapsulation of pte >> entries by disallowing non-arch code from directly dereferencing pte_t pointers. >> Based on earlier feedback, I split the series in 2; the first part, fixes for >> existing bugs, was already posted at [3] and merged into mm-stable. This second >> part contains the conversion from direct dereferences to instead use >> ptep_get()/ptep_get_lockless(). >> >> See the v1 cover letter at [1] for rationale for this work. >> >> Based on feedback at v2, I've removed the new ptep_deref() helper I originally >> added, and am now using the existing ptep_get() and ptep_get_lockless() helpers. > > When I first saw the name of ptep_get()/ptep_get_lockless(), I thought > the pte seems like to be protected by the refcount mechanism (Why I have > this though? Because Qi Zheng has proposed a approach to free pte page tables > by using the refcount mechanism [1]). And your proposed name of ptep_deref() > is intuitive for me, so I have another thought, should we rename ptep_get() > to ptep_deref()? Just a thought from me, I'd like to hear if others object. I see your point, but I think any renaming exercise should be discussed and applied in the context of a separate patch series, given that ptep_get() and ptep_get_lockless() already exist in the code base. This would be a much bigger job, since it would need to cover all the arch code too. > > Thanks. > > [1] https://lore.kernel.org/lkml/20211110105428.32458-7-zhengqi.arch@bytedance.com/