Message ID | 00dfc71569bc9971b53e29b36a80e9e020ac61ac.1737391102.git.oleksii.kurochko@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fixes for vmap_to_mfn() and pt_mapping_level | expand |
On 20.01.2025 17:54, Oleksii Kurochko wrote: > RISC-V doesn't have hardware feature to ask MMU to translate > virtual address to physical address ( like Arm has, for example ), > so software page table walking in implemented. > > Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> > --- > xen/arch/riscv/include/asm/mm.h | 2 ++ > xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ > 2 files changed, 58 insertions(+) > > diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h > index 292aa48fc1..d46018c132 100644 > --- a/xen/arch/riscv/include/asm/mm.h > +++ b/xen/arch/riscv/include/asm/mm.h > @@ -15,6 +15,8 @@ > > extern vaddr_t directmap_virt_start; > > +paddr_t pt_walk(vaddr_t va); In the longer run, is returning just the PA really going to be sufficient? If not, perhaps say a word on the limitation in the description. > --- a/xen/arch/riscv/pt.c > +++ b/xen/arch/riscv/pt.c > @@ -274,6 +274,62 @@ static int pt_update_entry(mfn_t root, vaddr_t virt, > return rc; > } > > +paddr_t pt_walk(vaddr_t va) > +{ > + const mfn_t root = get_root_page(); > + /* > + * In pt_walk() only XEN_TALE_MAP_NONE and XEN_TABLE_SUPER_PAGE are Nit: s/TALE/TABLE/ ? > + * handled ( as they are only possible for page table walking ), so Nit: Blanks again inside parentheses in a comment. > + * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM. > + */ > + int ret = XEN_TABLE_MAP_NOMEM; > + unsigned int level = HYP_PT_ROOT_LEVEL; > + paddr_t pa = 0; Seeing 0 as initializer here, just to double check: You do prevent MFN 0 to be handed to the page allocator, and you also don't use it "manually" anywhere? > + pte_t *table; > + > + DECLARE_OFFSETS(offsets, va); > + > + table = map_table(root); > + > + /* > + * Find `pa` of an entry which corresponds to `va` by iterating for each > + * page level and checking if the entry points to a next page table or > + * to a page. > + * > + * Two cases are possible: > + * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find; > + * (Despite of the name) XEN_TABLE_SUPER_PAGE covers 4k mapping too. > + * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually > + * mapped. > + */ > + while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) ) > + { > + /* > + * This case shouldn't really occur as it will mean that for table > + * level 0 a pointer to next page table has been written, but at > + * level 0 it could be only a pointer to 4k page. > + */ > + ASSERT(level <= HYP_PT_ROOT_LEVEL); > + > + ret = pt_next_level(false, &table, offsets[level]); > + level--; > + } > + > + if ( ret == XEN_TABLE_MAP_NONE ) > + dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); Even if it's a dprintk(), I'd recommend against adding such. > + else if ( ret == XEN_TABLE_SUPER_PAGE ) > + pa = pte_to_paddr(*(table + offsets[level + 1])); Missing "else if ( ret == XEN_TABLE_NORMAL )" (or maybe simply "else")? > + /* > + * There is no need for unmap_table() after each pt_next_level() call as > + * pt_next_level() will do unmap_table() for the previous table before > + * returning next level table. > + */ > + unmap_table(table); I don't think the comment is needed. You map once before the loop, so it's natural that you unmap once after. > + return pa; Don't you want to OR in the low 12 bits of VA here (unless "pa" is still 0)? Jan
On 1/27/25 11:06 AM, Jan Beulich wrote: > On 20.01.2025 17:54, Oleksii Kurochko wrote: >> RISC-V doesn't have hardware feature to ask MMU to translate >> virtual address to physical address ( like Arm has, for example ), >> so software page table walking in implemented. >> >> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >> --- >> xen/arch/riscv/include/asm/mm.h | 2 ++ >> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >> 2 files changed, 58 insertions(+) >> >> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >> index 292aa48fc1..d46018c132 100644 >> --- a/xen/arch/riscv/include/asm/mm.h >> +++ b/xen/arch/riscv/include/asm/mm.h >> @@ -15,6 +15,8 @@ >> >> extern vaddr_t directmap_virt_start; >> >> +paddr_t pt_walk(vaddr_t va); > In the longer run, is returning just the PA really going to be sufficient? > If not, perhaps say a word on the limitation in the description. In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. Anyway, yes, it is still returning a physical address, and that seems enough to me. Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? |[1] https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/mm.c#L820| >> + * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM. >> + */ >> + int ret = XEN_TABLE_MAP_NOMEM; >> + unsigned int level = HYP_PT_ROOT_LEVEL; >> + paddr_t pa = 0; > Seeing 0 as initializer here, just to double check: You do prevent MFN 0 > to be handed to the page allocator, and you also don't use it "manually" > anywhere? MFN 0 could be used when removing the page:https://gitlab.com/xen-project/xen/-/blob/staging/xen/arch/riscv/pt.c?ref_type=heads#L251 <https://gitlab.com/xen-project/xen/-/blob/staging/xen/arch/riscv/pt.c?ref_type=heads#L251>. In that case, it would be better to initialize|pa| with|(paddr_t)(-1)|, as this value couldn't be mapped and is safe to use as an invalid value. > >> + pte_t *table; >> + >> + DECLARE_OFFSETS(offsets, va); >> + >> + table = map_table(root); >> + >> + /* >> + * Find `pa` of an entry which corresponds to `va` by iterating for each >> + * page level and checking if the entry points to a next page table or >> + * to a page. >> + * >> + * Two cases are possible: >> + * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find; >> + * (Despite of the name) XEN_TABLE_SUPER_PAGE covers 4k mapping too. >> + * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually >> + * mapped. >> + */ >> + while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) ) >> + { >> + /* >> + * This case shouldn't really occur as it will mean that for table >> + * level 0 a pointer to next page table has been written, but at >> + * level 0 it could be only a pointer to 4k page. >> + */ >> + ASSERT(level <= HYP_PT_ROOT_LEVEL); >> + >> + ret = pt_next_level(false, &table, offsets[level]); >> + level--; >> + } >> + >> + if ( ret == XEN_TABLE_MAP_NONE ) >> + dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); > Even if it's a dprintk(), I'd recommend against adding such. > >> + else if ( ret == XEN_TABLE_SUPER_PAGE ) >> + pa = pte_to_paddr(*(table + offsets[level + 1])); > Missing "else if ( ret == XEN_TABLE_NORMAL )" (or maybe simply "else")? If I am not missing something, we can't be here with ret == XEN_TABLE_NORMAL because we iterating in the while loop above until we don't find a leaf or until reach level = 0. If we find a leaf then XEN_TABLE_SUPER_PAGE is returned; otherwise sooner or later we should face a case when next table (in case when `level`=0 and someone put at this level a pointer to next level, what is a bug) should be allocated in pt_next_level(), but it will fail because `alloc_tbl`=false is passed to pt_next_level() and thereby ret=XEN_TABLE_MAP_NONE() will be returned. Based on your previous comment about dprintk this could could be re-written in the following way: - if ( ret == XEN_TABLE_MAP_NONE ) - dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); - else if ( ret == XEN_TABLE_SUPER_PAGE ) + if ( ret != XEN_TABLE_MAP_NONE ) pa = pte_to_paddr(*(table + offsets[level + 1])); >> + return pa; > Don't you want to OR in the low 12 bits of VA here (unless "pa" is still 0)? It is a bug, and IIUC if `pa` isn't 0 it is still need to add the low bits of VA to `pa`: return pa | (va & ((1 << PAGE_SHIFT) - 1)); (I think that I saw somewhere a macros to generate masks but can't find where) Thanks. ~ Oleksii
On 27.01.2025 13:29, Oleksii Kurochko wrote: > On 1/27/25 11:06 AM, Jan Beulich wrote: >> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>> RISC-V doesn't have hardware feature to ask MMU to translate >>> virtual address to physical address ( like Arm has, for example ), >>> so software page table walking in implemented. >>> >>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>> --- >>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>> 2 files changed, 58 insertions(+) >>> >>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>> index 292aa48fc1..d46018c132 100644 >>> --- a/xen/arch/riscv/include/asm/mm.h >>> +++ b/xen/arch/riscv/include/asm/mm.h >>> @@ -15,6 +15,8 @@ >>> >>> extern vaddr_t directmap_virt_start; >>> >>> +paddr_t pt_walk(vaddr_t va); >> In the longer run, is returning just the PA really going to be sufficient? >> If not, perhaps say a word on the limitation in the description. > > In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, > as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument > as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. > Anyway, yes, it is still returning a physical address, and that seems enough to me. > > Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? Often you care about the permissions as well. Sometimes it may even be relevant to know the (super-)page size of the mapping. > |[1] > https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/mm.c#L820| > >>> + * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM. >>> + */ >>> + int ret = XEN_TABLE_MAP_NOMEM; >>> + unsigned int level = HYP_PT_ROOT_LEVEL; >>> + paddr_t pa = 0; >> Seeing 0 as initializer here, just to double check: You do prevent MFN 0 >> to be handed to the page allocator, and you also don't use it "manually" >> anywhere? > > MFN 0 could be used when removing the page:https://gitlab.com/xen-project/xen/-/blob/staging/xen/arch/riscv/pt.c?ref_type=heads#L251 <https://gitlab.com/xen-project/xen/-/blob/staging/xen/arch/riscv/pt.c?ref_type=heads#L251>. > > In that case, it would be better to initialize|pa| with|(paddr_t)(-1)|, as this value couldn't be mapped and is safe to use as an invalid value. > >> >>> + pte_t *table; >>> + >>> + DECLARE_OFFSETS(offsets, va); >>> + >>> + table = map_table(root); >>> + >>> + /* >>> + * Find `pa` of an entry which corresponds to `va` by iterating for each >>> + * page level and checking if the entry points to a next page table or >>> + * to a page. >>> + * >>> + * Two cases are possible: >>> + * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find; >>> + * (Despite of the name) XEN_TABLE_SUPER_PAGE covers 4k mapping too. >>> + * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually >>> + * mapped. >>> + */ >>> + while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) ) >>> + { >>> + /* >>> + * This case shouldn't really occur as it will mean that for table >>> + * level 0 a pointer to next page table has been written, but at >>> + * level 0 it could be only a pointer to 4k page. >>> + */ >>> + ASSERT(level <= HYP_PT_ROOT_LEVEL); >>> + >>> + ret = pt_next_level(false, &table, offsets[level]); >>> + level--; >>> + } >>> + >>> + if ( ret == XEN_TABLE_MAP_NONE ) >>> + dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); >> Even if it's a dprintk(), I'd recommend against adding such. >> >>> + else if ( ret == XEN_TABLE_SUPER_PAGE ) >>> + pa = pte_to_paddr(*(table + offsets[level + 1])); >> Missing "else if ( ret == XEN_TABLE_NORMAL )" (or maybe simply "else")? > > If I am not missing something, we can't be here with ret == XEN_TABLE_NORMAL because we iterating > in the while loop above until we don't find a leaf or until reach level = 0. I'll admit that I didn't specifically check whether XEN_TABLE_NORMAL could be observed here; my point was that non-super-page mappings aren't handled, as you ... > If we find a leaf then > XEN_TABLE_SUPER_PAGE is returned; otherwise sooner or later we should face a case when next table > (in case when `level`=0 and someone put at this level a pointer to next level, what is a bug) should > be allocated in pt_next_level(), but it will fail because `alloc_tbl`=false is passed to pt_next_level() > and thereby ret=XEN_TABLE_MAP_NONE() will be returned. > > Based on your previous comment about dprintk this could could be re-written in the following way: > - if ( ret == XEN_TABLE_MAP_NONE ) > - dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); > - else if ( ret == XEN_TABLE_SUPER_PAGE ) > + if ( ret != XEN_TABLE_MAP_NONE ) > pa = pte_to_paddr(*(table + offsets[level + 1])); ... appear to confirm here. Jan
On 1/27/25 1:57 PM, Jan Beulich wrote: > On 27.01.2025 13:29, Oleksii Kurochko wrote: >> On 1/27/25 11:06 AM, Jan Beulich wrote: >>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>> virtual address to physical address ( like Arm has, for example ), >>>> so software page table walking in implemented. >>>> >>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>> --- >>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>> 2 files changed, 58 insertions(+) >>>> >>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>> index 292aa48fc1..d46018c132 100644 >>>> --- a/xen/arch/riscv/include/asm/mm.h >>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>> @@ -15,6 +15,8 @@ >>>> >>>> extern vaddr_t directmap_virt_start; >>>> >>>> +paddr_t pt_walk(vaddr_t va); >>> In the longer run, is returning just the PA really going to be sufficient? >>> If not, perhaps say a word on the limitation in the description. >> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >> Anyway, yes, it is still returning a physical address, and that seems enough to me. >> >> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? > Often you care about the permissions as well. Sometimes it may even be relevant > to know the (super-)page size of the mapping. Perhaps it would be better to change the prototype to: bool pt_walk(vaddr_t va, mfn_t *ret_pa); or even void pt_walk(vaddr_t va, mfn_t *ret_pa); In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. If there's a need to return permissions or (super-)page size in the future, another argument could be added. What do you think? Would this approach be better? I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or page size) as needed in the future. Both solutions seem more or less equivalent. ~ Oleksii
On 1/27/25 6:22 PM, Oleksii Kurochko wrote: > > > On 1/27/25 1:57 PM, Jan Beulich wrote: >> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>> virtual address to physical address ( like Arm has, for example ), >>>>> so software page table walking in implemented. >>>>> >>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>> --- >>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>> 2 files changed, 58 insertions(+) >>>>> >>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>> index 292aa48fc1..d46018c132 100644 >>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>> @@ -15,6 +15,8 @@ >>>>> >>>>> extern vaddr_t directmap_virt_start; >>>>> >>>>> +paddr_t pt_walk(vaddr_t va); >>>> In the longer run, is returning just the PA really going to be sufficient? >>>> If not, perhaps say a word on the limitation in the description. >>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>> >>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >> Often you care about the permissions as well. Sometimes it may even be relevant >> to know the (super-)page size of the mapping. > Perhaps it would be better to change the prototype to: > bool pt_walk(vaddr_t va, mfn_t *ret_pa); > or even > void pt_walk(vaddr_t va, mfn_t *ret_pa); > In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. > If there's a need to return permissions or (super-)page size in the future, another argument could be added. > What do you think? Would this approach be better? We have to return mfn_t or paddr_t as pt_walk() is used invmap_to_mfn(). ~ Oleksii > > I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or > page size) as needed in the future. Both solutions seem more or less equivalent. > > ~ Oleksii
On 27.01.2025 18:41, Oleksii Kurochko wrote: > > On 1/27/25 6:22 PM, Oleksii Kurochko wrote: >> >> >> On 1/27/25 1:57 PM, Jan Beulich wrote: >>> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>>> virtual address to physical address ( like Arm has, for example ), >>>>>> so software page table walking in implemented. >>>>>> >>>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>>> --- >>>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>>> 2 files changed, 58 insertions(+) >>>>>> >>>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>>> index 292aa48fc1..d46018c132 100644 >>>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>>> @@ -15,6 +15,8 @@ >>>>>> >>>>>> extern vaddr_t directmap_virt_start; >>>>>> >>>>>> +paddr_t pt_walk(vaddr_t va); >>>>> In the longer run, is returning just the PA really going to be sufficient? >>>>> If not, perhaps say a word on the limitation in the description. >>>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>>> >>>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >>> Often you care about the permissions as well. Sometimes it may even be relevant >>> to know the (super-)page size of the mapping. >> Perhaps it would be better to change the prototype to: >> bool pt_walk(vaddr_t va, mfn_t *ret_pa); >> or even >> void pt_walk(vaddr_t va, mfn_t *ret_pa); >> In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. >> If there's a need to return permissions or (super-)page size in the future, another argument could be added. >> What do you think? Would this approach be better? > > We have to return mfn_t or paddr_t as pt_walk() is used invmap_to_mfn(). That use doesn't really limit what the function needs to return. It merely affects how simple (or complicated) the invocation there would be. Jan
On 27.01.2025 18:22, Oleksii Kurochko wrote: > On 1/27/25 1:57 PM, Jan Beulich wrote: >> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>> virtual address to physical address ( like Arm has, for example ), >>>>> so software page table walking in implemented. >>>>> >>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>> --- >>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>> 2 files changed, 58 insertions(+) >>>>> >>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>> index 292aa48fc1..d46018c132 100644 >>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>> @@ -15,6 +15,8 @@ >>>>> >>>>> extern vaddr_t directmap_virt_start; >>>>> >>>>> +paddr_t pt_walk(vaddr_t va); >>>> In the longer run, is returning just the PA really going to be sufficient? >>>> If not, perhaps say a word on the limitation in the description. >>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>> >>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >> Often you care about the permissions as well. Sometimes it may even be relevant >> to know the (super-)page size of the mapping. > > Perhaps it would be better to change the prototype to: > bool pt_walk(vaddr_t va, mfn_t *ret_pa); > or even > void pt_walk(vaddr_t va, mfn_t *ret_pa); > In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. > If there's a need to return permissions or (super-)page size in the future, another argument could be added. > > What do you think? Would this approach be better? > > I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or > page size) as needed in the future. Both solutions seem more or less equivalent. Imo the most natural thing for a page walking function would be to return the leaf PTE (or the leaf-most not-present [or otherwise "no-access"] one). That would provide (almost) all possible information to the caller. "Almost" because depending on how page walk works, permissions may combine across page table levels. Yet then (see also the "no-access" above) this would also require further input, to specify the context for which the translation is being seeked. For example, the intention to write may want to yield no valid PTE when there are present ones down to the leaf, but effective permissions say "read-only". Jan
On 1/28/25 9:14 AM, Jan Beulich wrote: > On 27.01.2025 18:22, Oleksii Kurochko wrote: >> On 1/27/25 1:57 PM, Jan Beulich wrote: >>> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>>> virtual address to physical address ( like Arm has, for example ), >>>>>> so software page table walking in implemented. >>>>>> >>>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>>> --- >>>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>>> 2 files changed, 58 insertions(+) >>>>>> >>>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>>> index 292aa48fc1..d46018c132 100644 >>>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>>> @@ -15,6 +15,8 @@ >>>>>> >>>>>> extern vaddr_t directmap_virt_start; >>>>>> >>>>>> +paddr_t pt_walk(vaddr_t va); >>>>> In the longer run, is returning just the PA really going to be sufficient? >>>>> If not, perhaps say a word on the limitation in the description. >>>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>>> >>>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >>> Often you care about the permissions as well. Sometimes it may even be relevant >>> to know the (super-)page size of the mapping. >> Perhaps it would be better to change the prototype to: >> bool pt_walk(vaddr_t va, mfn_t *ret_pa); >> or even >> void pt_walk(vaddr_t va, mfn_t *ret_pa); >> In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. >> If there's a need to return permissions or (super-)page size in the future, another argument could be added. >> >> What do you think? Would this approach be better? >> >> I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or >> page size) as needed in the future. Both solutions seem more or less equivalent. > Imo the most natural thing for a page walking function would be to return the > leaf PTE (or the leaf-most not-present [or otherwise "no-access"] one). That > would provide (almost) all possible information to the caller. "Almost" > because depending on how page walk works, permissions may combine across page > table levels. Yet then (see also the "no-access" above) this would also > require further input, to specify the context for which the translation is > being seeked. For example, the intention to write may want to yield no valid > PTE when there are present ones down to the leaf, but effective permissions > say "read-only". Perhaps returning the leaf PTE could be a really good option. I'm not entirely sure I understand what you mean by "leaf-most not-present". Could you please try to explain this moment one more time? My expectation was that the function should return an existing leaf PTE (from which "access" rights could be determined) or|NULL| to indicate that no leaf PTE was found. Another thing I'm curious about is whether this would be sufficient for determining the level. It seems clear that, given a PTE and a virtual address, we could compute: |mask = VA | paddr_from_pte(pte)| Then, iterating through each level, we could apply and understand on which one level it was mapped: |mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)|. If I haven't overlooked any other way to calculate the page table level, would it be better to simply add another argument to|pt_walk()| to return the level. Thanks. ~ Oleksii > > Jan
On 29.01.2025 14:12, Oleksii Kurochko wrote: > > On 1/28/25 9:14 AM, Jan Beulich wrote: >> On 27.01.2025 18:22, Oleksii Kurochko wrote: >>> On 1/27/25 1:57 PM, Jan Beulich wrote: >>>> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>>>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>>>> virtual address to physical address ( like Arm has, for example ), >>>>>>> so software page table walking in implemented. >>>>>>> >>>>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>>>> --- >>>>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>>>> 2 files changed, 58 insertions(+) >>>>>>> >>>>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>>>> index 292aa48fc1..d46018c132 100644 >>>>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>>>> @@ -15,6 +15,8 @@ >>>>>>> >>>>>>> extern vaddr_t directmap_virt_start; >>>>>>> >>>>>>> +paddr_t pt_walk(vaddr_t va); >>>>>> In the longer run, is returning just the PA really going to be sufficient? >>>>>> If not, perhaps say a word on the limitation in the description. >>>>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>>>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>>>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>>>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>>>> >>>>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >>>> Often you care about the permissions as well. Sometimes it may even be relevant >>>> to know the (super-)page size of the mapping. >>> Perhaps it would be better to change the prototype to: >>> bool pt_walk(vaddr_t va, mfn_t *ret_pa); >>> or even >>> void pt_walk(vaddr_t va, mfn_t *ret_pa); >>> In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. >>> If there's a need to return permissions or (super-)page size in the future, another argument could be added. >>> >>> What do you think? Would this approach be better? >>> >>> I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or >>> page size) as needed in the future. Both solutions seem more or less equivalent. >> Imo the most natural thing for a page walking function would be to return the >> leaf PTE (or the leaf-most not-present [or otherwise "no-access"] one). That >> would provide (almost) all possible information to the caller. "Almost" >> because depending on how page walk works, permissions may combine across page >> table levels. Yet then (see also the "no-access" above) this would also >> require further input, to specify the context for which the translation is >> being seeked. For example, the intention to write may want to yield no valid >> PTE when there are present ones down to the leaf, but effective permissions >> say "read-only". > > Perhaps returning the leaf PTE could be a really good option. > > I'm not entirely sure I understand what you mean by "leaf-most not-present". Could you please try to explain this moment one more time? > My expectation was that the function should return an existing leaf PTE (from which "access" rights could be determined) > or|NULL| to indicate that no leaf PTE was found. "no leaf PTE" may be for a variety of reasons. Hence why I think returning the PTE at which the walk stopped (leaf or leaf-most not-present) is likely best. Such a not-present PTE may, after all, still contain valuable information; it's not like it has to be all zero. > Another thing I'm curious about is whether this would be sufficient for determining the level. > It seems clear that, given a PTE and a virtual address, we could compute: > |mask = VA | paddr_from_pte(pte)| What would this value represent? No, from holding a PTE in your hands you can't determine the level it came from. So yes, ... > Then, iterating through each level, we could apply and understand on which one level it was mapped: > |mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)|. > > If I haven't overlooked any other way to calculate the page table level, would it be better to simply add another argument > to|pt_walk()| to return the level. ... for callers who care doing this might then be necessary (this would be a pointer parameter, and since I expect many callers wouldn't care about the level, it likely wants to be permissible to pass in NULL). Question then is whether it's better to hand back the level or the page order of the mapping. On x86 we return the latter from P2M lookups, for example. Jan
On 1/29/25 3:01 PM, Jan Beulich wrote: > On 29.01.2025 14:12, Oleksii Kurochko wrote: >> On 1/28/25 9:14 AM, Jan Beulich wrote: >>> On 27.01.2025 18:22, Oleksii Kurochko wrote: >>>> On 1/27/25 1:57 PM, Jan Beulich wrote: >>>>> On 27.01.2025 13:29, Oleksii Kurochko wrote: >>>>>> On 1/27/25 11:06 AM, Jan Beulich wrote: >>>>>>> On 20.01.2025 17:54, Oleksii Kurochko wrote: >>>>>>>> RISC-V doesn't have hardware feature to ask MMU to translate >>>>>>>> virtual address to physical address ( like Arm has, for example ), >>>>>>>> so software page table walking in implemented. >>>>>>>> >>>>>>>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> >>>>>>>> --- >>>>>>>> xen/arch/riscv/include/asm/mm.h | 2 ++ >>>>>>>> xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ >>>>>>>> 2 files changed, 58 insertions(+) >>>>>>>> >>>>>>>> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h >>>>>>>> index 292aa48fc1..d46018c132 100644 >>>>>>>> --- a/xen/arch/riscv/include/asm/mm.h >>>>>>>> +++ b/xen/arch/riscv/include/asm/mm.h >>>>>>>> @@ -15,6 +15,8 @@ >>>>>>>> >>>>>>>> extern vaddr_t directmap_virt_start; >>>>>>>> >>>>>>>> +paddr_t pt_walk(vaddr_t va); >>>>>>> In the longer run, is returning just the PA really going to be sufficient? >>>>>>> If not, perhaps say a word on the limitation in the description. >>>>>> In the long run, this function's prototype looks like|paddr_t pt_walk(vaddr_t root, vaddr_t va, bool is_xen)| [1]. However, I'm not sure if it will stay that way, >>>>>> as I think|is_xen| could be skipped, since using|map_table()| should be sufficient (as it now considers|system_state|) and I'm not really sure if I need root argument >>>>>> as initial goal was to use this function for debug only purposes and I've never used it for guest page table (stage-1) walking. >>>>>> Anyway, yes, it is still returning a physical address, and that seems enough to me. >>>>>> >>>>>> Could you share your thoughts on what I should take into account for returning value, probably, I am missing something really useful? >>>>> Often you care about the permissions as well. Sometimes it may even be relevant >>>>> to know the (super-)page size of the mapping. >>>> Perhaps it would be better to change the prototype to: >>>> bool pt_walk(vaddr_t va, mfn_t *ret_pa); >>>> or even >>>> void pt_walk(vaddr_t va, mfn_t *ret_pa); >>>> In this case,|ret_pa = INVALID_MFN| could serve as a signal that|pt_walk()| failed. >>>> If there's a need to return permissions or (super-)page size in the future, another argument could be added. >>>> >>>> What do you think? Would this approach be better? >>>> >>>> I am also considering returning a structure containing the|mfn| (or|paddr_t|) and adding other properties (such as permissions or >>>> page size) as needed in the future. Both solutions seem more or less equivalent. >>> Imo the most natural thing for a page walking function would be to return the >>> leaf PTE (or the leaf-most not-present [or otherwise "no-access"] one). That >>> would provide (almost) all possible information to the caller. "Almost" >>> because depending on how page walk works, permissions may combine across page >>> table levels. Yet then (see also the "no-access" above) this would also >>> require further input, to specify the context for which the translation is >>> being seeked. For example, the intention to write may want to yield no valid >>> PTE when there are present ones down to the leaf, but effective permissions >>> say "read-only". >> Perhaps returning the leaf PTE could be a really good option. >> >> I'm not entirely sure I understand what you mean by "leaf-most not-present". Could you please try to explain this moment one more time? >> My expectation was that the function should return an existing leaf PTE (from which "access" rights could be determined) >> or|NULL| to indicate that no leaf PTE was found. > "no leaf PTE" may be for a variety of reasons. Hence why I think returning > the PTE at which the walk stopped (leaf or leaf-most not-present) is likely > best. Such a not-present PTE may, after all, still contain valuable > information; it's not like it has to be all zero. Thanks, it is clearer now. It will complicate a little bit vmap_to_mfn() (as we should to check that pt_walk() returns a leaf; otherwise something wrong happens), but I think it is not really critical as you mentioned before, and for convenience it would be better to implement it as a static inline function: static inline mfn_t vmap_to_mfn(vaddr_t va) { pte_t *entry = pt_walk(va, NULL); BUG_ON(!pte_is_mapping(*entry)); return mfn_from_pte(*entry); } >> Another thing I'm curious about is whether this would be sufficient for determining the level. >> It seems clear that, given a PTE and a virtual address, we could compute: >> |mask = VA | paddr_from_pte(pte)| > What would this value represent? No, from holding a PTE in your hands you > can't determine the level it came from. So yes, ... > >> Then, iterating through each level, we could apply and understand on which one level it was mapped: >> |mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)|. >> >> If I haven't overlooked any other way to calculate the page table level, would it be better to simply add another argument >> to|pt_walk()| to return the level. > ... for callers who care doing this might then be necessary (this would be > a pointer parameter, and since I expect many callers wouldn't care about > the level, it likely wants to be permissible to pass in NULL). > > Question then is whether it's better to hand back the level or the page > order of the mapping. On x86 we return the latter from P2M lookups, for > example. Actually, I think for proper calculation of order in pt_update(). Thanks. ~ Oleksii
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index 292aa48fc1..d46018c132 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -15,6 +15,8 @@ extern vaddr_t directmap_virt_start; +paddr_t pt_walk(vaddr_t va); + #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT) #define paddr_to_pfn(pa) ((unsigned long)((pa) >> PAGE_SHIFT)) diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c index a703e0f1bd..865d60d1af 100644 --- a/xen/arch/riscv/pt.c +++ b/xen/arch/riscv/pt.c @@ -274,6 +274,62 @@ static int pt_update_entry(mfn_t root, vaddr_t virt, return rc; } +paddr_t pt_walk(vaddr_t va) +{ + const mfn_t root = get_root_page(); + /* + * In pt_walk() only XEN_TALE_MAP_NONE and XEN_TABLE_SUPER_PAGE are + * handled ( as they are only possible for page table walking ), so + * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM. + */ + int ret = XEN_TABLE_MAP_NOMEM; + unsigned int level = HYP_PT_ROOT_LEVEL; + paddr_t pa = 0; + pte_t *table; + + DECLARE_OFFSETS(offsets, va); + + table = map_table(root); + + /* + * Find `pa` of an entry which corresponds to `va` by iterating for each + * page level and checking if the entry points to a next page table or + * to a page. + * + * Two cases are possible: + * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find; + * (Despite of the name) XEN_TABLE_SUPER_PAGE covers 4k mapping too. + * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually + * mapped. + */ + while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) ) + { + /* + * This case shouldn't really occur as it will mean that for table + * level 0 a pointer to next page table has been written, but at + * level 0 it could be only a pointer to 4k page. + */ + ASSERT(level <= HYP_PT_ROOT_LEVEL); + + ret = pt_next_level(false, &table, offsets[level]); + level--; + } + + if ( ret == XEN_TABLE_MAP_NONE ) + dprintk(XENLOG_WARNING, "Is va(%#lx) really mapped?\n", va); + else if ( ret == XEN_TABLE_SUPER_PAGE ) + pa = pte_to_paddr(*(table + offsets[level + 1])); + + /* + * There is no need for unmap_table() after each pt_next_level() call as + * pt_next_level() will do unmap_table() for the previous table before + * returning next level table. + */ + unmap_table(table); + + return pa; +} + /* Return the level where mapping should be done */ static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr, unsigned int flags)
RISC-V doesn't have hardware feature to ask MMU to translate virtual address to physical address ( like Arm has, for example ), so software page table walking in implemented. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- xen/arch/riscv/include/asm/mm.h | 2 ++ xen/arch/riscv/pt.c | 56 +++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+)