Message ID | 20210225080453.1314-3-alex@ghiti.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Move kernel mapping outside the linear mapping | expand |
| | | |> + ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan > + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap > + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io > + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap > + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | vmalloc/ioremap space > + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | direct mapping of all physical memory ^ So you could never ever have more than 126 GB, correct? I assume that's nothing new.
Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit : > | | | |> + > ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan >> + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap >> + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io >> + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap >> + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | >> vmalloc/ioremap space >> + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | >> direct mapping of all physical memory > > ^ So you could never ever have more than 126 GB, correct? > > I assume that's nothing new. > Before this patch, the limit was 128GB, so in my sense, there is nothing new. If ever we want to increase that limit, we'll just have to lower PAGE_OFFSET, there is still some unused virtual addresses after kasan for example. Thanks, Alex
On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit : > > | | | |> + > > ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan > >> + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap > >> + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io > >> + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap > >> + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | > >> vmalloc/ioremap space > >> + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | > >> direct mapping of all physical memory > > > > ^ So you could never ever have more than 126 GB, correct? > > > > I assume that's nothing new. > > > > Before this patch, the limit was 128GB, so in my sense, there is nothing > new. If ever we want to increase that limit, we'll just have to lower > PAGE_OFFSET, there is still some unused virtual addresses after kasan > for example. Linus Walleij is looking into changing the arm32 code to have the kernel direct map inside of the vmalloc area, which would be another place that you could use here. It would be nice to not have too many different ways of doing this, but I'm not sure how hard it would be to rework your code, or if there are any downsides of doing this. Arnd
Hi Arnd, Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit : > On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit : >>> | | | |> + >>> ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan >>>> + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap >>>> + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io >>>> + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap >>>> + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | >>>> vmalloc/ioremap space >>>> + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | >>>> direct mapping of all physical memory >>> >>> ^ So you could never ever have more than 126 GB, correct? >>> >>> I assume that's nothing new. >>> >> >> Before this patch, the limit was 128GB, so in my sense, there is nothing >> new. If ever we want to increase that limit, we'll just have to lower >> PAGE_OFFSET, there is still some unused virtual addresses after kasan >> for example. > > Linus Walleij is looking into changing the arm32 code to have the kernel > direct map inside of the vmalloc area, which would be another place > that you could use here. It would be nice to not have too many different > ways of doing this, but I'm not sure how hard it would be to rework your > code, or if there are any downsides of doing this. This was what my previous version did: https://lkml.org/lkml/2020/6/7/28. This approach was not welcomed very well and it fixed only the problem of the implementation of relocatable kernel. The second issue I'm trying to resolve here is to support both 3 and 4 level page tables using the same kernel without being relocatable (which would introduce performance penalty). I can't do it when the kernel mapping is in the vmalloc region since vmalloc region relies on PAGE_OFFSET which is different on both 3 and 4 level page table and that would then require the kernel to be relocatable. Alex > > Arnd > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
On Wed, Mar 10, 2021 at 8:12 PM Alex Ghiti <alex@ghiti.fr> wrote: > Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit : > > On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit : > >>> | | | |> + > >>> ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan > >>>> + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap > >>>> + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io > >>>> + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap > >>>> + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | > >>>> vmalloc/ioremap space > >>>> + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | > >>>> direct mapping of all physical memory > >>> > >>> ^ So you could never ever have more than 126 GB, correct? > >>> > >>> I assume that's nothing new. > >>> > >> > >> Before this patch, the limit was 128GB, so in my sense, there is nothing > >> new. If ever we want to increase that limit, we'll just have to lower > >> PAGE_OFFSET, there is still some unused virtual addresses after kasan > >> for example. > > > > Linus Walleij is looking into changing the arm32 code to have the kernel > > direct map inside of the vmalloc area, which would be another place > > that you could use here. It would be nice to not have too many different > > ways of doing this, but I'm not sure how hard it would be to rework your > > code, or if there are any downsides of doing this. > > This was what my previous version did: https://lkml.org/lkml/2020/6/7/28. > > This approach was not welcomed very well and it fixed only the problem > of the implementation of relocatable kernel. The second issue I'm trying > to resolve here is to support both 3 and 4 level page tables using the > same kernel without being relocatable (which would introduce performance > penalty). I can't do it when the kernel mapping is in the vmalloc region > since vmalloc region relies on PAGE_OFFSET which is different on both 3 > and 4 level page table and that would then require the kernel to be > relocatable. Ok, I see. I suppose it might work if you moved the direct-map to the lowest address and the vmalloc area (incorporating the kernel mapping, modules, pio, and fixmap at fixed addresses) to the very top of the address space, but you probably already considered and rejected that for other reasons. Arnd
Hi Arnd, Le 3/11/21 à 3:42 AM, Arnd Bergmann a écrit : > On Wed, Mar 10, 2021 at 8:12 PM Alex Ghiti <alex@ghiti.fr> wrote: >> Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit : >>> On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit : >>>>> | | | |> + >>>>> ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan >>>>>> + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap >>>>>> + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io >>>>>> + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap >>>>>> + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | >>>>>> vmalloc/ioremap space >>>>>> + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | >>>>>> direct mapping of all physical memory >>>>> >>>>> ^ So you could never ever have more than 126 GB, correct? >>>>> >>>>> I assume that's nothing new. >>>>> >>>> >>>> Before this patch, the limit was 128GB, so in my sense, there is nothing >>>> new. If ever we want to increase that limit, we'll just have to lower >>>> PAGE_OFFSET, there is still some unused virtual addresses after kasan >>>> for example. >>> >>> Linus Walleij is looking into changing the arm32 code to have the kernel >>> direct map inside of the vmalloc area, which would be another place >>> that you could use here. It would be nice to not have too many different >>> ways of doing this, but I'm not sure how hard it would be to rework your >>> code, or if there are any downsides of doing this. >> >> This was what my previous version did: https://lkml.org/lkml/2020/6/7/28. >> >> This approach was not welcomed very well and it fixed only the problem >> of the implementation of relocatable kernel. The second issue I'm trying >> to resolve here is to support both 3 and 4 level page tables using the >> same kernel without being relocatable (which would introduce performance >> penalty). I can't do it when the kernel mapping is in the vmalloc region >> since vmalloc region relies on PAGE_OFFSET which is different on both 3 >> and 4 level page table and that would then require the kernel to be >> relocatable. > > Ok, I see. > > I suppose it might work if you moved the direct-map to the lowest > address and the vmalloc area (incorporating the kernel mapping, > modules, pio, and fixmap at fixed addresses) to the very top of the > address space, but you probably already considered and rejected > that for other reasons. > Yes I considered it...when you re-proposed it :) I'm not opposed to your solution in the vmalloc region but I can't find any advantage over the current solution, are there ? That would harmonize with Linus's work, but then we'd be quite different from x86 address space. And by the way, thanks for having suggested the current solution in a previous conversation :) Thanks again, Alex > Arnd >
On Sat, Mar 13, 2021 at 9:23 AM Alex Ghiti <alex@ghiti.fr> wrote: > > Yes I considered it...when you re-proposed it :) I'm not opposed to your > solution in the vmalloc region but I can't find any advantage over the > current solution, are there ? That would harmonize with Linus's work, > but then we'd be quite different from x86 address space. > > And by the way, thanks for having suggested the current solution in a > previous conversation :) Ah, I really need to keep track better of what I already commented on... Arnd
diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst index 6e6e39482502..ea915c196048 100644 --- a/Documentation/riscv/index.rst +++ b/Documentation/riscv/index.rst @@ -6,6 +6,7 @@ RISC-V architecture :maxdepth: 1 boot-image-header + vm-layout pmu patch-acceptance diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst new file mode 100644 index 000000000000..e8e569e2686a --- /dev/null +++ b/Documentation/riscv/vm-layout.rst @@ -0,0 +1,61 @@ +===================================== +Virtual Memory Layout on RISC-V Linux +===================================== + +:Author: Alexandre Ghiti <alex@ghiti.fr> +:Date: 12 February 2021 + +This document describes the virtual memory layout used by the RISC-V Linux +Kernel. + +RISC-V Linux Kernel 32bit +========================= + +RISC-V Linux Kernel SV32 +------------------------ + +TODO + +RISC-V Linux Kernel 64bit +========================= + +The RISC-V privileged architecture document states that the 64bit addresses +"must have bits 63–48 all equal to bit 47, or else a page-fault exception will +occur.": that splits the virtual address space into 2 halves separated by a very +big hole, the lower half is where the userspace resides, the upper half is where +the RISC-V Linux Kernel resides. + +RISC-V Linux Kernel SV39 +------------------------ + +:: + + ======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description + ======================================================================================================================== + | | | | + 0000000000000000 | 0 | 0000003fffffffff | 256 GB | user-space virtual memory, different per mm + __________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0000004000000000 | +256 GB | ffffffbfffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -256 GB + | | | | starting offset of kernel mappings. + __________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: + ____________________________________________________________|___________________________________________________________ + | | | | + ffffffc000000000 | -256 GB | ffffffc7ffffffff | 32 GB | kasan + ffffffcefee00000 | -196 GB | ffffffcefeffffff | 2 MB | fixmap + ffffffceff000000 | -196 GB | ffffffceffffffff | 16 MB | PCI io + ffffffcf00000000 | -196 GB | ffffffcfffffffff | 4 GB | vmemmap + ffffffd000000000 | -192 GB | ffffffdfffffffff | 64 GB | vmalloc/ioremap space + ffffffe000000000 | -128 GB | ffffffff7fffffff | 126 GB | direct mapping of all physical memory + __________________|____________|__________________|_________|____________________________________________________________ + | + | + ____________________________________________________________|____________________________________________________________ + | | | | + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules + ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel, BPF + __________________|____________|__________________|_________|____________________________________________________________
This new document presents the RISC-V virtual memory layout and is based one the x86 one: it describes the different limits of the different regions of the virtual address space. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> --- Documentation/riscv/index.rst | 1 + Documentation/riscv/vm-layout.rst | 61 +++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+) create mode 100644 Documentation/riscv/vm-layout.rst