Message ID | 20190528161026.13193-1-steve.capper@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | 52-bit kernel + user VAs | expand |
Hello Steve, On 05/28/2019 09:40 PM, Steve Capper wrote: > This patch series adds support for 52-bit kernel VAs using some of the > machinery already introduced by the 52-bit userspace VA code in 5.0. > > As 52-bit virtual address support is an optional hardware feature, > software support for 52-bit kernel VAs needs to be deduced at early boot > time. If HW support is not available, the kernel falls back to 48-bit. Just to summarize. If kernel is configured for 52 bits then it just setups up infrastructure for 52 bits kernel VA space. When at the boot a. Detects HW feature -> Use 52 bits VA on 52 bits infra b. Does not detect feature -> Use 48 bits VA on 52 bits infra (adjusted) > A significant proportion of this series focuses on "de-constifying" > VA_BITS related constants. I assume this is required for the situation (b) because of adjustments at boot time which will be required after detecting that 52 bit is not supported in the HW. > > In order to allow for a KASAN shadow that changes size at boot time, one Ditto as above ? > must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the > start address. Also, it is highly desirable to maintain the same Is there any particular reason why KASAN_SHADOW_START cannot be fixed and KASAN_SHADOW_END "grow" instead ? Is it because we are trying to make start address (which will be closer to VA_START) for all required sections variable ? > function addresses in the kernel .text between VA sizes. Both of these Kernel .text range should remain same as the kernel is already loaded in memory at boot and executing while also trying to fix the effective VA_BITS after detecting (or not) the 52 bits HW feature. > requirements necessitate us to flip the kernel address space halves s.t. > the direct linear map occupies the lower addresses. Still trying to understand all the reasons for this VA space flip here. The current kernel 48 bit VA range is split into two halves 1. Higher half - [UL(~0) ...... PAGE_OFFSET] for linear mapping 2. Lower half - [PAGE_OFFSET ... VA_START] for everything else The split in the middle is based on VA_BITS. When that becomes variable then boot time computed lower half sections like kernel text, fixed mapping etc become problematic as they are already running or being used and cannot be relocated. This is caused by the fact the 48 bits to 52 bits adjustment can only happen on the VA_START end as the other end UL(~0) is fixed. Hence move those non-relocatable/fixed sections to higher half so they dont get impacted from the 48-52 bits adjustments. Linear mapping (so would PAGE_OFFSET) on the other hand will have to grow/shrink (or not) during 48-52 bits adjustment. Hence it can be aligned with the VA_START end instead. Is that correct or I am missing something. > > In V2 of this series (apologies for the long delay from V1), the major > change is that PAGE_OFFSET is retained as a constant. This allows for > much faster virt_to_page computations. This is achieved by expanding the virt_to_page(), __va(), __pa() needs to be based on just linear offset calculations else there will be performance impact. > size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit > direct linear map. This has been found to work well in my testing, but I I assume it means that we create linear mapping for the entire 52 bit VA space but back it up with vmmmap struct page mapping only for the actual bits (48 or 52) in use.
On Fri, Jun 07, 2019 at 07:23:59PM +0530, Anshuman Khandual wrote: > Hello Steve, Hi Anshuman, > > On 05/28/2019 09:40 PM, Steve Capper wrote: > > This patch series adds support for 52-bit kernel VAs using some of the > > machinery already introduced by the 52-bit userspace VA code in 5.0. > > > > As 52-bit virtual address support is an optional hardware feature, > > software support for 52-bit kernel VAs needs to be deduced at early boot > > time. If HW support is not available, the kernel falls back to 48-bit. > > Just to summarize. > > If kernel is configured for 52 bits then it just setups up infrastructure > for 52 bits kernel VA space. > > When at the boot > > a. Detects HW feature -> Use 52 bits VA on 52 bits infra > b. Does not detect feature -> Use 48 bits VA on 52 bits infra (adjusted) > > > A significant proportion of this series focuses on "de-constifying" > > VA_BITS related constants. > > I assume this is required for the situation (b) because of adjustments > at boot time which will be required after detecting that 52 bit is not > supported in the HW. > > > > > In order to allow for a KASAN shadow that changes size at boot time, one > > Ditto as above ? > > > must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the > > start address. Also, it is highly desirable to maintain the same > > Is there any particular reason why KASAN_SHADOW_START cannot be fixed and > KASAN_SHADOW_END "grow" instead ? Is it because we are trying to make start > address (which will be closer to VA_START) for all required sections variable ? > KASAN has a mode of operation whereby the shadow offset computation: shadowPtr = (ptr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET is inlined into the executable with a constant scale and offset. As we are dealing with TTBR1 style addresses (i.e. prefixed by 0xfff...) this effectively means that the KASAN shadow end address becomes fixed (the highest ptr is always ~0UL which is invariant to VA space size changes). The only way that I am aware of fixing the start address is to somehow patch the KASAN_SHADOW_OFFSET, or prohibit the KASAN inline mode (which would then hurt performance). > > function addresses in the kernel .text between VA sizes. Both of these > > Kernel .text range should remain same as the kernel is already loaded in > memory at boot and executing while also trying to fix the effective VA_BITS > after detecting (or not) the 52 bits HW feature. > > > requirements necessitate us to flip the kernel address space halves s.t. > > the direct linear map occupies the lower addresses. > > Still trying to understand all the reasons for this VA space flip here. > > The current kernel 48 bit VA range is split into two halves > > 1. Higher half - [UL(~0) ...... PAGE_OFFSET] for linear mapping > 2. Lower half - [PAGE_OFFSET ... VA_START] for everything else > > The split in the middle is based on VA_BITS. When that becomes variable then > boot time computed lower half sections like kernel text, fixed mapping etc > become problematic as they are already running or being used and cannot be > relocated. This is caused by the fact the 48 bits to 52 bits adjustment can > only happen on the VA_START end as the other end UL(~0) is fixed. Hence move > those non-relocatable/fixed sections to higher half so they dont get impacted > from the 48-52 bits adjustments. Linear mapping (so would PAGE_OFFSET) on the > other hand will have to grow/shrink (or not) during 48-52 bits adjustment. > Hence it can be aligned with the VA_START end instead. Is that correct or I > am missing something. > Agreed with the .text addresses. For PAGE_OFFSET we don't strictly need it to point to the start of the linear map if we grow the vmemmap and adjust the (already variable) vmemmap offset (along with physvirt_offset). Also we need to flip the VA space to fit KASAN in as it will grow from the start. > > > > In V2 of this series (apologies for the long delay from V1), the major > > change is that PAGE_OFFSET is retained as a constant. This allows for > > much faster virt_to_page computations. This is achieved by expanding the > > virt_to_page(), __va(), __pa() needs to be based on just linear offset > calculations else there will be performance impact. > IIUC I've maintained equal perf for these, but if I've missed something please shout :-). > > size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit > > direct linear map. This has been found to work well in my testing, but I > > I assume it means that we create linear mapping for the entire 52 bit VA > space but back it up with vmmmap struct page mapping only for the actual > bits (48 or 52) in use. > That is my understanding too. A big thank you for looking at this! Cheers,
Hi Steve, Thanks for the v2. I still did not get much time to go through this in deep and have a go with the same on LVA supporting prototype platforms or old CPUs (which don't support ARMv8.2 LVA/LPA extensions) I have. May be I will give this a quick check on the same in a day or two. On 05/28/2019 09:40 PM, Steve Capper wrote: > This patch series adds support for 52-bit kernel VAs using some of the > machinery already introduced by the 52-bit userspace VA code in 5.0. > > As 52-bit virtual address support is an optional hardware feature, > software support for 52-bit kernel VAs needs to be deduced at early boot > time. If HW support is not available, the kernel falls back to 48-bit. > > A significant proportion of this series focuses on "de-constifying" > VA_BITS related constants. > > In order to allow for a KASAN shadow that changes size at boot time, one > must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the > start address. Also, it is highly desirable to maintain the same > function addresses in the kernel .text between VA sizes. Both of these > requirements necessitate us to flip the kernel address space halves s.t. > the direct linear map occupies the lower addresses. > > In V2 of this series (apologies for the long delay from V1), the major > change is that PAGE_OFFSET is retained as a constant. This allows for > much faster virt_to_page computations. This is achieved by expanding the > size of the VMEMMAP region to accommodate a disjoint 52-bit/48-bit > direct linear map. This has been found to work well in my testing, but I > would appreciate any feedback on this if it needs changing. To aid with > git bisect, this logic is broken down into a few smaller patches. > > As far as I'm aware, there are two outstanding issues with this series > that need to be resolved: > 1) Is the code patching for ttbr1_offset safe? I need to analyse this > a little more, > 2) How can this memory map be advertised to kdump tools/documentation? > I was planning on getting the kernel VA structure agreed on, then I > would add the relevant exports/documentation. Indeed, in the absence of corresponding changes to the Documentation section, it is hard to visualize the changes being made in the memory map. Also I would suggest that we note in the patchset itself (may be the git log) that kdump tools (or even crash for that matter) will be broken with this patchset - to prevent kernel bugs being reported. BTW, James and I are already discussing more coherent methods (see [0]) to manage this exporting of information to user-land (so to that we can save ourselves from requiring to export new variables in the vmcoreinfo in case we have similar changes to the virtual/physical address spaces in future). I will work on and send a patchset addressing the same shortly. [0]. http://lists.infradead.org/pipermail/kexec/2019-June/023105.html Thanks, Bhupesh
On Mon, Jun 10, 2019 at 04:10:50PM +0530, Bhupesh Sharma wrote: > On 05/28/2019 09:40 PM, Steve Capper wrote: > > 2) How can this memory map be advertised to kdump tools/documentation? > > I was planning on getting the kernel VA structure agreed on, then I > > would add the relevant exports/documentation. > > Indeed, in the absence of corresponding changes to the Documentation > section, it is hard to visualize the changes being made in the memory > map. We used to have some better documentation in the arm64 memory.txt until commit 08375198b010 ("arm64: Determine the vmalloc/vmemmap space at build time based on VA_BITS") which removed it in favour of what the kernel was printing. Subsequently, the kernel VA layout printing was also removed. It would be nice to bring back the memory.txt, even if it is for a single configuration as per defconfig.
Hi Catalin, On Mon, Jun 10, 2019 at 4:24 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Jun 10, 2019 at 04:10:50PM +0530, Bhupesh Sharma wrote: > > On 05/28/2019 09:40 PM, Steve Capper wrote: > > > 2) How can this memory map be advertised to kdump tools/documentation? > > > I was planning on getting the kernel VA structure agreed on, then I > > > would add the relevant exports/documentation. > > > > Indeed, in the absence of corresponding changes to the Documentation > > section, it is hard to visualize the changes being made in the memory > > map. > > We used to have some better documentation in the arm64 memory.txt until > commit 08375198b010 ("arm64: Determine the vmalloc/vmemmap space at > build time based on VA_BITS") which removed it in favour of what the > kernel was printing. Subsequently, the kernel VA layout printing was > also removed. It would be nice to bring back the memory.txt, even if it > is for a single configuration as per defconfig. Indeed, that's what I suggested during the v1 review as well. See <https://www.spinics.net/lists/arm-kernel/msg718096.html> for details. Also, we may want to have a doc dedicated to 52-bit address space details on arm64, similar to what we have currently for x86 (see [1a] and [1b]) [1a]. https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/5level-paging.txt [1b].https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt Thanks, Bhupesh