Message ID | 1540593788-28181-1-git-send-email-bhsharma@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86_64, vmcoreinfo: Append 'page_offset_base' to vmcoreinfo | expand |
Hi Bhupesh, Sorry for top posting. Because I don't know which line at below I should add comment into. So could you plese tell what problem you have met in user space tools? Which user space tool is broken so that we need export page_offset_base to vmcoreinfo? Sorry I didn't get what problem this patch is trying to fix from the patch log. About this, I have replied to you in lkml.kernel.org/r/20181025063446.GD2120@MiWiFi-R3L-srv You might miss it. About this exporting, I ever posted patch to upstream and we have had discussion, please check https://lore.kernel.org/patchwork/patch/723472/ In makedumpfile and crash, we have had a clear method to analyze and deduce it from kcore or vmcore. Thanks Baoquan On 10/27/18 at 04:13am, Bhupesh Sharma wrote: > Since commit 23c85094fe1895caefdd > ["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore' > contains a new PT_NOTE which carries the VMCOREINFO information. > > If the same is available, one can use it in user-land to > retrieve machine specific symbols or strings being appended to the > vmcoreinfo even for live-debugging of the primary kernel as a > standard interface exposed by kernel for sharing machine specific > details with the user-land. > > In the past I had a discussion with James, where he suggested this > approach (please see [0]) and I really liked the idea. Since then I > have been working on unifying the implementations of > (atleast) the commonly used user-space utilities that provide > live-debugging capabilities (tools like 'makedumpfile' and > 'crash-utility', see [1] for details of these tools). > > For the same, when live debugging on x86_64 machines, user-space > tools currently rely on different mechanisms to determine > the 'page_offset_base' value (i.e. start of direct mapping of all > physical memory). One of the approach used by 'makedumpfile' > user-space tool for e.g. is to calculate the same from the last > PT_LOAD available in '/proc/kcore', which can be flaky as and when > new sections (for e.g. KCORE_REMAP which was added > to recent kernels) are added to kcore. > > For other architectures like arm64, I have already proposed using > the vmcoreinfo note (in '/proc/kcore') in the user-space utilities to > determine machine specific details like VA_BITS, PAGE_OFFSET, > kasrl_offset() (see [2] for details), for which different user-space > tools earlier used different (and at times flaky) approaches like: > > - Reading kernel CONFIGs from user-space and determining CONFIG values > like VA_BITS from there. > - Reading symbols from '/proc/kallsyms' and determining their values > via '/dev/mem' interface. > - Reading symbols from 'vmlinux' and determing their values from > reading memory. > > This patch allows appending 'page_offset_base' for x86_64 platforms > to vmcoreinfo, so that user-space tools can use the same as a standard > interface to determine the start of direct mapping of all physical > memory. > > Testing: > ------- > - I tested this patch (rebased on 'linux-next') on a x86_64 machine > using the modified 'makedumpfile' user-space code (see [3] for my > github tree which contains the same) for determining how many pages > are dumpable when different dump_level is specified (which is > one use-case of live-debugging via 'makedumpfile'). > - I tested both the KASLR and non-KASLR boot cases with this patch. > - Here is one sample log (for KASLR boot case) on my x86_64 machine: > > < snip..> > The kernel doesn't support mmap(),read() will be used instead. > > TYPE PAGES EXCLUDABLE DESCRIPTION > ---------------------------------------------------------------------- > ZERO 21299 yes Pages filled > with zero > NON_PRI_CACHE 91785 yes Cache > pages without private flag > PRI_CACHE 1 yes Cache pages with > private flag > USER 14057 yes User process > pages > FREE 740346 yes Free pages > KERN_DATA 58152 no Dumpable kernel > data > > page size: 4096 > Total pages on system: 925640 > Total size on system: 3791421440 Byte > > I understand that there might be some reservations about exporting > such machine-specific details in the vmcoreinfo, but to unify > the implementations across user-land and archs, perhaps this would be > good starting point to start a discussion. > > [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html > [1]. MAN pages -> MAKEDUMPFILE(8) and CRASH(8) > [2]. https://www.spinics.net/lists/kexec/msg21608.html > http://lists.infradead.org/pipermail/kexec/2018-October/021725.html > [3]. https://github.com/bhupesh-sharma/makedumpfile/tree/add-page-offset-base-to-vmcore-v1 > > Cc: Boris Petkov <bp@alien8.de> > Cc: Baoquan He <bhe@redhat.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> > Cc: Dave Anderson <anderson@redhat.com> > Cc: James Morse <james.morse@arm.com> > Cc: Omar Sandoval <osandov@fb.com> > Cc: x86@kernel.org > Cc: kexec@lists.infradead.org > Cc: linux-arm-kernel@lists.infradead.org > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > --- > arch/x86/kernel/machine_kexec_64.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > index 4c8acdfdc5a7..834ccefef867 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -356,6 +356,7 @@ void arch_crash_save_vmcoreinfo(void) > VMCOREINFO_SYMBOL(init_top_pgt); > vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > pgtable_l5_enabled()); > + VMCOREINFO_NUMBER(page_offset_base); > > #ifdef CONFIG_NUMA > VMCOREINFO_SYMBOL(node_data); > -- > 2.7.4 >
Hi Baoquan, Thanks a lot for your review. Please see my comments in-line: On Sat, Oct 27, 2018 at 3:32 PM Baoquan He <bhe@redhat.com> wrote: > > Hi Bhupesh, > > Sorry for top posting. Because I don't know which line at below I should > add comment into. > > So could you plese tell what problem you have met in user space tools? > Which user space tool is broken so that we need export 'page_offset_base' > to vmcoreinfo? I am sorry, I understand that the commit log is a bit long and probably this part is not easy to infer. Currently, I see that the 'makedumpfile' utility is broken with newer kernels (I tested on 4.19-rc8+) as we KCORE_REMAP was added to recent kernels thus leading to an additional section in kcore. [see <http://lists.infradead.org/pipermail/kexec/2018-October/021769.html> for details]. The details of the makedumpfile utility can be seen via the man page [MAKEDUMPFILE(8)], but in short it tries to make a small DUMPFILE by compressing dump data or by excluding unnecessary pages for analysis, or both. However the bigger problem is how we export machine specific details from kernel-space to user-land in a standardized way. As I mentioned in brief in the git log, I was seeing issues when I upgrade kernels or try to bring up user-space utilities on newer hardware, as currently we use different (and often flaky approaches) to calculate machine specific details in user-space code as there used to be lack of a clear ABI between the kernel and user-space on how machine specific details would be shared. Later on, kernel commit 23c85094fe1895caefdd came, which adds vmcoreinfo to 'kcore', as an arch agnostic approach to unify the differences existing in exporting kernel space information to the user-space code and James suggested that I use the same for user-space purposes to fix the issues I was observing. > Sorry I didn't get what problem this patch is trying to fix from the > patch log. So, here since the 'page_offset_base' variable (which holds the start of direct mapping of all physical memory) is not exported by the x86_64 kernel to the user-space via a standard interface, we resort to calculating the same via reading PT_LOADs in user-space (as an example from the makedumpfile implementation ). Now this implementation is usually different across user-space utilities. Also, if the PT_LOAD ordering changes (as we saw with the newer kernels), this approach will need fixing to calculate the addresses. In addition, we normally need 'page_offset_base' value in user-space (and retrieve it via vmlinux file in another user case from the same makedumpfile code) for calculating the start of direct mapping of all physical memory specifically for KASLR boot cases. Instead, if we can export 'page_offset_base' via vmcoreinfo, we can easily use the same for live-debugging a running kernel via user-space utilities, which can benefit by reading this value from the vmcoreinfo note inside kcore directly without relying on other methods. The x86_64 kernel code ('arch/x86/kernel/head64.c'), already sets the same as: unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4; and also uses the same to indicate the base of KASLR regions on x86_64: static __initdata struct kaslr_memory_region { unsigned long *base; unsigned long size_tb; } kaslr_regions[] = { { &page_offset_base, 0 }, so it can be used for both the above purposes across user-space utilities. Hope this explains the intention behind this patch. Thanks, Bhupesh > About this, I have replied to you in > lkml.kernel.org/r/20181025063446.GD2120@MiWiFi-R3L-srv > You might miss it. > > About this exporting, I ever posted patch to upstream and we have had > discussion, please check > https://lore.kernel.org/patchwork/patch/723472/ > > In makedumpfile and crash, we have had a clear method to analyze and > deduce it from kcore or vmcore. > > Thanks > Baoquan > > On 10/27/18 at 04:13am, Bhupesh Sharma wrote: > > Since commit 23c85094fe1895caefdd > > ["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore' > > contains a new PT_NOTE which carries the VMCOREINFO information. > > > > If the same is available, one can use it in user-land to > > retrieve machine specific symbols or strings being appended to the > > vmcoreinfo even for live-debugging of the primary kernel as a > > standard interface exposed by kernel for sharing machine specific > > details with the user-land. > > > > In the past I had a discussion with James, where he suggested this > > approach (please see [0]) and I really liked the idea. Since then I > > have been working on unifying the implementations of > > (atleast) the commonly used user-space utilities that provide > > live-debugging capabilities (tools like 'makedumpfile' and > > 'crash-utility', see [1] for details of these tools). > > > > For the same, when live debugging on x86_64 machines, user-space > > tools currently rely on different mechanisms to determine > > the 'page_offset_base' value (i.e. start of direct mapping of all > > physical memory). One of the approach used by 'makedumpfile' > > user-space tool for e.g. is to calculate the same from the last > > PT_LOAD available in '/proc/kcore', which can be flaky as and when > > new sections (for e.g. KCORE_REMAP which was added > > to recent kernels) are added to kcore. > > > > For other architectures like arm64, I have already proposed using > > the vmcoreinfo note (in '/proc/kcore') in the user-space utilities to > > determine machine specific details like VA_BITS, PAGE_OFFSET, > > kasrl_offset() (see [2] for details), for which different user-space > > tools earlier used different (and at times flaky) approaches like: > > > > - Reading kernel CONFIGs from user-space and determining CONFIG values > > like VA_BITS from there. > > - Reading symbols from '/proc/kallsyms' and determining their values > > via '/dev/mem' interface. > > - Reading symbols from 'vmlinux' and determing their values from > > reading memory. > > > > This patch allows appending 'page_offset_base' for x86_64 platforms > > to vmcoreinfo, so that user-space tools can use the same as a standard > > interface to determine the start of direct mapping of all physical > > memory. > > > > Testing: > > ------- > > - I tested this patch (rebased on 'linux-next') on a x86_64 machine > > using the modified 'makedumpfile' user-space code (see [3] for my > > github tree which contains the same) for determining how many pages > > are dumpable when different dump_level is specified (which is > > one use-case of live-debugging via 'makedumpfile'). > > - I tested both the KASLR and non-KASLR boot cases with this patch. > > - Here is one sample log (for KASLR boot case) on my x86_64 machine: > > > > < snip..> > > The kernel doesn't support mmap(),read() will be used instead. > > > > TYPE PAGES EXCLUDABLE DESCRIPTION > > ---------------------------------------------------------------------- > > ZERO 21299 yes Pages filled > > with zero > > NON_PRI_CACHE 91785 yes Cache > > pages without private flag > > PRI_CACHE 1 yes Cache pages with > > private flag > > USER 14057 yes User process > > pages > > FREE 740346 yes Free pages > > KERN_DATA 58152 no Dumpable kernel > > data > > > > page size: 4096 > > Total pages on system: 925640 > > Total size on system: 3791421440 Byte > > > > I understand that there might be some reservations about exporting > > such machine-specific details in the vmcoreinfo, but to unify > > the implementations across user-land and archs, perhaps this would be > > good starting point to start a discussion. > > > > [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html > > [1]. MAN pages -> MAKEDUMPFILE(8) and CRASH(8) > > [2]. https://www.spinics.net/lists/kexec/msg21608.html > > http://lists.infradead.org/pipermail/kexec/2018-October/021725.html > > [3]. https://github.com/bhupesh-sharma/makedumpfile/tree/add-page-offset-base-to-vmcore-v1 > > > > Cc: Boris Petkov <bp@alien8.de> > > Cc: Baoquan He <bhe@redhat.com> > > Cc: Ingo Molnar <mingo@kernel.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> > > Cc: Dave Anderson <anderson@redhat.com> > > Cc: James Morse <james.morse@arm.com> > > Cc: Omar Sandoval <osandov@fb.com> > > Cc: x86@kernel.org > > Cc: kexec@lists.infradead.org > > Cc: linux-arm-kernel@lists.infradead.org > > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > > --- > > arch/x86/kernel/machine_kexec_64.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > > index 4c8acdfdc5a7..834ccefef867 100644 > > --- a/arch/x86/kernel/machine_kexec_64.c > > +++ b/arch/x86/kernel/machine_kexec_64.c > > @@ -356,6 +356,7 @@ void arch_crash_save_vmcoreinfo(void) > > VMCOREINFO_SYMBOL(init_top_pgt); > > vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > > pgtable_l5_enabled()); > > + VMCOREINFO_NUMBER(page_offset_base); > > > > #ifdef CONFIG_NUMA > > VMCOREINFO_SYMBOL(node_data); > > -- > > 2.7.4 > >
On 10/29/18 at 04:07pm, Bhupesh Sharma wrote: > I am sorry, I understand that the commit log is a bit long and Yes, it's too long. Please summarize well so that it can save reviewers' time. > probably this part > is not easy to infer. Currently, I see that the 'makedumpfile' utility > is broken with newer kernels > (I tested on 4.19-rc8+) as we KCORE_REMAP was added to recent kernels > thus leading to an additional section in kcore. > [see <http://lists.infradead.org/pipermail/kexec/2018-October/021769.html> > for details]. Why it's broken? Have you investigated and figured out why it's broken? If fix, what patch will it look like? Does the patch prove it's not worth using the current way? Have you thought about this in advance? Or still like before, you said on arm64 you found different boards have different behaviour, then makedumpfile maintainer Kazu said he investigated and found it may be caused by KALSR. This time, for this KCORE_REMAP adding, can you help to investigate further and give an answer to the issue you found and raised? > > The details of the makedumpfile utility can be seen via the man page > [MAKEDUMPFILE(8)], > but in short it tries to make a small DUMPFILE by compressing dump > data or by excluding > unnecessary pages for analysis, or both. > > However the bigger problem is how we export machine specific details > from kernel-space > to user-land in a standardized way. As I mentioned in brief in the git > log, I was seeing > issues when I upgrade kernels or try to bring up user-space utilities > on newer hardware, > as currently we use different (and often flaky approaches) to > calculate machine specific details in > user-space code as there used to be lack of a clear ABI between the > kernel and user-space on how > machine specific details would be shared. > > Later on, kernel commit 23c85094fe1895caefdd came, which adds > vmcoreinfo to 'kcore', > as an arch agnostic approach to unify the differences existing in > exporting kernel space information > to the user-space code and James suggested that I use the same for > user-space purposes to fix > the issues I was observing. > > > Sorry I didn't get what problem this patch is trying to fix from the > > patch log. > > So, here since the 'page_offset_base' variable (which holds the start > of direct mapping of all physical > memory) is not exported by the x86_64 kernel to the user-space via a > standard interface, we resort > to calculating the same via reading PT_LOADs in user-space (as an > example from the makedumpfile > implementation ). Now this implementation is usually different across > user-space utilities. > > Also, if the PT_LOAD ordering changes (as we saw with the newer > kernels), this approach will need > fixing to calculate the addresses. In addition, we normally need > 'page_offset_base' value in user-space (and retrieve it via > vmlinux file in another user case from the same makedumpfile code) for > calculating the start of direct mapping of all physical > memory specifically for KASLR boot cases. > > Instead, if we can export 'page_offset_base' via vmcoreinfo, we can > easily use the same > for live-debugging a running kernel via user-space utilities, which > can benefit by reading this value > from the vmcoreinfo note inside kcore directly without relying on other methods. We have got a method, what's wrong with that? Only KCORE_REMAP adding, again? if fix, what is the defect? Where's patch, analysis, only one sentence to say KCORE_REMAP caused that? > > The x86_64 kernel code ('arch/x86/kernel/head64.c'), already sets the same as: > unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4; > > and also uses the same to indicate the base of KASLR regions on x86_64: > static __initdata struct kaslr_memory_region { > unsigned long *base; > unsigned long size_tb; > } kaslr_regions[] = { > { &page_offset_base, 0 }, > > so it can be used for both the above purposes across user-space utilities. > > Hope this explains the intention behind this patch. > > Thanks, > Bhupesh > > > About this, I have replied to you in > > lkml.kernel.org/r/20181025063446.GD2120@MiWiFi-R3L-srv > > You might miss it. > > > > About this exporting, I ever posted patch to upstream and we have had > > discussion, please check > > https://lore.kernel.org/patchwork/patch/723472/ > > > > In makedumpfile and crash, we have had a clear method to analyze and > > deduce it from kcore or vmcore. > > > > Thanks > > Baoquan > > > > On 10/27/18 at 04:13am, Bhupesh Sharma wrote: > > > Since commit 23c85094fe1895caefdd > > > ["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore' > > > contains a new PT_NOTE which carries the VMCOREINFO information. > > > > > > If the same is available, one can use it in user-land to > > > retrieve machine specific symbols or strings being appended to the > > > vmcoreinfo even for live-debugging of the primary kernel as a > > > standard interface exposed by kernel for sharing machine specific > > > details with the user-land. > > > > > > In the past I had a discussion with James, where he suggested this > > > approach (please see [0]) and I really liked the idea. Since then I > > > have been working on unifying the implementations of > > > (atleast) the commonly used user-space utilities that provide > > > live-debugging capabilities (tools like 'makedumpfile' and > > > 'crash-utility', see [1] for details of these tools). > > > > > > For the same, when live debugging on x86_64 machines, user-space > > > tools currently rely on different mechanisms to determine > > > the 'page_offset_base' value (i.e. start of direct mapping of all > > > physical memory). One of the approach used by 'makedumpfile' > > > user-space tool for e.g. is to calculate the same from the last > > > PT_LOAD available in '/proc/kcore', which can be flaky as and when > > > new sections (for e.g. KCORE_REMAP which was added > > > to recent kernels) are added to kcore. > > > > > > For other architectures like arm64, I have already proposed using > > > the vmcoreinfo note (in '/proc/kcore') in the user-space utilities to > > > determine machine specific details like VA_BITS, PAGE_OFFSET, > > > kasrl_offset() (see [2] for details), for which different user-space > > > tools earlier used different (and at times flaky) approaches like: > > > > > > - Reading kernel CONFIGs from user-space and determining CONFIG values > > > like VA_BITS from there. > > > - Reading symbols from '/proc/kallsyms' and determining their values > > > via '/dev/mem' interface. > > > - Reading symbols from 'vmlinux' and determing their values from > > > reading memory. > > > > > > This patch allows appending 'page_offset_base' for x86_64 platforms > > > to vmcoreinfo, so that user-space tools can use the same as a standard > > > interface to determine the start of direct mapping of all physical > > > memory. > > > > > > Testing: > > > ------- > > > - I tested this patch (rebased on 'linux-next') on a x86_64 machine > > > using the modified 'makedumpfile' user-space code (see [3] for my > > > github tree which contains the same) for determining how many pages > > > are dumpable when different dump_level is specified (which is > > > one use-case of live-debugging via 'makedumpfile'). > > > - I tested both the KASLR and non-KASLR boot cases with this patch. > > > - Here is one sample log (for KASLR boot case) on my x86_64 machine: > > > > > > < snip..> > > > The kernel doesn't support mmap(),read() will be used instead. > > > > > > TYPE PAGES EXCLUDABLE DESCRIPTION > > > ---------------------------------------------------------------------- > > > ZERO 21299 yes Pages filled > > > with zero > > > NON_PRI_CACHE 91785 yes Cache > > > pages without private flag > > > PRI_CACHE 1 yes Cache pages with > > > private flag > > > USER 14057 yes User process > > > pages > > > FREE 740346 yes Free pages > > > KERN_DATA 58152 no Dumpable kernel > > > data > > > > > > page size: 4096 > > > Total pages on system: 925640 > > > Total size on system: 3791421440 Byte > > > > > > I understand that there might be some reservations about exporting > > > such machine-specific details in the vmcoreinfo, but to unify > > > the implementations across user-land and archs, perhaps this would be > > > good starting point to start a discussion. > > > > > > [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html > > > [1]. MAN pages -> MAKEDUMPFILE(8) and CRASH(8) > > > [2]. https://www.spinics.net/lists/kexec/msg21608.html > > > http://lists.infradead.org/pipermail/kexec/2018-October/021725.html > > > [3]. https://github.com/bhupesh-sharma/makedumpfile/tree/add-page-offset-base-to-vmcore-v1 > > > > > > Cc: Boris Petkov <bp@alien8.de> > > > Cc: Baoquan He <bhe@redhat.com> > > > Cc: Ingo Molnar <mingo@kernel.org> > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> > > > Cc: Dave Anderson <anderson@redhat.com> > > > Cc: James Morse <james.morse@arm.com> > > > Cc: Omar Sandoval <osandov@fb.com> > > > Cc: x86@kernel.org > > > Cc: kexec@lists.infradead.org > > > Cc: linux-arm-kernel@lists.infradead.org > > > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > > > --- > > > arch/x86/kernel/machine_kexec_64.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > > > index 4c8acdfdc5a7..834ccefef867 100644 > > > --- a/arch/x86/kernel/machine_kexec_64.c > > > +++ b/arch/x86/kernel/machine_kexec_64.c > > > @@ -356,6 +356,7 @@ void arch_crash_save_vmcoreinfo(void) > > > VMCOREINFO_SYMBOL(init_top_pgt); > > > vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > > > pgtable_l5_enabled()); > > > + VMCOREINFO_NUMBER(page_offset_base); > > > > > > #ifdef CONFIG_NUMA > > > VMCOREINFO_SYMBOL(node_data); > > > -- > > > 2.7.4 > > >
Hi Baoquan, On Tue, Oct 30, 2018 at 7:37 AM Baoquan He <bhe@redhat.com> wrote: > > On 10/29/18 at 04:07pm, Bhupesh Sharma wrote: > > I am sorry, I understand that the commit log is a bit long and > > Yes, it's too long. Please summarize well so that it can save reviewers' > time. I have tried to include all the points in the log which might be relevant to folks (there are user-space utility maintainers in Cc as well) who might not be aware of complete background on this (especially the kernel side of the things), while I am discussing the user-space changes with them :) > > probably this part > > is not easy to infer. Currently, I see that the 'makedumpfile' utility > > is broken with newer kernels > > (I tested on 4.19-rc8+) as we KCORE_REMAP was added to recent kernels > > thus leading to an additional section in kcore. > > [see <http://lists.infradead.org/pipermail/kexec/2018-October/021769.html> > > for details]. > > Why it's broken? Have you investigated and figured out why it's broken? > If fix, what patch will it look like? Does the patch prove it's not > worth using the current way? > > Have you thought about this in advance? Or still like before, you said > on arm64 you found different boards have different behaviour, then > makedumpfile maintainer Kazu said he investigated and found it may be > caused by KALSR. This time, for this KCORE_REMAP adding, can you help to > investigate further and give an answer to the issue you found and > raised? Ofcourse, the patchset which added vmcoreinfo into kcore was discussed and it was agreed that this was a better approach to move forward and hence accepted in mainline. Regarding the makedumpfile issue, I have already provided a detailed reply to Kazu (you are Cc'ed on the thread) and also proposed a makedumpfile approach which reads the 'page_offset_base' value from kcore (using the kernel interface provided by this patch), [on which you are Cc'ed as well]: [1]. https://www.spinics.net/lists/kexec/msg21717.html [2]. https://www.spinics.net/lists/kexec/msg21722.html Again I think we are discussing on a wrong tangent here. The idea is not limited to only makedumpfile - it affects other user-space utilities (like crash) which are used for live debugging a running kernel. > > The details of the makedumpfile utility can be seen via the man page > > [MAKEDUMPFILE(8)], > > but in short it tries to make a small DUMPFILE by compressing dump > > data or by excluding > > unnecessary pages for analysis, or both. > > > > However the bigger problem is how we export machine specific details > > from kernel-space > > to user-land in a standardized way. As I mentioned in brief in the git > > log, I was seeing > > issues when I upgrade kernels or try to bring up user-space utilities > > on newer hardware, > > as currently we use different (and often flaky approaches) to > > calculate machine specific details in > > user-space code as there used to be lack of a clear ABI between the > > kernel and user-space on how > > machine specific details would be shared. > > > > Later on, kernel commit 23c85094fe1895caefdd came, which adds > > vmcoreinfo to 'kcore', > > as an arch agnostic approach to unify the differences existing in > > exporting kernel space information > > to the user-space code and James suggested that I use the same for > > user-space purposes to fix > > the issues I was observing. > > > > > Sorry I didn't get what problem this patch is trying to fix from the > > > patch log. > > > > So, here since the 'page_offset_base' variable (which holds the start > > of direct mapping of all physical > > memory) is not exported by the x86_64 kernel to the user-space via a > > standard interface, we resort > > to calculating the same via reading PT_LOADs in user-space (as an > > example from the makedumpfile > > implementation ). Now this implementation is usually different across > > user-space utilities. > > > > Also, if the PT_LOAD ordering changes (as we saw with the newer > > kernels), this approach will need > > fixing to calculate the addresses. In addition, we normally need > > 'page_offset_base' value in user-space (and retrieve it via > > vmlinux file in another user case from the same makedumpfile code) for > > calculating the start of direct mapping of all physical > > memory specifically for KASLR boot cases. > > > > Instead, if we can export 'page_offset_base' via vmcoreinfo, we can > > easily use the same > > for live-debugging a running kernel via user-space utilities, which > > can benefit by reading this value > > from the vmcoreinfo note inside kcore directly without relying on other methods. > > We have got a method, what's wrong with that? Only KCORE_REMAP adding, > again? if fix, what is the defect? Where's patch, analysis, only one > sentence to say KCORE_REMAP caused that? Please see above, this is not limited to one use-case of makedumpfile tool. Also, it extends to other user-space utilities (like crash) as well. See, why would we want to have every user-space utility implement a different mechanism for solving the same problem, right? If a standard interface is available from the kernel side we better use the same across all user-land. Plus, if this interface (vmcoreinfo in kcore) is available from the kernel side for all archs (like it is currently), it solves another set of problem in the user-space tools, i.e. we don't need to keep different arch specific implementations across different user-space tool (see [1] above for e.g., both x86_64 and arm64 makedumpfile code bases use an almost similar approach now). I understand that such changes are always obstructive, but hopefully it saves us the effort to manage differences across user-land in the longer run. So, via this patch I propose that the kernel export 'page_offset_base' variable via vmcoreinfo and user-space uses its uniformly to determine the start of direct mapping of all physical memory. Hope this clarifies the background. Regads, Bhupesh > > > > The x86_64 kernel code ('arch/x86/kernel/head64.c'), already sets the same as: > > unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4; > > > > and also uses the same to indicate the base of KASLR regions on x86_64: > > static __initdata struct kaslr_memory_region { > > unsigned long *base; > > unsigned long size_tb; > > } kaslr_regions[] = { > > { &page_offset_base, 0 }, > > > > so it can be used for both the above purposes across user-space utilities. > > > > Hope this explains the intention behind this patch. > > > > Thanks, > > Bhupesh > > > > > About this, I have replied to you in > > > lkml.kernel.org/r/20181025063446.GD2120@MiWiFi-R3L-srv > > > You might miss it. > > > > > > About this exporting, I ever posted patch to upstream and we have had > > > discussion, please check > > > https://lore.kernel.org/patchwork/patch/723472/ > > > > > > In makedumpfile and crash, we have had a clear method to analyze and > > > deduce it from kcore or vmcore. > > > > > > Thanks > > > Baoquan > > > > > > On 10/27/18 at 04:13am, Bhupesh Sharma wrote: > > > > Since commit 23c85094fe1895caefdd > > > > ["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore' > > > > contains a new PT_NOTE which carries the VMCOREINFO information. > > > > > > > > If the same is available, one can use it in user-land to > > > > retrieve machine specific symbols or strings being appended to the > > > > vmcoreinfo even for live-debugging of the primary kernel as a > > > > standard interface exposed by kernel for sharing machine specific > > > > details with the user-land. > > > > > > > > In the past I had a discussion with James, where he suggested this > > > > approach (please see [0]) and I really liked the idea. Since then I > > > > have been working on unifying the implementations of > > > > (atleast) the commonly used user-space utilities that provide > > > > live-debugging capabilities (tools like 'makedumpfile' and > > > > 'crash-utility', see [1] for details of these tools). > > > > > > > > For the same, when live debugging on x86_64 machines, user-space > > > > tools currently rely on different mechanisms to determine > > > > the 'page_offset_base' value (i.e. start of direct mapping of all > > > > physical memory). One of the approach used by 'makedumpfile' > > > > user-space tool for e.g. is to calculate the same from the last > > > > PT_LOAD available in '/proc/kcore', which can be flaky as and when > > > > new sections (for e.g. KCORE_REMAP which was added > > > > to recent kernels) are added to kcore. > > > > > > > > For other architectures like arm64, I have already proposed using > > > > the vmcoreinfo note (in '/proc/kcore') in the user-space utilities to > > > > determine machine specific details like VA_BITS, PAGE_OFFSET, > > > > kasrl_offset() (see [2] for details), for which different user-space > > > > tools earlier used different (and at times flaky) approaches like: > > > > > > > > - Reading kernel CONFIGs from user-space and determining CONFIG values > > > > like VA_BITS from there. > > > > - Reading symbols from '/proc/kallsyms' and determining their values > > > > via '/dev/mem' interface. > > > > - Reading symbols from 'vmlinux' and determing their values from > > > > reading memory. > > > > > > > > This patch allows appending 'page_offset_base' for x86_64 platforms > > > > to vmcoreinfo, so that user-space tools can use the same as a standard > > > > interface to determine the start of direct mapping of all physical > > > > memory. > > > > > > > > Testing: > > > > ------- > > > > - I tested this patch (rebased on 'linux-next') on a x86_64 machine > > > > using the modified 'makedumpfile' user-space code (see [3] for my > > > > github tree which contains the same) for determining how many pages > > > > are dumpable when different dump_level is specified (which is > > > > one use-case of live-debugging via 'makedumpfile'). > > > > - I tested both the KASLR and non-KASLR boot cases with this patch. > > > > - Here is one sample log (for KASLR boot case) on my x86_64 machine: > > > > > > > > < snip..> > > > > The kernel doesn't support mmap(),read() will be used instead. > > > > > > > > TYPE PAGES EXCLUDABLE DESCRIPTION > > > > ---------------------------------------------------------------------- > > > > ZERO 21299 yes Pages filled > > > > with zero > > > > NON_PRI_CACHE 91785 yes Cache > > > > pages without private flag > > > > PRI_CACHE 1 yes Cache pages with > > > > private flag > > > > USER 14057 yes User process > > > > pages > > > > FREE 740346 yes Free pages > > > > KERN_DATA 58152 no Dumpable kernel > > > > data > > > > > > > > page size: 4096 > > > > Total pages on system: 925640 > > > > Total size on system: 3791421440 Byte > > > > > > > > I understand that there might be some reservations about exporting > > > > such machine-specific details in the vmcoreinfo, but to unify > > > > the implementations across user-land and archs, perhaps this would be > > > > good starting point to start a discussion. > > > > > > > > [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html > > > > [1]. MAN pages -> MAKEDUMPFILE(8) and CRASH(8) > > > > [2]. https://www.spinics.net/lists/kexec/msg21608.html > > > > http://lists.infradead.org/pipermail/kexec/2018-October/021725.html > > > > [3]. https://github.com/bhupesh-sharma/makedumpfile/tree/add-page-offset-base-to-vmcore-v1 > > > > > > > > Cc: Boris Petkov <bp@alien8.de> > > > > Cc: Baoquan He <bhe@redhat.com> > > > > Cc: Ingo Molnar <mingo@kernel.org> > > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> > > > > Cc: Dave Anderson <anderson@redhat.com> > > > > Cc: James Morse <james.morse@arm.com> > > > > Cc: Omar Sandoval <osandov@fb.com> > > > > Cc: x86@kernel.org > > > > Cc: kexec@lists.infradead.org > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > > > > --- > > > > arch/x86/kernel/machine_kexec_64.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > > > > index 4c8acdfdc5a7..834ccefef867 100644 > > > > --- a/arch/x86/kernel/machine_kexec_64.c > > > > +++ b/arch/x86/kernel/machine_kexec_64.c > > > > @@ -356,6 +356,7 @@ void arch_crash_save_vmcoreinfo(void) > > > > VMCOREINFO_SYMBOL(init_top_pgt); > > > > vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > > > > pgtable_l5_enabled()); > > > > + VMCOREINFO_NUMBER(page_offset_base); > > > > > > > > #ifdef CONFIG_NUMA > > > > VMCOREINFO_SYMBOL(node_data); > > > > -- > > > > 2.7.4 > > > >
Hi Bhupesh, On 10/30/18 at 12:33pm, Bhupesh Sharma wrote: > > Why it's broken? Have you investigated and figured out why it's broken? > > If fix, what patch will it look like? Does the patch prove it's not > > worth using the current way? > > > > Have you thought about this in advance? Or still like before, you said > > on arm64 you found different boards have different behaviour, then > > makedumpfile maintainer Kazu said he investigated and found it may be > > caused by KALSR. This time, for this KCORE_REMAP adding, can you help to > > investigate further and give an answer to the issue you found and > > raised? > > Ofcourse, the patchset which added vmcoreinfo into kcore was discussed > and it was agreed that this was a better approach to move forward and hence > accepted in mainline. Currently I am wondering why x86_64 need add page_offset_base to vmcoreinfo. Is it because any feature or userspace tool is broken if page_offset_base is not added into vmcoreinfo? Why KCORE_REMAP adding broke makedumpfile, do you find out the root cause and what it looks like if you fix it in the current way? Can you list the reasons one by one as below with short sentence? 1) 2) 3) > > Regarding the makedumpfile issue, I have already provided a detailed > reply to Kazu (you are Cc'ed on the thread) and also proposed a > makedumpfile approach which > reads the 'page_offset_base' value from kcore (using the kernel > interface provided by this patch), > [on which you are Cc'ed as well]: This is your replying mail link: https://www.spinics.net/lists/kexec/msg21616.html Then what on earth do you want to fix in this patch? So Kazu's patch which decuding page_offset_base like x86 64 have done works. Yes, and your way using vmcoreinfo in kcore also works, but this is not the reason which supports you to discard the old way Kazu suggested. Now we are talking about why you want to discard the old way, and adding page_offset_base to vmcoreinfo. Please elaborate and reply with simple and clear logic. Thanks Baoquan
Hi Bhupesh, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/x86/core] [also build test ERROR on v4.19 next-20181030] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Bhupesh-Sharma/x86_64-vmcoreinfo-Append-page_offset_base-to-vmcoreinfo/20181027-075816 config: x86_64-randconfig-s4-10301649 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): arch/x86/kernel/machine_kexec_64.o: In function `arch_crash_save_vmcoreinfo': >> arch/x86/kernel/machine_kexec_64.c:359: undefined reference to `page_offset_base' >> arch/x86/kernel/machine_kexec_64.c:359: undefined reference to `page_offset_base' vim +359 arch/x86/kernel/machine_kexec_64.c 352 353 void arch_crash_save_vmcoreinfo(void) 354 { 355 VMCOREINFO_NUMBER(phys_base); 356 VMCOREINFO_SYMBOL(init_top_pgt); 357 vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", 358 pgtable_l5_enabled()); > 359 VMCOREINFO_NUMBER(page_offset_base); 360 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi, On Tue, Oct 30, 2018 at 5:02 PM kbuild test robot <lkp@intel.com> wrote: > > Hi Bhupesh, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on tip/x86/core] > [also build test ERROR on v4.19 next-20181030] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Bhupesh-Sharma/x86_64-vmcoreinfo-Append-page_offset_base-to-vmcoreinfo/20181027-075816 > config: x86_64-randconfig-s4-10301649 (attached as .config) > compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All errors (new ones prefixed by >>): > > arch/x86/kernel/machine_kexec_64.o: In function `arch_crash_save_vmcoreinfo': > >> arch/x86/kernel/machine_kexec_64.c:359: undefined reference to `page_offset_base' > >> arch/x86/kernel/machine_kexec_64.c:359: undefined reference to `page_offset_base' > > vim +359 arch/x86/kernel/machine_kexec_64.c > > 352 > 353 void arch_crash_save_vmcoreinfo(void) > 354 { > 355 VMCOREINFO_NUMBER(phys_base); > 356 VMCOREINFO_SYMBOL(init_top_pgt); > 357 vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > 358 pgtable_l5_enabled()); > > 359 VMCOREINFO_NUMBER(page_offset_base); > 360 > Thanks I can reproduce the issue, I will send a v2 shortly to fix the build issue. Regards, Bhupesh
Hi Baoquan, On Tue, Oct 30, 2018 at 2:29 PM Baoquan He <bhe@redhat.com> wrote: > > Hi Bhupesh, > > On 10/30/18 at 12:33pm, Bhupesh Sharma wrote: > > > Why it's broken? Have you investigated and figured out why it's broken? > > > If fix, what patch will it look like? Does the patch prove it's not > > > worth using the current way? > > > > > > Have you thought about this in advance? Or still like before, you said > > > on arm64 you found different boards have different behaviour, then > > > makedumpfile maintainer Kazu said he investigated and found it may be > > > caused by KALSR. This time, for this KCORE_REMAP adding, can you help to > > > investigate further and give an answer to the issue you found and > > > raised? > > > > Ofcourse, the patchset which added vmcoreinfo into kcore was discussed > > and it was agreed that this was a better approach to move forward and hence > > accepted in mainline. > > Currently I am wondering why x86_64 need add page_offset_base to > vmcoreinfo. Is it because any feature or userspace tool is broken if > page_offset_base is not added into vmcoreinfo? > > Why KCORE_REMAP adding broke makedumpfile, do you find out the root > cause and what it looks like if you fix it in the current way? > > Can you list the reasons one by one as below with short sentence? > 1) > 2) > 3) > > > > > Regarding the makedumpfile issue, I have already provided a detailed > > reply to Kazu (you are Cc'ed on the thread) and also proposed a > > makedumpfile approach which > > reads the 'page_offset_base' value from kcore (using the kernel > > interface provided by this patch), > > [on which you are Cc'ed as well]: > > This is your replying mail link: > https://www.spinics.net/lists/kexec/msg21616.html > > Then what on earth do you want to fix in this patch? > > So Kazu's patch which decuding page_offset_base like x86 64 have done works. > Yes, and your way using vmcoreinfo in kcore also works, but this is not > the reason which supports you to discard the old way Kazu suggested. Now > we are talking about why you want to discard the old way, and adding > page_offset_base to vmcoreinfo. > > Please elaborate and reply with simple and clear logic. I have sent a v2 patch with a much simpler git log message. Hopefully that should clarify the intent behind the patch. Also lets see what views the x86 maintainers have on the v2 patch. Regards, Bhupesh
On 11/16/18 at 03:20am, Bhupesh Sharma wrote: > I have sent a v2 patch with a much simpler git log message. Hopefully > that should clarify the intent behind the patch. > Also lets see what views the x86 maintainers have on the v2 patch. Thanks, Bhupesh. That's great. Believe other people will help review and give advice or ACK. I am busying on other issue and have no time to review it for now. Thanks for the effort. Thanks Baoquan
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 4c8acdfdc5a7..834ccefef867 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -356,6 +356,7 @@ void arch_crash_save_vmcoreinfo(void) VMCOREINFO_SYMBOL(init_top_pgt); vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", pgtable_l5_enabled()); + VMCOREINFO_NUMBER(page_offset_base); #ifdef CONFIG_NUMA VMCOREINFO_SYMBOL(node_data);
Since commit 23c85094fe1895caefdd ["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore' contains a new PT_NOTE which carries the VMCOREINFO information. If the same is available, one can use it in user-land to retrieve machine specific symbols or strings being appended to the vmcoreinfo even for live-debugging of the primary kernel as a standard interface exposed by kernel for sharing machine specific details with the user-land. In the past I had a discussion with James, where he suggested this approach (please see [0]) and I really liked the idea. Since then I have been working on unifying the implementations of (atleast) the commonly used user-space utilities that provide live-debugging capabilities (tools like 'makedumpfile' and 'crash-utility', see [1] for details of these tools). For the same, when live debugging on x86_64 machines, user-space tools currently rely on different mechanisms to determine the 'page_offset_base' value (i.e. start of direct mapping of all physical memory). One of the approach used by 'makedumpfile' user-space tool for e.g. is to calculate the same from the last PT_LOAD available in '/proc/kcore', which can be flaky as and when new sections (for e.g. KCORE_REMAP which was added to recent kernels) are added to kcore. For other architectures like arm64, I have already proposed using the vmcoreinfo note (in '/proc/kcore') in the user-space utilities to determine machine specific details like VA_BITS, PAGE_OFFSET, kasrl_offset() (see [2] for details), for which different user-space tools earlier used different (and at times flaky) approaches like: - Reading kernel CONFIGs from user-space and determining CONFIG values like VA_BITS from there. - Reading symbols from '/proc/kallsyms' and determining their values via '/dev/mem' interface. - Reading symbols from 'vmlinux' and determing their values from reading memory. This patch allows appending 'page_offset_base' for x86_64 platforms to vmcoreinfo, so that user-space tools can use the same as a standard interface to determine the start of direct mapping of all physical memory. Testing: ------- - I tested this patch (rebased on 'linux-next') on a x86_64 machine using the modified 'makedumpfile' user-space code (see [3] for my github tree which contains the same) for determining how many pages are dumpable when different dump_level is specified (which is one use-case of live-debugging via 'makedumpfile'). - I tested both the KASLR and non-KASLR boot cases with this patch. - Here is one sample log (for KASLR boot case) on my x86_64 machine: < snip..> The kernel doesn't support mmap(),read() will be used instead. TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 21299 yes Pages filled with zero NON_PRI_CACHE 91785 yes Cache pages without private flag PRI_CACHE 1 yes Cache pages with private flag USER 14057 yes User process pages FREE 740346 yes Free pages KERN_DATA 58152 no Dumpable kernel data page size: 4096 Total pages on system: 925640 Total size on system: 3791421440 Byte I understand that there might be some reservations about exporting such machine-specific details in the vmcoreinfo, but to unify the implementations across user-land and archs, perhaps this would be good starting point to start a discussion. [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html [1]. MAN pages -> MAKEDUMPFILE(8) and CRASH(8) [2]. https://www.spinics.net/lists/kexec/msg21608.html http://lists.infradead.org/pipermail/kexec/2018-October/021725.html [3]. https://github.com/bhupesh-sharma/makedumpfile/tree/add-page-offset-base-to-vmcore-v1 Cc: Boris Petkov <bp@alien8.de> Cc: Baoquan He <bhe@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Omar Sandoval <osandov@fb.com> Cc: x86@kernel.org Cc: kexec@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> --- arch/x86/kernel/machine_kexec_64.c | 1 + 1 file changed, 1 insertion(+)