mbox series

[RFC,00/10] pstore: directly mapped regions

Message ID 20250217101706.2104498-1-eugen.hristev@linaro.org (mailing list archive)
Headers show
Series pstore: directly mapped regions | expand

Message

Eugen Hristev Feb. 17, 2025, 10:16 a.m. UTC
Hello,

This series comes as an RFC proposed solution to enhance pstore and
devcoredump with the following functionality:
It is interesting to mark specific memory regions in the kernel
as core areas, such that in the even of a cataclysmic event like
panic, deadlock, hardware fault, those areas can be retrieved by
an external agent, that can be JTAG, firmware, additional debug
processor, etc.
Even though pstore panic handlers attempt to copy data into the
persistent storage, this is not always working, because the kernel
could be in a deadlock, have irqs disabled, panic handler could
crash itself, or even some memory areas could be corrupted badly
enough that the recovery mechanism doesn't work.
This patch series attempts to solve this by reusing existing
infrastructure in pstore and devcoredump, and provide a copy-free
possibility to have core kernel data exposed for creating debug
information like a coredump or list of binary elements with data.

How this works:
Drivers and key kernel infrastructure, like for example the dmesg
buffer (ftrace buffer is still work in progress and not in the series),
would register themselves into pstore, with a simple function
call taking a name, a pointer to the area and size.
That would be all, and the patch labeled "EXAMPLE" shows a driver
doing exactly that, registering it's own dev struct with string markers
that can help identifying the data later once it's being retrieved.
Further, pstore and pstore/zone would keep track of all registered
areas inside the intermediary back-end. There would not be any copy of
the data.
Pstore would require a new type of back-end, named directly mapped
(because there is no copy, and each zone directly maps the incoming
buffer). If this back-end is available, then pstore further calls
the back-end to register the area.
The back-end itself is then free to do whatever it wishes with the
received area. It can (possibly) do a copy to store data, it can
have a timer to check the area every x time periods, or store the
pointers inside a table of contents (which is what the qcom smem is
doing).
The qcom smem driver registers itself into pstore as a new type of
back-end called shared memory. It's very similar with the pstore/blk
but much simplified, to cope with just a shared memory storage.
The areas metadata are stored in a shared memory that is being managed
by the qcom_smem driver. It's called shared memory because it's being
shared between the main processor running the kernel and additional
secondary remote processors running firmware.
In this qcom smem backend case, the firmware runs when there is a
panic event, of watchdog timeout, and uses the table of contents to
retrieve useful kernel debug data.

The advantages of this proposed idea are: no kernel panic handler
intervention. Everything is being marked and prepared ahead of time,
before there is an actual crash. There is no overhead associated,
simply the areas are being marked and metadata created, no buffers
associated, no memory areas reserved for it. Memory areas being marked,
there is no possibility of getting older useless data, that was copied
before the actual events leading to a crash would occur, so the data
would be the real actual data the kernel was working on.
Of course, this would not work if there is no external agent to parse
the areas and get the useful data. However as the hardware becomes
more and more advanced, in the cases where this is available, kernel
debugging had no way of using this advantage. As the hardware evolves,
so is the kernel.

There is another aspect to underline: the directly mapped zones
change the semantics of pstore using them. Pstore relies on user being
able to read the data from a previous kernel run once the system
has rebooted. This is not valid with the directly mapped zones unless
the JTAG/firmware would copy this data into a persistent area and make
it available upon reboot (which is not impossible).

I would like to thank everyone reviewing and chiping in on the series.

The back-end implementation for qcom_smem is based on the minidump
patch series and driver written by Mukesh Ojha, thanks:
https://lore.kernel.org/lkml/20240131110837.14218-1-quic_mojha@quicinc.com/

Eugen

Eugen Hristev (10):
  pstore/zone: move pstore_device_info into zone header
  pstore/smem: add new pstore/smem type of pstore
  pstore/zone: introduce directly mapped zones
  qcom: smem: add pstore smem backend
  pstore: implement core area registration
  qcom: smem: enable smem pstore backend
  printk: export symbols for buffer address and length functions
  pstore: register kmsg into directly mapped zones if available
  devcoredump: add devcd_{un}register_core_area API
  rng: qcom_rng: EXAMPLE: registering dev structure

 drivers/base/devcoredump.c     |  13 ++
 drivers/crypto/qcom-rng.c      |  11 ++
 drivers/soc/qcom/Kconfig       |   9 +
 drivers/soc/qcom/Makefile      |   6 +-
 drivers/soc/qcom/smem.c        |  10 ++
 drivers/soc/qcom/smem_md.c     | 306 +++++++++++++++++++++++++++++++++
 drivers/soc/qcom/smem_pstore.c | 112 ++++++++++++
 fs/pstore/Kconfig              |  13 ++
 fs/pstore/Makefile             |   3 +
 fs/pstore/platform.c           |  84 +++++++++
 fs/pstore/smem.c               | 115 +++++++++++++
 fs/pstore/zone.c               | 122 +++++++++++--
 include/linux/devcoredump.h    |   3 +
 include/linux/pstore.h         |  20 +++
 include/linux/pstore_blk.h     |  14 --
 include/linux/pstore_smem.h    |   9 +
 include/linux/pstore_zone.h    |  17 ++
 include/linux/soc/qcom/smem.h  |  43 +++++
 kernel/printk/printk.c         |   2 +
 19 files changed, 885 insertions(+), 27 deletions(-)
 create mode 100644 drivers/soc/qcom/smem_md.c
 create mode 100644 drivers/soc/qcom/smem_pstore.c
 create mode 100644 fs/pstore/smem.c
 create mode 100644 include/linux/pstore_smem.h

Comments

Johannes Berg Feb. 17, 2025, 10:23 a.m. UTC | #1
On Mon, 2025-02-17 at 12:16 +0200, Eugen Hristev wrote:

> This series comes as an RFC proposed solution to enhance pstore and
> devcoredump with the following functionality:

...

> This patch series attempts to solve this by reusing existing
> infrastructure in pstore and devcoredump, and provide a copy-free

...

You mention devcoredump multiple times, but it almost seems like you
don't even know what devcoredump does? I don't see how there's any
relation at all, and the code added to it seems to have no relation to
the actual functionality of devcoredump either?

johannes
Eugen Hristev Feb. 17, 2025, 10:44 a.m. UTC | #2
On 2/17/25 12:23, Johannes Berg wrote:
> On Mon, 2025-02-17 at 12:16 +0200, Eugen Hristev wrote:
> 
>> This series comes as an RFC proposed solution to enhance pstore and
>> devcoredump with the following functionality:
> 
> ...
> 
>> This patch series attempts to solve this by reusing existing
>> infrastructure in pstore and devcoredump, and provide a copy-free
> 
> ...
> 
> You mention devcoredump multiple times, but it almost seems like you
> don't even know what devcoredump does? I don't see how there's any
> relation at all, and the code added to it seems to have no relation to
> the actual functionality of devcoredump either?

At this moment going through devcoredump is not something that impacts
the idea of the implementation.
The whole reason of going through it (because things work without it as
well), is to see whether this has any kind of impact or not, and if
there is any kind of fit/reason of going through it.
Devcoredump is involved because the whole core registration is similar
to a core area that devcoredump could use.
For example, would it be interesting to have a handler going through all
devices, and have the dump areas already registered ?
Meaning, when there is a request to generate a core dump, one could
directly dump this area instead of calling back the driver, and provide
that to the userspace instead of the driver calling the dev_coredumpv by
its own.

> 
> johannes
Johannes Berg Feb. 17, 2025, 11:19 a.m. UTC | #3
On Mon, 2025-02-17 at 12:44 +0200, Eugen Hristev wrote:
> 
> At this moment going through devcoredump is not something that impacts
> the idea of the implementation.

Yeah. I don't think it _should_ go through it at all.

> The whole reason of going through it (because things work without it as
> well), is to see whether this has any kind of impact or not, and if
> there is any kind of fit/reason of going through it.

So it's just a trial balloon?

> Devcoredump is involved because the whole core registration is similar
> to a core area that devcoredump could use.

Yeah but ... 

> For example, would it be interesting to have a handler going through all
> devices, and have the dump areas already registered ?
> Meaning, when there is a request to generate a core dump, one could
> directly dump this area instead of calling back the driver, and provide
> that to the userspace instead of the driver calling the dev_coredumpv by
> its own.

I'll be blunt ... so you _really_ haven't understood devcoredump then?

It's really not doing this. It's not meant to do this... It's intended
to dump data from inside the device when the device crashes.

Please remove devcoredump involvement from this series.

johannes
Eugen Hristev Feb. 17, 2025, 11:39 a.m. UTC | #4
On 2/17/25 13:19, Johannes Berg wrote:
> On Mon, 2025-02-17 at 12:44 +0200, Eugen Hristev wrote:
>>
>> At this moment going through devcoredump is not something that impacts
>> the idea of the implementation.
> 
> Yeah. I don't think it _should_ go through it at all.
> 
>> The whole reason of going through it (because things work without it as
>> well), is to see whether this has any kind of impact or not, and if
>> there is any kind of fit/reason of going through it.
> 
> So it's just a trial balloon?

Yes, that's why it's marked as an RFC.
> 
>> Devcoredump is involved because the whole core registration is similar
>> to a core area that devcoredump could use.
> 
> Yeah but ... 
> 
>> For example, would it be interesting to have a handler going through all
>> devices, and have the dump areas already registered ?
>> Meaning, when there is a request to generate a core dump, one could
>> directly dump this area instead of calling back the driver, and provide
>> that to the userspace instead of the driver calling the dev_coredumpv by
>> its own.
> 
> I'll be blunt ... so you _really_ haven't understood devcoredump then?
> 
> It's really not doing this. It's not meant to do this... It's intended
> to dump data from inside the device when the device crashes.
> 
> Please remove devcoredump involvement from this series.

Thank you for your feedback.

> 
> johannes
Johannes Berg Feb. 17, 2025, 11:43 a.m. UTC | #5
On Mon, 2025-02-17 at 13:39 +0200, Eugen Hristev wrote:
> 
> On 2/17/25 13:19, Johannes Berg wrote:
> > On Mon, 2025-02-17 at 12:44 +0200, Eugen Hristev wrote:
> > > 
> > > At this moment going through devcoredump is not something that impacts
> > > the idea of the implementation.
> > 
> > Yeah. I don't think it _should_ go through it at all.
> > 
> > > The whole reason of going through it (because things work without it as
> > > well), is to see whether this has any kind of impact or not, and if
> > > there is any kind of fit/reason of going through it.
> > 
> > So it's just a trial balloon?
> 
> Yes, that's why it's marked as an RFC.
> 

(IMHO) that doesn't make it acceptable to just randomly go modify code
you don't understand the purpose of, certainly not without actually
stating such. But anyway, I'll leave it at that.

johannes