diff mbox

[v2,0/6] Add basic Minidump kernel driver support

Message ID 1679491817-2498-1-git-send-email-quic_mojha@quicinc.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Mukesh Ojha March 22, 2023, 1:30 p.m. UTC
Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory (SMEM), which is a part of DDR and
allocates a small section of SMEM to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump and all get
their reference from G-ToC. Each segment/region has some details like
name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports minidump
feature for remoteproc instances like ADSP, MODEM, ... where predefined
selective segments of subsystem region can be dumped as part of
coredump collection which generates smaller size artifacts compared to
complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Patch 1/6 is very trivial change.
Patch 2/6 moves the minidump specific data structure and macro to
 qcom_minidump.h so that (4/6) minidump driver can use.
Patch 3/6 documents qualcomm minidump guide for users.
Patch 4/6 implements qualcomm minidump kernel driver and exports
 symbol which other minidump kernel client can use.
Patch 5/6 enables the qualcomm minidump driver.
Patch 6/6 Use the exported symbol from minidump driver in qcom_common
 for querying minidump descriptor for a subsystem.

Testing of the patches has been done on sm8450 target with the help
of out of tree patch which helps to set the download mode and storage
type and to warm reset the device.

Download mode setting patches are floating here,
https://lore.kernel.org/lkml/1679070482-8391-1-git-send-email-quic_mojha@quicinc.com/

Default storage type is set to via USB, so minidump would be
downloaded with the help of x86_64 machine running PCAT attached
to Qualcomm device which has backed minidump boot firmware
support(more can be found patch 3/6)

Below patch [1] is to warm reset Qualcomm device which has upstream qcom
watchdog driver support.

After applying all patches, we can boot the device and can execute
following command.

echo mini > /sys/module/qcom_scm/parameters/download_mode
echo c > /proc/sysrq-trigger

This will make the device go to download mode and collect the
minidump on to the attached x86 machine running the Qualcomm
PCAT tool.

We will see a bunch of predefined registered region as binary
blobs starts with md_*. A sample client example to dump a linux
region has been given in 3/6.

[1]
--------------------------->8-------------------------------------

commit f1124ccebd47550b4c9627aa162d9cdceba2b76f
Author: Mukesh Ojha <quic_mojha@quicinc.com>
Date:   Thu Mar 16 14:08:35 2023 +0530

    do not merge: watchdog bite on panic

    Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>

--------------------------------------------------------------------------

Changes in v2:
 - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
 - Addressed comments made by [srinivas.kandagatla]
 - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
   region in minidump.
 - Fixed issue reported by kernel test robot.


Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/

Mukesh Ojha (6):
  remoteproc: qcom: Expand MD_* as MINIDUMP_*
  remoteproc: qcom: Move minidump specific data to qcom_minidump.h
  docs: qcom: Add qualcomm minidump guide
  soc: qcom: Add Qualcomm minidump kernel driver
  arm64: defconfig: Enable Qualcomm minidump driver
  remoterproc: qcom: refactor to leverage exported minidump symbol

 Documentation/admin-guide/qcom_minidump.rst | 240 +++++++++++++
 arch/arm64/configs/defconfig                |   1 +
 drivers/remoteproc/qcom_common.c            |  75 +---
 drivers/soc/qcom/Kconfig                    |  14 +
 drivers/soc/qcom/Makefile                   |   1 +
 drivers/soc/qcom/qcom_minidump.c            | 537 ++++++++++++++++++++++++++++
 include/soc/qcom/minidump.h                 |  40 +++
 include/soc/qcom/qcom_minidump.h            |  88 +++++
 8 files changed, 927 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/admin-guide/qcom_minidump.rst
 create mode 100644 drivers/soc/qcom/qcom_minidump.c
 create mode 100644 include/soc/qcom/minidump.h
 create mode 100644 include/soc/qcom/qcom_minidump.h

Comments

Mukesh Ojha April 3, 2023, 4:25 p.m. UTC | #1
Gentle ping;

-Mukesh

On 3/22/2023 7:00 PM, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
> 
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
> 
> The core of minidump feature is part of Qualcomm's boot firmware code.
> It initializes shared memory (SMEM), which is a part of DDR and
> allocates a small section of SMEM to minidump table i.e also called
> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
> their own table of segments to be included in the minidump and all get
> their reference from G-ToC. Each segment/region has some details like
> name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
> 
> Existing upstream Qualcomm remoteproc driver[1] already supports minidump
> feature for remoteproc instances like ADSP, MODEM, ... where predefined
> selective segments of subsystem region can be dumped as part of
> coredump collection which generates smaller size artifacts compared to
> complete coredump of subsystem on crash.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
> 
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.
> 
> Patch 1/6 is very trivial change.
> Patch 2/6 moves the minidump specific data structure and macro to
>   qcom_minidump.h so that (4/6) minidump driver can use.
> Patch 3/6 documents qualcomm minidump guide for users.
> Patch 4/6 implements qualcomm minidump kernel driver and exports
>   symbol which other minidump kernel client can use.
> Patch 5/6 enables the qualcomm minidump driver.
> Patch 6/6 Use the exported symbol from minidump driver in qcom_common
>   for querying minidump descriptor for a subsystem.
> 
> Testing of the patches has been done on sm8450 target with the help
> of out of tree patch which helps to set the download mode and storage
> type and to warm reset the device.
> 
> Download mode setting patches are floating here,
> https://lore.kernel.org/lkml/1679070482-8391-1-git-send-email-quic_mojha@quicinc.com/
> 
> Default storage type is set to via USB, so minidump would be
> downloaded with the help of x86_64 machine running PCAT attached
> to Qualcomm device which has backed minidump boot firmware
> support(more can be found patch 3/6)
> 
> Below patch [1] is to warm reset Qualcomm device which has upstream qcom
> watchdog driver support.
> 
> After applying all patches, we can boot the device and can execute
> following command.
> 
> echo mini > /sys/module/qcom_scm/parameters/download_mode
> echo c > /proc/sysrq-trigger
> 
> This will make the device go to download mode and collect the
> minidump on to the attached x86 machine running the Qualcomm
> PCAT tool.
> 
> We will see a bunch of predefined registered region as binary
> blobs starts with md_*. A sample client example to dump a linux
> region has been given in 3/6.
> 
> [1]
> --------------------------->8-------------------------------------
> 
> commit f1124ccebd47550b4c9627aa162d9cdceba2b76f
> Author: Mukesh Ojha <quic_mojha@quicinc.com>
> Date:   Thu Mar 16 14:08:35 2023 +0530
> 
>      do not merge: watchdog bite on panic
> 
>      Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> 
> diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c
> index 0d2209c..767e84a 100644
> --- a/drivers/watchdog/qcom-wdt.c
> +++ b/drivers/watchdog/qcom-wdt.c
> @@ -12,6 +12,7 @@
>   #include <linux/platform_device.h>
>   #include <linux/watchdog.h>
>   #include <linux/of_device.h>
> +#include <linux/panic.h>
> 
>   enum wdt_reg {
>          WDT_RST,
> @@ -114,12 +115,28 @@ static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd,
>          return qcom_wdt_start(wdd);
>   }
> 
> +static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt)
> +{
> +       writel(0, wdt_addr(wdt, WDT_EN));
> +       writel(1, wdt_addr(wdt, WDT_BITE_TIME));
> +       writel(1, wdt_addr(wdt, WDT_RST));
> +       writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN));
> +
> +       wmb();
> +
> +       while(1)
> +               udelay(1);
> +}
> +
>   static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action,
>                              void *data)
>   {
>          struct qcom_wdt *wdt = to_qcom_wdt(wdd);
>          u32 timeout;
> 
> +       if (in_panic)
> +               qcom_wdt_bite_on_panic(wdt);
> +
>          /*
>           * Trigger watchdog bite:
>           *    Setup BITE_TIME to be 128ms, and enable WDT.
> diff --git a/include/linux/panic.h b/include/linux/panic.h
> index 979b776..f913629 100644
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -22,6 +22,7 @@ extern int panic_on_oops;
>   extern int panic_on_unrecovered_nmi;
>   extern int panic_on_io_nmi;
>   extern int panic_on_warn;
> +extern bool in_panic;
> 
>   extern unsigned long panic_on_taint;
>   extern bool panic_on_taint_nousertaint;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 487f5b0..714f7f4 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -65,6 +65,8 @@ static unsigned int warn_limit __read_mostly;
> 
>   int panic_timeout = CONFIG_PANIC_TIMEOUT;
>   EXPORT_SYMBOL_GPL(panic_timeout);
> +bool in_panic = false;
> +EXPORT_SYMBOL_GPL(in_panic);
> 
>   #define PANIC_PRINT_TASK_INFO          0x00000001
>   #define PANIC_PRINT_MEM_INFO           0x00000002
> @@ -261,6 +263,7 @@ void panic(const char *fmt, ...)
>          int old_cpu, this_cpu;
>          bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
> 
> +       in_panic = true;
>          if (panic_on_warn) {
>                  /*
>                   * This thread may hit another WARN() in the panic path.
> --------------------------------------------------------------------------
> 
> Changes in v2:
>   - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>   - Addressed comments made by [srinivas.kandagatla]
>   - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>     region in minidump.
>   - Fixed issue reported by kernel test robot.
> 
> 
> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
> 
> Mukesh Ojha (6):
>    remoteproc: qcom: Expand MD_* as MINIDUMP_*
>    remoteproc: qcom: Move minidump specific data to qcom_minidump.h
>    docs: qcom: Add qualcomm minidump guide
>    soc: qcom: Add Qualcomm minidump kernel driver
>    arm64: defconfig: Enable Qualcomm minidump driver
>    remoterproc: qcom: refactor to leverage exported minidump symbol
> 
>   Documentation/admin-guide/qcom_minidump.rst | 240 +++++++++++++
>   arch/arm64/configs/defconfig                |   1 +
>   drivers/remoteproc/qcom_common.c            |  75 +---
>   drivers/soc/qcom/Kconfig                    |  14 +
>   drivers/soc/qcom/Makefile                   |   1 +
>   drivers/soc/qcom/qcom_minidump.c            | 537 ++++++++++++++++++++++++++++
>   include/soc/qcom/minidump.h                 |  40 +++
>   include/soc/qcom/qcom_minidump.h            |  88 +++++
>   8 files changed, 927 insertions(+), 69 deletions(-)
>   create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>   create mode 100644 include/soc/qcom/minidump.h
>   create mode 100644 include/soc/qcom/qcom_minidump.h
>
diff mbox

Patch

diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c
index 0d2209c..767e84a 100644
--- a/drivers/watchdog/qcom-wdt.c
+++ b/drivers/watchdog/qcom-wdt.c
@@ -12,6 +12,7 @@ 
 #include <linux/platform_device.h>
 #include <linux/watchdog.h>
 #include <linux/of_device.h>
+#include <linux/panic.h>

 enum wdt_reg {
        WDT_RST,
@@ -114,12 +115,28 @@  static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd,
        return qcom_wdt_start(wdd);
 }

+static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt)
+{
+       writel(0, wdt_addr(wdt, WDT_EN));
+       writel(1, wdt_addr(wdt, WDT_BITE_TIME));
+       writel(1, wdt_addr(wdt, WDT_RST));
+       writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN));
+
+       wmb();
+
+       while(1)
+               udelay(1);
+}
+
 static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action,
                            void *data)
 {
        struct qcom_wdt *wdt = to_qcom_wdt(wdd);
        u32 timeout;

+       if (in_panic)
+               qcom_wdt_bite_on_panic(wdt);
+
        /*
         * Trigger watchdog bite:
         *    Setup BITE_TIME to be 128ms, and enable WDT.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 979b776..f913629 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -22,6 +22,7 @@  extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
+extern bool in_panic;

 extern unsigned long panic_on_taint;
 extern bool panic_on_taint_nousertaint;
diff --git a/kernel/panic.c b/kernel/panic.c
index 487f5b0..714f7f4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -65,6 +65,8 @@  static unsigned int warn_limit __read_mostly;

 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
+bool in_panic = false;
+EXPORT_SYMBOL_GPL(in_panic);

 #define PANIC_PRINT_TASK_INFO          0x00000001
 #define PANIC_PRINT_MEM_INFO           0x00000002
@@ -261,6 +263,7 @@  void panic(const char *fmt, ...)
        int old_cpu, this_cpu;
        bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;

+       in_panic = true;
        if (panic_on_warn) {
                /*
                 * This thread may hit another WARN() in the panic path.