Message ID | d4556268-8274-4089-949f-3b97d67793c7@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | remoteproc: zynqmp: Add coredump support | expand |
Hi Leonard, I have queued patches for this driver that will break this patch. Please re-submit when v6.9-rc1 is out and rproc-next has been updated, which should be around the middle of next week. Thanks, Mathieu On Sat, Mar 16, 2024 at 08:16:42PM +0200, Leonard Crestez wrote: > Supporting remoteproc coredump requires the platform-specific driver to > register coredump segments to be dumped. Do this by calling > rproc_coredump_add_segment for every carveout. > > Also call rproc_coredump_set_elf_info when then rproc is created. If the > ELFCLASS parameter is not provided then coredump fails with an error. > Other drivers seem to pass EM_NONE for the machine argument but for me > this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. > > Signed-off-by: Leonard Crestez <cdleonard@gmail.com> > --- > > Tests were done by triggering an deliberate crash using remoteproc > debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash > > This was tested using RPU apps which use RAM for everything so TCM dump > was not verified. The freertos-gdb script package showed credible data: > > https://github.com/espressif/freertos-gdb > > The R5 cache is not flushed so RAM might be out of date which is > actually very bad because information most relevant to determining the > cause of a crash is lost. Possible workaround would be to flush caches > in some sort of R5 crash handler? I don't think Linux can do anything > about this limitation. > > The generated coredump doesn't contain registers, this seems to be a > limitation shared with other rproc coredumps. It's not clear how the apu > could access rpu registers on zynqmp, my only idea would be to use the > coresight dap but that sounds difficult. > > --- > drivers/remoteproc/xlnx_r5_remoteproc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c > index 4395edea9a64..cfbd97b89c26 100644 > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > @@ -484,10 +484,11 @@ static int add_mem_regions_carveout(struct rproc *rproc) > of_node_put(it.node); > return -ENOMEM; > } > > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, rmem->base, rmem->size); > > dev_dbg(&rproc->dev, "reserved mem carveout %s addr=%llx, size=0x%llx", > it.node->name, rmem->base, rmem->size); > i++; > } > @@ -595,10 +596,11 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) > zynqmp_pm_release_node(pm_domain_id); > goto release_tcm_split; > } > > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, da, bank_size); > } > > return 0; > > release_tcm_split: > @@ -674,10 +676,11 @@ static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) > goto release_tcm_lockstep; > } > > /* If registration is success, add carveouts */ > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, da, bank_size); > > dev_dbg(dev, "TCM carveout lockstep mode %s addr=0x%llx, da=0x%x, size=0x%lx", > bank_name, bank_addr, da, bank_size); > } > > @@ -851,10 +854,12 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) > if (!r5_rproc) { > dev_err(cdev, "failed to allocate memory for rproc instance\n"); > return ERR_PTR(-ENOMEM); > } > > + rproc_coredump_set_elf_info(r5_rproc, ELFCLASS32, EM_ARM); > + > r5_rproc->auto_boot = false; > r5_core = r5_rproc->priv; > r5_core->dev = cdev; > r5_core->np = dev_of_node(cdev); > if (!r5_core->np) { > -- > 2.34.1
On 3/18/24 18:52, Mathieu Poirier wrote: > Hi Leonard, > > I have queued patches for this driver that will break this patch. Please > re-submit when v6.9-rc1 is out and rproc-next has been updated, which should be > around the middle of next week. Hello, It's been a while - v6.9-rc1 is out and rproc-next has been rebased on top of it. But the coredump patch still applies? I expected some unrelated xlnx_r5_remoteproc patches to cause conflicts but there's nothing there. It seems to me that the patch can be applied as-is and no resend is required. Am I missing something? -- Regards, Leonard > On Sat, Mar 16, 2024 at 08:16:42PM +0200, Leonard Crestez wrote: >> Supporting remoteproc coredump requires the platform-specific driver to >> register coredump segments to be dumped. Do this by calling >> rproc_coredump_add_segment for every carveout. >> >> Also call rproc_coredump_set_elf_info when then rproc is created. If the >> ELFCLASS parameter is not provided then coredump fails with an error. >> Other drivers seem to pass EM_NONE for the machine argument but for me >> this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. >> >> Signed-off-by: Leonard Crestez <cdleonard@gmail.com> >> --- >> >> Tests were done by triggering an deliberate crash using remoteproc >> debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash >> >> This was tested using RPU apps which use RAM for everything so TCM dump >> was not verified. The freertos-gdb script package showed credible data: >> >> https://github.com/espressif/freertos-gdb >> >> The R5 cache is not flushed so RAM might be out of date which is >> actually very bad because information most relevant to determining the >> cause of a crash is lost. Possible workaround would be to flush caches >> in some sort of R5 crash handler? I don't think Linux can do anything >> about this limitation. >> >> The generated coredump doesn't contain registers, this seems to be a >> limitation shared with other rproc coredumps. It's not clear how the apu >> could access rpu registers on zynqmp, my only idea would be to use the >> coresight dap but that sounds difficult.
On Thu, Mar 28, 2024 at 10:17:13AM +0200, Leonard Crestez wrote: > On 3/18/24 18:52, Mathieu Poirier wrote: > > Hi Leonard, > > > > I have queued patches for this driver that will break this patch. Please > > re-submit when v6.9-rc1 is out and rproc-next has been updated, which should be > > around the middle of next week. > > Hello, > > It's been a while - v6.9-rc1 is out and rproc-next has been rebased on top of > it. But the coredump patch still applies? I expected some unrelated > xlnx_r5_remoteproc patches to cause conflicts but there's nothing there. > > It seems to me that the patch can be applied as-is and no resend is required. > Am I missing something? > You're not missing anything. Back when I wrote my initial comment Tanmay had submitted patches to fix the way TCMs are initialized, which conflicted with your patch. There were some last minute modifications to Tanmay's patchset and I ended up not applying it, leading us to where we are today. Tanmay - please review and test this patch. Thanks, Mathieu > -- > Regards, > Leonard > > > On Sat, Mar 16, 2024 at 08:16:42PM +0200, Leonard Crestez wrote: > >> Supporting remoteproc coredump requires the platform-specific driver to > >> register coredump segments to be dumped. Do this by calling > >> rproc_coredump_add_segment for every carveout. > >> > >> Also call rproc_coredump_set_elf_info when then rproc is created. If the > >> ELFCLASS parameter is not provided then coredump fails with an error. > >> Other drivers seem to pass EM_NONE for the machine argument but for me > >> this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. > >> > >> Signed-off-by: Leonard Crestez <cdleonard@gmail.com> > >> --- > >> > >> Tests were done by triggering an deliberate crash using remoteproc > >> debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash > >> > >> This was tested using RPU apps which use RAM for everything so TCM dump > >> was not verified. The freertos-gdb script package showed credible data: > >> > >> https://github.com/espressif/freertos-gdb > >> > >> The R5 cache is not flushed so RAM might be out of date which is > >> actually very bad because information most relevant to determining the > >> cause of a crash is lost. Possible workaround would be to flush caches > >> in some sort of R5 crash handler? I don't think Linux can do anything > >> about this limitation. > >> > >> The generated coredump doesn't contain registers, this seems to be a > >> limitation shared with other rproc coredumps. It's not clear how the apu > >> could access rpu registers on zynqmp, my only idea would be to use the > >> coresight dap but that sounds difficult.
Hello, Thanks for your patch. Patch looks good to me. Please find some comments below. On 3/16/24 1:16 PM, Leonard Crestez wrote: > Supporting remoteproc coredump requires the platform-specific driver to > register coredump segments to be dumped. Do this by calling > rproc_coredump_add_segment for every carveout. > > Also call rproc_coredump_set_elf_info when then rproc is created. If the > ELFCLASS parameter is not provided then coredump fails with an error. > Other drivers seem to pass EM_NONE for the machine argument but for me > this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. > > Signed-off-by: Leonard Crestez <cdleonard@gmail.com> > --- > > Tests were done by triggering an deliberate crash using remoteproc > debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash > > This was tested using RPU apps which use RAM for everything so TCM dump > was not verified. The freertos-gdb script package showed credible data: > > https://github.com/espressif/freertos-gdb Thanks for this testing. > > The R5 cache is not flushed so RAM might be out of date which is > actually very bad because information most relevant to determining the > cause of a crash is lost. Possible workaround would be to flush caches > in some sort of R5 crash handler? I don't think Linux can do anything > about this limitation. > Correct Linux can't. One workaround is that R5 firmware can mark required memory regions as non-cachable in MPU setting. This way information loss can be avoided. > The generated coredump doesn't contain registers, this seems to be a > limitation shared with other rproc coredumps. It's not clear how the apu > could access rpu registers on zynqmp, my only idea would be to use the > coresight dap but that sounds difficult. Linux doesn't really have access to R5 control registers due to security. Instead EEMI calls to platform management controller are used to control R5. So R5 control register dump shouldn't needed. Mathieu, I am okay to merge this patch. > > --- > drivers/remoteproc/xlnx_r5_remoteproc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c > index 4395edea9a64..cfbd97b89c26 100644 > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > @@ -484,10 +484,11 @@ static int add_mem_regions_carveout(struct rproc *rproc) > of_node_put(it.node); > return -ENOMEM; > } > > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, rmem->base, rmem->size); > > dev_dbg(&rproc->dev, "reserved mem carveout %s addr=%llx, size=0x%llx", > it.node->name, rmem->base, rmem->size); > i++; > } > @@ -595,10 +596,11 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) > zynqmp_pm_release_node(pm_domain_id); > goto release_tcm_split; > } > > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, da, bank_size); > } > > return 0; > > release_tcm_split: > @@ -674,10 +676,11 @@ static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) > goto release_tcm_lockstep; > } > > /* If registration is success, add carveouts */ > rproc_add_carveout(rproc, rproc_mem); > + rproc_coredump_add_segment(rproc, da, bank_size); > > dev_dbg(dev, "TCM carveout lockstep mode %s addr=0x%llx, da=0x%x, size=0x%lx", > bank_name, bank_addr, da, bank_size); > } > > @@ -851,10 +854,12 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) > if (!r5_rproc) { > dev_err(cdev, "failed to allocate memory for rproc instance\n"); > return ERR_PTR(-ENOMEM); > } > > + rproc_coredump_set_elf_info(r5_rproc, ELFCLASS32, EM_ARM); > + > r5_rproc->auto_boot = false; > r5_core = r5_rproc->priv; > r5_core->dev = cdev; > r5_core->np = dev_of_node(cdev); > if (!r5_core->np) {
On 4/4/24 23:14, Tanmay Shah wrote: > Hello, > > Thanks for your patch. Patch looks good to me. > Please find some comments below. > > On 3/16/24 1:16 PM, Leonard Crestez wrote: >> Supporting remoteproc coredump requires the platform-specific driver to >> register coredump segments to be dumped. Do this by calling >> rproc_coredump_add_segment for every carveout. >> >> Also call rproc_coredump_set_elf_info when then rproc is created. If the >> ELFCLASS parameter is not provided then coredump fails with an error. >> Other drivers seem to pass EM_NONE for the machine argument but for me >> this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. >> >> Signed-off-by: Leonard Crestez <cdleonard@gmail.com> >> --- >> >> Tests were done by triggering an deliberate crash using remoteproc >> debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash >> >> The R5 cache is not flushed so RAM might be out of date which is >> actually very bad because information most relevant to determining the >> cause of a crash is lost. Possible workaround would be to flush caches >> in some sort of R5 crash handler? I don't think Linux can do anything >> about this limitation. >> > > Correct Linux can't. One workaround is that R5 firmware can mark > required memory regions as non-cachable in MPU setting. This way information > loss can be avoided. The solution I ended up with is to add cache flushing in some sort of R5-side crash handler. >> The generated coredump doesn't contain registers, this seems to be a >> limitation shared with other rproc coredumps. It's not clear how the apu >> could access rpu registers on zynqmp, my only idea would be to use the >> coresight dap but that sounds difficult. > > Linux doesn't really have access to R5 control registers due to security. > Instead EEMI calls to platform management controller are used to control R5. > So R5 control register dump shouldn't needed. > > Mathieu, > I am okay to merge this patch. Thanks for the review.
On Thu, Apr 04, 2024 at 03:14:48PM -0500, Tanmay Shah wrote: > Hello, > > Thanks for your patch. Patch looks good to me. > Please find some comments below. > > On 3/16/24 1:16 PM, Leonard Crestez wrote: > > Supporting remoteproc coredump requires the platform-specific driver to > > register coredump segments to be dumped. Do this by calling > > rproc_coredump_add_segment for every carveout. > > > > Also call rproc_coredump_set_elf_info when then rproc is created. If the > > ELFCLASS parameter is not provided then coredump fails with an error. > > Other drivers seem to pass EM_NONE for the machine argument but for me > > this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. > > > > Signed-off-by: Leonard Crestez <cdleonard@gmail.com> > > --- > > > > Tests were done by triggering an deliberate crash using remoteproc > > debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash > > > > This was tested using RPU apps which use RAM for everything so TCM dump > > was not verified. The freertos-gdb script package showed credible data: > > > > https://github.com/espressif/freertos-gdb > > Thanks for this testing. > > > > > The R5 cache is not flushed so RAM might be out of date which is > > actually very bad because information most relevant to determining the > > cause of a crash is lost. Possible workaround would be to flush caches > > in some sort of R5 crash handler? I don't think Linux can do anything > > about this limitation. > > > > Correct Linux can't. One workaround is that R5 firmware can mark > required memory regions as non-cachable in MPU setting. This way information > loss can be avoided. > > > The generated coredump doesn't contain registers, this seems to be a > > limitation shared with other rproc coredumps. It's not clear how the apu > > could access rpu registers on zynqmp, my only idea would be to use the > > coresight dap but that sounds difficult. > > Linux doesn't really have access to R5 control registers due to security. > Instead EEMI calls to platform management controller are used to control R5. > So R5 control register dump shouldn't needed. > > Mathieu, > I am okay to merge this patch. > Applied. Thanks, Mathieu > > > > --- > > drivers/remoteproc/xlnx_r5_remoteproc.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c > > index 4395edea9a64..cfbd97b89c26 100644 > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > @@ -484,10 +484,11 @@ static int add_mem_regions_carveout(struct rproc *rproc) > > of_node_put(it.node); > > return -ENOMEM; > > } > > > > rproc_add_carveout(rproc, rproc_mem); > > + rproc_coredump_add_segment(rproc, rmem->base, rmem->size); > > > > dev_dbg(&rproc->dev, "reserved mem carveout %s addr=%llx, size=0x%llx", > > it.node->name, rmem->base, rmem->size); > > i++; > > } > > @@ -595,10 +596,11 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) > > zynqmp_pm_release_node(pm_domain_id); > > goto release_tcm_split; > > } > > > > rproc_add_carveout(rproc, rproc_mem); > > + rproc_coredump_add_segment(rproc, da, bank_size); > > } > > > > return 0; > > > > release_tcm_split: > > @@ -674,10 +676,11 @@ static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) > > goto release_tcm_lockstep; > > } > > > > /* If registration is success, add carveouts */ > > rproc_add_carveout(rproc, rproc_mem); > > + rproc_coredump_add_segment(rproc, da, bank_size); > > > > dev_dbg(dev, "TCM carveout lockstep mode %s addr=0x%llx, da=0x%x, size=0x%lx", > > bank_name, bank_addr, da, bank_size); > > } > > > > @@ -851,10 +854,12 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) > > if (!r5_rproc) { > > dev_err(cdev, "failed to allocate memory for rproc instance\n"); > > return ERR_PTR(-ENOMEM); > > } > > > > + rproc_coredump_set_elf_info(r5_rproc, ELFCLASS32, EM_ARM); > > + > > r5_rproc->auto_boot = false; > > r5_core = r5_rproc->priv; > > r5_core->dev = cdev; > > r5_core->np = dev_of_node(cdev); > > if (!r5_core->np) { >
diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 4395edea9a64..cfbd97b89c26 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -484,10 +484,11 @@ static int add_mem_regions_carveout(struct rproc *rproc) of_node_put(it.node); return -ENOMEM; } rproc_add_carveout(rproc, rproc_mem); + rproc_coredump_add_segment(rproc, rmem->base, rmem->size); dev_dbg(&rproc->dev, "reserved mem carveout %s addr=%llx, size=0x%llx", it.node->name, rmem->base, rmem->size); i++; } @@ -595,10 +596,11 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) zynqmp_pm_release_node(pm_domain_id); goto release_tcm_split; } rproc_add_carveout(rproc, rproc_mem); + rproc_coredump_add_segment(rproc, da, bank_size); } return 0; release_tcm_split: @@ -674,10 +676,11 @@ static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) goto release_tcm_lockstep; } /* If registration is success, add carveouts */ rproc_add_carveout(rproc, rproc_mem); + rproc_coredump_add_segment(rproc, da, bank_size); dev_dbg(dev, "TCM carveout lockstep mode %s addr=0x%llx, da=0x%x, size=0x%lx", bank_name, bank_addr, da, bank_size); } @@ -851,10 +854,12 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) if (!r5_rproc) { dev_err(cdev, "failed to allocate memory for rproc instance\n"); return ERR_PTR(-ENOMEM); } + rproc_coredump_set_elf_info(r5_rproc, ELFCLASS32, EM_ARM); + r5_rproc->auto_boot = false; r5_core = r5_rproc->priv; r5_core->dev = cdev; r5_core->np = dev_of_node(cdev); if (!r5_core->np) {
Supporting remoteproc coredump requires the platform-specific driver to register coredump segments to be dumped. Do this by calling rproc_coredump_add_segment for every carveout. Also call rproc_coredump_set_elf_info when then rproc is created. If the ELFCLASS parameter is not provided then coredump fails with an error. Other drivers seem to pass EM_NONE for the machine argument but for me this shows a warning in gdb. Pass EM_ARM because this is an ARM R5. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> --- Tests were done by triggering an deliberate crash using remoteproc debugfs: echo 2 > /sys/kernel/debug/remoteproc/remoteproc0/crash This was tested using RPU apps which use RAM for everything so TCM dump was not verified. The freertos-gdb script package showed credible data: https://github.com/espressif/freertos-gdb The R5 cache is not flushed so RAM might be out of date which is actually very bad because information most relevant to determining the cause of a crash is lost. Possible workaround would be to flush caches in some sort of R5 crash handler? I don't think Linux can do anything about this limitation. The generated coredump doesn't contain registers, this seems to be a limitation shared with other rproc coredumps. It's not clear how the apu could access rpu registers on zynqmp, my only idea would be to use the coresight dap but that sounds difficult. --- drivers/remoteproc/xlnx_r5_remoteproc.c | 5 +++++ 1 file changed, 5 insertions(+)