Message ID | 20200424200135.28825-10-mathieu.poirier@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | remoteproc: Add support for synchronisaton with rproc | expand |
Hi Mathieu, On 4/24/20 10:01 PM, Mathieu Poirier wrote: > Refactor function rproc_trigger_recovery() in order to avoid > reloading the firmware image when synchronising with a remote > processor rather than booting it. Also part of the process, > properly set the synchronisation flag in order to properly > recover the system. > > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> > --- > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ > 2 files changed, 43 insertions(+), 7 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index ef88d3e84bfb..3a84a38ba37b 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) > */ > int rproc_trigger_recovery(struct rproc *rproc) > { > - const struct firmware *firmware_p; > + const struct firmware *firmware_p = NULL; > struct device *dev = &rproc->dev; > int ret; > > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) > /* generate coredump */ > rproc_coredump(rproc); > > - /* load firmware */ > - ret = request_firmware(&firmware_p, rproc->firmware, dev); > - if (ret < 0) { > - dev_err(dev, "request_firmware failed: %d\n", ret); > - goto unlock_mutex; > + /* load firmware if need be */ > + if (!rproc_needs_syncing(rproc)) { > + ret = request_firmware(&firmware_p, rproc->firmware, dev); > + if (ret < 0) { > + dev_err(dev, "request_firmware failed: %d\n", ret); > + goto unlock_mutex; > + } If we started in syncing mode then rpoc->firmware is null rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc) false. In this case here we fail the recovery an leave in RPROC_STOP state. As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery failed. [1]https://lkml.org/lkml/2020/3/11/402 Regards, Arnaud > } > > - /* boot the remote processor up again */ > + /* boot up or synchronise with the remote processor again */ > ret = rproc_start(rproc, firmware_p); > > release_firmware(firmware_p); > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, > rproc->name); > > + /* > + * The remote processor has crashed - tell the core what operation > + * to use from hereon, i.e whether an external entity will reboot > + * the MCU or it is now the remoteproc core's responsability. > + */ > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); > + > mutex_unlock(&rproc->lock); > > if (!rproc->recovery_disabled) > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h > index 3985c084b184..61500981155c 100644 > --- a/drivers/remoteproc/remoteproc_internal.h > +++ b/drivers/remoteproc/remoteproc_internal.h > @@ -24,6 +24,33 @@ struct rproc_debug_trace { > struct rproc_mem_entry trace_mem; > }; > > +/* > + * enum rproc_sync_states - remote processsor sync states > + * > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor > + * has crashed but has not been recovered by > + * the remoteproc core yet. > + * > + * Keeping these separate from the enum rproc_state in order to avoid > + * introducing coupling between the state of the MCU and the synchronisation > + * operation to use. > + */ > +enum rproc_sync_states { > + RPROC_SYNC_STATE_CRASHED, > +}; > + > +static inline void rproc_set_sync_flag(struct rproc *rproc, > + enum rproc_sync_states state) > +{ > + switch (state) { > + case RPROC_SYNC_STATE_CRASHED: > + rproc->sync_with_rproc = rproc->sync_flags.after_crash; > + break; > + default: > + break; > + } > +} > + > /* from remoteproc_core.c */ > void rproc_release(struct kref *kref); > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id); >
On Wed, Apr 29, 2020 at 09:44:02AM +0200, Arnaud POULIQUEN wrote: > Hi Mathieu, > > On 4/24/20 10:01 PM, Mathieu Poirier wrote: > > Refactor function rproc_trigger_recovery() in order to avoid > > reloading the firmware image when synchronising with a remote > > processor rather than booting it. Also part of the process, > > properly set the synchronisation flag in order to properly > > recover the system. > > > > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> > > --- > > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ > > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ > > 2 files changed, 43 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > index ef88d3e84bfb..3a84a38ba37b 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) > > */ > > int rproc_trigger_recovery(struct rproc *rproc) > > { > > - const struct firmware *firmware_p; > > + const struct firmware *firmware_p = NULL; > > struct device *dev = &rproc->dev; > > int ret; > > > > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) > > /* generate coredump */ > > rproc_coredump(rproc); > > > > - /* load firmware */ > > - ret = request_firmware(&firmware_p, rproc->firmware, dev); > > - if (ret < 0) { > > - dev_err(dev, "request_firmware failed: %d\n", ret); > > - goto unlock_mutex; > > + /* load firmware if need be */ > > + if (!rproc_needs_syncing(rproc)) { > > + ret = request_firmware(&firmware_p, rproc->firmware, dev); > > + if (ret < 0) { > > + dev_err(dev, "request_firmware failed: %d\n", ret); > > + goto unlock_mutex; > > + } > > If we started in syncing mode then rpoc->firmware is null > rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc) > false. You are correct, I will add an additional check in rproc_set_machine() to prevent a situation where rproc_alloc() has been called without an ops and any of the synchronisation flags are set to false. It is also possible that someone would call proc_alloc() without an ops and doesn't call rproc_set_state_machine(), in which case both ops and sync_ops would be NULL. Adding a check in rproc_add() is probably the best location to catch such a condition. > In this case here we fail the recovery an leave in RPROC_STOP state. > As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery > failed. Right, that's a different problem. > > [1]https://lkml.org/lkml/2020/3/11/402 > > Regards, > Arnaud > > } > > > > - /* boot the remote processor up again */ > > + /* boot up or synchronise with the remote processor again */ > > ret = rproc_start(rproc, firmware_p); > > > > release_firmware(firmware_p); > > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) > > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, > > rproc->name); > > > > + /* > > + * The remote processor has crashed - tell the core what operation > > + * to use from hereon, i.e whether an external entity will reboot > > + * the MCU or it is now the remoteproc core's responsability. > > + */ > > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); > > + > > mutex_unlock(&rproc->lock); > > > > if (!rproc->recovery_disabled) > > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h > > index 3985c084b184..61500981155c 100644 > > --- a/drivers/remoteproc/remoteproc_internal.h > > +++ b/drivers/remoteproc/remoteproc_internal.h > > @@ -24,6 +24,33 @@ struct rproc_debug_trace { > > struct rproc_mem_entry trace_mem; > > }; > > > > +/* > > + * enum rproc_sync_states - remote processsor sync states > > + * > > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor > > + * has crashed but has not been recovered by > > + * the remoteproc core yet. > > + * > > + * Keeping these separate from the enum rproc_state in order to avoid > > + * introducing coupling between the state of the MCU and the synchronisation > > + * operation to use. > > + */ > > +enum rproc_sync_states { > > + RPROC_SYNC_STATE_CRASHED, > > +}; > > + > > +static inline void rproc_set_sync_flag(struct rproc *rproc, > > + enum rproc_sync_states state) > > +{ > > + switch (state) { > > + case RPROC_SYNC_STATE_CRASHED: > > + rproc->sync_with_rproc = rproc->sync_flags.after_crash; > > + break; > > + default: > > + break; > > + } > > +} > > + > > /* from remoteproc_core.c */ > > void rproc_release(struct kref *kref); > > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id); > >
On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote: > Refactor function rproc_trigger_recovery() in order to avoid > reloading the firmware image when synchronising with a remote > processor rather than booting it. Also part of the process, > properly set the synchronisation flag in order to properly > recover the system. > > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> > --- > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ > 2 files changed, 43 insertions(+), 7 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index ef88d3e84bfb..3a84a38ba37b 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) > */ > int rproc_trigger_recovery(struct rproc *rproc) > { > - const struct firmware *firmware_p; > + const struct firmware *firmware_p = NULL; > struct device *dev = &rproc->dev; > int ret; > > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) > /* generate coredump */ > rproc_coredump(rproc); > > - /* load firmware */ > - ret = request_firmware(&firmware_p, rproc->firmware, dev); > - if (ret < 0) { > - dev_err(dev, "request_firmware failed: %d\n", ret); > - goto unlock_mutex; > + /* load firmware if need be */ > + if (!rproc_needs_syncing(rproc)) { > + ret = request_firmware(&firmware_p, rproc->firmware, dev); > + if (ret < 0) { > + dev_err(dev, "request_firmware failed: %d\n", ret); > + goto unlock_mutex; > + } > } > > - /* boot the remote processor up again */ > + /* boot up or synchronise with the remote processor again */ > ret = rproc_start(rproc, firmware_p); > > release_firmware(firmware_p); > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, > rproc->name); > > + /* > + * The remote processor has crashed - tell the core what operation > + * to use from hereon, i.e whether an external entity will reboot > + * the MCU or it is now the remoteproc core's responsability. > + */ > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); If I follow the logic correctly, you're essentially using rproc->sync_with_rproc to pass an additional parameter down through rproc_trigger_recovery() to tell everyone below to "load firmware and boot the core or not". And given that the comment alludes to some unknown logic determining the continuation I think it would be much preferable to essentially just pass rproc->sync_flags.after_crash down through these functions. And per my comment on a previous patch, is there any synchronization with the remote controller when this happens? Regards, Bjorn > + > mutex_unlock(&rproc->lock); > > if (!rproc->recovery_disabled) > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h > index 3985c084b184..61500981155c 100644 > --- a/drivers/remoteproc/remoteproc_internal.h > +++ b/drivers/remoteproc/remoteproc_internal.h > @@ -24,6 +24,33 @@ struct rproc_debug_trace { > struct rproc_mem_entry trace_mem; > }; > > +/* > + * enum rproc_sync_states - remote processsor sync states > + * > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor > + * has crashed but has not been recovered by > + * the remoteproc core yet. > + * > + * Keeping these separate from the enum rproc_state in order to avoid > + * introducing coupling between the state of the MCU and the synchronisation > + * operation to use. > + */ > +enum rproc_sync_states { > + RPROC_SYNC_STATE_CRASHED, > +}; > + > +static inline void rproc_set_sync_flag(struct rproc *rproc, > + enum rproc_sync_states state) > +{ > + switch (state) { > + case RPROC_SYNC_STATE_CRASHED: > + rproc->sync_with_rproc = rproc->sync_flags.after_crash; > + break; > + default: > + break; > + } > +} > + > /* from remoteproc_core.c */ > void rproc_release(struct kref *kref); > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id); > -- > 2.20.1 >
On Tue, May 05, 2020 at 06:01:56PM -0700, Bjorn Andersson wrote: > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote: > > > Refactor function rproc_trigger_recovery() in order to avoid > > reloading the firmware image when synchronising with a remote > > processor rather than booting it. Also part of the process, > > properly set the synchronisation flag in order to properly > > recover the system. > > > > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> > > --- > > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ > > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ > > 2 files changed, 43 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > index ef88d3e84bfb..3a84a38ba37b 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) > > */ > > int rproc_trigger_recovery(struct rproc *rproc) > > { > > - const struct firmware *firmware_p; > > + const struct firmware *firmware_p = NULL; > > struct device *dev = &rproc->dev; > > int ret; > > > > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) > > /* generate coredump */ > > rproc_coredump(rproc); > > > > - /* load firmware */ > > - ret = request_firmware(&firmware_p, rproc->firmware, dev); > > - if (ret < 0) { > > - dev_err(dev, "request_firmware failed: %d\n", ret); > > - goto unlock_mutex; > > + /* load firmware if need be */ > > + if (!rproc_needs_syncing(rproc)) { > > + ret = request_firmware(&firmware_p, rproc->firmware, dev); > > + if (ret < 0) { > > + dev_err(dev, "request_firmware failed: %d\n", ret); > > + goto unlock_mutex; > > + } > > } > > > > - /* boot the remote processor up again */ > > + /* boot up or synchronise with the remote processor again */ > > ret = rproc_start(rproc, firmware_p); > > > > release_firmware(firmware_p); > > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) > > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, > > rproc->name); > > > > + /* > > + * The remote processor has crashed - tell the core what operation > > + * to use from hereon, i.e whether an external entity will reboot > > + * the MCU or it is now the remoteproc core's responsability. > > + */ > > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); > > If I follow the logic correctly, you're essentially using > rproc->sync_with_rproc to pass an additional parameter down through > rproc_trigger_recovery() to tell everyone below to "load firmware and > boot the core or not". I am using the value of rproc::sync_flags::after_crash to set rproc->sync_with_rproc. That way the core can know whether it should boot the remote processor or synchronise with it. > > And given that the comment alludes to some unknown logic determining the > continuation I think it would be much preferable to essentially just > pass rproc->sync_flags.after_crash down through these functions. > The only thing we need to do is set the value of rproc->sync_with_rproc properly, which rproc_set_sync_flag() does. I have decided to use a wrapper function to allow us to change how the rproc->sync_with_rproc is handled without touching anything else in the code. > > And per my comment on a previous patch, is there any synchronization > with the remote controller when this happens? I can't seem to find that comment - can you indicate which patch that was? As it is today the core doesn't provide synchronisation, it is up to the platform driver to probe the remote processor to make sure it is up. I suppose sync_ops::start() would be a perfect candidate for that. > > Regards, > Bjorn > > > + > > mutex_unlock(&rproc->lock); > > > > if (!rproc->recovery_disabled) > > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h > > index 3985c084b184..61500981155c 100644 > > --- a/drivers/remoteproc/remoteproc_internal.h > > +++ b/drivers/remoteproc/remoteproc_internal.h > > @@ -24,6 +24,33 @@ struct rproc_debug_trace { > > struct rproc_mem_entry trace_mem; > > }; > > > > +/* > > + * enum rproc_sync_states - remote processsor sync states > > + * > > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor > > + * has crashed but has not been recovered by > > + * the remoteproc core yet. > > + * > > + * Keeping these separate from the enum rproc_state in order to avoid > > + * introducing coupling between the state of the MCU and the synchronisation > > + * operation to use. > > + */ > > +enum rproc_sync_states { > > + RPROC_SYNC_STATE_CRASHED, > > +}; > > + > > +static inline void rproc_set_sync_flag(struct rproc *rproc, > > + enum rproc_sync_states state) > > +{ > > + switch (state) { > > + case RPROC_SYNC_STATE_CRASHED: > > + rproc->sync_with_rproc = rproc->sync_flags.after_crash; > > + break; > > + default: > > + break; > > + } > > +} > > + > > /* from remoteproc_core.c */ > > void rproc_release(struct kref *kref); > > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id); > > -- > > 2.20.1 > >
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index ef88d3e84bfb..3a84a38ba37b 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) */ int rproc_trigger_recovery(struct rproc *rproc) { - const struct firmware *firmware_p; + const struct firmware *firmware_p = NULL; struct device *dev = &rproc->dev; int ret; @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) /* generate coredump */ rproc_coredump(rproc); - /* load firmware */ - ret = request_firmware(&firmware_p, rproc->firmware, dev); - if (ret < 0) { - dev_err(dev, "request_firmware failed: %d\n", ret); - goto unlock_mutex; + /* load firmware if need be */ + if (!rproc_needs_syncing(rproc)) { + ret = request_firmware(&firmware_p, rproc->firmware, dev); + if (ret < 0) { + dev_err(dev, "request_firmware failed: %d\n", ret); + goto unlock_mutex; + } } - /* boot the remote processor up again */ + /* boot up or synchronise with the remote processor again */ ret = rproc_start(rproc, firmware_p); release_firmware(firmware_p); @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, rproc->name); + /* + * The remote processor has crashed - tell the core what operation + * to use from hereon, i.e whether an external entity will reboot + * the MCU or it is now the remoteproc core's responsability. + */ + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); + mutex_unlock(&rproc->lock); if (!rproc->recovery_disabled) diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h index 3985c084b184..61500981155c 100644 --- a/drivers/remoteproc/remoteproc_internal.h +++ b/drivers/remoteproc/remoteproc_internal.h @@ -24,6 +24,33 @@ struct rproc_debug_trace { struct rproc_mem_entry trace_mem; }; +/* + * enum rproc_sync_states - remote processsor sync states + * + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor + * has crashed but has not been recovered by + * the remoteproc core yet. + * + * Keeping these separate from the enum rproc_state in order to avoid + * introducing coupling between the state of the MCU and the synchronisation + * operation to use. + */ +enum rproc_sync_states { + RPROC_SYNC_STATE_CRASHED, +}; + +static inline void rproc_set_sync_flag(struct rproc *rproc, + enum rproc_sync_states state) +{ + switch (state) { + case RPROC_SYNC_STATE_CRASHED: + rproc->sync_with_rproc = rproc->sync_flags.after_crash; + break; + default: + break; + } +} + /* from remoteproc_core.c */ void rproc_release(struct kref *kref); irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
Refactor function rproc_trigger_recovery() in order to avoid reloading the firmware image when synchronising with a remote processor rather than booting it. Also part of the process, properly set the synchronisation flag in order to properly recover the system. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> --- drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 7 deletions(-)