Message ID | 20230921141935.3973069-1-robbarnes@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drivers/cros_ec: Reduce log polling period to 2s | expand |
On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote: > 2. Yields more recent logs prior to a crash, facilitating easier debugging. The approach depends on multiple parties before a system reboot: EC reports on panic, AP sends EC commands for reading the logs, userland programs (e.g. timberslide) write the logs to filesystem, and filesystem synchronization. If anyone in the path didn't work in time, the logs disappear. An random idea: could we save the EC console logs to pstore when AP gets notification on EC panic? > --- > drivers/platform/chrome/cros_ec_debugfs.c | 2 +- If you have chance to send next version, please refer `git log` on the file to see a good candidate of title prefixes.
On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote: > > 2. Yields more recent logs prior to a crash, facilitating easier debugging. > > The approach depends on multiple parties before a system reboot: EC reports > on panic, AP sends EC commands for reading the logs, userland programs > (e.g. timberslide) write the logs to filesystem, and filesystem > synchronization. If anyone in the path didn't work in time, the logs > disappear. This change is just affecting the continuous polling period of the cros_ec driver. Syncing the log immediately after a panic is a separate path and will only work when EC supports system safe mode recovery, most don't. The hope with shortening the period is that cros_ec.previous will contain more relevant logs after a crash, even when safe mode isn't enabled. > > An random idea: could we save the EC console logs to pstore when AP gets > notification on EC panic? This could work. It would be a separate effort. > > > --- > > drivers/platform/chrome/cros_ec_debugfs.c | 2 +- > > If you have chance to send next version, please refer `git log` on the file > to see a good candidate of title prefixes. Ack.
On Tue, Sep 26, 2023 at 12:57:22PM -0600, Rob Barnes wrote: > On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote: > > > 2. Yields more recent logs prior to a crash, facilitating easier debugging. > > > > The approach depends on multiple parties before a system reboot: EC reports > > on panic, AP sends EC commands for reading the logs, userland programs > > (e.g. timberslide) write the logs to filesystem, and filesystem > > synchronization. If anyone in the path didn't work in time, the logs > > disappear. > > This change is just affecting the continuous polling period of the > cros_ec driver. > Syncing the log immediately after a panic is a separate path and will only work > when EC supports system safe mode recovery, most don't. I guess I misunderstood. Is the understanding "EC can't respond to further host commands from AP after EC crashed unless some more special supports (e.g. system safe mode)" correct? If yes, how about after EC reported a panic but before EC crashed? Can EC respond to host commands during the period? > The hope with shortening the period is that cros_ec.previous will contain more > relevant logs after a crash, even when safe mode isn't enabled. What is current approach for getting EC crash dumps/logs from AP? If the system didn't save them before system reset, are they available after next boot?
Sorry for dropping this conversation. I'm pursuing a different approach for setting the ec log period. See https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/5509884 for the draft. I will send the patch to this list after internal review. Also answering your questions below. On Thu, Sep 28, 2023 at 1:07 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Tue, Sep 26, 2023 at 12:57:22PM -0600, Rob Barnes wrote: > > On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > > > On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote: > > > > 2. Yields more recent logs prior to a crash, facilitating easier debugging. > > > > > > The approach depends on multiple parties before a system reboot: EC reports > > > on panic, AP sends EC commands for reading the logs, userland programs > > > (e.g. timberslide) write the logs to filesystem, and filesystem > > > synchronization. If anyone in the path didn't work in time, the logs > > > disappear. > > > > This change is just affecting the continuous polling period of the > > cros_ec driver. > > Syncing the log immediately after a panic is a separate path and will only work > > when EC supports system safe mode recovery, most don't. > > I guess I misunderstood. Is the understanding "EC can't respond to further > host commands from AP after EC crashed unless some more special supports > (e.g. system safe mode)" correct? > > If yes, how about after EC reported a panic but before EC crashed? Can EC > respond to host commands during the period? System safe mode allows the EC to temporarily recover from some types of panics. The EC is able to respond to host commands in this scenario. > > > > The hope with shortening the period is that cros_ec.previous will contain more > > relevant logs after a crash, even when safe mode isn't enabled. > > What is current approach for getting EC crash dumps/logs from AP? If the > system didn't save them before system reset, are they available after next > > boot? The EC does save a small panicinfo struct in uninitialized RAM that is preserved across reset. This struct does not include logs, so any EC logs that are not sync'd to the OS before reset are lost.
diff --git a/drivers/platform/chrome/cros_ec_debugfs.c b/drivers/platform/chrome/cros_ec_debugfs.c index 9044c6bab770c..c96493504127d 100644 --- a/drivers/platform/chrome/cros_ec_debugfs.c +++ b/drivers/platform/chrome/cros_ec_debugfs.c @@ -21,7 +21,7 @@ #define LOG_SHIFT 14 #define LOG_SIZE (1 << LOG_SHIFT) -#define LOG_POLL_SEC 10 +#define LOG_POLL_SEC 2 #define CIRC_ADD(idx, size, value) (((idx) + (value)) & ((size) - 1))
Lowering the log polling interval to 2 seconds provides two advantages: 1. Minimizes the risk of EC internal log buffer overfilling and truncating. 2. Yields more recent logs prior to a crash, facilitating easier debugging. Signed-off-by: Rob Barnes <robbarnes@google.com> --- drivers/platform/chrome/cros_ec_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)