Message ID | 20200721114326.305790-1-vaibhav@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8a5b27ff18a561106926d108834b15249a1b102c |
Headers | show |
Series | [ndctl] papr: Check for command type in papr_xlat_firmware_status() | expand |
On Tue, 2020-07-21 at 17:13 +0530, Vaibhav Jain wrote: > We recently discovered intermittent failures while reading label-area > of PAPR-NVDimms and the command 'read-labels' would in such a case > generated empty output like below: > > $ sudo ndctl read-labels -j nmem0 > [ > ] > read 0 nmem > > Upon investigation we found that this was caused by a spurious error > code returned from ndctl_cmd_submit_xlat() when its called from > ndctl_dimm_read_label_extent() while trying to read the label-area > contents of the NVDIMM. > > Digging further it was relieved that ndctl_cmd_submit_xlat() would > always call papr_xlat_firmware_status() via pointer > 'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware > status for all ndctl_cmds even though they arent really PAPR PDSM > commands. > > In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was > represented by type 'struct nd_cmd_get_config_data_hdr' and > papr_xlat_firmware_status() incorrectly assumed it to be of type > 'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid > value. > > A proper fix for this would probably need introducing a new ndctl_cmd > callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one > introduced in [1]. However such a change could be disruptive, hence > the patch introduces a small workaround in papr_xlat_firmware_status() > that checks if the 'struct ndctl_cmd *' provided if of correct type > CMD_CALL and if not then it ignores it and return '0' > > References: > [1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status > reporting") > > Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM requests") > Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > ndctl/lib/papr.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c > index d9ce253369b3..63561f8f9797 100644 > --- a/ndctl/lib/papr.c > +++ b/ndctl/lib/papr.c > @@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd) > > static int papr_xlat_firmware_status(struct ndctl_cmd *cmd) > { > - const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd); > - > - return pcmd->cmd_status; > + return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0; Is this correct? -- for non ND_CMD_CALL commands this will always return a 0, and it seems like you will lose any error state for commands like ND_CMD_SET_CONFIG_DATA. > } > > /* Verify if the given command is supported and valid */
Vishal Verma <vishal.l.verma@intel.com> writes: <snip> >> static int papr_xlat_firmware_status(struct ndctl_cmd *cmd) >> { >> - const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd); >> - >> - return pcmd->cmd_status; >> + return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0; > > Is this correct? -- for non ND_CMD_CALL commands this will always return a 0, > and it seems like you will lose any error state for commands > like ND_CMD_SET_CONFIG_DATA. This behaviour is similar to what ndctl_cmd_xlat_firmware_status() does when corresponding dimm-ops are missing the 'xlat_firmware_status' callback. Also ndctl_cmd_submit_xlat() returns the rc from ndctl_cmd_submit() in case ndctl_cmd_xlat_firmware_status() returns '0', which corresponds to 'ndctl_cmd.status' field. So any error codes returned from ndctl_cmd_submit() are still returned back to the caller even though papr_xlat_firmware_status() returned '0'. > >> } >> >> /* Verify if the given command is supported and valid */ > >
On Wed, 2020-07-22 at 07:29 +0530, Vaibhav Jain wrote: > Vishal Verma <vishal.l.verma@intel.com> writes: > > <snip> > > > static int papr_xlat_firmware_status(struct ndctl_cmd *cmd) > > > { > > > - const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd); > > > - > > > - return pcmd->cmd_status; > > > + return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : > > > 0; > > > > Is this correct? -- for non ND_CMD_CALL commands this will always > > return a 0, > > and it seems like you will lose any error state for commands > > like ND_CMD_SET_CONFIG_DATA. > This behaviour is similar to what ndctl_cmd_xlat_firmware_status() > does > when corresponding dimm-ops are missing the 'xlat_firmware_status' > callback. > > Also ndctl_cmd_submit_xlat() returns the rc from ndctl_cmd_submit() > in case ndctl_cmd_xlat_firmware_status() returns '0', which > corresponds > to 'ndctl_cmd.status' field. So any error codes > returned from ndctl_cmd_submit() are still returned back to the caller > even though papr_xlat_firmware_status() returned '0'. Ah yes you're right. I'll queue this up for v69 and we can do the more involved fix in the next cycle. Thanks! -Vishal
diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c index d9ce253369b3..63561f8f9797 100644 --- a/ndctl/lib/papr.c +++ b/ndctl/lib/papr.c @@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd) static int papr_xlat_firmware_status(struct ndctl_cmd *cmd) { - const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd); - - return pcmd->cmd_status; + return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0; } /* Verify if the given command is supported and valid */
We recently discovered intermittent failures while reading label-area of PAPR-NVDimms and the command 'read-labels' would in such a case generated empty output like below: $ sudo ndctl read-labels -j nmem0 [ ] read 0 nmem Upon investigation we found that this was caused by a spurious error code returned from ndctl_cmd_submit_xlat() when its called from ndctl_dimm_read_label_extent() while trying to read the label-area contents of the NVDIMM. Digging further it was relieved that ndctl_cmd_submit_xlat() would always call papr_xlat_firmware_status() via pointer 'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware status for all ndctl_cmds even though they arent really PAPR PDSM commands. In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was represented by type 'struct nd_cmd_get_config_data_hdr' and papr_xlat_firmware_status() incorrectly assumed it to be of type 'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid value. A proper fix for this would probably need introducing a new ndctl_cmd callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one introduced in [1]. However such a change could be disruptive, hence the patch introduces a small workaround in papr_xlat_firmware_status() that checks if the 'struct ndctl_cmd *' provided if of correct type CMD_CALL and if not then it ignores it and return '0' References: [1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status reporting") Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM requests") Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> --- ndctl/lib/papr.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)