Message ID | 1631300645-27662-1-git-send-email-wenxiong@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [V2,1/1] scsi/ses: Saw "Failed to get diagnostic page 0x1" | expand |
Hi, Thank you for the patch! Yet something to improve: [auto build test ERROR on scsi/for-next] [also build test ERROR on mkp-scsi/for-next v5.14 next-20210910] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/wenxiong-linux-vnet-ibm-com/scsi-ses-Saw-Failed-to-get-diagnostic-page-0x1/20210911-043434 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: microblaze-buildonly-randconfig-r001-20210910 (attached as .config) compiler: microblaze-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/a117eeeef2a13989a97ac0e10d86ffa6314f481e git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review wenxiong-linux-vnet-ibm-com/scsi-ses-Saw-Failed-to-get-diagnostic-page-0x1/20210911-043434 git checkout a117eeeef2a13989a97ac0e10d86ffa6314f481e # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash drivers/scsi/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): In file included from include/uapi/linux/posix_types.h:5, from include/uapi/linux/types.h:14, from include/linux/types.h:6, from include/linux/kasan-checks.h:5, from include/asm-generic/rwonce.h:26, from ./arch/microblaze/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:264, from include/asm-generic/bug.h:5, from ./arch/microblaze/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/gfp.h:5, from include/linux/slab.h:15, from drivers/scsi/ses.c:8: drivers/scsi/ses.c: In function 'ses_recv_diag': >> include/linux/stddef.h:8:14: warning: passing argument 7 of 'scsi_execute_req' makes integer from pointer without a cast [-Wint-conversion] 8 | #define NULL ((void *)0) | ^~~~~~~~~~~ | | | void * drivers/scsi/ses.c:95:42: note: in expansion of macro 'NULL' 95 | bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL); | ^~~~ In file included from include/scsi/scsi_cmnd.h:12, from drivers/scsi/ses.c:15: include/scsi/scsi_device.h:467:61: note: expected 'int' but argument is of type 'void *' 467 | unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout, | ~~~~^~~~~~~ >> drivers/scsi/ses.c:61:21: warning: passing argument 9 of 'scsi_execute_req' makes pointer from integer without a cast [-Wint-conversion] 61 | #define SES_RETRIES 3 | ^ | | | int drivers/scsi/ses.c:95:61: note: in expansion of macro 'SES_RETRIES' 95 | bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL); | ^~~~~~~~~~~ In file included from include/scsi/scsi_cmnd.h:12, from drivers/scsi/ses.c:15: include/scsi/scsi_device.h:468:27: note: expected 'int *' but argument is of type 'int' 468 | int retries, int *resid) | ~~~~~^~~~~ >> drivers/scsi/ses.c:94:24: error: too many arguments to function 'scsi_execute_req' 94 | ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, | ^~~~~~~~~~~~~~~~ In file included from include/scsi/scsi_cmnd.h:12, from drivers/scsi/ses.c:15: include/scsi/scsi_device.h:465:19: note: declared here 465 | static inline int scsi_execute_req(struct scsi_device *sdev, | ^~~~~~~~~~~~~~~~ vim +/scsi_execute_req +94 drivers/scsi/ses.c 59 60 #define SES_TIMEOUT (30 * HZ) > 61 #define SES_RETRIES 3 62 63 static void init_device_slot_control(unsigned char *dest_desc, 64 struct enclosure_component *ecomp, 65 unsigned char *status) 66 { 67 memcpy(dest_desc, status, 4); 68 dest_desc[0] = 0; 69 /* only clear byte 1 for ENCLOSURE_COMPONENT_DEVICE */ 70 if (ecomp->type == ENCLOSURE_COMPONENT_DEVICE) 71 dest_desc[1] = 0; 72 dest_desc[2] &= 0xde; 73 dest_desc[3] &= 0x3c; 74 } 75 76 77 static int ses_recv_diag(struct scsi_device *sdev, int page_code, 78 void *buf, int bufflen) 79 { 80 int ret; 81 unsigned char cmd[] = { 82 RECEIVE_DIAGNOSTIC, 83 1, /* Set PCV bit */ 84 page_code, 85 bufflen >> 8, 86 bufflen & 0xff, 87 0 88 }; 89 unsigned char recv_page_code; 90 struct scsi_sense_hdr sshdr; 91 int retries = SES_RETRIES; 92 93 do { > 94 ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, 95 bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL); 96 97 } while (scsi_sense_valid(&sshdr) && 98 sshdr.sense_key == UNIT_ATTENTION && --retries); 99 100 if (unlikely(ret)) 101 return ret; 102 103 recv_page_code = ((unsigned char *)buf)[0]; 104 105 if (likely(recv_page_code == page_code)) 106 return ret; 107 108 /* successful diagnostic but wrong page code. This happens to some 109 * USB devices, just print a message and pretend there was an error */ 110 111 sdev_printk(KERN_ERR, sdev, 112 "Wrong diagnostic page; asked for %d got %u\n", 113 page_code, recv_page_code); 114 115 return -EINVAL; 116 } 117 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Fri, 2021-09-10 at 14:04 -0500, wenxiong@linux.vnet.ibm.com wrote: > From: Wen Xiong <root@ltczz405-lp2.aus.stglabs.ibm.com> > > We saw two errors with Slider drawer: > - Failed to get diagnostic page 0x1 during booting up > - /sys/class/enclosure is empty with ipr adapter + Slider drawer > > From scsi logging level with error=3, looks ses_recv_diag not try on > a UA. The patch addes retrying on a UA in ses_recv_diag(); Do we know why the device is returning a UA? Presumably it's a check condition UA meaning the device is trying to tell us something and wants us to request sense? > Signed-Off-by: Wen Xiong<wenxiong@linux.vnet.ibm.com> > Reviewed-by: Brian King <brking@linux.vnet.ibm.com> > --- > drivers/scsi/ses.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c > index c2afba2a5414..93f6a8ce1bea 100644 > --- a/drivers/scsi/ses.c > +++ b/drivers/scsi/ses.c > @@ -87,9 +87,16 @@ static int ses_recv_diag(struct scsi_device *sdev, > int page_code, > 0 > }; > unsigned char recv_page_code; > + struct scsi_sense_hdr sshdr; > + int retries = SES_RETRIES; > + > + do { > + ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, > buf, > + bufflen, &sshdr, NULL, SES_TIMEOUT, This grew an additional argument: you want to replace the NULL with the sshdr, I think ... and compile test it next time. > SES_RETRIES, NULL); I think you want a 1 here instead of SES_RETRIES because you're retrying for SES_RETRIES in an outer loop now, so if you maxed out both sets of retries, you'd retry for SES_RETRIES^2. If this is a CC/UA, you can simplify all this by setting cmd->expecting_cc_ua = 1 and avoiding the loop. James
Hi Wendy!
> I am going to re-do V2 patch with retries. Is it ok?
We probably do not want to blindly retry on all errors. We should only
retry in cases where the command has the potential to subsequently
succeed. Hence the request to retry "transient errors" in my previous
mail.
Classifying which errors should result in a retry is the important
piece. And for that we need to know exactly what error your tray is
returning.
I also suggest you have a look at scsi_check_sense().
Thanks!
Wendy, > Below is error with enabling scsi_log_level; > > [108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention [current] > [108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset function occurred OK. I propose you refine your check to retry on NOT_READY as well as UNIT_ATTENTION with ASC 0x29. I think that would be a good start. Do the same for ses_send_diag(). Thanks!
On Tue, 2021-09-14 at 11:27 -0400, Martin K. Petersen wrote: > Wendy, > > > Below is error with enabling scsi_log_level; > > > > [108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention > > [current] > > [108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset > > function occurred > > OK. I propose you refine your check to retry on NOT_READY as well as > UNIT_ATTENTION with ASC 0x29. I think that would be a good start. Do > the same for ses_send_diag(). Something is wrong with the outbound email: the list isn't getting copies of this: https://lore.kernel.org/all/yq1v934yysg.fsf@ca-mkp.ca.oracle.com/ because you're sending HTML email ... please don't. I still think setting expecting_cc_ua is the best thing to do because it's designed for exactly the problem you're seeing, but open coding it in the loop would be fine as well. James
Wendy, A few small additional nits... > From: Wen Xiong <root@ltczz405-lp2.aus.stglabs.ibm.com> Please make sure your email address is correct. > Signed-Off-by: Wen Xiong<wenxiong@linux.vnet.ibm.com> This should be Signed-off-by: and you need a space before your email address. > + ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, > + bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL); The sense header goes in the field before SES_TIMEOUT: int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, int data_direction, void *buffer, unsigned bufflen, unsigned char *sense, struct scsi_sense_hdr *sshdr, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ int timeout, int retries, u64 flags, req_flags_t rq_flags, ^^^^^^^^^^^ int *resid) Thanks! PS. Bonus points for whoever fixes up the scsi_execute calls to have less than 10,000 arguments. One would be good. And some sensible input validation/type checking.
James, > I still think setting expecting_cc_ua is the best thing to do because > it's designed for exactly the problem you're seeing, but open coding > it in the loop would be fine as well. The problem with that approach is that (as far as I can tell) we didn't issue the reset in question. We could explicitly set the flag in ses_recv_diag(). However, even though scsi_check_sense() returns RETRY, scsi_decide_disposition() bypasses it because a check condition is considered a "no retry" command by scsi_noretry_cmd(). So despite expecting_cc_ua being set, the retry is not performed. Fun can of worms...
diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index c2afba2a5414..93f6a8ce1bea 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -87,9 +87,16 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code, 0 }; unsigned char recv_page_code; + struct scsi_sense_hdr sshdr; + int retries = SES_RETRIES; + + do { + ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, + bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL); + + } while (scsi_sense_valid(&sshdr) && + sshdr.sense_key == UNIT_ATTENTION && --retries); - ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, - NULL, SES_TIMEOUT, SES_RETRIES, NULL); if (unlikely(ret)) return ret;