Message ID | 20220214021747.4976-1-damien.lemoal@opensource.wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | libsas and pm8001 fixes | expand |
On 2022/02/14 11:17, Damien Le Moal wrote: > This first part of this series (patches 1 to 24) fixes handling of NCQ > NON DATA commands in libsas and many bugs in the pm8001 driver. > > The fixes for the pm8001 driver include: > * Suppression of all sparse warnings, fixing along the way many le32 > handling bugs for big-endian architectures > * Fix handling of NCQ NON DATA commands > * Fix of tag values handling (0 *is* a valid tag value) > * Fix many tag iand memory leaks in error path > * Fix NCQ error recovery (abort all task execution) that was causing a > crash > > The second part of the series (patches 25 to 31) iadd a small cleanup of > libsas code and many simplifications of the pm8001 driver code. > > With these fixes, libzbc test suite passes all test case. This test > suite was used with an SMR drive for testing because it generates many > NCQ NON DATA commands (for zone management commands) and also generates > many NCQ command errors to check ASC/ASCQ returned by the device. With > the test suite, the error recovery path was extensively exercised. The > same tests were also executed with a SAS SMR drives to exercise the > error path. > > The patches are based on the 5.18/scsi-staging tree. Martin, Note that there is a conflict between 5.18/scsi-staging and 5.17-rc3/4 in the pm8001 driver. And I need to touch rc3/rc4 code too. Could you rebase scsi-staging ? Best regards. > > Changes from v2: > * Reorganized the series: fixes first, cleanups second. > * Added more bug/leaks fix patches > * Addressed Gary's comment for the ccb alloc helper (patch 28) > * Rebased (and tested) all patches on 5.18/scsi-staging > > Changes from v1: > * Added reviewed-by tags > * Addressed Christoph's comments on patch 4 and 8 > * Added patches 21 and 22 to fix 2 additional problems found while > preparing this v2 series > * Added patch 23 and 24 to cleanup the code further. > > Damien Le Moal (31): > scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands > scsi: pm8001: Fix __iomem pointer use in pm8001_phy_control() > scsi: pm8001: Fix pm8001_update_flash() local variable type > scsi: pm8001: Fix command initialization in pm80XX_send_read_log() > scsi: pm8001: Fix pm80xx_pci_mem_copy() interface > scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req() > scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config() > scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config() > scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update() > scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req() > scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req() > scsi: pm8001: Fix use of struct set_phy_profile_req fields > scsi: pm8001: Remove local variable in pm8001_pci_resume() > scsi: pm8001: Fix NCQ NON DATA command task initialization > scsi: pm8001: Fix NCQ NON DATA command completion handling > scsi: pm8001: Fix abort all task initialization > scsi: pm8001: Fix pm8001_tag_alloc() failures handling > scsi: pm8001: Fix pm80xx_chip_phy_ctl_req() > scsi: pm8001: Fix pm8001_mpi_task_abort_resp() > scsi: pm8001: Fix tag values handling > scsi: pm8001: Fix task leak in pm8001_send_abort_all() > scsi: pm8001: Fix tag leaks on error > scsi: pm8001: fix memory leak in pm8001_chip_fw_flash_update_req() > scsi: pm8001: Fix process_one_iomb() kdoc comment > scsi: libsas: Simplify sas_ata_qc_issue() detection of NCQ commands > scsi: pm8001: Simplify pm8001_get_ncq_tag() > scsi: pm8001: Cleanup pm8001_queue_command() > scsi: pm8001: Introduce ccb alloc/free helpers > scsi: pm8001: Simplify pm8001_mpi_build_cmd() interface > scsi: pm8001: Simplify pm8001_task_exec() > scsi: pm8001: Simplify pm8001_ccb_task_free() > > drivers/scsi/libsas/sas_ata.c | 11 +- > drivers/scsi/pm8001/pm8001_ctl.c | 5 +- > drivers/scsi/pm8001/pm8001_hwi.c | 450 ++++++++++++----------------- > drivers/scsi/pm8001/pm8001_init.c | 11 +- > drivers/scsi/pm8001/pm8001_sas.c | 297 +++++++++---------- > drivers/scsi/pm8001/pm8001_sas.h | 64 ++++- > drivers/scsi/pm8001/pm80xx_hwi.c | 458 ++++++++++++++---------------- > drivers/scsi/pm8001/pm80xx_hwi.h | 2 +- > 8 files changed, 605 insertions(+), 693 deletions(-) >
On 14/02/2022 02:17, Damien Le Moal wrote: > This first part of this series (patches 1 to 24) fixes handling of NCQ > NON DATA commands in libsas and many bugs in the pm8001 driver. > > The fixes for the pm8001 driver include: > * Suppression of all sparse warnings, fixing along the way many le32 > handling bugs for big-endian architectures > * Fix handling of NCQ NON DATA commands > * Fix of tag values handling (0*is* a valid tag value) > * Fix many tag iand memory leaks in error path > * Fix NCQ error recovery (abort all task execution) that was causing a > crash > > The second part of the series (patches 25 to 31) iadd a small cleanup of > libsas code and many simplifications of the pm8001 driver code. > > With these fixes, libzbc test suite passes all test case. This test > suite was used with an SMR drive for testing because it generates many > NCQ NON DATA commands (for zone management commands) and also generates > many NCQ command errors to check ASC/ASCQ returned by the device. With > the test suite, the error recovery path was extensively exercised. The > same tests were also executed with a SAS SMR drives to exercise the > error path. > > The patches are based on the 5.18/scsi-staging tree. Hi Damien, jfyi, I still see the hang with this series. I don't think that the tag fixes were relevant unfortunately. btw, how about add guys from get_maintainers.pl to lighten the review workload (and we should have the official maintainer anyway)? There are quite a few patches now... Thanks, John
On 2/15/22 03:06, John Garry wrote: > On 14/02/2022 02:17, Damien Le Moal wrote: >> This first part of this series (patches 1 to 24) fixes handling of NCQ >> NON DATA commands in libsas and many bugs in the pm8001 driver. >> >> The fixes for the pm8001 driver include: >> * Suppression of all sparse warnings, fixing along the way many le32 >> handling bugs for big-endian architectures >> * Fix handling of NCQ NON DATA commands >> * Fix of tag values handling (0*is* a valid tag value) >> * Fix many tag iand memory leaks in error path >> * Fix NCQ error recovery (abort all task execution) that was causing a >> crash >> >> The second part of the series (patches 25 to 31) iadd a small cleanup of >> libsas code and many simplifications of the pm8001 driver code. >> >> With these fixes, libzbc test suite passes all test case. This test >> suite was used with an SMR drive for testing because it generates many >> NCQ NON DATA commands (for zone management commands) and also generates >> many NCQ command errors to check ASC/ASCQ returned by the device. With >> the test suite, the error recovery path was extensively exercised. The >> same tests were also executed with a SAS SMR drives to exercise the >> error path. >> >> The patches are based on the 5.18/scsi-staging tree. > > Hi Damien, > > jfyi, I still see the hang with this series. I don't think that the tag > fixes were relevant unfortunately. As mentioned above, I did test with a SAS drive too (an SMR one to heavily test the error path) and it worked perfectly. Note that using Martin's rc1 based scsi-staging tree, I did see a lot of KASAN complaints on boot regarding MSI/PCI setup. These warnings are gone with rc3/4. What kernel version base are you using ? I could not find the ARM board I have in the lab yesterday. Will try again to find it and test with it. > > btw, how about add guys from get_maintainers.pl to lighten the review > workload (and we should have the official maintainer anyway)? There are > quite a few patches now... I did that... I will check again. I may have made mistake creating the distribution list. > > Thanks, > John
Hi Damien! > Note that there is a conflict between 5.18/scsi-staging and 5.17-rc3/4 > in the pm8001 driver. And I need to touch rc3/rc4 code too. Could you > rebase scsi-staging ? I pulled in 5.17/scsi-fixes and fixed up the conflicts. I suggest you merge your -rc of choice when testing. That's what I do if the baseline kernel has problems on my lab systems.
On 2/15/22 12:18, Martin K. Petersen wrote: > > Hi Damien! > >> Note that there is a conflict between 5.18/scsi-staging and 5.17-rc3/4 >> in the pm8001 driver. And I need to touch rc3/rc4 code too. Could you >> rebase scsi-staging ? > > I pulled in 5.17/scsi-fixes and fixed up the conflicts. Thanks ! > I suggest you merge your -rc of choice when testing. That's what I do if > the baseline kernel has problems on my lab systems. That is what I normally do, but for some reason, this time, I kept ending up with a non working kernel... I must have done something very wrong :)
On 14/02/2022 22:29, Damien Le Moal wrote: > On 2/15/22 03:06, John Garry wrote: >> On 14/02/2022 02:17, Damien Le Moal wrote: >>> This first part of this series (patches 1 to 24) fixes handling of NCQ >>> NON DATA commands in libsas and many bugs in the pm8001 driver. >>> >>> The fixes for the pm8001 driver include: >>> * Suppression of all sparse warnings, fixing along the way many le32 >>> handling bugs for big-endian architectures >>> * Fix handling of NCQ NON DATA commands >>> * Fix of tag values handling (0*is* a valid tag value) >>> * Fix many tag iand memory leaks in error path >>> * Fix NCQ error recovery (abort all task execution) that was causing a >>> crash >>> >>> The second part of the series (patches 25 to 31) iadd a small cleanup of >>> libsas code and many simplifications of the pm8001 driver code. >>> >>> With these fixes, libzbc test suite passes all test case. This test >>> suite was used with an SMR drive for testing because it generates many >>> NCQ NON DATA commands (for zone management commands) and also generates >>> many NCQ command errors to check ASC/ASCQ returned by the device. With >>> the test suite, the error recovery path was extensively exercised. The >>> same tests were also executed with a SAS SMR drives to exercise the >>> error path. >>> >>> The patches are based on the 5.18/scsi-staging tree. >> >> Hi Damien, >> >> jfyi, I still see the hang with this series. I don't think that the tag >> fixes were relevant unfortunately. > > As mentioned above, I did test with a SAS drive too (an SMR one to > heavily test the error path) and it worked perfectly. > Note that using Martin's rc1 based scsi-staging tree, I did see a lot of > KASAN complaints on boot regarding MSI/PCI setup. These warnings are > gone with rc3/4. What kernel version base are you using ? I was using mkp 5.18 staging tree from yesterday as baseline: https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.17-damien-pm8001-v3 If I start using rc4 then I would get conflicts (which obviously would be resolvable easily enough), but I doubt it is an issue (in using rc1). > > I could not find the ARM board I have in the lab yesterday. Will try > again to find it and test with it. That would be brilliant if you could. This problem could even be down to defconfig differences (between arm64 and x86_64). Thanks, John