Message ID | 20230612181038.14421-1-dave@stgolabs.net |
---|---|
Headers | show |
Series | cxl: Support device sanitation | expand |
Hi Davidlohr, > Testing. > ======== > > o There are the mock device tests for Sanitize and Secure Erase. > > o The latest (v2) qemu bg/sanitize support series is posted here: > https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/ That doesn't seem to have support for reading back the security state so the stuff this set adds fails before it gets going. Am I missing another series? Jonathan
On Tue, 13 Jun 2023 16:26:11 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > Hi Davidlohr, > > > Testing. > > ======== > > > > o There are the mock device tests for Sanitize and Secure Erase. > > > > o The latest (v2) qemu bg/sanitize support series is posted here: > > https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/ > > That doesn't seem to have support for reading back the security state so > the stuff this set adds fails before it gets going. > Am I missing another series? > I hacked in enough to get things to carry on... Following might be something I've broken locally but on basis it might not... Note my QEMU is odd right now as I'm in middle of a big refactor of the CCI handling, but this should still not happen even if I happen to have broken QEMU side of things in some weird way. My base is more or less mainline + background set as queued on CXL tree plus this set. However similar traces to below happen when I poke a 1 into sanitize This appears to be: WARN_ON_ONCE(timer->function != delayed_work_timer_fn); then WARN_ON_ONCE(!list_empty(&work->entry)); triggering. ------------[ cut here ]------------ WARNING: CPU: 3 PID: 617 at kernel/workqueue.c:1663 __queue_delayed_work+0xb8/0xe8 Modules linked in: cxl_pmem cxl_mem cxl_port cxl_pmu cxl_acpi cxl_pci cxl_core CPU: 3 PID: 617 Comm: bash Not tainted 6.4.0-rc6+ #790 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown unknown pstate: 014000c5 (nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __queue_delayed_work+0xb8/0xe8 lr : queue_delayed_work_on+0x70/0x98 sp : ffff800008a23b10 x29: ffff800008a23b10 x28: ffff0000f6de5880 x27: ffff0000c14d9c80 x26: ffff8000085510a8 x25: ffff0000c0a490d0 x24: ffff0000c14d9cd8 x23: ffff800008a23da8 x22: fffffffffffffff2 x21: 0000000000000000 x20: 0000000000000001 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffba73a9885008 x15: 0000aaaad5a0b6b0 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffba73a9885078 x8 : ffff800008a23c98 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffffba73a9884ea8 x4 : 0000000000000100 x3 : 00000000000000fa x2 : ffff0000c14d9e98 x1 : ffff0000c0028600 x0 : ffff0000c14d9eb8 Call trace: __queue_delayed_work+0xb8/0xe8 queue_delayed_work_on+0x70/0x98 cxl_pci_mbox_send+0x404/0x580 [cxl_pci] cxl_internal_send_cmd+0x48/0x110 [cxl_core] cxl_mem_sanitize+0xbc/0x140 [cxl_core] security_sanitize_store+0x98/0xf0 [cxl_core] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1ac/0x2e8 ksys_write+0x74/0x110 __arm64_sys_write+0x24/0x38 invoke_syscall.constprop.0+0x58/0xf8 do_el0_svc+0x60/0x168 el0_svc+0x38/0xf0 el0t_64_sync_handler+0xf4/0x120 el0t_64_sync+0x190/0x198 ---[ end trace 0000000000000000 ]---
On Tue, 13 Jun 2023, Jonathan Cameron wrote: >On Tue, 13 Jun 2023 16:26:11 +0100 >Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > >> Hi Davidlohr, >> >> > Testing. >> > ======== >> > >> > o There are the mock device tests for Sanitize and Secure Erase. >> > >> > o The latest (v2) qemu bg/sanitize support series is posted here: >> > https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/ >> >> That doesn't seem to have support for reading back the security state so >> the stuff this set adds fails before it gets going. >> Am I missing another series? Correct, when testing I simply comment out the security state call in cxl_mem_sanitize(). >> >I hacked in enough to get things to carry on... > >Following might be something I've broken locally but on basis >it might not... Note my QEMU is odd right now as I'm in middle >of a big refactor of the CCI handling, but this should still not happen >even if I happen to have broken QEMU side of things in some weird way. > >My base is more or less mainline + background set as queued on CXL >tree plus this set. > >However similar traces to below happen when I poke a 1 into >sanitize > > This appears to be: > > WARN_ON_ONCE(timer->function != delayed_work_timer_fn); >then > WARN_ON_ONCE(!list_empty(&work->entry)); > >triggering. > Oh apologies, my bad. I actually should have tested this last iteration :/ But with the below I replicated the testing without anything falling out (also with polling). I'll wait a bit to see if Dan has any other comments regarding the series before sending in another iteration - otherwise maybe this could be folded in to avoid more email overhead? Thanks, Davidlohr ----8<------- diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index d1df23c19245..008e1c267ce2 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -288,12 +288,12 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds, int i, timeout; /* -+ * Sanitation is a special case which monopolizes the device + * Sanitation is a special case which monopolizes the device * and cannot be timesliced. Handle asynchronously instead, * and allow userspace to poll(2) for completion. */ if (mbox_cmd->opcode == CXL_MBOX_OP_SANITIZE) { - if (cxlds->security.poll_tmo_secs != -1) { + if (cxlds->security.poll) { /* hold the device throughout */ get_device(cxlds->dev);
Davidlohr Bueso wrote: > Hi, > > Changes from v5 (https://lore.kernel.org/linux-cxl/20230526033344.17167-1-dave@stgolabs.net/): > o Added patch 1 which fixes bogus irq handled scenarios when it's not our interrupt. > This should be picked up regardless of the rest of the series (Jonathan) > o Added security.poll boolean instead of using the timeout member (Dave, Jonathan). > o Do not explicitly init security.state (Dave). > o Misc cleanups (Jonathan). > o Updated changelog in patch 4. > o Picked up tags. > > This adds the sanitation part of the background command handling. Some noteworthy items: > > o Treating Sanitation as such a special beast can make the code a bit invasive, > but couldn't find a decent alternative. For example I realize that this is really > ad-hoc code in __cxl_pci_mbox_send_cmd(). A lot of this also comes from the fact > that polling for sanitize is supported, so sw still needs to keep up and serialize. > > o Nothing depends explicitly on CPU cacheline management > > o All sysfs files/attributes in the security directory are visible. > > o Continue to use __ATTR() macros for sysfs attributes instead of the requested > DEVICE_ATTR_*() ones because of the naming the security directory, otherwise > names don't match. > > Patch 1 fixes mbox isr. > Patch 2: adds a new security/state file. > Patch 3 paves the required sanitation handling code before actually using it. > Patch 4,5 wires up sanitation + unit test. > Patch 6,7 wires up secure erase + unit test. > > Testing. > ======== > > o There are the mock device tests for Sanitize and Secure Erase. > > o The latest (v2) qemu bg/sanitize support series is posted here: > https://lore.kernel.org/linux-cxl/20230418172337.19207-1-dave@stgolabs.net/ > > (1) Window where driver is out of sync with hw (Sanitation async polling). > > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 159.297482] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400 > [ 159.298648] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms > [ 159.299908] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started > >>>> qemu informs sanitation is done <<<<< > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 165.897345] cxl_pci 0000:37:00.0: Failed to sanitize device : -16 > [ 171.692050] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 173.373337] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400 > [ 173.374498] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms > [ 173.375727] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started > > (2) Perform sanitation of more than one memdev at a time (Sanitation async polling). > > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize > [ 351.287129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400 > [ 351.288403] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms > [ 351.289706] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 353.058614] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:37:00.0: Sending command: 0x4400 > [ 353.059854] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:37:00.0: Doorbell wait took 0ms > [ 353.061126] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:37:00.0: Sanitation operation started > >>>> qemu informs sanitation is done <<<<< > >>>> qemu informs sanitation is done <<<<< > [ 363.692138] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:36:00.0: Sanitation operation ended > [ 365.227416] cxl_pci:cxl_mbox_sanitize_work:147: cxl_pci 0000:37:00.0: Sanitation operation ended > > (3) Perform sanitation of more than one memdev at a time (Sanitation async irq). > > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem1/security/sanitize > [ 193.729821] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:c1:00.0: Sending command: 0x4400 > [ 193.731071] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:c1:00.0: Doorbell wait took 0ms > [ 193.732360] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:c1:00.0: Sanitation operation started > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 197.001466] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400 > [ 197.002694] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms > [ 197.003956] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started > >>>> qemu says sanitation is done <<<< > [ 197.731473] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:c1:00.0: Sanitation operation ended > >>>> qemu says sanitation is done <<<< > [ 201.003258] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended > > (4) Forbid new sanitation while one is in progress (Sanitation async irq). > > [root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/state > disabled > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 39.284258] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400 > [ 39.285459] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms > [ 39.286723] cxl_pci:__cxl_pci_mbox_send_cmd:295: cxl_pci 0000:36:00.0: Sanitation operation started > [root@fedora ~]# cat /sys/bus/cxl/devices/mem0/security/state > sanitize > [root@fedora ~]# echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize > [ 42.697129] cxl_pci:__cxl_pci_mbox_send_cmd:243: cxl_pci 0000:36:00.0: Sending command: 0x4400 > [ 42.698323] cxl_pci:cxl_pci_mbox_wait_for_doorbell:73: cxl_pci 0000:36:00.0: Doorbell wait took 0ms > [ 42.699525] cxl_pci:__cxl_pci_mbox_send_cmd:335: cxl_pci 0000:36:00.0: Mailbox operation had an error: ongoing background operation > [ 42.701119] cxl_pci 0000:36:00.0: Failed to sanitize device : -6 > >>>> qemu says sanitation is done <<<< > [ 43.285334] cxl_pci:cxl_pci_mbox_irq:119: cxl_pci 0000:36:00.0: Sanitation operation ended > > > Applies against 'for-6.5/cxl-background' from cxl.git. > > Please consider for v6.5. This looks good to me after s/([sS])anitation/\1anitization/ throughout and some othe minor fixups that I mentioned. Merged for v6.5.
On Sun, 25 Jun 2023, Dan Williams wrote: >This looks good to me after s/([sS])anitation/\1anitization/ throughout >and some othe minor fixups that I mentioned. > >Merged for v6.5. fyi per https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/commit/?h=pending&id=0c36b6ad436a38b167af16e6c690c890b8b2df62 this seems to be missing this fix: https://lore.kernel.org/linux-cxl/ispowzuk5fg2u7zbpenocvyihh4t3kywtszcbg245unypzduex@jenknfy62rjl/ Thanks, Davidlohr
Davidlohr Bueso wrote: > On Sun, 25 Jun 2023, Dan Williams wrote: > > >This looks good to me after s/([sS])anitation/\1anitization/ throughout > >and some othe minor fixups that I mentioned. > > > >Merged for v6.5. > > fyi per https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/commit/?h=pending&id=0c36b6ad436a38b167af16e6c690c890b8b2df62 > this seems to be missing this fix: > > https://lore.kernel.org/linux-cxl/ispowzuk5fg2u7zbpenocvyihh4t3kywtszcbg245unypzduex@jenknfy62rjl/ Oh, I missed that. Can you send it as a follow-on? Also make sure you have "core.abbrev = 12" in your git config if you include a Fixes: tag.