mbox series

[v6,00/12] dmaengine: qcom: bam_dma: add cmd descriptor support

Message ID 20250115103004.3350561-1-quic_mdalam@quicinc.com (mailing list archive)
Headers show
Series dmaengine: qcom: bam_dma: add cmd descriptor support | expand

Message

Md Sadre Alam Jan. 15, 2025, 10:29 a.m. UTC
Requirements:
  In QCE crypto driver we are accessing the crypto engine registers 
  directly via CPU read/write. Trust Zone could possibly to perform some
  crypto operations simultaneously, a race condition will be created and
  this could result in undefined behavior.

  To avoid this behavior we need to use BAM HW LOCK/UNLOCK feature on BAM 
  pipes, and this LOCK/UNLOCK will be set via sending a command descriptor,
  where the HLOS/TZ QCE crypto driver prepares a command descriptor with a
  dummy write operation on one of the QCE crypto engine register and pass
  the LOCK/UNLOCK flag along with it.

  This feature tested with tcrypt.ko and "libkcapi" with all the AES 
  algorithm supported by QCE crypto engine. Tested on IPQ9574 and 
  qcm6490.LE chipset.

  insmod tcrypt.ko mode=101
  insmod tcrypt.ko mode=102
  insmod tcrypt.ko mode=155
  insmod tcrypt.ko mode=180
  insmod tcrypt.ko mode=181
  insmod tcrypt.ko mode=182
  insmod tcrypt.ko mode=185
  insmod tcrypt.ko mode=186
  insmod tcrypt.ko mode=212
  insmod tcrypt.ko mode=216
  insmod tcrypt.ko mode=403
  insmod tcrypt.ko mode=404
  insmod tcrypt.ko mode=500
  insmod tcrypt.ko mode=501
  insmod tcrypt.ko mode=502
  insmod tcrypt.ko mode=600
  insmod tcrypt.ko mode=601
  insmod tcrypt.ko mode=602

  Encryption command line:
 ./kcapi -x 1 -e -c "cbc(aes)" -k
 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
 * 8b19050f66582cb7f7e4b6c873819b71
 *
 Decryption command line:
 * $ ./kcapi -x 1 -c "cbc(aes)" -k
 3023b2418ea59a841757dcf07881b3a8def1c97b659a4dad -i
 95aa5b68130be6fcf5cabe7d9f898a41 -q c313c6b50145b69a77b33404cb422598
 * 836de0065f9d6f6a3dd2c53cd17e33a

 * $ ./kcapi -x 3 -c sha256 -p 38f86d
 * cc42f645c5aa76ac3154b023359b665375fc3ae42f025fe961fb0f65205ad70e
 * $ ./kcapi -x 3 -c sha256 -p bbb300ac5eda9d
 * 61f7b48577a613fbdfe0d6d90b49985e07a42c99e7a439b6efb76d5ec71b3d30

 ./kcapi -x 12 -c "hmac(sha256)" -k
 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b -i
 000102030405060708090a0b0c -p f0f1f2f3f4f5f6f7f8f9 -b 42
 *
 3cb25f25faacd57a90434f64d0362f2a2d2d0a90cf1a5a4c5db02d56ecc4c5bf3400720
 8d5b887185865

 Paraller test with two different EE's (Execution Environment)

 EE1 (Trust Zone)                          EE2 (HLOS)

 There is a TZ application which    "libkcapi" or "tcrypt.ko" will run in 
 will do continuous enc/dec with     continuous loop to do enc/dec with 
 different AES algorithm supported   different algorithm supported QCE
 by QCE crypto engine.     	     crypto engine. 

1) dummy write with LOCK bit set    1) dummy write with LOCK bit set                        
2) bam will lock all other pipes    2) bam will lock all other pipes which
   which not belongs to current	       not belongs to current EE's, i.e tz 
   EE's, i.e HLOS pipe and keep        pipe and keep handling current
   handling current pipe only.         pipe only. 
                                    3) hlos prepare data descriptor and               
3) tz prepare data descriptor          submit to CE5
   and submit to CE5                4) dummy write with UNLOCK bit set
4) dummy write with UNLOCK bit      5) bam will release all the locked 
   set                                 pipes
5) bam will release all the locked
   pipes                   

 Upon encountering a descriptor with Lock bit set, the BAM will lock all
 other pipes not related to the current pipe group, and keep handling the 
 current pipe only until it sees the Un-Lock set (then it will release all
 locked pipes). The actual locking is done on the new descriptor fetching
 for publishing, i.e. locked pipe will not fetch new descriptors even if 
 it got event/events adding more descriptors for this pipe.

v6:
 * change "BAM" to "DMA" 
 * Ensured this series is compilable with the current Linux-next tip of 
   the tree (TOT).

v5:
 * Added DMA_PREP_LOCK and DMA_PREP_UNLOCK flag support in separate patch
 * Removed DMA_PREP_LOCK & DMA_PREP_UNLOCK flag
 * Added FIELD_GET and GENMASK macro to extract major and minor version

v4:
  * Added feature description and test hardware
    with test command
  * Fixed patch version numbering
  * Dropped dt-binding patch
  * Dropped device tree changes
  * Added BAM_SW_VERSION register read
  * Handled the error path for the api dma_map_resource()
    in probe
  * updated the commit messages for batter redability
  * Squash the change where qce_bam_acquire_lock() and
    qce_bam_release_lock() api got introduce to the change where
    the lock/unlock flag get introced
  * changed cover letter subject heading to
    "dmaengine: qcom: bam_dma: add cmd descriptor support"
  * Added the very initial post for BAM lock/unlock patch link
    as v1 to track this feature

v3:
  * https://lore.kernel.org/lkml/183d4f5e-e00a-8ef6-a589-f5704bc83d4a@quicinc.com/
  * Addressed all the comments from v2
  * Added the dt-binding
  * Fix alignment issue
  * Removed type casting from qce_write_reg_dma()
    and qce_read_reg_dma()
  * Removed qce_bam_txn = dma->qce_bam_txn; line from
    qce_alloc_bam_txn() api and directly returning
    dma->qce_bam_txn

v2:
  * https://lore.kernel.org/lkml/20231214114239.2635325-1-quic_mdalam@quicinc.com/
  * Initial set of patches for cmd descriptor support
  * Add client driver to use BAM lock/unlock feature
  * Added register read/write via BAM in QCE Crypto driver
    to use BAM lock/unlock feature

v1:
  * https://lore.kernel.org/all/1608215842-15381-1-git-send-email-mdalam@codeaurora.org/
  * Initial support for LOCK/UNLOCK in bam_dma driver


Md Sadre Alam (12):
  dmaengine: qcom: bam_dma: Add bam_sw_version register read
  dmaengine: add DMA_PREP_LOCK and DMA_PREP_UNLOCK flag
  dmaengine: qcom: bam_dma: add bam_pipe_lock flag support
  crypto: qce - Add support for crypto address read
  crypto: qce - Add bam dma support for crypto register r/w
  crypto: qce - Convert register r/w for skcipher via BAM/DMA
  crypto: qce - Convert register r/w for sha via BAM/DMA
  crypto: qce - Convert register r/w for aead via BAM/DMA
  crypto: qce - Add LOCK and UNLOCK flag support
  crypto: qce - Add support for lock/unlock in skcipher
  crypto: qce - Add support for lock/unlock in sha
  crypto: qce - Add support for lock/unlock in aead

 .../driver-api/dmaengine/provider.rst         |  15 ++
 drivers/crypto/qce/aead.c                     |   4 +
 drivers/crypto/qce/common.c                   | 141 +++++++----
 drivers/crypto/qce/core.c                     |  16 +-
 drivers/crypto/qce/core.h                     |  12 +
 drivers/crypto/qce/dma.c                      | 231 ++++++++++++++++++
 drivers/crypto/qce/dma.h                      |  26 ++
 drivers/crypto/qce/sha.c                      |   4 +
 drivers/crypto/qce/skcipher.c                 |   4 +
 drivers/dma/qcom/bam_dma.c                    |  29 ++-
 include/linux/dmaengine.h                     |   6 +
 11 files changed, 444 insertions(+), 44 deletions(-)

Comments

Bartosz Golaszewski Feb. 20, 2025, 2:27 p.m. UTC | #1
On Wed, 15 Jan 2025 11:29:52 +0100, Md Sadre Alam
<quic_mdalam@quicinc.com> said:
> Requirements:
>   In QCE crypto driver we are accessing the crypto engine registers
>   directly via CPU read/write. Trust Zone could possibly to perform some
>   crypto operations simultaneously, a race condition will be created and
>   this could result in undefined behavior.
>
>   To avoid this behavior we need to use BAM HW LOCK/UNLOCK feature on BAM
>   pipes, and this LOCK/UNLOCK will be set via sending a command descriptor,
>   where the HLOS/TZ QCE crypto driver prepares a command descriptor with a
>   dummy write operation on one of the QCE crypto engine register and pass
>   the LOCK/UNLOCK flag along with it.
>

On rb3gen2 I'm seeing the following when running cryptsetup benchmark:

# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1452321 iterations per second for 256-bit key
PBKDF2-sha256    2641249 iterations per second for 256-bit key
PBKDF2-sha512    1278751 iterations per second for 256-bit key
PBKDF2-ripemd160  760940 iterations per second for 256-bit key
PBKDF2-whirlpool     N/A
argon2i       4 iterations, 1008918 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
[   43.558496] NET: Registered PF_ALG protocol family
[   43.570034] arm-smmu 15000000.iommu: Unhandled context fault:
fsr=0x402, iova=0xfffdf000, fsynr=0x7b0003, cbfrsynra=0x4e4, cb=0
[   43.582069] arm-smmu 15000000.iommu: FSR    = 00000402 [Format=2
TF], SID=0x4e4
[   43.592758] arm-smmu 15000000.iommu: FSYNR0 = 007b0003 [S1CBNDX=123 PLVL=3]
[   43.608107] Internal error: synchronous external abort:
0000000096000010 [#1] PREEMPT SMP
[   43.616509] Modules linked in: algif_skcipher af_alg bluetooth
ecdh_generic ecc ipv6 snd_soc_hdmi_codec phy_qcom_edp venus_dec
venus_enc videobuf2_dma_contig videobuf2_memops nb7vpq904m
lontium_lt9611uxc msm leds_qcom_lpg qcom_battmgr pmic_glink_altmode
aux_hpd_bridge ocmem qcom_pbs venus_core ucsi_glink drm_exec
typec_ucsi qcom_pon qcom_spmi_adc5 led_class_multicolor
qcom_spmi_temp_alarm rtc_pm8xxx gpu_sched v4l2_mem2mem ath11k_ahb
qcom_vadc_common nvmem_qcom_spmi_sdam drm_dp_aux_bus videobuf2_v4l2
qcom_stats dispcc_sc7280 drm_display_helper videodev ath11k
videobuf2_common coresight_stm drm_client_lib camcc_sc7280
videocc_sc7280 mac80211 mc i2c_qcom_geni phy_qcom_qmp_combo stm_core
coresight_replicator aux_bridge coresight_tmc coresight_funnel
llcc_qcom libarc4 gpi icc_bwmon typec phy_qcom_snps_femto_v2 coresight
qcrypto qcom_q6v5_pas authenc qcom_pil_info qcom_q6v5 gpucc_sc7280
ufs_qcom libdes qcom_sysmon qcom_common pinctrl_sc7280_lpass_lpi
qcom_glink_smem mdt_loader phy_qcom_qmp_ufs lpassaudiocc_sc7280
[   43.616763]  pinctrl_lpass_lpi cfg80211 phy_qcom_qmp_pcie
icc_osm_l3 rfkill qcom_rng qrtr nvmem_reboot_mode display_connector
socinfo drm_kms_helper pmic_glink pdr_interface qcom_pdr_msg
qmi_helpers drm backlight
[   43.727571] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.14.0-rc3-next-20250220-00012-g2a8d60663e03-dirty #53
[   43.738291] Hardware name: Qualcomm Technologies, Inc. Robotics RB3gen2 (DT)
[   43.745535] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   43.752685] pc : bam_dma_irq+0x374/0x3b0
[   43.756736] lr : bam_dma_irq+0x2c8/0x3b0
[   43.760781] sp : ffff800080003e90
[   43.764200] x29: ffff800080003e90 x28: ffffd03eaae84880 x27: 000000009edf8000
[   43.771543] x26: ffff45a746642c80 x25: 0000000a24f8499b x24: ffff45a742dca080
[   43.778886] x23: ffff45a742df7600 x22: 000000000000006e x21: ffff8000811c3000
[   43.786228] x20: ffff45a742df7630 x19: ffff45a742eaab80 x18: 0000000000000001
[   43.793568] x17: ffff75698e7b4000 x16: ffff800080000000 x15: 0000000000000034
[   43.800902] x14: 0000000000000038 x13: 0000000000010008 x12: 071c71c71c71c71c
[   43.808244] x11: 0000000000000040 x10: ffff45a74000a230 x9 : ffff45a74000a228
[   43.815587] x8 : ffff45a7407a1dd0 x7 : 0000000000000000 x6 : 0000000000000000
[   43.822920] x5 : ffff45a7407a1da8 x4 : 0000000000000000 x3 : 0000000000000018
[   43.830253] x2 : ffff8000811c0000 x1 : ffff8000811c0018 x0 : 0000000000000002
[   43.837594] Call trace:
[   43.840115]  bam_dma_irq+0x374/0x3b0 (P)
[   43.844163]  __handle_irq_event_percpu+0x48/0x140
[   43.849006]  handle_irq_event+0x4c/0xb0
[   43.852961]  handle_fasteoi_irq+0xa0/0x1bc
[   43.857186]  handle_irq_desc+0x34/0x58
[   43.861054]  generic_handle_domain_irq+0x1c/0x28
[   43.865812]  gic_handle_irq+0x4c/0x120
[   43.869680]  call_on_irq_stack+0x24/0x64
[   43.873728]  do_interrupt_handler+0x80/0x84
[   43.878039]  el1_interrupt+0x34/0x68
[   43.881732]  el1h_64_irq_handler+0x18/0x24
[   43.885955]  el1h_64_irq+0x6c/0x70
[   43.889465]  cpuidle_enter_state+0xac/0x320 (P)
[   43.894133]  cpuidle_enter+0x38/0x50
[   43.897826]  do_idle+0x1e4/0x260
[   43.901151]  cpu_startup_entry+0x38/0x3c
[   43.905195]  rest_init+0xdc/0xe0
[   43.908531]  console_on_rootfs+0x0/0x6c
[   43.912490]  __primary_switched+0x88/0x90
[   43.916621] Code: b9409063 1b047c21 8b030021 8b010041 (b9000020)
[   43.922881] ---[ end trace 0000000000000000 ]---
[   43.927633] Kernel panic - not syncing: synchronous external abort:
Fatal exception in interrupt
[   43.936653] SMP: stopping secondary CPUs
[   43.941042] Kernel Offset: 0x503e28e00000 from 0xffff800080000000
[   43.947306] PHYS_OFFSET: 0xfff0ba59c0000000
[   43.951615] CPU features: 0x300,00000170,00801250,8200720b
[   43.957257] Memory Limit: none
[   43.960405] ---[ end Kernel panic - not syncing: synchronous
external abort: Fatal exception in interrupt ]---

Bartosz