mbox series

[V5,00/10] AMD XDNA driver

Message ID 20241021161931.3701754-1-lizhi.hou@amd.com (mailing list archive)
Headers show
Series AMD XDNA driver | expand

Message

Lizhi Hou Oct. 21, 2024, 4:19 p.m. UTC
This patchset introduces a new Linux Kernel Driver, amdxdna for AMD NPUs.
The driver is based on Linux accel subsystem.

NPU (Neural Processing Unit) is an AI inference accelerator integrated
into AMD client CPUs. NPU enables efficient execution of Machine Learning
applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
architecture [1].

AMD NPU consists of the following components:

  - Tiled array of AMD AI Engine processors.
  - Micro Controller which runs the NPU Firmware responsible for
    command processing, AIE array configuration, and execution management.
  - PCI EP for host control of the NPU device.
  - Interconnect for connecting the NPU components together.
  - SRAM for use by the NPU Firmware.
  - Address translation hardware for protected host memory access by the
    NPU.

NPU supports multiple concurrent fully isolated contexts. Concurrent
contexts may be bound to AI Engine array spatially and or temporarily.

The driver is licensed under GPL-2.0 except for UAPI header which is
licensed GPL-2.0 WITH Linux-syscall-note.

User mode driver stack consists of XRT [2] and AMD AIE Plugin for IREE [3].

The firmware for the NPU is distributed as a closed source binary, and has
already been pushed to the DRM firmware repository [4].

[1] https://www.amd.com/en/technologies/xdna.html
[2] https://github.com/Xilinx/XRT
[3] https://github.com/nod-ai/iree-amd-aie
[4] https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu

Changes since v4:
- Fix lockdep errors
- Use __u* structure for struct aie_error

Changes since v3:
- Remove debug BO patch
- Changes based on code review comments

Changes since v2:
- Add document amdnpu.rst
- Change AIE2_DEVM_SIZE to 64M due to firmware change
- Changes based on code review comments

Changes since v1:
- Remove some inline defines
- Minor changes based on code review comments

Lizhi Hou (10):
  accel/amdxdna: Add documentation for AMD NPU accelerator driver
  accel/amdxdna: Add a new driver for AMD AI Engine
  accel/amdxdna: Support hardware mailbox
  accel/amdxdna: Add hardware resource solver
  accel/amdxdna: Add hardware context
  accel/amdxdna: Add GEM buffer object management
  accel/amdxdna: Add command execution
  accel/amdxdna: Add suspend and resume
  accel/amdxdna: Add error handling
  accel/amdxdna: Add query functions

 Documentation/accel/amdxdna/amdnpu.rst        | 281 ++++++
 Documentation/accel/amdxdna/index.rst         |  11 +
 Documentation/accel/index.rst                 |   1 +
 MAINTAINERS                                   |  10 +
 drivers/accel/Kconfig                         |   1 +
 drivers/accel/Makefile                        |   1 +
 drivers/accel/amdxdna/Kconfig                 |  15 +
 drivers/accel/amdxdna/Makefile                |  21 +
 drivers/accel/amdxdna/TODO                    |   5 +
 drivers/accel/amdxdna/aie2_ctx.c              | 919 ++++++++++++++++++
 drivers/accel/amdxdna/aie2_error.c            | 360 +++++++
 drivers/accel/amdxdna/aie2_message.c          | 790 +++++++++++++++
 drivers/accel/amdxdna/aie2_msg_priv.h         | 370 +++++++
 drivers/accel/amdxdna/aie2_pci.c              | 766 +++++++++++++++
 drivers/accel/amdxdna/aie2_pci.h              | 255 +++++
 drivers/accel/amdxdna/aie2_psp.c              | 145 +++
 drivers/accel/amdxdna/aie2_smu.c              | 119 +++
 drivers/accel/amdxdna/aie2_solver.c           | 330 +++++++
 drivers/accel/amdxdna/aie2_solver.h           | 154 +++
 drivers/accel/amdxdna/amdxdna_ctx.c           | 607 ++++++++++++
 drivers/accel/amdxdna/amdxdna_ctx.h           | 160 +++
 drivers/accel/amdxdna/amdxdna_gem.c           | 631 ++++++++++++
 drivers/accel/amdxdna/amdxdna_gem.h           |  66 ++
 drivers/accel/amdxdna/amdxdna_mailbox.c       | 575 +++++++++++
 drivers/accel/amdxdna/amdxdna_mailbox.h       | 124 +++
 .../accel/amdxdna/amdxdna_mailbox_helper.c    |  61 ++
 .../accel/amdxdna/amdxdna_mailbox_helper.h    |  42 +
 drivers/accel/amdxdna/amdxdna_pci_drv.c       | 402 ++++++++
 drivers/accel/amdxdna/amdxdna_pci_drv.h       | 123 +++
 drivers/accel/amdxdna/amdxdna_sysfs.c         |  67 ++
 drivers/accel/amdxdna/npu1_regs.c             | 101 ++
 drivers/accel/amdxdna/npu2_regs.c             | 118 +++
 drivers/accel/amdxdna/npu4_regs.c             | 118 +++
 drivers/accel/amdxdna/npu5_regs.c             | 118 +++
 include/trace/events/amdxdna.h                | 101 ++
 include/uapi/drm/amdxdna_accel.h              | 457 +++++++++
 36 files changed, 8425 insertions(+)
 create mode 100644 Documentation/accel/amdxdna/amdnpu.rst
 create mode 100644 Documentation/accel/amdxdna/index.rst
 create mode 100644 drivers/accel/amdxdna/Kconfig
 create mode 100644 drivers/accel/amdxdna/Makefile
 create mode 100644 drivers/accel/amdxdna/TODO
 create mode 100644 drivers/accel/amdxdna/aie2_ctx.c
 create mode 100644 drivers/accel/amdxdna/aie2_error.c
 create mode 100644 drivers/accel/amdxdna/aie2_message.c
 create mode 100644 drivers/accel/amdxdna/aie2_msg_priv.h
 create mode 100644 drivers/accel/amdxdna/aie2_pci.c
 create mode 100644 drivers/accel/amdxdna/aie2_pci.h
 create mode 100644 drivers/accel/amdxdna/aie2_psp.c
 create mode 100644 drivers/accel/amdxdna/aie2_smu.c
 create mode 100644 drivers/accel/amdxdna/aie2_solver.c
 create mode 100644 drivers/accel/amdxdna/aie2_solver.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.c
 create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_gem.c
 create mode 100644 drivers/accel/amdxdna/amdxdna_gem.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.c
 create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.c
 create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.c
 create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.h
 create mode 100644 drivers/accel/amdxdna/amdxdna_sysfs.c
 create mode 100644 drivers/accel/amdxdna/npu1_regs.c
 create mode 100644 drivers/accel/amdxdna/npu2_regs.c
 create mode 100644 drivers/accel/amdxdna/npu4_regs.c
 create mode 100644 drivers/accel/amdxdna/npu5_regs.c
 create mode 100644 include/trace/events/amdxdna.h
 create mode 100644 include/uapi/drm/amdxdna_accel.h

Comments

Jeffrey Hugo Oct. 25, 2024, 5:55 p.m. UTC | #1
On 10/21/2024 10:19 AM, Lizhi Hou wrote:
> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD NPUs.
> The driver is based on Linux accel subsystem.
> 
> NPU (Neural Processing Unit) is an AI inference accelerator integrated
> into AMD client CPUs. NPU enables efficient execution of Machine Learning
> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
> architecture [1].
> 
> AMD NPU consists of the following components:
> 
>    - Tiled array of AMD AI Engine processors.
>    - Micro Controller which runs the NPU Firmware responsible for
>      command processing, AIE array configuration, and execution management.
>    - PCI EP for host control of the NPU device.
>    - Interconnect for connecting the NPU components together.
>    - SRAM for use by the NPU Firmware.
>    - Address translation hardware for protected host memory access by the
>      NPU.
> 
> NPU supports multiple concurrent fully isolated contexts. Concurrent
> contexts may be bound to AI Engine array spatially and or temporarily.
> 
> The driver is licensed under GPL-2.0 except for UAPI header which is
> licensed GPL-2.0 WITH Linux-syscall-note.
> 
> User mode driver stack consists of XRT [2] and AMD AIE Plugin for IREE [3].
> 
> The firmware for the NPU is distributed as a closed source binary, and has
> already been pushed to the DRM firmware repository [4].
> 
> [1]https://www.amd.com/en/technologies/xdna.html
> [2]https://github.com/Xilinx/XRT
> [3]https://github.com/nod-ai/iree-amd-aie
> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu
> 
> Changes since v4:
> - Fix lockdep errors
> - Use __u* structure for struct aie_error

One nit, when you send the next version would you please either To: or 
Cc: me on the entire series?  I only get pieces in my inbox which is 
mildly annoying on my end.

Looks like we are getting close here.  One procedural question I have, 
do you have commit permissions to drm-misc?

I applied the series to drm-misc-next and tried to build.  Got the 
following errors -

   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
   CC [M]  drivers/accel/amdxdna/aie2_error.o
   CC [M]  drivers/accel/amdxdna/aie2_message.o
   CC [M]  drivers/accel/amdxdna/aie2_pci.o
   CC [M]  drivers/accel/amdxdna/aie2_psp.o
   CC [M]  drivers/accel/amdxdna/aie2_smu.o
   CC [M]  drivers/accel/amdxdna/aie2_solver.o
   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
   CC [M]  drivers/accel/amdxdna/npu1_regs.o
   CC [M]  drivers/accel/amdxdna/npu2_regs.o
   CC [M]  drivers/accel/amdxdna/npu4_regs.o
   CC [M]  drivers/accel/amdxdna/npu5_regs.o
   AR      drivers/base/firmware_loader/built-in.a
   AR      drivers/base/built-in.a
In file included from drivers/accel/amdxdna/aie2_message.c:19:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_psp.c:11:
drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration of 
function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
    62 |                                  FIELD_GET(PSP_STATUS_READY, 
ready),
       |                                  ^~~~~~~~~
./include/linux/iopoll.h:47:21: note: in definition of macro 
‘read_poll_timeout’
    47 |                 if (cond) \
       |                     ^~~~
drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
‘readx_poll_timeout’
    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
PSP_STATUS_REG), ready,
       |               ^~~~~~~~~~~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_pci.c:22:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
cc1: all warnings being treated as errors
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
function ‘drm_sched_start’
   114 |         drm_sched_start(&hwctx->priv->sched);
       |         ^~~~~~~~~~~~~~~
In file included from ./include/trace/events/amdxdna.h:12,
                  from drivers/accel/amdxdna/aie2_ctx.c:13:
./include/drm/gpu_scheduler.h:593:6: note: declared here
   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
       |      ^~~~~~~~~~~~~~~
make[5]: *** [scripts/Makefile.build:229: 
drivers/accel/amdxdna/aie2_psp.o] Error 1
make[5]: *** Waiting for unfinished jobs....
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit declaration 
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
       |                ^~~~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [scripts/Makefile.build:229: 
drivers/accel/amdxdna/aie2_ctx.o] Error 1
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit declaration 
of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
       |                        ^~~~~~~~~~
drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
‘xdna_mailbox_send_msg’:
drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
declaration of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, msg->send_size) |
       |                          ^~~~~~~~~~


You also have the following checkpatch issues -


WARNING: 'Disalbe' may be misspelled - perhaps 'Disable'?
#1646: FILE: drivers/accel/amdxdna/amdxdna_mailbox.c:553:
+       /* Disalbe an irq and wait. This might sleep. */
            ^^^^^^^

WARNING: 'splite' may be misspelled - perhaps 'split'?
#1695: FILE: drivers/accel/amdxdna/amdxdna_mailbox.h:21:
+ * The mailbox will splite the sending data in to multiple firmware 
message if
                      ^^^^^^

WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
#1875: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:9:
+#define TX_TIMEOUT 2000 /* miliseconds */
                             ^^^^^^^^^^^

WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
#1876: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:10:
+#define RX_TIMEOUT 5000 /* miliseconds */
                             ^^^^^^^^^^^

total: 0 errors, 4 warnings, 0 checks, 1916 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
       mechanically convert to the typical style using --fix or 
--fix-inplace.

0003-accel-amdxdna-Support-hardware-mailbox.patch has style problems, 
please review.



0007-accel-amdxdna-Add-command-execution.patch
----------------------------------------------
WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
#59: FILE: drivers/accel/amdxdna/aie2_ctx.c:27:
+#define HWCTX_MAX_TIMEOUT      60000 /* miliseconds */
                                          ^^^^^^^^^^^

WARNING: 'reverve' may be misspelled - perhaps 'reserve'?
#612: FILE: drivers/accel/amdxdna/aie2_ctx.c:779:
+               XDNA_WARN(xdna, "Failed to reverve fence, ret %d", ret);
                                            ^^^^^^^

WARNING: 'Exectuion' may be misspelled - perhaps 'Execution'?
#1899: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:139:
+       /* Exectuion */
            ^^^^^^^^^

WARNING: 'exectuion' may be misspelled - perhaps 'execution'?
#2113: FILE: include/uapi/drm/amdxdna_accel.h:239:
+ * struct amdxdna_drm_wait_cmd - Wait exectuion command.
                                        ^^^^^^^^^

total: 0 errors, 10 warnings, 0 checks, 1983 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
       mechanically convert to the typical style using --fix or 
--fix-inplace.

0007-accel-amdxdna-Add-command-execution.patch has style problems, 
please review.


0008-accel-amdxdna-Add-suspend-and-resume.patch
-----------------------------------------------
WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
#163: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:22:
+#define AMDXDNA_AUTOSUSPEND_DELAY      5000 /* miliseconds */
                                                 ^^^^^^^^^^^

total: 0 errors, 1 warnings, 0 checks, 302 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
       mechanically convert to the typical style using --fix or 
--fix-inplace.

0008-accel-amdxdna-Add-suspend-and-resume.patch has style problems, 
please review.


-Jeff
Lizhi Hou Oct. 25, 2024, 9:28 p.m. UTC | #2
On 10/25/24 10:55, Jeffrey Hugo wrote:
> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>> NPUs.
>> The driver is based on Linux accel subsystem.
>>
>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>> into AMD client CPUs. NPU enables efficient execution of Machine 
>> Learning
>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>> architecture [1].
>>
>> AMD NPU consists of the following components:
>>
>>    - Tiled array of AMD AI Engine processors.
>>    - Micro Controller which runs the NPU Firmware responsible for
>>      command processing, AIE array configuration, and execution 
>> management.
>>    - PCI EP for host control of the NPU device.
>>    - Interconnect for connecting the NPU components together.
>>    - SRAM for use by the NPU Firmware.
>>    - Address translation hardware for protected host memory access by 
>> the
>>      NPU.
>>
>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>> contexts may be bound to AI Engine array spatially and or temporarily.
>>
>> The driver is licensed under GPL-2.0 except for UAPI header which is
>> licensed GPL-2.0 WITH Linux-syscall-note.
>>
>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>> IREE [3].
>>
>> The firmware for the NPU is distributed as a closed source binary, 
>> and has
>> already been pushed to the DRM firmware repository [4].
>>
>> [1]https://www.amd.com/en/technologies/xdna.html
>> [2]https://github.com/Xilinx/XRT
>> [3]https://github.com/nod-ai/iree-amd-aie
>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>
>>
>> Changes since v4:
>> - Fix lockdep errors
>> - Use __u* structure for struct aie_error
>
> One nit, when you send the next version would you please either To: or 
> Cc: me on the entire series?  I only get pieces in my inbox which is 
> mildly annoying on my end.
Sure.
>
> Looks like we are getting close here.  One procedural question I have, 
> do you have commit permissions to drm-misc?
No, I do not have commit permissions yet.
>
> I applied the series to drm-misc-next and tried to build.  Got the 
> following errors -

Could you share the build command line? So I can reproduce and verify my 
fix.

I used "make M=drivers/accel/amdxdna" and did not reproduce the error 
with drm-misc-next. It looks build robot did not complain with the patch 
neither.

$ git branch
* drm-misc-next
$ make M=drivers/accel/amdxdna
   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
   CC [M]  drivers/accel/amdxdna/aie2_error.o
   CC [M]  drivers/accel/amdxdna/aie2_message.o
   CC [M]  drivers/accel/amdxdna/aie2_pci.o
   CC [M]  drivers/accel/amdxdna/aie2_psp.o
   CC [M]  drivers/accel/amdxdna/aie2_smu.o
   CC [M]  drivers/accel/amdxdna/aie2_solver.o
   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
   CC [M]  drivers/accel/amdxdna/npu1_regs.o
   CC [M]  drivers/accel/amdxdna/npu2_regs.o
   CC [M]  drivers/accel/amdxdna/npu4_regs.o
   CC [M]  drivers/accel/amdxdna/npu5_regs.o
   LD [M]  drivers/accel/amdxdna/amdxdna.o
   MODPOST drivers/accel/amdxdna/Module.symvers
   CC [M]  drivers/accel/amdxdna/amdxdna.mod.o
   CC [M]  drivers/accel/amdxdna/.module-common.o
   LD [M]  drivers/accel/amdxdna/amdxdna.ko
$

>
>   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>   CC [M]  drivers/accel/amdxdna/aie2_error.o
>   CC [M]  drivers/accel/amdxdna/aie2_message.o
>   CC [M]  drivers/accel/amdxdna/aie2_pci.o
>   CC [M]  drivers/accel/amdxdna/aie2_psp.o
>   CC [M]  drivers/accel/amdxdna/aie2_smu.o
>   CC [M]  drivers/accel/amdxdna/aie2_solver.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>   CC [M]  drivers/accel/amdxdna/npu1_regs.o
>   CC [M]  drivers/accel/amdxdna/npu2_regs.o
>   CC [M]  drivers/accel/amdxdna/npu4_regs.o
>   CC [M]  drivers/accel/amdxdna/npu5_regs.o
>   AR      drivers/base/firmware_loader/built-in.a
>   AR      drivers/base/built-in.a
> In file included from drivers/accel/amdxdna/aie2_message.c:19:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration of 
> function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
>    62 | FIELD_GET(PSP_STATUS_READY, ready),
>       |                                  ^~~~~~~~~
> ./include/linux/iopoll.h:47:21: note: in definition of macro 
> ‘read_poll_timeout’
>    47 |                 if (cond) \
>       |                     ^~~~
> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
> ‘readx_poll_timeout’
>    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
> PSP_STATUS_REG), ready,
>       |               ^~~~~~~~~~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> cc1: all warnings being treated as errors
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
> function ‘drm_sched_start’
>   114 |         drm_sched_start(&hwctx->priv->sched);
>       |         ^~~~~~~~~~~~~~~
> In file included from ./include/trace/events/amdxdna.h:12,
>                  from drivers/accel/amdxdna/aie2_ctx.c:13:
> ./include/drm/gpu_scheduler.h:593:6: note: declared here
>   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
>       |      ^~~~~~~~~~~~~~~
> make[5]: *** [scripts/Makefile.build:229: 
> drivers/accel/amdxdna/aie2_psp.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
> declaration of function ‘FIELD_GET’ 
> [-Werror=implicit-function-declaration]
>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>       |                ^~~~~~~~~
> cc1: all warnings being treated as errors
> make[5]: *** [scripts/Makefile.build:229: 
> drivers/accel/amdxdna/aie2_ctx.o] Error 1
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>       |                        ^~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
> ‘xdna_mailbox_send_msg’:
> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
> declaration of function ‘FIELD_PREP’ 
> [-Werror=implicit-function-declaration]
>   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, 
> msg->send_size) |
>       |                          ^~~~~~~~~~
>
>
> You also have the following checkpatch issues -

Could you share the command you used?  I tried to use 'dim checkpatch' 
and it did not find out the misspelling issue.


Thanks,

Lizhi

>
>
> WARNING: 'Disalbe' may be misspelled - perhaps 'Disable'?
> #1646: FILE: drivers/accel/amdxdna/amdxdna_mailbox.c:553:
> +       /* Disalbe an irq and wait. This might sleep. */
>            ^^^^^^^
>
> WARNING: 'splite' may be misspelled - perhaps 'split'?
> #1695: FILE: drivers/accel/amdxdna/amdxdna_mailbox.h:21:
> + * The mailbox will splite the sending data in to multiple firmware 
> message if
>                      ^^^^^^
>
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #1875: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:9:
> +#define TX_TIMEOUT 2000 /* miliseconds */
>                             ^^^^^^^^^^^
>
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #1876: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:10:
> +#define RX_TIMEOUT 5000 /* miliseconds */
>                             ^^^^^^^^^^^
>
> total: 0 errors, 4 warnings, 0 checks, 1916 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
>       mechanically convert to the typical style using --fix or 
> --fix-inplace.
>
> 0003-accel-amdxdna-Support-hardware-mailbox.patch has style problems, 
> please review.
>
>
>
> 0007-accel-amdxdna-Add-command-execution.patch
> ----------------------------------------------
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #59: FILE: drivers/accel/amdxdna/aie2_ctx.c:27:
> +#define HWCTX_MAX_TIMEOUT      60000 /* miliseconds */
>                                          ^^^^^^^^^^^
>
> WARNING: 'reverve' may be misspelled - perhaps 'reserve'?
> #612: FILE: drivers/accel/amdxdna/aie2_ctx.c:779:
> +               XDNA_WARN(xdna, "Failed to reverve fence, ret %d", ret);
>                                            ^^^^^^^
>
> WARNING: 'Exectuion' may be misspelled - perhaps 'Execution'?
> #1899: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:139:
> +       /* Exectuion */
>            ^^^^^^^^^
>
> WARNING: 'exectuion' may be misspelled - perhaps 'execution'?
> #2113: FILE: include/uapi/drm/amdxdna_accel.h:239:
> + * struct amdxdna_drm_wait_cmd - Wait exectuion command.
>                                        ^^^^^^^^^
>
> total: 0 errors, 10 warnings, 0 checks, 1983 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
>       mechanically convert to the typical style using --fix or 
> --fix-inplace.
>
> 0007-accel-amdxdna-Add-command-execution.patch has style problems, 
> please review.
>
>
> 0008-accel-amdxdna-Add-suspend-and-resume.patch
> -----------------------------------------------
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #163: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:22:
> +#define AMDXDNA_AUTOSUSPEND_DELAY      5000 /* miliseconds */
>                                                 ^^^^^^^^^^^
>
> total: 0 errors, 1 warnings, 0 checks, 302 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
>       mechanically convert to the typical style using --fix or 
> --fix-inplace.
>
> 0008-accel-amdxdna-Add-suspend-and-resume.patch has style problems, 
> please review.
>
>
> -Jeff
Jeffrey Hugo Oct. 25, 2024, 10:02 p.m. UTC | #3
On 10/25/2024 3:28 PM, Lizhi Hou wrote:
> 
> On 10/25/24 10:55, Jeffrey Hugo wrote:
>> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>>> NPUs.
>>> The driver is based on Linux accel subsystem.
>>>
>>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>> Learning
>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>> architecture [1].
>>>
>>> AMD NPU consists of the following components:
>>>
>>>    - Tiled array of AMD AI Engine processors.
>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>      command processing, AIE array configuration, and execution 
>>> management.
>>>    - PCI EP for host control of the NPU device.
>>>    - Interconnect for connecting the NPU components together.
>>>    - SRAM for use by the NPU Firmware.
>>>    - Address translation hardware for protected host memory access by 
>>> the
>>>      NPU.
>>>
>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>> contexts may be bound to AI Engine array spatially and or temporarily.
>>>
>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>
>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>> IREE [3].
>>>
>>> The firmware for the NPU is distributed as a closed source binary, 
>>> and has
>>> already been pushed to the DRM firmware repository [4].
>>>
>>> [1]https://www.amd.com/en/technologies/xdna.html
>>> [2]https://github.com/Xilinx/XRT
>>> [3]https://github.com/nod-ai/iree-amd-aie
>>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>
>>>
>>> Changes since v4:
>>> - Fix lockdep errors
>>> - Use __u* structure for struct aie_error
>>
>> One nit, when you send the next version would you please either To: or 
>> Cc: me on the entire series?  I only get pieces in my inbox which is 
>> mildly annoying on my end.
> Sure.
>>
>> Looks like we are getting close here.  One procedural question I have, 
>> do you have commit permissions to drm-misc?
> No, I do not have commit permissions yet.

You should apply for access.  Assuming this series is ready before that 
goes through, I'll apply it.

>> I applied the series to drm-misc-next and tried to build.  Got the 
>> following errors -
> 
> Could you share the build command line? So I can reproduce and verify my 
> fix.

The command is simple:
make -j20

The system details, incase it somehow matters:
Ubuntu 22.04 w/ 5.15 kernel

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

$ uname -a
Linux jhugo-lnx 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux

The kernel config is probably the relevant piece.  When I first built 
after applying the series, I was asked to choose what to do with the new 
config item.  I selected =m.
.config can be found at 
https://gist.github.com/quic-jhugo/4cc249b1e3ba127039fbc709a513a432

> 
> I used "make M=drivers/accel/amdxdna" and did not reproduce the error 
> with drm-misc-next. It looks build robot did not complain with the patch 
> neither.
> 
> $ git branch
> * drm-misc-next
> $ make M=drivers/accel/amdxdna
>    CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>    CC [M]  drivers/accel/amdxdna/aie2_error.o
>    CC [M]  drivers/accel/amdxdna/aie2_message.o
>    CC [M]  drivers/accel/amdxdna/aie2_pci.o
>    CC [M]  drivers/accel/amdxdna/aie2_psp.o
>    CC [M]  drivers/accel/amdxdna/aie2_smu.o
>    CC [M]  drivers/accel/amdxdna/aie2_solver.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>    CC [M]  drivers/accel/amdxdna/npu1_regs.o
>    CC [M]  drivers/accel/amdxdna/npu2_regs.o
>    CC [M]  drivers/accel/amdxdna/npu4_regs.o
>    CC [M]  drivers/accel/amdxdna/npu5_regs.o
>    LD [M]  drivers/accel/amdxdna/amdxdna.o
>    MODPOST drivers/accel/amdxdna/Module.symvers
>    CC [M]  drivers/accel/amdxdna/amdxdna.mod.o
>    CC [M]  drivers/accel/amdxdna/.module-common.o
>    LD [M]  drivers/accel/amdxdna/amdxdna.ko
> $
> 
>>
>>   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>   CC [M]  drivers/accel/amdxdna/aie2_error.o
>>   CC [M]  drivers/accel/amdxdna/aie2_message.o
>>   CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>   CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>   CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>   CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>   CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>   AR      drivers/base/firmware_loader/built-in.a
>>   AR      drivers/base/built-in.a
>> In file included from drivers/accel/amdxdna/aie2_message.c:19:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
>> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
>> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration of 
>> function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
>>    62 | FIELD_GET(PSP_STATUS_READY, ready),
>>       |                                  ^~~~~~~~~
>> ./include/linux/iopoll.h:47:21: note: in definition of macro 
>> ‘read_poll_timeout’
>>    47 |                 if (cond) \
>>       |                     ^~~~
>> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
>> ‘readx_poll_timeout’
>>    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
>> PSP_STATUS_REG), ready,
>>       |               ^~~~~~~~~~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> cc1: all warnings being treated as errors
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
>> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
>> function ‘drm_sched_start’
>>   114 |         drm_sched_start(&hwctx->priv->sched);
>>       |         ^~~~~~~~~~~~~~~
>> In file included from ./include/trace/events/amdxdna.h:12,
>>                  from drivers/accel/amdxdna/aie2_ctx.c:13:
>> ./include/drm/gpu_scheduler.h:593:6: note: declared here
>>   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
>>       |      ^~~~~~~~~~~~~~~
>> make[5]: *** [scripts/Makefile.build:229: 
>> drivers/accel/amdxdna/aie2_psp.o] Error 1
>> make[5]: *** Waiting for unfinished jobs....
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> cc1: all warnings being treated as errors
>> make[5]: *** [scripts/Makefile.build:229: 
>> drivers/accel/amdxdna/aie2_ctx.o] Error 1
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
>> ‘xdna_mailbox_send_msg’:
>> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, 
>> msg->send_size) |
>>       |                          ^~~~~~~~~~
>>
>>
>> You also have the following checkpatch issues -
> 
> Could you share the command you used?  I tried to use 'dim checkpatch' 
> and it did not find out the misspelling issue.

./scripts/checkpatch.pl --strict --codespell *.patch

Note, --codespell requires some local setup.  I beleive the comments in 
the checkpatch.pl script are fairly straightforward.  I use a copy of 
the database from the github that is rather recent.  The Ubuntu distro 
package is really out of date and I don't think I looked to see if there 
is a pythong pip version.  Grabbing the one file from the github repo 
seemed simple emough.

-Jeff
Lizhi Hou Oct. 29, 2024, 3:24 p.m. UTC | #4
On 10/25/24 15:02, Jeffrey Hugo wrote:
> On 10/25/2024 3:28 PM, Lizhi Hou wrote:
>>
>> On 10/25/24 10:55, Jeffrey Hugo wrote:
>>> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>>>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>>>> NPUs.
>>>> The driver is based on Linux accel subsystem.
>>>>
>>>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>>> Learning
>>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>>> architecture [1].
>>>>
>>>> AMD NPU consists of the following components:
>>>>
>>>>    - Tiled array of AMD AI Engine processors.
>>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>>      command processing, AIE array configuration, and execution 
>>>> management.
>>>>    - PCI EP for host control of the NPU device.
>>>>    - Interconnect for connecting the NPU components together.
>>>>    - SRAM for use by the NPU Firmware.
>>>>    - Address translation hardware for protected host memory access 
>>>> by the
>>>>      NPU.
>>>>
>>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>>> contexts may be bound to AI Engine array spatially and or temporarily.
>>>>
>>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>>
>>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>>> IREE [3].
>>>>
>>>> The firmware for the NPU is distributed as a closed source binary, 
>>>> and has
>>>> already been pushed to the DRM firmware repository [4].
>>>>
>>>> [1]https://www.amd.com/en/technologies/xdna.html
>>>> [2]https://github.com/Xilinx/XRT
>>>> [3]https://github.com/nod-ai/iree-amd-aie
>>>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>>
>>>>
>>>> Changes since v4:
>>>> - Fix lockdep errors
>>>> - Use __u* structure for struct aie_error
>>>
>>> One nit, when you send the next version would you please either To: 
>>> or Cc: me on the entire series?  I only get pieces in my inbox which 
>>> is mildly annoying on my end.
>> Sure.
>>>
>>> Looks like we are getting close here.  One procedural question I 
>>> have, do you have commit permissions to drm-misc?
>> No, I do not have commit permissions yet.
>
> You should apply for access.  Assuming this series is ready before 
> that goes through, I'll apply it.
>
>>> I applied the series to drm-misc-next and tried to build.  Got the 
>>> following errors -
>>
>> Could you share the build command line? So I can reproduce and verify 
>> my fix.
>
> The command is simple:
> make -j20
>
> The system details, incase it somehow matters:
> Ubuntu 22.04 w/ 5.15 kernel
>
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 22.04.3 LTS
> Release:        22.04
> Codename:       jammy
>
> $ uname -a
> Linux jhugo-lnx 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 
> UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
>
> The kernel config is probably the relevant piece.  When I first built 
> after applying the series, I was asked to choose what to do with the 
> new config item.  I selected =m.
> .config can be found at 
> https://gist.github.com/quic-jhugo/4cc249b1e3ba127039fbc709a513a432
>
>>
>> I used "make M=drivers/accel/amdxdna" and did not reproduce the error 
>> with drm-misc-next. It looks build robot did not complain with the 
>> patch neither.
>>
>> $ git branch
>> * drm-misc-next
>> $ make M=drivers/accel/amdxdna
>>    CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>    CC [M]  drivers/accel/amdxdna/aie2_error.o
>>    CC [M]  drivers/accel/amdxdna/aie2_message.o
>>    CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>    CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>    CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>    CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>    CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>    LD [M]  drivers/accel/amdxdna/amdxdna.o
>>    MODPOST drivers/accel/amdxdna/Module.symvers
>>    CC [M]  drivers/accel/amdxdna/amdxdna.mod.o
>>    CC [M]  drivers/accel/amdxdna/.module-common.o
>>    LD [M]  drivers/accel/amdxdna/amdxdna.ko
>> $
>>
>>>
>>>   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_error.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_message.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>>   CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>>   AR      drivers/base/firmware_loader/built-in.a
>>>   AR      drivers/base/built-in.a
>>> In file included from drivers/accel/amdxdna/aie2_message.c:19:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
>>> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
>>> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration 
>>> of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
>>>    62 | FIELD_GET(PSP_STATUS_READY, ready),
>>>       |                                  ^~~~~~~~~
>>> ./include/linux/iopoll.h:47:21: note: in definition of macro 
>>> ‘read_poll_timeout’
>>>    47 |                 if (cond) \
>>>       |                     ^~~~
>>> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
>>> ‘readx_poll_timeout’
>>>    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
>>> PSP_STATUS_REG), ready,
>>>       |               ^~~~~~~~~~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> cc1: all warnings being treated as errors
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
>>> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
>>> function ‘drm_sched_start’
>>>   114 | drm_sched_start(&hwctx->priv->sched);
>>>       |         ^~~~~~~~~~~~~~~
>>> In file included from ./include/trace/events/amdxdna.h:12,
>>>                  from drivers/accel/amdxdna/aie2_ctx.c:13:
>>> ./include/drm/gpu_scheduler.h:593:6: note: declared here
>>>   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int 
>>> errno);
>>>       |      ^~~~~~~~~~~~~~~
>>> make[5]: *** [scripts/Makefile.build:229: 
>>> drivers/accel/amdxdna/aie2_psp.o] Error 1
>>> make[5]: *** Waiting for unfinished jobs....
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> cc1: all warnings being treated as errors
>>> make[5]: *** [scripts/Makefile.build:229: 
>>> drivers/accel/amdxdna/aie2_ctx.o] Error 1
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
>>> ‘xdna_mailbox_send_msg’:
>>> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, 
>>> msg->send_size) |
>>>       |                          ^~~~~~~~~~
>>>
>>>
>>> You also have the following checkpatch issues -
>>
>> Could you share the command you used?  I tried to use 'dim 
>> checkpatch' and it did not find out the misspelling issue.
>
> ./scripts/checkpatch.pl --strict --codespell *.patch
>
> Note, --codespell requires some local setup.  I beleive the comments 
> in the checkpatch.pl script are fairly straightforward. I use a copy 
> of the database from the github that is rather recent.  The Ubuntu 
> distro package is really out of date and I don't think I looked to see 
> if there is a pythong pip version. Grabbing the one file from the 
> github repo seemed simple emough.

I was able to reproduce with your suggestions. Thanks a lot.


Lizhi

>
> -Jeff