[v2,4/4] drm/nvdla/uapi: Add UAPI of NVDLA driver

Message ID	20220426060808.78225-5-cai.huoqing@linux.dev (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Cai Huoqing <cai.huoqing@linux.dev> To: cai.huoqing@linux.dev Subject: [PATCH v2 4/4] drm/nvdla/uapi: Add UAPI of NVDLA driver Date: Tue, 26 Apr 2022 14:08:01 +0800 Message-Id: <20220426060808.78225-5-cai.huoqing@linux.dev> In-Reply-To: <20220426060808.78225-1-cai.huoqing@linux.dev> References: <20220426060808.78225-1-cai.huoqing@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@linux.ie>, linux-kernel@vger.kernel.org, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, linaro-mm-sig@lists.linaro.org, dri-devel@lists.freedesktop.org, Sumit Semwal <sumit.semwal@linaro.org>, linux-media@vger.kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/nvdla: Add driver support for NVDLA \| expand [v2,0/4] drm/nvdla: Add driver support for NVDLA [v2,1/4] MAINTAINERS: Add the driver info of the NVDLA [v2,2/4] drm/nvdla: Add driver support for NVDLA [v2,3/4] drm/nvdla: Add register head file of NVDLA [v2,4/4] drm/nvdla/uapi: Add UAPI of NVDLA driver

Cai Huoqing April 26, 2022, 6:08 a.m. UTC

The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
which is integrated into NVIDIA Jetson AGX Xavier,
so add UAPI of this driver.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
---
v1->v2:
*Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
 move it to uapi.
 comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/

 include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 include/uapi/drm/nvdla_drm.h

Christian König April 26, 2022, 6:31 a.m. UTC | #1

Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add UAPI of this driver.
>
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> ---
> v1->v2:
> *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
>   move it to uapi.
>   comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/
>
>   include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 99 insertions(+)
>   create mode 100644 include/uapi/drm/nvdla_drm.h
>
> diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
> new file mode 100644
> index 000000000000..984635285525
> --- /dev/null
> +++ b/include/uapi/drm/nvdla_drm.h
> @@ -0,0 +1,99 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#ifndef __LINUX_NVDLA_IOCTL_H
> +#define __LINUX_NVDLA_IOCTL_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +#if !defined(__KERNEL__)
> +#define __user
> +#endif
> +
> +/**
> + * struct nvdla_mem_handle structure for memory handles
> + *
> + * @handle		handle to DMA buffer allocated in userspace
> + * @reserved		Reserved for padding
> + * @offset		offset in bytes from start address of buffer
> + *
> + */
> +struct nvdla_mem_handle {
> +	__u32 handle;
> +	__u32 reserved;
> +	__u64 offset;
> +};
> +
> +/**
> + * struct nvdla_ioctl_submit_task structure for single task information
> + *
> + * @num_addresses		total number of entries in address_list
> + * @reserved			Reserved for padding
> + * @address_list		pointer to array of struct nvdla_mem_handle
> + *
> + */
> +struct nvdla_ioctl_submit_task {
> +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
> +	__u32 num_addresses;
> +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> +	__u32 timeout;

What format does that timeout value have?

In general it is best practice to have absolute 64bit nanosecond 
timeouts (to be used with ktime inside the kernel) so that restarting 
interrupted IOCTLs works smooth.

> +	__u64 address_list;

Maybe make the comments inline, cause I just wanted to write that you 
should note that this is pointing to an nvdla_mem_handle array until I 
saw the comment above.

> +};
> +
> +/**
> + * struct nvdla_submit_args structure for task submit
> + *
> + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
> + * @num_tasks		number of entries in tasks
> + * @flags		flags for task submit, no flags defined yet
> + * @version		version of task structure
> + *
> + */
> +struct nvdla_submit_args {
> +	__u64 tasks;
> +	__u16 num_tasks;
> +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
> +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)

Well that "no flags defined yet" from the comment above is probably 
outdated :)

A comment what this flag means would also be nice to have.

Apart from all those nit picks that looks pretty solid to me. Just one 
core functionality we usually have seems to be missing here: How is 
completion signaling implemented?

Regards,
Christian.

> +	__u16 flags;
> +	__u32 version;
> +};
> +
> +/**
> + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
> + *
> + * @handle		handle updated by kernel after allocation
> + * @flags		implementation specific flags
> + * @size		size of buffer to allocate
> + */
> +struct nvdla_gem_create_args {
> +	__u32 handle;
> +	__u32 flags;
> +	__u64 size;
> +};
> +
> +/**
> + * struct nvdla_gem_map_offset_args for mapping DMA buffer
> + *
> + * @handle		handle of the buffer
> + * @reserved		reserved for padding
> + * @offset		offset updated by kernel after mapping
> + */
> +struct nvdla_gem_map_offset_args {
> +	__u32 handle;
> +	__u32 reserved;
> +	__u64 offset;
> +};
> +
> +#define DRM_NVDLA_SUBMIT		0x00
> +#define DRM_NVDLA_GEM_CREATE	0x01
> +#define DRM_NVDLA_GEM_MMAP		0x02
> +
> +#define DRM_IOCTL_NVDLA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_SUBMIT, struct nvdla_submit_args)
> +#define DRM_IOCTL_NVDLA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_CREATE, struct nvdla_gem_create_args)
> +#define DRM_IOCTL_NVDLA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_MMAP, struct nvdla_gem_map_offset_args)
> +
> +#endif

Cai Huoqing April 26, 2022, 8:23 a.m. UTC | #2

On 26 4月 22 08:31:05, Christian König wrote:
> Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add UAPI of this driver.
> > 
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> > ---
> > v1->v2:
> > *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
> >   move it to uapi.
> >   comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/
> > 
> >   include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
> >   1 file changed, 99 insertions(+)
> >   create mode 100644 include/uapi/drm/nvdla_drm.h
> > 
> > diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
> > new file mode 100644
> > index 000000000000..984635285525
> > --- /dev/null
> > +++ b/include/uapi/drm/nvdla_drm.h
> > @@ -0,0 +1,99 @@
> > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> > +/*
> > + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
> > + * Copyright (C) 2022 Cai Huoqing
> > + */
> > +
> > +#ifndef __LINUX_NVDLA_IOCTL_H
> > +#define __LINUX_NVDLA_IOCTL_H
> > +
> > +#include <linux/ioctl.h>
> > +#include <linux/types.h>
> > +
> > +#if !defined(__KERNEL__)
> > +#define __user
> > +#endif
> > +
> > +/**
> > + * struct nvdla_mem_handle structure for memory handles
> > + *
> > + * @handle		handle to DMA buffer allocated in userspace
> > + * @reserved		Reserved for padding
> > + * @offset		offset in bytes from start address of buffer
> > + *
> > + */
> > +struct nvdla_mem_handle {
> > +	__u32 handle;
> > +	__u32 reserved;
> > +	__u64 offset;
> > +};
> > +
> > +/**
> > + * struct nvdla_ioctl_submit_task structure for single task information
> > + *
> > + * @num_addresses		total number of entries in address_list
> > + * @reserved			Reserved for padding
> > + * @address_list		pointer to array of struct nvdla_mem_handle
> > + *
> > + */
> > +struct nvdla_ioctl_submit_task {
> > +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
> > +	__u32 num_addresses;
> > +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> > +	__u32 timeout;
> 
> What format does that timeout value have?
> 
> In general it is best practice to have absolute 64bit nanosecond timeouts
> (to be used with ktime inside the kernel) so that restarting interrupted
> IOCTLs works smooth.
> 
> > +	__u64 address_list;
> 
> Maybe make the comments inline, cause I just wanted to write that you should
> note that this is pointing to an nvdla_mem_handle array until I saw the
> comment above.
> 
> > +};
> > +
> > +/**
> > + * struct nvdla_submit_args structure for task submit
> > + *
> > + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
> > + * @num_tasks		number of entries in tasks
> > + * @flags		flags for task submit, no flags defined yet
> > + * @version		version of task structure
> > + *
> > + */
> > +struct nvdla_submit_args {
> > +	__u64 tasks;
> > +	__u16 num_tasks;
> > +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
> > +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
> 
> Well that "no flags defined yet" from the comment above is probably outdated
> :)
> 
> A comment what this flag means would also be nice to have.
> 
> Apart from all those nit picks that looks pretty solid to me. Just one core
> functionality we usually have seems to be missing here: How is completion
> signaling implemented?
Hi,thank for your reply.

Do you mean fence signal? In this driver, IOCTL_SUBMIT is a synchronous call
which do task submission & wait for done completion. This accerletor deal
with massive compute operator (Pooling, Conv...), that is different to
GPU. It's unnecessary to expose fence API to UMD for reducing such less time.

Thanks,
Cai
> 
> Regards,
> Christian.
> 
> > +	__u16 flags;
> > +	__u32 version;
> > +};
> > +
> > +/**
> > + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
> > + *
> > + * @handle		handle updated by kernel after allocation
> > + * @flags		implementation specific flags
> > + * @size		size of buffer to allocate
> > + */
> > +struct nvdla_gem_create_args {
> > +	__u32 handle;
> > +	__u32 flags;
> > +	__u64 size;
> > +};
> > +
> > +/**
> > + * struct nvdla_gem_map_offset_args for mapping DMA buffer
> > + *
> > + * @handle		handle of the buffer
> > + * @reserved		reserved for padding
> > + * @offset		offset updated by kernel after mapping
> > + */
> > +struct nvdla_gem_map_offset_args {
> > +	__u32 handle;
> > +	__u32 reserved;
> > +	__u64 offset;
> > +};
> > +
> > +#define DRM_NVDLA_SUBMIT		0x00
> > +#define DRM_NVDLA_GEM_CREATE	0x01
> > +#define DRM_NVDLA_GEM_MMAP		0x02
> > +
> > +#define DRM_IOCTL_NVDLA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_SUBMIT, struct nvdla_submit_args)
> > +#define DRM_IOCTL_NVDLA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_CREATE, struct nvdla_gem_create_args)
> > +#define DRM_IOCTL_NVDLA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_MMAP, struct nvdla_gem_map_offset_args)
> > +
> > +#endif
>

Christian König April 26, 2022, 8:29 a.m. UTC | #3

Am 26.04.22 um 10:23 schrieb Cai Huoqing:
> On 26 4月 22 08:31:05, Christian König wrote:
>> Am 26.04.22 um 08:08 schrieb Cai Huoqing:
>>> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
>>> which is integrated into NVIDIA Jetson AGX Xavier,
>>> so add UAPI of this driver.
>>>
>>> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
>>> ---
>>> v1->v2:
>>> *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
>>>    move it to uapi.
>>>    comments link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20bac605-97e6-e5cd-c4e4-83a8121645d8%40amd.com%2F&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C0777513b15b34d20c30e08da275e235c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637865582541002548%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ziQSwKxqhevOLDxq%2FvgfinF8BG3hiAwmUxsH3ZzZF4E%3D&amp;reserved=0
>>>
>>>    include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 99 insertions(+)
>>>    create mode 100644 include/uapi/drm/nvdla_drm.h
>>>
>>> diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
>>> new file mode 100644
>>> index 000000000000..984635285525
>>> --- /dev/null
>>> +++ b/include/uapi/drm/nvdla_drm.h
>>> @@ -0,0 +1,99 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
>>> +/*
>>> + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
>>> + * Copyright (C) 2022 Cai Huoqing
>>> + */
>>> +
>>> +#ifndef __LINUX_NVDLA_IOCTL_H
>>> +#define __LINUX_NVDLA_IOCTL_H
>>> +
>>> +#include <linux/ioctl.h>
>>> +#include <linux/types.h>
>>> +
>>> +#if !defined(__KERNEL__)
>>> +#define __user
>>> +#endif
>>> +
>>> +/**
>>> + * struct nvdla_mem_handle structure for memory handles
>>> + *
>>> + * @handle		handle to DMA buffer allocated in userspace
>>> + * @reserved		Reserved for padding
>>> + * @offset		offset in bytes from start address of buffer
>>> + *
>>> + */
>>> +struct nvdla_mem_handle {
>>> +	__u32 handle;
>>> +	__u32 reserved;
>>> +	__u64 offset;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_ioctl_submit_task structure for single task information
>>> + *
>>> + * @num_addresses		total number of entries in address_list
>>> + * @reserved			Reserved for padding
>>> + * @address_list		pointer to array of struct nvdla_mem_handle
>>> + *
>>> + */
>>> +struct nvdla_ioctl_submit_task {
>>> +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
>>> +	__u32 num_addresses;
>>> +#define NVDLA_NO_TIMEOUT    (0xffffffff)
>>> +	__u32 timeout;
>> What format does that timeout value have?
>>
>> In general it is best practice to have absolute 64bit nanosecond timeouts
>> (to be used with ktime inside the kernel) so that restarting interrupted
>> IOCTLs works smooth.
>>
>>> +	__u64 address_list;
>> Maybe make the comments inline, cause I just wanted to write that you should
>> note that this is pointing to an nvdla_mem_handle array until I saw the
>> comment above.
>>
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_submit_args structure for task submit
>>> + *
>>> + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
>>> + * @num_tasks		number of entries in tasks
>>> + * @flags		flags for task submit, no flags defined yet
>>> + * @version		version of task structure
>>> + *
>>> + */
>>> +struct nvdla_submit_args {
>>> +	__u64 tasks;
>>> +	__u16 num_tasks;
>>> +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
>>> +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
>> Well that "no flags defined yet" from the comment above is probably outdated
>> :)
>>
>> A comment what this flag means would also be nice to have.
>>
>> Apart from all those nit picks that looks pretty solid to me. Just one core
>> functionality we usually have seems to be missing here: How is completion
>> signaling implemented?
> Hi,thank for your reply.
>
> Do you mean fence signal? In this driver, IOCTL_SUBMIT is a synchronous call
> which do task submission & wait for done completion. This accerletor deal
> with massive compute operator (Pooling, Conv...), that is different to
> GPU. It's unnecessary to expose fence API to UMD for reducing such less time.

You should probably add that as a comment somewhere here.

Thanks,
Christian.

>
> Thanks,
> Cai
>> Regards,
>> Christian.
>>
>>> +	__u16 flags;
>>> +	__u32 version;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
>>> + *
>>> + * @handle		handle updated by kernel after allocation
>>> + * @flags		implementation specific flags
>>> + * @size		size of buffer to allocate
>>> + */
>>> +struct nvdla_gem_create_args {
>>> +	__u32 handle;
>>> +	__u32 flags;
>>> +	__u64 size;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_gem_map_offset_args for mapping DMA buffer
>>> + *
>>> + * @handle		handle of the buffer
>>> + * @reserved		reserved for padding
>>> + * @offset		offset updated by kernel after mapping
>>> + */
>>> +struct nvdla_gem_map_offset_args {
>>> +	__u32 handle;
>>> +	__u32 reserved;
>>> +	__u64 offset;
>>> +};
>>> +
>>> +#define DRM_NVDLA_SUBMIT		0x00
>>> +#define DRM_NVDLA_GEM_CREATE	0x01
>>> +#define DRM_NVDLA_GEM_MMAP		0x02
>>> +
>>> +#define DRM_IOCTL_NVDLA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_SUBMIT, struct nvdla_submit_args)
>>> +#define DRM_IOCTL_NVDLA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_CREATE, struct nvdla_gem_create_args)
>>> +#define DRM_IOCTL_NVDLA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_MMAP, struct nvdla_gem_map_offset_args)
>>> +
>>> +#endif

kernel test robot April 26, 2022, 10:12 a.m. UTC | #4

Hi Cai,

I love your patch! Yet something to improve:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on linus/master linux/master v5.18-rc4 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220426-141148
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: i386-randconfig-a003-20220425 (https://download.01.org/0day-ci/archive/20220426/202204261827.CMHZCsOI-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a54587f7637b8ee11ad624794af3b409e6306e07
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220426-141148
        git checkout a54587f7637b8ee11ad624794af3b409e6306e07
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 prepare

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> error: include/uapi/drm/nvdla_drm.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
   make[2]: *** [scripts/Makefile.headersinst:63: usr/include/drm/nvdla_drm.h] Error 1
   make[2]: Target '__headers' not remade because of errors.
   make[1]: *** [Makefile:1280: headers] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:219: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

Arnd Bergmann April 26, 2022, 10:50 a.m. UTC | #5

On Tue, Apr 26, 2022 at 8:31 AM Christian König
<christian.koenig@amd.com> wrote:
> Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add UAPI of this driver.
> >
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>

I saw the reply but no the original mail, so I'll comment here.

> > +
> > +#if !defined(__KERNEL__)
> > +#define __user
> > +#endif

This is done in the 'make headers_install' step, no need to define it
separately.

> > +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> > +     __u32 timeout;
>
> What format does that timeout value have?
>
> In general it is best practice to have absolute 64bit nanosecond
> timeouts (to be used with ktime inside the kernel) so that restarting
> interrupted IOCTLs works smooth.

When using absolute values, one also needs to decide whether this should be
realtime, monotonic or boottime and document the decision.


> > + * struct nvdla_submit_args structure for task submit
> > + *
> > + * @tasks            pointer to array of struct nvdla_ioctl_submit_task
> > + * @num_tasks                number of entries in tasks
> > + * @flags            flags for task submit, no flags defined yet
> > + * @version          version of task structure
> > + *
> > + */
> > +struct nvdla_submit_args {
> > +     __u64 tasks;
> > +     __u16 num_tasks;
> > +#define NVDLA_MAX_TASKS_PER_SUBMIT   24
> > +#define NVDLA_SUBMIT_FLAGS_ATOMIC    (1 << 0)
>
> Well that "no flags defined yet" from the comment above is probably
> outdated :)

> > +     __u16 flags;
> > +     __u32 version;
> > +};

Versioned interfaces are usually a bad idea. If you introduce an ioctl command,
it should generally keep working. If you ever need to change the interface, just
use a new command number for the new version.

> > +/**
> > + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
> > + *
> > + * @handle           handle updated by kernel after allocation
> > + * @flags            implementation specific flags
> > + * @size             size of buffer to allocate
> > + */
> > +struct nvdla_gem_create_args {
> > +     __u32 handle;
> > +     __u32 flags;
> > +     __u64 size;
> > +};
> > +
> > +/**
> > + * struct nvdla_gem_map_offset_args for mapping DMA buffer
> > + *
> > + * @handle           handle of the buffer
> > + * @reserved         reserved for padding
> > + * @offset           offset updated by kernel after mapping
> > + */
> > +struct nvdla_gem_map_offset_args {
> > +     __u32 handle;
> > +     __u32 reserved;
> > +     __u64 offset;
> > +};
> > +
> > +#define DRM_NVDLA_SUBMIT             0x00
> > +#define DRM_NVDLA_GEM_CREATE 0x01
> > +#define DRM_NVDLA_GEM_MMAP           0x02

Is this an actual mmap() call, or something that needs to be done before the
mmap()? Is the 'handle' a file descriptor or some internal number?

      Arnd

kernel test robot April 26, 2022, 11:23 a.m. UTC | #6

Hi Cai,

I love your patch! Yet something to improve:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on linus/master linux/master v5.18-rc4 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220426-141148
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: ia64-randconfig-r021-20220425 (https://download.01.org/0day-ci/archive/20220426/202204261945.UCAr92eM-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a54587f7637b8ee11ad624794af3b409e6306e07
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220426-141148
        git checkout a54587f7637b8ee11ad624794af3b409e6306e07
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=ia64 prepare

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> error: include/uapi/drm/nvdla_drm.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
   make[2]: *** [scripts/Makefile.headersinst:63: usr/include/drm/nvdla_drm.h] Error 1
   make[2]: Target '__headers' not remade because of errors.
   make[1]: *** [Makefile:1280: headers] Error 2
   arch/ia64/kernel/asm-offsets.c:23:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes]
      23 | void foo(void)
         |      ^~~
   <stdin>:1517:2: warning: #warning syscall clone3 not implemented [-Wcpp]
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:219: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

Cai Huoqing April 26, 2022, 12:24 p.m. UTC | #7

On 26 4月 22 12:50:50, Arnd Bergmann wrote:
> On Tue, Apr 26, 2022 at 8:31 AM Christian König
> <christian.koenig@amd.com> wrote:
> > Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> > > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > > which is integrated into NVIDIA Jetson AGX Xavier,
> > > so add UAPI of this driver.
> > >
> > > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> 
> I saw the reply but no the original mail, so I'll comment here
Hi, thanks for your reply
The patches here:
https://lore.kernel.org/lkml/20220426060808.78225-3-cai.huoqing@linux.dev/
> 
> > > +
> > > +#if !defined(__KERNEL__)
> > > +#define __user
> > > +#endif
> 
> This is done in the 'make headers_install' step, no need to define it
> separately.
> 
> > > +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> > > +     __u32 timeout;
> >
> > What format does that timeout value have?
> >
> > In general it is best practice to have absolute 64bit nanosecond
> > timeouts (to be used with ktime inside the kernel) so that restarting
> > interrupted IOCTLs works smooth.
> 
> When using absolute values, one also needs to decide whether this should be
> realtime, monotonic or boottime and document the decision.
> 
> 
> > > + * struct nvdla_submit_args structure for task submit
> > > + *
> > > + * @tasks            pointer to array of struct nvdla_ioctl_submit_task
> > > + * @num_tasks                number of entries in tasks
> > > + * @flags            flags for task submit, no flags defined yet
> > > + * @version          version of task structure
> > > + *
> > > + */
> > > +struct nvdla_submit_args {
> > > +     __u64 tasks;
> > > +     __u16 num_tasks;
> > > +#define NVDLA_MAX_TASKS_PER_SUBMIT   24
> > > +#define NVDLA_SUBMIT_FLAGS_ATOMIC    (1 << 0)
> >
> > Well that "no flags defined yet" from the comment above is probably
> > outdated :)
> 
> > > +     __u16 flags;
> > > +     __u32 version;
> > > +};
> 
> Versioned interfaces are usually a bad idea. If you introduce an ioctl command,
> it should generally keep working. If you ever need to change the interface, just
> use a new command number for the new version.
> 
> > > +/**
> > > + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
> > > + *
> > > + * @handle           handle updated by kernel after allocation
> > > + * @flags            implementation specific flags
> > > + * @size             size of buffer to allocate
> > > + */
> > > +struct nvdla_gem_create_args {
> > > +     __u32 handle;
> > > +     __u32 flags;
> > > +     __u64 size;
> > > +};
> > > +
> > > +/**
> > > + * struct nvdla_gem_map_offset_args for mapping DMA buffer
> > > + *
> > > + * @handle           handle of the buffer
> > > + * @reserved         reserved for padding
> > > + * @offset           offset updated by kernel after mapping
> > > + */
> > > +struct nvdla_gem_map_offset_args {
> > > +     __u32 handle;
> > > +     __u32 reserved;
> > > +     __u64 offset;
> > > +};
> > > +
> > > +#define DRM_NVDLA_SUBMIT             0x00
> > > +#define DRM_NVDLA_GEM_CREATE 0x01
> > > +#define DRM_NVDLA_GEM_MMAP           0x02
> 
> Is this an actual mmap() call, or something that needs to be done before the
> mmap()? Is the 'handle' a file descriptor or some internal number?
It's an gem object mmap which calls drm_gem_dumb_map_offset() inside and
the handle is gem object handle.

Thanks,
Cai
> 
>       Arnd

Arnd Bergmann April 26, 2022, 12:38 p.m. UTC | #8

On Tue, Apr 26, 2022 at 2:24 PM Cai Huoqing <cai.huoqing@linux.dev> wrote:
> On 26 4月 22 12:50:50, Arnd Bergmann wrote:

> > > > +#define DRM_NVDLA_SUBMIT             0x00
> > > > +#define DRM_NVDLA_GEM_CREATE 0x01
> > > > +#define DRM_NVDLA_GEM_MMAP           0x02
> >
> > Is this an actual mmap() call, or something that needs to be done before the
> > mmap()? Is the 'handle' a file descriptor or some internal number?
> It's an gem object mmap which calls drm_gem_dumb_map_offset() inside and
> the handle is gem object handle.

Ok, thanks for the clarification. I see that other drivers have the
exact same thing,
so I assume it's fine for drivers/gpu/ then, even if it would be a bit odd for
other subsystems.

       Arnd

Thierry Reding April 28, 2022, 2:40 p.m. UTC | #9

On Tue, Apr 26, 2022 at 02:08:01PM +0800, Cai Huoqing wrote:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add UAPI of this driver.
> 
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> ---
> v1->v2:
> *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
>  move it to uapi.
>  comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/
> 
>  include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 99 insertions(+)
>  create mode 100644 include/uapi/drm/nvdla_drm.h
> 
> diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
> new file mode 100644
> index 000000000000..984635285525
> --- /dev/null
> +++ b/include/uapi/drm/nvdla_drm.h
> @@ -0,0 +1,99 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#ifndef __LINUX_NVDLA_IOCTL_H
> +#define __LINUX_NVDLA_IOCTL_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +#if !defined(__KERNEL__)
> +#define __user
> +#endif
> +
> +/**
> + * struct nvdla_mem_handle structure for memory handles
> + *
> + * @handle		handle to DMA buffer allocated in userspace
> + * @reserved		Reserved for padding
> + * @offset		offset in bytes from start address of buffer
> + *
> + */
> +struct nvdla_mem_handle {
> +	__u32 handle;
> +	__u32 reserved;
> +	__u64 offset;
> +};
> +
> +/**
> + * struct nvdla_ioctl_submit_task structure for single task information
> + *
> + * @num_addresses		total number of entries in address_list
> + * @reserved			Reserved for padding
> + * @address_list		pointer to array of struct nvdla_mem_handle
> + *
> + */
> +struct nvdla_ioctl_submit_task {
> +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)

This is an odd number. Can you clarify where this limitation comes from?
I say "limitation" here because, again, I'm no expert on DLA and I don't
know what a typical workload would look like. 6144 is a lot of buffers,
but are these tasks typically using a few large buffers or many small
buffers?

> +	__u32 num_addresses;
> +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> +	__u32 timeout;
> +	__u64 address_list;
> +};

So if a task is basically just a collection of DMA buffers, is the
userspace supposed to fill some of those buffers with metadata to
determine what the task is about? If so, is this something that the
DLA firmware/hardware knows how to parse?

> +/**
> + * struct nvdla_submit_args structure for task submit
> + *
> + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
> + * @num_tasks		number of entries in tasks
> + * @flags		flags for task submit, no flags defined yet
> + * @version		version of task structure
> + *
> + */
> +struct nvdla_submit_args {
> +	__u64 tasks;
> +	__u16 num_tasks;
> +#define NVDLA_MAX_TASKS_PER_SUBMIT	24

Perhaps worth clarifying if this is a hardware restriction or an
arbitrary software limit. Is this perhaps worth parameterizing somehow
if this can potentially change in newer versions of DLA?

> +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)

What exactly does atomicity imply here? Should this be described in a
comment?

Thierry

> +	__u16 flags;
> +	__u32 version;
> +};
> +
> +/**
> + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
> + *
> + * @handle		handle updated by kernel after allocation
> + * @flags		implementation specific flags
> + * @size		size of buffer to allocate
> + */
> +struct nvdla_gem_create_args {
> +	__u32 handle;
> +	__u32 flags;
> +	__u64 size;
> +};
> +
> +/**
> + * struct nvdla_gem_map_offset_args for mapping DMA buffer
> + *
> + * @handle		handle of the buffer
> + * @reserved		reserved for padding
> + * @offset		offset updated by kernel after mapping
> + */
> +struct nvdla_gem_map_offset_args {
> +	__u32 handle;
> +	__u32 reserved;
> +	__u64 offset;
> +};
> +
> +#define DRM_NVDLA_SUBMIT		0x00
> +#define DRM_NVDLA_GEM_CREATE	0x01
> +#define DRM_NVDLA_GEM_MMAP		0x02
> +
> +#define DRM_IOCTL_NVDLA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_SUBMIT, struct nvdla_submit_args)
> +#define DRM_IOCTL_NVDLA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_CREATE, struct nvdla_gem_create_args)
> +#define DRM_IOCTL_NVDLA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_MMAP, struct nvdla_gem_map_offset_args)
> +
> +#endif
> -- 
> 2.25.1
>

Thierry Reding April 28, 2022, 2:45 p.m. UTC | #10

On Tue, Apr 26, 2022 at 04:23:41PM +0800, Cai Huoqing wrote:
> On 26 4月 22 08:31:05, Christian König wrote:
> > Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> > > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > > which is integrated into NVIDIA Jetson AGX Xavier,
> > > so add UAPI of this driver.
> > > 
> > > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> > > ---
> > > v1->v2:
> > > *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
> > >   move it to uapi.
> > >   comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/
> > > 
> > >   include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 99 insertions(+)
> > >   create mode 100644 include/uapi/drm/nvdla_drm.h
> > > 
> > > diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
> > > new file mode 100644
> > > index 000000000000..984635285525
> > > --- /dev/null
> > > +++ b/include/uapi/drm/nvdla_drm.h
> > > @@ -0,0 +1,99 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> > > +/*
> > > + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
> > > + * Copyright (C) 2022 Cai Huoqing
> > > + */
> > > +
> > > +#ifndef __LINUX_NVDLA_IOCTL_H
> > > +#define __LINUX_NVDLA_IOCTL_H
> > > +
> > > +#include <linux/ioctl.h>
> > > +#include <linux/types.h>
> > > +
> > > +#if !defined(__KERNEL__)
> > > +#define __user
> > > +#endif
> > > +
> > > +/**
> > > + * struct nvdla_mem_handle structure for memory handles
> > > + *
> > > + * @handle		handle to DMA buffer allocated in userspace
> > > + * @reserved		Reserved for padding
> > > + * @offset		offset in bytes from start address of buffer
> > > + *
> > > + */
> > > +struct nvdla_mem_handle {
> > > +	__u32 handle;
> > > +	__u32 reserved;
> > > +	__u64 offset;
> > > +};
> > > +
> > > +/**
> > > + * struct nvdla_ioctl_submit_task structure for single task information
> > > + *
> > > + * @num_addresses		total number of entries in address_list
> > > + * @reserved			Reserved for padding
> > > + * @address_list		pointer to array of struct nvdla_mem_handle
> > > + *
> > > + */
> > > +struct nvdla_ioctl_submit_task {
> > > +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
> > > +	__u32 num_addresses;
> > > +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> > > +	__u32 timeout;
> > 
> > What format does that timeout value have?
> > 
> > In general it is best practice to have absolute 64bit nanosecond timeouts
> > (to be used with ktime inside the kernel) so that restarting interrupted
> > IOCTLs works smooth.
> > 
> > > +	__u64 address_list;
> > 
> > Maybe make the comments inline, cause I just wanted to write that you should
> > note that this is pointing to an nvdla_mem_handle array until I saw the
> > comment above.
> > 
> > > +};
> > > +
> > > +/**
> > > + * struct nvdla_submit_args structure for task submit
> > > + *
> > > + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
> > > + * @num_tasks		number of entries in tasks
> > > + * @flags		flags for task submit, no flags defined yet
> > > + * @version		version of task structure
> > > + *
> > > + */
> > > +struct nvdla_submit_args {
> > > +	__u64 tasks;
> > > +	__u16 num_tasks;
> > > +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
> > > +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
> > 
> > Well that "no flags defined yet" from the comment above is probably outdated
> > :)
> > 
> > A comment what this flag means would also be nice to have.
> > 
> > Apart from all those nit picks that looks pretty solid to me. Just one core
> > functionality we usually have seems to be missing here: How is completion
> > signaling implemented?
> Hi,thank for your reply.
> 
> Do you mean fence signal? In this driver, IOCTL_SUBMIT is a synchronous call
> which do task submission & wait for done completion. This accerletor deal
> with massive compute operator (Pooling, Conv...), that is different to
> GPU. It's unnecessary to expose fence API to UMD for reducing such less time.

Are you saying that using fences won't be a big benefit because the DLA
can't effectively process tasks from multiple sources in parallel? That
is only part of where some sort of signalling would be useful. Another
reason why it would be good to have is to make it easier to write user-
space that can hand off a set of tasks to the DLA, then go off and do
something else and get notified about the completion somehow. If not a
full-blown fence API, then perhaps FD polling would be a simple
mechanism to allow some degree of asynchronicity.

Thierry

Cai Huoqing April 29, 2022, 3:58 a.m. UTC | #11

On 28 4月 22 16:45:06, Thierry Reding wrote:
> On Tue, Apr 26, 2022 at 04:23:41PM +0800, Cai Huoqing wrote:
> > On 26 4月 22 08:31:05, Christian König wrote:
> > > Am 26.04.22 um 08:08 schrieb Cai Huoqing:
> > > > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > > > which is integrated into NVIDIA Jetson AGX Xavier,
> > > > so add UAPI of this driver.
> > > > 
> > > > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> > > > ---
> > > > v1->v2:
> > > > *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
> > > >   move it to uapi.
> > > >   comments link: https://lore.kernel.org/lkml/20bac605-97e6-e5cd-c4e4-83a8121645d8@amd.com/
> > > > 
> > > >   include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
> > > >   1 file changed, 99 insertions(+)
> > > >   create mode 100644 include/uapi/drm/nvdla_drm.h
> > > > 
> > > > diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
> > > > new file mode 100644
> > > > index 000000000000..984635285525
> > > > --- /dev/null
> > > > +++ b/include/uapi/drm/nvdla_drm.h
> > > > @@ -0,0 +1,99 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> > > > +/*
> > > > + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
> > > > + * Copyright (C) 2022 Cai Huoqing
> > > > + */
> > > > +
> > > > +#ifndef __LINUX_NVDLA_IOCTL_H
> > > > +#define __LINUX_NVDLA_IOCTL_H
> > > > +
> > > > +#include <linux/ioctl.h>
> > > > +#include <linux/types.h>
> > > > +
> > > > +#if !defined(__KERNEL__)
> > > > +#define __user
> > > > +#endif
> > > > +
> > > > +/**
> > > > + * struct nvdla_mem_handle structure for memory handles
> > > > + *
> > > > + * @handle		handle to DMA buffer allocated in userspace
> > > > + * @reserved		Reserved for padding
> > > > + * @offset		offset in bytes from start address of buffer
> > > > + *
> > > > + */
> > > > +struct nvdla_mem_handle {
> > > > +	__u32 handle;
> > > > +	__u32 reserved;
> > > > +	__u64 offset;
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct nvdla_ioctl_submit_task structure for single task information
> > > > + *
> > > > + * @num_addresses		total number of entries in address_list
> > > > + * @reserved			Reserved for padding
> > > > + * @address_list		pointer to array of struct nvdla_mem_handle
> > > > + *
> > > > + */
> > > > +struct nvdla_ioctl_submit_task {
> > > > +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
> > > > +	__u32 num_addresses;
> > > > +#define NVDLA_NO_TIMEOUT    (0xffffffff)
> > > > +	__u32 timeout;
> > > 
> > > What format does that timeout value have?
> > > 
> > > In general it is best practice to have absolute 64bit nanosecond timeouts
> > > (to be used with ktime inside the kernel) so that restarting interrupted
> > > IOCTLs works smooth.
> > > 
> > > > +	__u64 address_list;
> > > 
> > > Maybe make the comments inline, cause I just wanted to write that you should
> > > note that this is pointing to an nvdla_mem_handle array until I saw the
> > > comment above.
> > > 
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct nvdla_submit_args structure for task submit
> > > > + *
> > > > + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
> > > > + * @num_tasks		number of entries in tasks
> > > > + * @flags		flags for task submit, no flags defined yet
> > > > + * @version		version of task structure
> > > > + *
> > > > + */
> > > > +struct nvdla_submit_args {
> > > > +	__u64 tasks;
> > > > +	__u16 num_tasks;
> > > > +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
> > > > +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
> > > 
> > > Well that "no flags defined yet" from the comment above is probably outdated
> > > :)
> > > 
> > > A comment what this flag means would also be nice to have.
> > > 
> > > Apart from all those nit picks that looks pretty solid to me. Just one core
> > > functionality we usually have seems to be missing here: How is completion
> > > signaling implemented?
> > Hi,thank for your reply.
> > 
> > Do you mean fence signal? In this driver, IOCTL_SUBMIT is a synchronous call
> > which do task submission & wait for done completion. This accerletor deal
> > with massive compute operator (Pooling, Conv...), that is different to
> > GPU. It's unnecessary to expose fence API to UMD for reducing such less time.
> 
> Are you saying that using fences won't be a big benefit because the DLA
> can't effectively process tasks from multiple sources in parallel? That
> is only part of where some sort of signalling would be useful. Another
> reason why it would be good to have is to make it easier to write user-
> space that can hand off a set of tasks to the DLA, then go off and do
> something else and get notified about the completion somehow. If not a
> full-blown fence API, then perhaps FD polling would be a simple
> mechanism to allow some degree of asynchronicity.
Agree, I will add fence IOCTL if resend patch

Thanks
Cai
> 
> Thierry

[v2,4/4] drm/nvdla/uapi: Add UAPI of NVDLA driver

Commit Message

Comments

Patch