diff mbox series

[rdma-core] libibverbs/man/ibv_reg_mr.3: Document errno on failure

Message ID 20231017053738.226069-1-lizhijian@fujitsu.com (mailing list archive)
State Not Applicable
Headers show
Series [rdma-core] libibverbs/man/ibv_reg_mr.3: Document errno on failure | expand

Commit Message

Li Zhijian Oct. 17, 2023, 5:37 a.m. UTC
'errno' is being widely used by applications when ibv_reg_mr returns NULL.
They all believe errno indicates the error on failure, so let's document
it explicitly.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 libibverbs/man/ibv_reg_mr.3 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Markus Armbruster Oct. 17, 2023, 8:01 a.m. UTC | #1
Li Zhijian <lizhijian@fujitsu.com> writes:

> 'errno' is being widely used by applications when ibv_reg_mr returns NULL.
> They all believe errno indicates the error on failure, so let's document
> it explicitly.

Similar issue with ibv_open_device() .  Possibly more.

> Reported-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  libibverbs/man/ibv_reg_mr.3 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3
> index 8f323891..d43799c5 100644
> --- a/libibverbs/man/ibv_reg_mr.3
> +++ b/libibverbs/man/ibv_reg_mr.3
> @@ -103,7 +103,7 @@ deregisters the MR
>  .I mr\fR.
>  .SH "RETURN VALUE"
>  .B ibv_reg_mr() / ibv_reg_mr_iova() / ibv_reg_dmabuf_mr()
> -returns a pointer to the registered MR, or NULL if the request fails.
> +returns a pointer to the registered MR, or NULL if the request fails (and sets errno to indicate the failure reason).
>  The local key (\fBL_Key\fR) field
>  .B lkey
>  is used as the lkey field of struct ibv_sge when posting buffers with

We should double-check errno is set on all failures.  I doubt it is.

ibv_reg_mr() is a macro:

    #define ibv_reg_mr(pd, addr, length, access)                                   \
            __ibv_reg_mr(pd, addr, length, access,                                 \
                         __builtin_constant_p(				       \
                                 ((int)(access) & IBV_ACCESS_OPTIONAL_RANGE) == 0))

__ibv_reg_mr() may call ibv_reg_mr_iova2():

    __attribute__((__always_inline__)) static inline struct ibv_mr *
    __ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, unsigned int access,
                 int is_access_const)
    {
            if (is_access_const && (access & IBV_ACCESS_OPTIONAL_RANGE) == 0)
                    return ibv_reg_mr(pd, addr, length, (int)access);
            else
                    return ibv_reg_mr_iova2(pd, addr, length, (uintptr_t)addr,
                                            access);
    }

ibv_reg_mr_iova2() doesn't seem to set errno at --->:

    struct ibv_mr *ibv_reg_mr_iova2(struct ibv_pd *pd, void *addr, size_t length,
                                    uint64_t iova, unsigned int access)
    {
            struct verbs_device *device = verbs_get_device(pd->context->device);
            bool odp_mr = access & IBV_ACCESS_ON_DEMAND;
            struct ibv_mr *mr;

            if (!(device->core_support & IB_UVERBS_CORE_SUPPORT_OPTIONAL_MR_ACCESS))
                    access &= ~IBV_ACCESS_OPTIONAL_RANGE;

            if (!odp_mr && ibv_dontfork_range(addr, length))
--->                return NULL;

            mr = get_ops(pd->context)->reg_mr(pd, addr, length, iova, access);
            if (mr) {
                    mr->context = pd->context;
                    mr->pd      = pd;
                    mr->addr    = addr;
                    mr->length  = length;
            } else {
                    if (!odp_mr)
                            ibv_dofork_range(addr, length);
            }

            return mr;
    }


Thanks!
Li Zhijian Oct. 18, 2023, 1:11 a.m. UTC | #2
On 17/10/2023 16:01, Markus Armbruster wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
> 
>> 'errno' is being widely used by applications when ibv_reg_mr returns NULL.
>> They all believe errno indicates the error on failure, so let's document
>> it explicitly.
> 
> Similar issue with ibv_open_device() .  Possibly more.

You are right, ibv_open_device()'s call chains are more complicated,
I have not figured out if it ought to set errno though QEMU relies on it.



> 
>> Reported-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   libibverbs/man/ibv_reg_mr.3 | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3
>> index 8f323891..d43799c5 100644
>> --- a/libibverbs/man/ibv_reg_mr.3
>> +++ b/libibverbs/man/ibv_reg_mr.3
>> @@ -103,7 +103,7 @@ deregisters the MR
>>   .I mr\fR.
>>   .SH "RETURN VALUE"
>>   .B ibv_reg_mr() / ibv_reg_mr_iova() / ibv_reg_dmabuf_mr()
>> -returns a pointer to the registered MR, or NULL if the request fails.
>> +returns a pointer to the registered MR, or NULL if the request fails (and sets errno to indicate the failure reason).
>>   The local key (\fBL_Key\fR) field
>>   .B lkey
>>   is used as the lkey field of struct ibv_sge when posting buffers with
> 
> We should double-check errno is set on all failures.  I doubt it is.
> 
> ibv_reg_mr() is a macro:
> 
>      #define ibv_reg_mr(pd, addr, length, access)                                   \
>              __ibv_reg_mr(pd, addr, length, access,                                 \
>                           __builtin_constant_p(				       \
>                                   ((int)(access) & IBV_ACCESS_OPTIONAL_RANGE) == 0))
> 
> __ibv_reg_mr() may call ibv_reg_mr_iova2():
> 
>      __attribute__((__always_inline__)) static inline struct ibv_mr *
>      __ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, unsigned int access,
>                   int is_access_const)
>      {
>              if (is_access_const && (access & IBV_ACCESS_OPTIONAL_RANGE) == 0)
>                      return ibv_reg_mr(pd, addr, length, (int)access);
>              else
>                      return ibv_reg_mr_iova2(pd, addr, length, (uintptr_t)addr,
>                                              access);
>      }
> 
> ibv_reg_mr_iova2() doesn't seem to set errno at --->:
> 
>      struct ibv_mr *ibv_reg_mr_iova2(struct ibv_pd *pd, void *addr, size_t length,
>                                      uint64_t iova, unsigned int access)
>      {
>              struct verbs_device *device = verbs_get_device(pd->context->device);
>              bool odp_mr = access & IBV_ACCESS_ON_DEMAND;
>              struct ibv_mr *mr;
> 
>              if (!(device->core_support & IB_UVERBS_CORE_SUPPORT_OPTIONAL_MR_ACCESS))
>                      access &= ~IBV_ACCESS_OPTIONAL_RANGE;
> 
>              if (!odp_mr && ibv_dontfork_range(addr, length))
> --->                return NULL;


It seems that
ibv_dontfork_range() are missing to set errno when
- get_start_node() fails // ENOMEN only
- do_madvise() fails and 'rolling_back = 1', since this condition will 'goto again'
   we may need to save this errno before 'goto again', and assign it to errno before return.



			if (ret) {
				node = undo_node(node, start, inc);

				if (rolling_back || !node)
					goto out;

				/* madvise failed, roll back previous changes */
				rolling_back = 1;
				advice = advice == MADV_DONTFORK ?
					MADV_DOFORK : MADV_DONTFORK;
				end = node->end;
				goto again;
			}

Thanks
  
> 
>              mr = get_ops(pd->context)->reg_mr(pd, addr, length, iova, access);
>              if (mr) {
>                      mr->context = pd->context;
>                      mr->pd      = pd;
>                      mr->addr    = addr;
>                      mr->length  = length;
>              } else {
>                      if (!odp_mr)
>                              ibv_dofork_range(addr, length);
>              }
> 
>              return mr;
>      }
> 
> 
> Thanks!
>
Markus Armbruster Oct. 18, 2023, 4:45 a.m. UTC | #3
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 17/10/2023 16:01, Markus Armbruster wrote:
>> Li Zhijian <lizhijian@fujitsu.com> writes:
>> 
>>> 'errno' is being widely used by applications when ibv_reg_mr returns NULL.
>>> They all believe errno indicates the error on failure, so let's document
>>> it explicitly.
>> 
>> Similar issue with ibv_open_device() .  Possibly more.
>
> You are right, ibv_open_device()'s call chains are more complicated,
> I have not figured out if it ought to set errno though QEMU relies on it.

I think a question to answer is for what purposes callers need errno.

The only callers I know are in QEMU.  There are three:

* qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()

  When ibv_reg_mr() fails, maybe try again with IBV_ACCESS_ON_DEMAND
  added to the protection attributes.

  "Maybe": if errno is ENOTSUP, and ibv_query_device_ex() reports
  IBV_ODP_SUPPORT.

* qemu_rdma_broken_ipv6_kernel()

  This function appears to probe the devices returned by
  ibv_get_device_list().

  For each device in the list, in order: try to ibv_open_device().  If
  it fails: ignore the device if errno is EPERM, else return failure.

I'm not familiar with RDMA, and I can't say whether any of this makes
sense.

If it doesn't, we need to talk about what problem the QEMU code is
trying to solve, and how to solve it properly.

If it does, we have legitimate uses of errno, and we need to talk how to
make errno usable safely, or else how to replace its use in QEMU.
Li Zhijian Oct. 18, 2023, 6:14 a.m. UTC | #4
On 18/10/2023 12:45, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 17/10/2023 16:01, Markus Armbruster wrote:
>>> Li Zhijian <lizhijian@fujitsu.com> writes:
>>>
>>>> 'errno' is being widely used by applications when ibv_reg_mr returns NULL.
>>>> They all believe errno indicates the error on failure, so let's document
>>>> it explicitly.
>>>
>>> Similar issue with ibv_open_device() .  Possibly more.
>>
>> You are right, ibv_open_device()'s call chains are more complicated,
>> I have not figured out if it ought to set errno though QEMU relies on it.
> 
> I think a question to answer is for what purposes callers need errno.
> 
> The only callers I know are in QEMU.  There are three:
> 
> * qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()
> 
>    When ibv_reg_mr() fails, maybe try again with IBV_ACCESS_ON_DEMAND
>    added to the protection attributes.
> 
>    "Maybe": if errno is ENOTSUP, and ibv_query_device_ex() reports
>    IBV_ODP_SUPPORT.

librpma[1] is another project that registers ODP MR like this.
https://github.com/pmem/rpma/blob/f52c00d18821ac573a71e9f23a6d2e8695086e95/src/peer.c#L277
ibv_reg_mr() will evolve to kernel via ioctl() generally, the when the libc wrapper will set the errno.


> 
> * qemu_rdma_broken_ipv6_kernel()
> 
>    This function appears to probe the devices returned by
>    ibv_get_device_list().
> 
>    For each device in the list, in order: try to ibv_open_device().  If
>    it fails: ignore the device if errno is EPERM, else return failure.

DPDK read the errno after calling ibv_open_device()[2] and ibv_get_device_list()[3]

[2] https://github.com/DPDK/dpdk/blob/5f9426b0618b7c2899f4d1444768f62739da1bce/drivers/net/mlx4/mlx4.c#L829
[3] https://github.com/DPDK/dpdk/blob/5f9426b0618b7c2899f4d1444768f62739da1bce/drivers/net/mlx4/mlx4.c#L802

I also think these APIs' intention are going to use the errno to indicate error reason, but they haven't been done yet?


> 
> I'm not familiar with RDMA, and I can't say whether any of this makes
> sense.
> 
> If it doesn't, we need to talk about what problem the QEMU code is
> trying to solve, and how to solve it properly.
> 
> If it does, we have legitimate uses of errno, and we need to talk how to
> make errno usable safely, or else how to replace its use in QEMU.
>
diff mbox series

Patch

diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3
index 8f323891..d43799c5 100644
--- a/libibverbs/man/ibv_reg_mr.3
+++ b/libibverbs/man/ibv_reg_mr.3
@@ -103,7 +103,7 @@  deregisters the MR
 .I mr\fR.
 .SH "RETURN VALUE"
 .B ibv_reg_mr() / ibv_reg_mr_iova() / ibv_reg_dmabuf_mr()
-returns a pointer to the registered MR, or NULL if the request fails.
+returns a pointer to the registered MR, or NULL if the request fails (and sets errno to indicate the failure reason).
 The local key (\fBL_Key\fR) field
 .B lkey
 is used as the lkey field of struct ibv_sge when posting buffers with