diff mbox series

hw/rdma: Add support for GID state changes for non-qmp frameworks

Message ID 20190505105518.22793-1-yuval.shaia@oracle.com (mailing list archive)
State New, archived
Headers show
Series hw/rdma: Add support for GID state changes for non-qmp frameworks | expand

Commit Message

Yuval Shaia May 5, 2019, 10:55 a.m. UTC
Any GID change in guest must be propogate to host. This is already done
by firing QMP event to managment system such as libvirt which in turn
will update the host with the relevant change.

When qemu is executed on non-qmp framework (ex from command-line) we
need to update the host instead.
Fix it by adding support to update the RoCE device's Ethernet function
IP list from qemu via netlink.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 configure              |  6 ++++
 hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 79 insertions(+), 1 deletion(-)

Comments

Eric Blake May 6, 2019, 3:09 p.m. UTC | #1
On 5/5/19 5:55 AM, Yuval Shaia wrote:
> Any GID change in guest must be propogate to host. This is already done

s/propogate to/propagated to the/

> by firing QMP event to managment system such as libvirt which in turn

s/managment/management/

> will update the host with the relevant change.
> 
> When qemu is executed on non-qmp framework (ex from command-line) we
> need to update the host instead.
> Fix it by adding support to update the RoCE device's Ethernet function
> IP list from qemu via netlink.
> 
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> ---
>  configure              |  6 ++++
>  hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 79 insertions(+), 1 deletion(-)
> 

> @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
>                                              gid->global.subnet_prefix,
>                                              gid->global.interface_id);
>  
> -    return ret;
> +    /*
> +     * We ignore return value since operation might completed sucessfully

s/completed sucessfully/have completed successfully/

> +     * by the QMP consumer
> +     */
> +    netlink_route_update(ifname, gid, RTM_NEWADDR);
> +
> +    return 0;
>  }
>  
>  int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
>                                              gid->global.subnet_prefix,
>                                              gid->global.interface_id);
>  
> +    /*
> +     * We ignore return value since operation might completed sucessfully

and again

> +     * by the QMP consumer
> +     */
> +    netlink_route_update(ifname, gid, RTM_DELADDR);
> +
>      return 0;
>  }
>  
>
Yuval Shaia May 6, 2019, 4:34 p.m. UTC | #2
On Mon, May 06, 2019 at 10:09:29AM -0500, Eric Blake wrote:
> On 5/5/19 5:55 AM, Yuval Shaia wrote:
> > Any GID change in guest must be propogate to host. This is already done
> 
> s/propogate to/propagated to the/
> 
> > by firing QMP event to managment system such as libvirt which in turn
> 
> s/managment/management/
> 
> > will update the host with the relevant change.
> > 
> > When qemu is executed on non-qmp framework (ex from command-line) we
> > need to update the host instead.
> > Fix it by adding support to update the RoCE device's Ethernet function
> > IP list from qemu via netlink.
> > 
> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> > ---
> >  configure              |  6 ++++
> >  hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 79 insertions(+), 1 deletion(-)
> > 
> 
> > @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
> >                                              gid->global.subnet_prefix,
> >                                              gid->global.interface_id);
> >  
> > -    return ret;
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> 
> s/completed sucessfully/have completed successfully/
> 
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_NEWADDR);
> > +
> > +    return 0;
> >  }
> >  
> >  int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> > @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> >                                              gid->global.subnet_prefix,
> >                                              gid->global.interface_id);
> >  
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> 
> and again

Thanks for the spelling mistake corrections.
Will fix and respin.

> 
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_DELADDR);
> > +
> >      return 0;
> >  }
> >  
> > 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
Marcel Apfelbaum May 24, 2019, 5:24 a.m. UTC | #3
Hi Yuval,

On 5/5/19 1:55 PM, Yuval Shaia wrote:
> Any GID change in guest must be propogate to host. This is already done
> by firing QMP event to managment system such as libvirt which in turn
> will update the host with the relevant change.

Agreed, *any* management software can do that.

>
> When qemu is executed on non-qmp framework (ex from command-line) we
> need to update the host instead.
> Fix it by adding support to update the RoCE device's Ethernet function
> IP list from qemu via netlink.

I am not sure this is the right approach. I don't think QEMU should 
actively change
the host network configuration.
As you pointed out yourself, the management software should make such 
changes.

I agree you cannot always assume the QEMU instance is managed by libvirt,
what about adding this functionality to rdma-multiplexer? The 
multiplexer may
register to the same QMP event.

Even if you think the multiplexer is not the right way to do it, even a 
simple bash script
disguised  as a systemd service can subscribe to the QMP event and make the
change on the host.

What do you think?

Thanks,
Marcel

>
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> ---
>   configure              |  6 ++++
>   hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 79 insertions(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 5b183c2e39..1f707b1a62 100755
> --- a/configure
> +++ b/configure
> @@ -3132,6 +3132,8 @@ fi
>   
>   cat > $TMPC <<EOF &&
>   #include <sys/mman.h>
> +#include <libmnl/libmnl.h>
> +#include <linux/rtnetlink.h>
>   
>   int
>   main(void)
> @@ -3144,10 +3146,13 @@ main(void)
>   }
>   EOF
>   
> +pvrdma_libs="-lmnl"
> +
>   if test "$rdma" = "yes" ; then
>       case "$pvrdma" in
>       "")
>           if compile_prog "" ""; then
> +            libs_softmmu="$libs_softmmu $pvrdma_libs"
>               pvrdma="yes"
>           else
>               pvrdma="no"
> @@ -3156,6 +3161,7 @@ if test "$rdma" = "yes" ; then
>       "yes")
>           if ! compile_prog "" ""; then
>               error_exit "PVRDMA is not supported since mremap is not implemented"
> +                        " or libmnl-devel is not installed"
>           fi
>           pvrdma="yes"
>           ;;
> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> index 05f6b03221..bc57b1a624 100644
> --- a/hw/rdma/rdma_backend.c
> +++ b/hw/rdma/rdma_backend.c
> @@ -16,6 +16,11 @@
>   #include "qemu/osdep.h"
>   #include "qapi/qapi-events-rdma.h"
>   
> +#include "linux/if_addr.h"
> +#include "libmnl/libmnl.h"
> +#include "linux/rtnetlink.h"
> +#include "net/if.h"
> +
>   #include <infiniband/verbs.h>
>   
>   #include "contrib/rdmacm-mux/rdmacm-mux.h"
> @@ -47,6 +52,61 @@ static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
>       rdma_error_report("No completion handler is registered");
>   }
>   
> +static int netlink_route_update(const char *ifname, union ibv_gid *gid,
> +                                __u16 type)
> +{
> +    char buf[MNL_SOCKET_BUFFER_SIZE];
> +    struct nlmsghdr *nlh;
> +    struct ifaddrmsg *ifm;
> +    struct mnl_socket *nl;
> +    int ret;
> +    uint32_t ipv4;
> +
> +    nl = mnl_socket_open(NETLINK_ROUTE);
> +    if (!nl) {
> +        rdma_error_report("Fail to connect to netlink\n");
> +        return -EIO;
> +    }
> +
> +    ret = mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID);
> +    if (ret < 0) {
> +        rdma_error_report("Fail to bind to netlink\n");
> +        goto out;
> +    }
> +
> +    nlh = mnl_nlmsg_put_header(buf);
> +    nlh->nlmsg_type = type;
> +    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> +    nlh->nlmsg_seq = 1;
> +
> +    ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> +    ifm->ifa_index = if_nametoindex(ifname);
> +    if (gid->global.subnet_prefix) {
> +        ifm->ifa_family = AF_INET6;
> +        ifm->ifa_prefixlen = 64;
> +        ifm->ifa_flags = IFA_F_PERMANENT;
> +        ifm->ifa_scope = RT_SCOPE_UNIVERSE;
> +        mnl_attr_put(nlh, IFA_ADDRESS, sizeof(*gid), gid);
> +    } else {
> +        ifm->ifa_family = AF_INET;
> +        ifm->ifa_prefixlen = 24;
> +        memcpy(&ipv4, (char *)&gid->global.interface_id + 4, sizeof(ipv4));
> +        mnl_attr_put(nlh, IFA_LOCAL, 4, &ipv4);
> +    }
> +
> +    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
> +    if (ret < 0) {
> +        rdma_error_report("Fail to send msg to to netlink\n");
> +        goto out;
> +    }
> +
> +    ret = 0;
> +
> +out:
> +    mnl_socket_close(nl);
> +    return ret;
> +}
> +
>   static inline void complete_work(enum ibv_wc_status status, uint32_t vendor_err,
>                                    void *ctx)
>   {
> @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
>                                               gid->global.subnet_prefix,
>                                               gid->global.interface_id);
>   
> -    return ret;
> +    /*
> +     * We ignore return value since operation might completed sucessfully
> +     * by the QMP consumer
> +     */
> +    netlink_route_update(ifname, gid, RTM_NEWADDR);
> +
> +    return 0;
>   }
>   
>   int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
>                                               gid->global.subnet_prefix,
>                                               gid->global.interface_id);
>   
> +    /*
> +     * We ignore return value since operation might completed sucessfully
> +     * by the QMP consumer
> +     */
> +    netlink_route_update(ifname, gid, RTM_DELADDR);
> +
>       return 0;
>   }
>
Yuval Shaia May 26, 2019, 6:41 a.m. UTC | #4
On Fri, May 24, 2019 at 08:24:30AM +0300, Marcel Apfelbaum wrote:
> 
> Hi Yuval,
> 
> On 5/5/19 1:55 PM, Yuval Shaia wrote:
> > Any GID change in guest must be propogate to host. This is already done
> > by firing QMP event to managment system such as libvirt which in turn
> > will update the host with the relevant change.
> 
> Agreed, *any* management software can do that.
> 
> > 
> > When qemu is executed on non-qmp framework (ex from command-line) we
> > need to update the host instead.
> > Fix it by adding support to update the RoCE device's Ethernet function
> > IP list from qemu via netlink.
> 
> I am not sure this is the right approach. I don't think QEMU should actively
> change
> the host network configuration.
> As you pointed out yourself, the management software should make such
> changes.

I know about few deployments that are not using any management software,
they fires their VMs right from command-line.

Currently those deployments cannot use pvrdma.

> 
> I agree you cannot always assume the QEMU instance is managed by libvirt,
> what about adding this functionality to rdma-multiplexer? The multiplexer
> may
> register to the same QMP event.

Two reasons prevent us from doing this:
- rdmacm-mux is a specific MAD multiplexer for CM packets, lets do not add
  management function to it.
- rdmacm-mux might be redundant when MAD multiplexer will be implemented in
  kernel. So what then?

> 
> Even if you think the multiplexer is not the right way to do it, even a
> simple bash script
> disguised  as a systemd service can subscribe to the QMP event and make the
> change on the host.
> 
> What do you think?

Another contrib app? A lightweight management software??


See, i do not have an argument if qemu policy is not allowing qemu process
to do external configuration (ex network). I'm just looking from a narrow
perspective of easy deployment - people sometimes runs qemu without libvirt
(or any other management software for that matter), if they want to use
pvrdma they are forced to install libvirt just for that.

> 
> Thanks,
> Marcel
> 
> > 
> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> > ---
> >   configure              |  6 ++++
> >   hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
> >   2 files changed, 79 insertions(+), 1 deletion(-)
> > 
> > diff --git a/configure b/configure
> > index 5b183c2e39..1f707b1a62 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3132,6 +3132,8 @@ fi
> >   cat > $TMPC <<EOF &&
> >   #include <sys/mman.h>
> > +#include <libmnl/libmnl.h>
> > +#include <linux/rtnetlink.h>
> >   int
> >   main(void)
> > @@ -3144,10 +3146,13 @@ main(void)
> >   }
> >   EOF
> > +pvrdma_libs="-lmnl"
> > +
> >   if test "$rdma" = "yes" ; then
> >       case "$pvrdma" in
> >       "")
> >           if compile_prog "" ""; then
> > +            libs_softmmu="$libs_softmmu $pvrdma_libs"
> >               pvrdma="yes"
> >           else
> >               pvrdma="no"
> > @@ -3156,6 +3161,7 @@ if test "$rdma" = "yes" ; then
> >       "yes")
> >           if ! compile_prog "" ""; then
> >               error_exit "PVRDMA is not supported since mremap is not implemented"
> > +                        " or libmnl-devel is not installed"
> >           fi
> >           pvrdma="yes"
> >           ;;
> > diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> > index 05f6b03221..bc57b1a624 100644
> > --- a/hw/rdma/rdma_backend.c
> > +++ b/hw/rdma/rdma_backend.c
> > @@ -16,6 +16,11 @@
> >   #include "qemu/osdep.h"
> >   #include "qapi/qapi-events-rdma.h"
> > +#include "linux/if_addr.h"
> > +#include "libmnl/libmnl.h"
> > +#include "linux/rtnetlink.h"
> > +#include "net/if.h"
> > +
> >   #include <infiniband/verbs.h>
> >   #include "contrib/rdmacm-mux/rdmacm-mux.h"
> > @@ -47,6 +52,61 @@ static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
> >       rdma_error_report("No completion handler is registered");
> >   }
> > +static int netlink_route_update(const char *ifname, union ibv_gid *gid,
> > +                                __u16 type)
> > +{
> > +    char buf[MNL_SOCKET_BUFFER_SIZE];
> > +    struct nlmsghdr *nlh;
> > +    struct ifaddrmsg *ifm;
> > +    struct mnl_socket *nl;
> > +    int ret;
> > +    uint32_t ipv4;
> > +
> > +    nl = mnl_socket_open(NETLINK_ROUTE);
> > +    if (!nl) {
> > +        rdma_error_report("Fail to connect to netlink\n");
> > +        return -EIO;
> > +    }
> > +
> > +    ret = mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID);
> > +    if (ret < 0) {
> > +        rdma_error_report("Fail to bind to netlink\n");
> > +        goto out;
> > +    }
> > +
> > +    nlh = mnl_nlmsg_put_header(buf);
> > +    nlh->nlmsg_type = type;
> > +    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> > +    nlh->nlmsg_seq = 1;
> > +
> > +    ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> > +    ifm->ifa_index = if_nametoindex(ifname);
> > +    if (gid->global.subnet_prefix) {
> > +        ifm->ifa_family = AF_INET6;
> > +        ifm->ifa_prefixlen = 64;
> > +        ifm->ifa_flags = IFA_F_PERMANENT;
> > +        ifm->ifa_scope = RT_SCOPE_UNIVERSE;
> > +        mnl_attr_put(nlh, IFA_ADDRESS, sizeof(*gid), gid);
> > +    } else {
> > +        ifm->ifa_family = AF_INET;
> > +        ifm->ifa_prefixlen = 24;
> > +        memcpy(&ipv4, (char *)&gid->global.interface_id + 4, sizeof(ipv4));
> > +        mnl_attr_put(nlh, IFA_LOCAL, 4, &ipv4);
> > +    }
> > +
> > +    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
> > +    if (ret < 0) {
> > +        rdma_error_report("Fail to send msg to to netlink\n");
> > +        goto out;
> > +    }
> > +
> > +    ret = 0;
> > +
> > +out:
> > +    mnl_socket_close(nl);
> > +    return ret;
> > +}
> > +
> >   static inline void complete_work(enum ibv_wc_status status, uint32_t vendor_err,
> >                                    void *ctx)
> >   {
> > @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
> >                                               gid->global.subnet_prefix,
> >                                               gid->global.interface_id);
> > -    return ret;
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_NEWADDR);
> > +
> > +    return 0;
> >   }
> >   int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> > @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
> >                                               gid->global.subnet_prefix,
> >                                               gid->global.interface_id);
> > +    /*
> > +     * We ignore return value since operation might completed sucessfully
> > +     * by the QMP consumer
> > +     */
> > +    netlink_route_update(ifname, gid, RTM_DELADDR);
> > +
> >       return 0;
> >   }
>
Marcel Apfelbaum May 26, 2019, 10:09 a.m. UTC | #5
On 5/26/19 9:41 AM, Yuval Shaia wrote:
> On Fri, May 24, 2019 at 08:24:30AM +0300, Marcel Apfelbaum wrote:
>> Hi Yuval,
>>
>> On 5/5/19 1:55 PM, Yuval Shaia wrote:
>>> Any GID change in guest must be propogate to host. This is already done
>>> by firing QMP event to managment system such as libvirt which in turn
>>> will update the host with the relevant change.
>> Agreed, *any* management software can do that.
>>
>>> When qemu is executed on non-qmp framework (ex from command-line) we
>>> need to update the host instead.
>>> Fix it by adding support to update the RoCE device's Ethernet function
>>> IP list from qemu via netlink.
>> I am not sure this is the right approach. I don't think QEMU should actively
>> change
>> the host network configuration.
>> As you pointed out yourself, the management software should make such
>> changes.
> I know about few deployments that are not using any management software,
> they fires their VMs right from command-line.
>
> Currently those deployments cannot use pvrdma.
>
>> I agree you cannot always assume the QEMU instance is managed by libvirt,
>> what about adding this functionality to rdma-multiplexer? The multiplexer
>> may
>> register to the same QMP event.
> Two reasons prevent us from doing this:
> - rdmacm-mux is a specific MAD multiplexer for CM packets, lets do not add
>    management function to it.
> - rdmacm-mux might be redundant when MAD multiplexer will be implemented in
>    kernel. So what then?
>
>> Even if you think the multiplexer is not the right way to do it, even a
>> simple bash script
>> disguised  as a systemd service can subscribe to the QMP event and make the
>> change on the host.
>>
>> What do you think?
> Another contrib app? A lightweight management software??

I would not call a simple bash script "management software".
You can even add a how-to in the pvrdma.txt doc
or add a script to scripts/qmp (I saw a fuse script there, maybe is ok 
to add another for pvrdma)
If I understand correctly we just need to listen to a qmp event
and issue a bash command, right?

>
> See, i do not have an argument if qemu policy is not allowing qemu process
> to do external configuration (ex network). I'm just looking from a narrow
> perspective of easy deployment - people sometimes runs qemu without libvirt
> (or any other management software for that matter), if they want to use
> pvrdma they are forced to install libvirt just for that.


Adding Markus ad Paolo for their take on this. Would be OK if QEMU
would change the networking configuration of the host?


Thanks,
Marcel

>> Thanks,
>> Marcel
>>
>>> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
>>> ---
>>>    configure              |  6 ++++
>>>    hw/rdma/rdma_backend.c | 74 +++++++++++++++++++++++++++++++++++++++++-
>>>    2 files changed, 79 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/configure b/configure
>>> index 5b183c2e39..1f707b1a62 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -3132,6 +3132,8 @@ fi
>>>    cat > $TMPC <<EOF &&
>>>    #include <sys/mman.h>
>>> +#include <libmnl/libmnl.h>
>>> +#include <linux/rtnetlink.h>
>>>    int
>>>    main(void)
>>> @@ -3144,10 +3146,13 @@ main(void)
>>>    }
>>>    EOF
>>> +pvrdma_libs="-lmnl"
>>> +
>>>    if test "$rdma" = "yes" ; then
>>>        case "$pvrdma" in
>>>        "")
>>>            if compile_prog "" ""; then
>>> +            libs_softmmu="$libs_softmmu $pvrdma_libs"
>>>                pvrdma="yes"
>>>            else
>>>                pvrdma="no"
>>> @@ -3156,6 +3161,7 @@ if test "$rdma" = "yes" ; then
>>>        "yes")
>>>            if ! compile_prog "" ""; then
>>>                error_exit "PVRDMA is not supported since mremap is not implemented"
>>> +                        " or libmnl-devel is not installed"
>>>            fi
>>>            pvrdma="yes"
>>>            ;;
>>> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
>>> index 05f6b03221..bc57b1a624 100644
>>> --- a/hw/rdma/rdma_backend.c
>>> +++ b/hw/rdma/rdma_backend.c
>>> @@ -16,6 +16,11 @@
>>>    #include "qemu/osdep.h"
>>>    #include "qapi/qapi-events-rdma.h"
>>> +#include "linux/if_addr.h"
>>> +#include "libmnl/libmnl.h"
>>> +#include "linux/rtnetlink.h"
>>> +#include "net/if.h"
>>> +
>>>    #include <infiniband/verbs.h>
>>>    #include "contrib/rdmacm-mux/rdmacm-mux.h"
>>> @@ -47,6 +52,61 @@ static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
>>>        rdma_error_report("No completion handler is registered");
>>>    }
>>> +static int netlink_route_update(const char *ifname, union ibv_gid *gid,
>>> +                                __u16 type)
>>> +{
>>> +    char buf[MNL_SOCKET_BUFFER_SIZE];
>>> +    struct nlmsghdr *nlh;
>>> +    struct ifaddrmsg *ifm;
>>> +    struct mnl_socket *nl;
>>> +    int ret;
>>> +    uint32_t ipv4;
>>> +
>>> +    nl = mnl_socket_open(NETLINK_ROUTE);
>>> +    if (!nl) {
>>> +        rdma_error_report("Fail to connect to netlink\n");
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    ret = mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID);
>>> +    if (ret < 0) {
>>> +        rdma_error_report("Fail to bind to netlink\n");
>>> +        goto out;
>>> +    }
>>> +
>>> +    nlh = mnl_nlmsg_put_header(buf);
>>> +    nlh->nlmsg_type = type;
>>> +    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
>>> +    nlh->nlmsg_seq = 1;
>>> +
>>> +    ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
>>> +    ifm->ifa_index = if_nametoindex(ifname);
>>> +    if (gid->global.subnet_prefix) {
>>> +        ifm->ifa_family = AF_INET6;
>>> +        ifm->ifa_prefixlen = 64;
>>> +        ifm->ifa_flags = IFA_F_PERMANENT;
>>> +        ifm->ifa_scope = RT_SCOPE_UNIVERSE;
>>> +        mnl_attr_put(nlh, IFA_ADDRESS, sizeof(*gid), gid);
>>> +    } else {
>>> +        ifm->ifa_family = AF_INET;
>>> +        ifm->ifa_prefixlen = 24;
>>> +        memcpy(&ipv4, (char *)&gid->global.interface_id + 4, sizeof(ipv4));
>>> +        mnl_attr_put(nlh, IFA_LOCAL, 4, &ipv4);
>>> +    }
>>> +
>>> +    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
>>> +    if (ret < 0) {
>>> +        rdma_error_report("Fail to send msg to to netlink\n");
>>> +        goto out;
>>> +    }
>>> +
>>> +    ret = 0;
>>> +
>>> +out:
>>> +    mnl_socket_close(nl);
>>> +    return ret;
>>> +}
>>> +
>>>    static inline void complete_work(enum ibv_wc_status status, uint32_t vendor_err,
>>>                                     void *ctx)
>>>    {
>>> @@ -1123,7 +1183,13 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
>>>                                                gid->global.subnet_prefix,
>>>                                                gid->global.interface_id);
>>> -    return ret;
>>> +    /*
>>> +     * We ignore return value since operation might completed sucessfully
>>> +     * by the QMP consumer
>>> +     */
>>> +    netlink_route_update(ifname, gid, RTM_NEWADDR);
>>> +
>>> +    return 0;
>>>    }
>>>    int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
>>> @@ -1149,6 +1215,12 @@ int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
>>>                                                gid->global.subnet_prefix,
>>>                                                gid->global.interface_id);
>>> +    /*
>>> +     * We ignore return value since operation might completed sucessfully
>>> +     * by the QMP consumer
>>> +     */
>>> +    netlink_route_update(ifname, gid, RTM_DELADDR);
>>> +
>>>        return 0;
>>>    }
diff mbox series

Patch

diff --git a/configure b/configure
index 5b183c2e39..1f707b1a62 100755
--- a/configure
+++ b/configure
@@ -3132,6 +3132,8 @@  fi
 
 cat > $TMPC <<EOF &&
 #include <sys/mman.h>
+#include <libmnl/libmnl.h>
+#include <linux/rtnetlink.h>
 
 int
 main(void)
@@ -3144,10 +3146,13 @@  main(void)
 }
 EOF
 
+pvrdma_libs="-lmnl"
+
 if test "$rdma" = "yes" ; then
     case "$pvrdma" in
     "")
         if compile_prog "" ""; then
+            libs_softmmu="$libs_softmmu $pvrdma_libs"
             pvrdma="yes"
         else
             pvrdma="no"
@@ -3156,6 +3161,7 @@  if test "$rdma" = "yes" ; then
     "yes")
         if ! compile_prog "" ""; then
             error_exit "PVRDMA is not supported since mremap is not implemented"
+                        " or libmnl-devel is not installed"
         fi
         pvrdma="yes"
         ;;
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 05f6b03221..bc57b1a624 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -16,6 +16,11 @@ 
 #include "qemu/osdep.h"
 #include "qapi/qapi-events-rdma.h"
 
+#include "linux/if_addr.h"
+#include "libmnl/libmnl.h"
+#include "linux/rtnetlink.h"
+#include "net/if.h"
+
 #include <infiniband/verbs.h>
 
 #include "contrib/rdmacm-mux/rdmacm-mux.h"
@@ -47,6 +52,61 @@  static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
     rdma_error_report("No completion handler is registered");
 }
 
+static int netlink_route_update(const char *ifname, union ibv_gid *gid,
+                                __u16 type)
+{
+    char buf[MNL_SOCKET_BUFFER_SIZE];
+    struct nlmsghdr *nlh;
+    struct ifaddrmsg *ifm;
+    struct mnl_socket *nl;
+    int ret;
+    uint32_t ipv4;
+
+    nl = mnl_socket_open(NETLINK_ROUTE);
+    if (!nl) {
+        rdma_error_report("Fail to connect to netlink\n");
+        return -EIO;
+    }
+
+    ret = mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID);
+    if (ret < 0) {
+        rdma_error_report("Fail to bind to netlink\n");
+        goto out;
+    }
+
+    nlh = mnl_nlmsg_put_header(buf);
+    nlh->nlmsg_type = type;
+    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+    nlh->nlmsg_seq = 1;
+
+    ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
+    ifm->ifa_index = if_nametoindex(ifname);
+    if (gid->global.subnet_prefix) {
+        ifm->ifa_family = AF_INET6;
+        ifm->ifa_prefixlen = 64;
+        ifm->ifa_flags = IFA_F_PERMANENT;
+        ifm->ifa_scope = RT_SCOPE_UNIVERSE;
+        mnl_attr_put(nlh, IFA_ADDRESS, sizeof(*gid), gid);
+    } else {
+        ifm->ifa_family = AF_INET;
+        ifm->ifa_prefixlen = 24;
+        memcpy(&ipv4, (char *)&gid->global.interface_id + 4, sizeof(ipv4));
+        mnl_attr_put(nlh, IFA_LOCAL, 4, &ipv4);
+    }
+
+    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+    if (ret < 0) {
+        rdma_error_report("Fail to send msg to to netlink\n");
+        goto out;
+    }
+
+    ret = 0;
+
+out:
+    mnl_socket_close(nl);
+    return ret;
+}
+
 static inline void complete_work(enum ibv_wc_status status, uint32_t vendor_err,
                                  void *ctx)
 {
@@ -1123,7 +1183,13 @@  int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
                                             gid->global.subnet_prefix,
                                             gid->global.interface_id);
 
-    return ret;
+    /*
+     * We ignore return value since operation might completed sucessfully
+     * by the QMP consumer
+     */
+    netlink_route_update(ifname, gid, RTM_NEWADDR);
+
+    return 0;
 }
 
 int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
@@ -1149,6 +1215,12 @@  int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
                                             gid->global.subnet_prefix,
                                             gid->global.interface_id);
 
+    /*
+     * We ignore return value since operation might completed sucessfully
+     * by the QMP consumer
+     */
+    netlink_route_update(ifname, gid, RTM_DELADDR);
+
     return 0;
 }