diff mbox series

migration/rdma: Fix cm_event used before being initialized

Message ID 20210513113750.22970-1-lizhijian@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show
Series migration/rdma: Fix cm_event used before being initialized | expand

Commit Message

Li Zhijian May 13, 2021, 11:37 a.m. UTC
A segmentation fault was triggered when i try to abort a postcopy + rdma
migration.

since rdma_ack_cm_event releases a uninitialized cm_event in thise case.

like below:
2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
2497     if (ret) {
2498         perror("rdma_get_cm_event after rdma_connect");
2499         ERROR(errp, "connecting to destination!");
2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
2501         goto err_rdma_source_connect;
2502     }

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé May 13, 2021, 4:13 p.m. UTC | #1
On 5/13/21 1:37 PM, Li Zhijian wrote:
> A segmentation fault was triggered when i try to abort a postcopy + rdma
> migration.
> 
> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
> 
> like below:
> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
> 2497     if (ret) {
> 2498         perror("rdma_get_cm_event after rdma_connect");
> 2499         ERROR(errp, "connecting to destination!");
> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
> 2501         goto err_rdma_source_connect;
> 2502     }
> 
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 00eac34232..2dadb62aed 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>                                            .private_data = &cap,
>                                            .private_data_len = sizeof(cap),
>                                          };
> -    struct rdma_cm_event *cm_event;
> +    struct rdma_cm_event *cm_event = NULL;
>      int ret;
>  
>      /*
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Philippe Mathieu-Daudé May 13, 2021, 4:45 p.m. UTC | #2
On 5/13/21 6:13 PM, Philippe Mathieu-Daudé wrote:
> On 5/13/21 1:37 PM, Li Zhijian wrote:
>> A segmentation fault was triggered when i try to abort a postcopy + rdma
>> migration.
>>
>> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
>>
>> like below:
>> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> 2497     if (ret) {
>> 2498         perror("rdma_get_cm_event after rdma_connect");
>> 2499         ERROR(errp, "connecting to destination!");
>> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
>> 2501         goto err_rdma_source_connect;
>> 2502     }
>>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>  migration/rdma.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 00eac34232..2dadb62aed 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>>                                            .private_data = &cap,
>>                                            .private_data_len = sizeof(cap),
>>                                          };
>> -    struct rdma_cm_event *cm_event;
>> +    struct rdma_cm_event *cm_event = NULL;
>>      int ret;
>>  
>>      /*
>>
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> 

Cc: qemu-stable@nongnu.org
Dr. David Alan Gilbert May 13, 2021, 5:15 p.m. UTC | #3
* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> A segmentation fault was triggered when i try to abort a postcopy + rdma
> migration.
> 
> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
> 
> like below:
> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
> 2497     if (ret) {
> 2498         perror("rdma_get_cm_event after rdma_connect");
> 2499         ERROR(errp, "connecting to destination!");
> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
> 2501         goto err_rdma_source_connect;
> 2502     }
> 
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

OK, that's an easy fix then; but I wonder if we should perhaps remove
that rdma_ack_cm_event, if it's the get_cm_event that's failed?

Still,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 00eac34232..2dadb62aed 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>                                            .private_data = &cap,
>                                            .private_data_len = sizeof(cap),
>                                          };
> -    struct rdma_cm_event *cm_event;
> +    struct rdma_cm_event *cm_event = NULL;
>      int ret;
>  
>      /*
> -- 
> 2.30.2
> 
> 
>
Zhijian Li (Fujitsu) May 14, 2021, 1:29 a.m. UTC | #4
On 14/05/2021 01.15, Dr. David Alan Gilbert wrote:
> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
>> A segmentation fault was triggered when i try to abort a postcopy + rdma
>> migration.
>>
>> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
>>
>> like below:
>> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> 2497     if (ret) {
>> 2498         perror("rdma_get_cm_event after rdma_connect");
>> 2499         ERROR(errp, "connecting to destination!");
>> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
>> 2501         goto err_rdma_source_connect;
>> 2502     }
>>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> OK, that's an easy fix then; but I wonder if we should perhaps remove
> that rdma_ack_cm_event, if it's the get_cm_event that's failed?

I also wondered, i checked the man page get_cm_event(3) which has not documented

and checked some rdma examples, some of them try to ack it[1],  but some not[2].

[1]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L451
[2]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L342

Thanks

>
> Still,
>
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>> ---
>>   migration/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 00eac34232..2dadb62aed 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
>>                                             .private_data = &cap,
>>                                             .private_data_len = sizeof(cap),
>>                                           };
>> -    struct rdma_cm_event *cm_event;
>> +    struct rdma_cm_event *cm_event = NULL;
>>       int ret;
>>   
>>       /*
>> -- 
>> 2.30.2
>>
>>
>>
Dr. David Alan Gilbert May 17, 2021, 10 a.m. UTC | #5
* lizhijian@fujitsu.com (lizhijian@fujitsu.com) wrote:
> 
> 
> On 14/05/2021 01.15, Dr. David Alan Gilbert wrote:
> > * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> >> A segmentation fault was triggered when i try to abort a postcopy + rdma
> >> migration.
> >>
> >> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
> >>
> >> like below:
> >> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
> >> 2497     if (ret) {
> >> 2498         perror("rdma_get_cm_event after rdma_connect");
> >> 2499         ERROR(errp, "connecting to destination!");
> >> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
> >> 2501         goto err_rdma_source_connect;
> >> 2502     }
> >>
> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> > OK, that's an easy fix then; but I wonder if we should perhaps remove
> > that rdma_ack_cm_event, if it's the get_cm_event that's failed?
> 
> I also wondered, i checked the man page get_cm_event(3) which has not documented
> 
> and checked some rdma examples, some of them try to ack it[1],  but some not[2].

I think they're actually consistent:

> [1]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L451

                ret = rdma_get_cm_event(test.channel, &event);
                if (!ret) {
                        ret = cma_handler(event->id, event);
                        rdma_ack_cm_event(event);
                }
Note it's '!ret' - so it's only doing the ack if the get_cm_event
succeeded.

> [2]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L342

                ret = rdma_get_cm_event(test.channel, &event);
                if (ret) {
                        perror("rdma_get_cm_event");
                        break;
                }

that exits the loop (and skips the ack) in the (ret) - i.e.
only on error - no !

Dave


> Thanks
> 
> >
> > Still,
> >
> >
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >
> >> ---
> >>   migration/rdma.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index 00eac34232..2dadb62aed 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
> >>                                             .private_data = &cap,
> >>                                             .private_data_len = sizeof(cap),
> >>                                           };
> >> -    struct rdma_cm_event *cm_event;
> >> +    struct rdma_cm_event *cm_event = NULL;
> >>       int ret;
> >>   
> >>       /*
> >> -- 
> >> 2.30.2
> >>
> >>
> >>
Zhijian Li (Fujitsu) May 19, 2021, 6:40 a.m. UTC | #6
On 17/05/2021 18.00, Dr. David Alan Gilbert wrote:
> * lizhijian@fujitsu.com (lizhijian@fujitsu.com) wrote:
>>
>> On 14/05/2021 01.15, Dr. David Alan Gilbert wrote:
>>> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
>>>> A segmentation fault was triggered when i try to abort a postcopy + rdma
>>>> migration.
>>>>
>>>> since rdma_ack_cm_event releases a uninitialized cm_event in thise case.
>>>>
>>>> like below:
>>>> 2496     ret = rdma_get_cm_event(rdma->channel, &cm_event);
>>>> 2497     if (ret) {
>>>> 2498         perror("rdma_get_cm_event after rdma_connect");
>>>> 2499         ERROR(errp, "connecting to destination!");
>>>> 2500         rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
>>>> 2501         goto err_rdma_source_connect;
>>>> 2502     }
>>>>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> OK, that's an easy fix then; but I wonder if we should perhaps remove
>>> that rdma_ack_cm_event, if it's the get_cm_event that's failed?
>> I also wondered, i checked the man page get_cm_event(3) which has not documented
>>
>> and checked some rdma examples, some of them try to ack it[1],  but some not[2].
> I think they're actually consistent:
You are right.
I also checked rdma_get_cm_even() code, indeed, event will be changed only if rdma_get_cm_even() returns 0.
So i agree to remove rdma_ack_cm_event(event) in error path. i will update the patch soon.

Thanks
Zhijian
diff mbox series

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index 00eac34232..2dadb62aed 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2466,7 +2466,7 @@  static int qemu_rdma_connect(RDMAContext *rdma, Error **errp)
                                           .private_data = &cap,
                                           .private_data_len = sizeof(cap),
                                         };
-    struct rdma_cm_event *cm_event;
+    struct rdma_cm_event *cm_event = NULL;
     int ret;
 
     /*