Message ID | 20210513113750.22970-1-lizhijian@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration/rdma: Fix cm_event used before being initialized | expand |
On 5/13/21 1:37 PM, Li Zhijian wrote: > A segmentation fault was triggered when i try to abort a postcopy + rdma > migration. > > since rdma_ack_cm_event releases a uninitialized cm_event in thise case. > > like below: > 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); > 2497 if (ret) { > 2498 perror("rdma_get_cm_event after rdma_connect"); > 2499 ERROR(errp, "connecting to destination!"); > 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault > 2501 goto err_rdma_source_connect; > 2502 } > > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > --- > migration/rdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 00eac34232..2dadb62aed 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > .private_data = &cap, > .private_data_len = sizeof(cap), > }; > - struct rdma_cm_event *cm_event; > + struct rdma_cm_event *cm_event = NULL; > int ret; > > /* > Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
On 5/13/21 6:13 PM, Philippe Mathieu-Daudé wrote: > On 5/13/21 1:37 PM, Li Zhijian wrote: >> A segmentation fault was triggered when i try to abort a postcopy + rdma >> migration. >> >> since rdma_ack_cm_event releases a uninitialized cm_event in thise case. >> >> like below: >> 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); >> 2497 if (ret) { >> 2498 perror("rdma_get_cm_event after rdma_connect"); >> 2499 ERROR(errp, "connecting to destination!"); >> 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault >> 2501 goto err_rdma_source_connect; >> 2502 } >> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >> --- >> migration/rdma.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 00eac34232..2dadb62aed 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) >> .private_data = &cap, >> .private_data_len = sizeof(cap), >> }; >> - struct rdma_cm_event *cm_event; >> + struct rdma_cm_event *cm_event = NULL; >> int ret; >> >> /* >> > > Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> > Cc: qemu-stable@nongnu.org
* Li Zhijian (lizhijian@cn.fujitsu.com) wrote: > A segmentation fault was triggered when i try to abort a postcopy + rdma > migration. > > since rdma_ack_cm_event releases a uninitialized cm_event in thise case. > > like below: > 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); > 2497 if (ret) { > 2498 perror("rdma_get_cm_event after rdma_connect"); > 2499 ERROR(errp, "connecting to destination!"); > 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault > 2501 goto err_rdma_source_connect; > 2502 } > > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> OK, that's an easy fix then; but I wonder if we should perhaps remove that rdma_ack_cm_event, if it's the get_cm_event that's failed? Still, Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/rdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 00eac34232..2dadb62aed 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > .private_data = &cap, > .private_data_len = sizeof(cap), > }; > - struct rdma_cm_event *cm_event; > + struct rdma_cm_event *cm_event = NULL; > int ret; > > /* > -- > 2.30.2 > > >
On 14/05/2021 01.15, Dr. David Alan Gilbert wrote: > * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: >> A segmentation fault was triggered when i try to abort a postcopy + rdma >> migration. >> >> since rdma_ack_cm_event releases a uninitialized cm_event in thise case. >> >> like below: >> 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); >> 2497 if (ret) { >> 2498 perror("rdma_get_cm_event after rdma_connect"); >> 2499 ERROR(errp, "connecting to destination!"); >> 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault >> 2501 goto err_rdma_source_connect; >> 2502 } >> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > OK, that's an easy fix then; but I wonder if we should perhaps remove > that rdma_ack_cm_event, if it's the get_cm_event that's failed? I also wondered, i checked the man page get_cm_event(3) which has not documented and checked some rdma examples, some of them try to ack it[1], but some not[2]. [1]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L451 [2]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L342 Thanks > > Still, > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > >> --- >> migration/rdma.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 00eac34232..2dadb62aed 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) >> .private_data = &cap, >> .private_data_len = sizeof(cap), >> }; >> - struct rdma_cm_event *cm_event; >> + struct rdma_cm_event *cm_event = NULL; >> int ret; >> >> /* >> -- >> 2.30.2 >> >> >>
* lizhijian@fujitsu.com (lizhijian@fujitsu.com) wrote: > > > On 14/05/2021 01.15, Dr. David Alan Gilbert wrote: > > * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: > >> A segmentation fault was triggered when i try to abort a postcopy + rdma > >> migration. > >> > >> since rdma_ack_cm_event releases a uninitialized cm_event in thise case. > >> > >> like below: > >> 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); > >> 2497 if (ret) { > >> 2498 perror("rdma_get_cm_event after rdma_connect"); > >> 2499 ERROR(errp, "connecting to destination!"); > >> 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault > >> 2501 goto err_rdma_source_connect; > >> 2502 } > >> > >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > > OK, that's an easy fix then; but I wonder if we should perhaps remove > > that rdma_ack_cm_event, if it's the get_cm_event that's failed? > > I also wondered, i checked the man page get_cm_event(3) which has not documented > > and checked some rdma examples, some of them try to ack it[1], but some not[2]. I think they're actually consistent: > [1]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L451 ret = rdma_get_cm_event(test.channel, &event); if (!ret) { ret = cma_handler(event->id, event); rdma_ack_cm_event(event); } Note it's '!ret' - so it's only doing the ack if the get_cm_event succeeded. > [2]: https://github.com/linux-rdma/rdma-core/blob/e381334c2915a5290565694947790d4aebaf2222/librdmacm/examples/mckey.c#L342 ret = rdma_get_cm_event(test.channel, &event); if (ret) { perror("rdma_get_cm_event"); break; } that exits the loop (and skips the ack) in the (ret) - i.e. only on error - no ! Dave > Thanks > > > > > Still, > > > > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > >> --- > >> migration/rdma.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/migration/rdma.c b/migration/rdma.c > >> index 00eac34232..2dadb62aed 100644 > >> --- a/migration/rdma.c > >> +++ b/migration/rdma.c > >> @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) > >> .private_data = &cap, > >> .private_data_len = sizeof(cap), > >> }; > >> - struct rdma_cm_event *cm_event; > >> + struct rdma_cm_event *cm_event = NULL; > >> int ret; > >> > >> /* > >> -- > >> 2.30.2 > >> > >> > >>
On 17/05/2021 18.00, Dr. David Alan Gilbert wrote: > * lizhijian@fujitsu.com (lizhijian@fujitsu.com) wrote: >> >> On 14/05/2021 01.15, Dr. David Alan Gilbert wrote: >>> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: >>>> A segmentation fault was triggered when i try to abort a postcopy + rdma >>>> migration. >>>> >>>> since rdma_ack_cm_event releases a uninitialized cm_event in thise case. >>>> >>>> like below: >>>> 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); >>>> 2497 if (ret) { >>>> 2498 perror("rdma_get_cm_event after rdma_connect"); >>>> 2499 ERROR(errp, "connecting to destination!"); >>>> 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault >>>> 2501 goto err_rdma_source_connect; >>>> 2502 } >>>> >>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >>> OK, that's an easy fix then; but I wonder if we should perhaps remove >>> that rdma_ack_cm_event, if it's the get_cm_event that's failed? >> I also wondered, i checked the man page get_cm_event(3) which has not documented >> >> and checked some rdma examples, some of them try to ack it[1], but some not[2]. > I think they're actually consistent: You are right. I also checked rdma_get_cm_even() code, indeed, event will be changed only if rdma_get_cm_even() returns 0. So i agree to remove rdma_ack_cm_event(event) in error path. i will update the patch soon. Thanks Zhijian
diff --git a/migration/rdma.c b/migration/rdma.c index 00eac34232..2dadb62aed 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2466,7 +2466,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error **errp) .private_data = &cap, .private_data_len = sizeof(cap), }; - struct rdma_cm_event *cm_event; + struct rdma_cm_event *cm_event = NULL; int ret; /*
A segmentation fault was triggered when i try to abort a postcopy + rdma migration. since rdma_ack_cm_event releases a uninitialized cm_event in thise case. like below: 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); 2497 if (ret) { 2498 perror("rdma_get_cm_event after rdma_connect"); 2499 ERROR(errp, "connecting to destination!"); 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault 2501 goto err_rdma_source_connect; 2502 } Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> --- migration/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)