Message ID | 20180807091209.13531-9-xiaoguangrong@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: compression optimization | expand |
On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > ram_find_and_save_block() can return negative if any error hanppens, > however, it is completely ignored in current code Could you hint me where we'll return an error? (Anyway I agree that the error handling is not that good, mostly because the QEMUFile APIs does not provide proper return code, e.g., qemu_put_be64 returns void) > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > --- > migration/ram.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 55966bc2c1..09be01dca2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2367,7 +2367,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, > * > * Called within an RCU critical section. > * > - * Returns the number of pages written where zero means no dirty pages > + * Returns the number of pages written where zero means no dirty pages, > + * or negative on error > * > * @rs: current RAM state > * @last_stage: if we are at the completion stage > @@ -3202,6 +3203,12 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > done = 1; > break; > } > + > + if (pages < 0) { > + qemu_file_set_error(f, pages); > + break; > + } > + > rs->iterations++; > > /* we want to check in the 1st loop, just in case it was the 1st time > @@ -3243,7 +3250,7 @@ out: > /** > * ram_save_complete: function called to send the remaining amount of ram > * > - * Returns zero to indicate success > + * Returns zero to indicate success or negative on error > * > * Called with iothread lock > * > @@ -3254,6 +3261,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > { > RAMState **temp = opaque; > RAMState *rs = *temp; > + int ret = 0; > > rcu_read_lock(); > > @@ -3274,6 +3282,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > if (pages == 0) { > break; > } > + if (pages < 0) { > + ret = pages; > + break; > + } > } > > flush_compressed_data(rs); > @@ -3285,7 +3297,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > qemu_fflush(f); > > - return 0; > + return ret; > } > > static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > -- > 2.14.4 > Regards,
On 08/08/2018 01:08 PM, Peter Xu wrote: > On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: >> From: Xiao Guangrong <xiaoguangrong@tencent.com> >> >> ram_find_and_save_block() can return negative if any error hanppens, >> however, it is completely ignored in current code > > Could you hint me where we'll return an error? > I think control_save_page() may return a error condition but i am not good at it ... Other places look safe _currently_. These functions were designed to have error returned anyway. > (Anyway I agree that the error handling is not that good, mostly > because the QEMUFile APIs does not provide proper return code, e.g., > qemu_put_be64 returns void) > Yes, it is, the returned error condition is mixed in file's API and function's return value... :(
On Wed, Aug 08, 2018 at 02:29:52PM +0800, Xiao Guangrong wrote: > > > On 08/08/2018 01:08 PM, Peter Xu wrote: > > On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > ram_find_and_save_block() can return negative if any error hanppens, > > > however, it is completely ignored in current code > > > > Could you hint me where we'll return an error? > > > > I think control_save_page() may return a error condition but i am not > good at it ... Other places look safe _currently_. These functions were > designed to have error returned anyway. Ah, the RDMA codes... Then I feel like this patch would be more suitable to be put into some of the RDMA series - at least we'd better be clear about what errors we're going to capture. For non-RDMA, it seems a bit helpless after all - AFAIU we're depending on the few qemu_file_get_error() calls to detect output errors. > > > (Anyway I agree that the error handling is not that good, mostly > > because the QEMUFile APIs does not provide proper return code, e.g., > > qemu_put_be64 returns void) > > > > Yes, it is, the returned error condition is mixed in file's API and > function's return value... :( > Regards,
On 08/08/2018 02:56 PM, Peter Xu wrote: > On Wed, Aug 08, 2018 at 02:29:52PM +0800, Xiao Guangrong wrote: >> >> >> On 08/08/2018 01:08 PM, Peter Xu wrote: >>> On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: >>>> From: Xiao Guangrong <xiaoguangrong@tencent.com> >>>> >>>> ram_find_and_save_block() can return negative if any error hanppens, >>>> however, it is completely ignored in current code >>> >>> Could you hint me where we'll return an error? >>> >> >> I think control_save_page() may return a error condition but i am not >> good at it ... Other places look safe _currently_. These functions were >> designed to have error returned anyway. > > Ah, the RDMA codes... > > Then I feel like this patch would be more suitable to be put into some > of the RDMA series - at least we'd better be clear about what errors > we're going to capture. For non-RDMA, it seems a bit helpless after > all - AFAIU we're depending on the few qemu_file_get_error() calls to > detect output errors. So, are you talking about to modify the semantic of these functions, ram_save_host_page(), ram_save_target_page(), etc, and make them be: "Returns the number of pages written where zero means no dirty pages, error conditions are indicated in the QEMUFile @rs->file if it happened." If it's what you want, i will update the comments and make the implementation more clear to reflect this fact for these functions
On Wed, Aug 08, 2018 at 03:23:22PM +0800, Xiao Guangrong wrote: > > > On 08/08/2018 02:56 PM, Peter Xu wrote: > > On Wed, Aug 08, 2018 at 02:29:52PM +0800, Xiao Guangrong wrote: > > > > > > > > > On 08/08/2018 01:08 PM, Peter Xu wrote: > > > > On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > > > > > ram_find_and_save_block() can return negative if any error hanppens, > > > > > however, it is completely ignored in current code > > > > > > > > Could you hint me where we'll return an error? > > > > > > > > > > I think control_save_page() may return a error condition but i am not > > > good at it ... Other places look safe _currently_. These functions were > > > designed to have error returned anyway. > > > > Ah, the RDMA codes... > > > > Then I feel like this patch would be more suitable to be put into some > > of the RDMA series - at least we'd better be clear about what errors > > we're going to capture. For non-RDMA, it seems a bit helpless after > > all - AFAIU we're depending on the few qemu_file_get_error() calls to > > detect output errors. > > So, are you talking about to modify the semantic of these functions, > ram_save_host_page(), ram_save_target_page(), etc, and make them > be: > "Returns the number of pages written where zero means no dirty pages, > error conditions are indicated in the QEMUFile @rs->file if it > happened." > > If it's what you want, i will update the comments and make the > implementation more clear to reflect this fact for these > functions Not really; I am just unclear about how this patch could help current code, however I have no objection on the content. Let's see whether Dave or Juan would like it. Thanks,
* Xiao Guangrong (guangrong.xiao@gmail.com) wrote: > > > On 08/08/2018 01:08 PM, Peter Xu wrote: > > On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > ram_find_and_save_block() can return negative if any error hanppens, > > > however, it is completely ignored in current code > > > > Could you hint me where we'll return an error? > > > > I think control_save_page() may return a error condition but i am not > good at it ... Other places look safe _currently_. These functions were > designed to have error returned anyway. ram_control_save_page's return is checked by control_save_page which returns true/false but sets *pages to a return value. What I'd need to follow closely is the case where ram_control_save_page returns RAM_SAVE_CONTROL_DELAYED, in that case control_save_page I think returns with *pages=-1 and returns true. And I think in that case ram_save_target_page can leak that -1 - hmm. Now, ram_save_host_page already checks for <0 and will return that, but I think that would potentially loop in ram_find_and_save_block; I'm not sure we want to change that or not! Dave > > > (Anyway I agree that the error handling is not that good, mostly > > because the QEMUFile APIs does not provide proper return code, e.g., > > qemu_put_be64 returns void) > > > > Yes, it is, the returned error condition is mixed in file's API and > function's return value... :( > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 08/08/2018 10:11 PM, Dr. David Alan Gilbert wrote: > * Xiao Guangrong (guangrong.xiao@gmail.com) wrote: >> >> >> On 08/08/2018 01:08 PM, Peter Xu wrote: >>> On Tue, Aug 07, 2018 at 05:12:07PM +0800, guangrong.xiao@gmail.com wrote: >>>> From: Xiao Guangrong <xiaoguangrong@tencent.com> >>>> >>>> ram_find_and_save_block() can return negative if any error hanppens, >>>> however, it is completely ignored in current code >>> >>> Could you hint me where we'll return an error? >>> >> >> I think control_save_page() may return a error condition but i am not >> good at it ... Other places look safe _currently_. These functions were >> designed to have error returned anyway. > > ram_control_save_page's return is checked by control_save_page which > returns true/false but sets *pages to a return value. > > What I'd need to follow closely is the case where ram_control_save_page > returns RAM_SAVE_CONTROL_DELAYED, in that case control_save_page I think > returns with *pages=-1 and returns true. > And I think in that case ram_save_target_page can leak that -1 - hmm. > > Now, ram_save_host_page already checks for <0 and will return that, > but I think that would potentially loop in ram_find_and_save_block; I'm > not sure we want to change that or not! ram_find_and_save_block() will continue the look only if ram_save_host_page returns zero: ...... if (found) { pages = ram_save_host_page(rs, &pss, last_stage); } } while (!pages && again); IMHO, how to change the code really depends on the semantic of these functions, based on the comments of them, returning error conditions is the current semantic. Another choice would be the one squashes error conditions to QEMUFile and adapt comments and code of these functions to reflect the new semantic clearly. Which one do you guys prefer to? :)
diff --git a/migration/ram.c b/migration/ram.c index 55966bc2c1..09be01dca2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2367,7 +2367,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss, * * Called within an RCU critical section. * - * Returns the number of pages written where zero means no dirty pages + * Returns the number of pages written where zero means no dirty pages, + * or negative on error * * @rs: current RAM state * @last_stage: if we are at the completion stage @@ -3202,6 +3203,12 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) done = 1; break; } + + if (pages < 0) { + qemu_file_set_error(f, pages); + break; + } + rs->iterations++; /* we want to check in the 1st loop, just in case it was the 1st time @@ -3243,7 +3250,7 @@ out: /** * ram_save_complete: function called to send the remaining amount of ram * - * Returns zero to indicate success + * Returns zero to indicate success or negative on error * * Called with iothread lock * @@ -3254,6 +3261,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) { RAMState **temp = opaque; RAMState *rs = *temp; + int ret = 0; rcu_read_lock(); @@ -3274,6 +3282,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) if (pages == 0) { break; } + if (pages < 0) { + ret = pages; + break; + } } flush_compressed_data(rs); @@ -3285,7 +3297,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); - return 0; + return ret; } static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,