Message ID | 20180426091519.26934-1-xiaoguangrong@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Dave, On 04/26/2018 05:15 PM, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > QEMU 2.13 enables strict check for compression & decompression to > make the migration more robuster, that depends on the source to fix > the internal design which triggers the unexpected error conditions > > To make it work for migrating old version QEMU to 2.13 QEMU, we > introduce this parameter to disable the error check on the > destination We noticed this issue when evaluated your pull request, could you please review this fix and pull it to 2.13 release? Sorry, i did not realize it before. :(
* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > QEMU 2.13 enables strict check for compression & decompression to > make the migration more robuster, that depends on the source to fix > the internal design which triggers the unexpected error conditions Hmm that's painful! > To make it work for migrating old version QEMU to 2.13 QEMU, we > introduce this parameter to disable the error check on the > destination > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > --- > hmp.c | 8 ++++++++ > migration/migration.c | 19 +++++++++++++++++++ > migration/migration.h | 1 + > migration/ram.c | 2 +- > qapi/migration.json | 43 +++++++++++++++++++++++++++++++++++++++---- > 5 files changed, 68 insertions(+), 5 deletions(-) > > diff --git a/hmp.c b/hmp.c > index 898e25f3e1..f0b934368b 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -329,6 +329,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) > monitor_printf(mon, "%s: %u\n", > MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS), > params->decompress_threads); > + assert(params->has_decompress_error_check); > + monitor_printf(mon, "%s: %s\n", > + MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK), > + params->decompress_error_check ? "on" : "off"); Since it's a bool, this should be a migration-capability rather than a parameter I think. Also, we need to make sure it defaults to off for compatibility. Other than that, I think it's OK. Dave > assert(params->has_cpu_throttle_initial); > monitor_printf(mon, "%s: %u\n", > MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL), > @@ -1593,6 +1597,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) > p->has_decompress_threads = true; > visit_type_int(v, param, &p->decompress_threads, &err); > break; > + case MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK: > + p->has_decompress_error_check = true; > + visit_type_bool(v, param, &p->decompress_error_check, &err); > + break; > case MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL: > p->has_cpu_throttle_initial = true; > visit_type_int(v, param, &p->cpu_throttle_initial, &err); > diff --git a/migration/migration.c b/migration/migration.c > index 0bdb28e144..1eef084763 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -534,6 +534,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) > params->compress_threads = s->parameters.compress_threads; > params->has_decompress_threads = true; > params->decompress_threads = s->parameters.decompress_threads; > + params->has_decompress_error_check = true; > + params->decompress_error_check = s->parameters.decompress_error_check; > params->has_cpu_throttle_initial = true; > params->cpu_throttle_initial = s->parameters.cpu_throttle_initial; > params->has_cpu_throttle_increment = true; > @@ -917,6 +919,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params, > dest->decompress_threads = params->decompress_threads; > } > > + if (params->has_decompress_error_check) { > + dest->decompress_error_check = params->decompress_error_check; > + } > + > if (params->has_cpu_throttle_initial) { > dest->cpu_throttle_initial = params->cpu_throttle_initial; > } > @@ -979,6 +985,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > s->parameters.decompress_threads = params->decompress_threads; > } > > + if (params->has_decompress_error_check) { > + s->parameters.decompress_error_check = params->decompress_error_check; > + } > + > if (params->has_cpu_throttle_initial) { > s->parameters.cpu_throttle_initial = params->cpu_throttle_initial; > } > @@ -1620,6 +1630,15 @@ int migrate_decompress_threads(void) > return s->parameters.decompress_threads; > } > > +bool migrate_decompress_error_check(void) > +{ > + MigrationState *s; > + > + s = migrate_get_current(); > + > + return s->parameters.decompress_error_check; > +} > + > bool migrate_dirty_bitmaps(void) > { > MigrationState *s; > diff --git a/migration/migration.h b/migration/migration.h > index 7c69598c54..5efbbaf9d8 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -241,6 +241,7 @@ bool migrate_use_compression(void); > int migrate_compress_level(void); > int migrate_compress_threads(void); > int migrate_decompress_threads(void); > +bool migrate_decompress_error_check(void); > bool migrate_use_events(void); > bool migrate_postcopy_blocktime(void); > > diff --git a/migration/ram.c b/migration/ram.c > index 912810c18e..01cc815410 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2581,7 +2581,7 @@ static void *do_data_decompress(void *opaque) > > ret = qemu_uncompress_data(¶m->stream, des, pagesize, > param->compbuf, len); > - if (ret < 0) { > + if (ret < 0 && migrate_decompress_error_check()) { > error_report("decompress data failed"); > qemu_file_set_error(decomp_file, ret); > } > diff --git a/qapi/migration.json b/qapi/migration.json > index f3974c6807..68222358e1 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -455,6 +455,17 @@ > # compression, so set the decompress-threads to the number about 1/4 > # of compress-threads is adequate. > # > +# @decompress-error-check: check decompression errors. When false, the errors > +# triggered by memory decompression are ignored. > +# When true, migration is aborted if the errors are > +# detected. For the old QEMU versions (< 2.13) the > +# internal design will cause decompression to fail > +# so the destination should completely ignore the > +# error conditions, i.e, make it be false if these > +# QEMUs are going to be migrated. Since 2.13, this > +# design is fixed, make it be true to avoid corrupting > +# the VM silently (Since 2.13) > +# > # @cpu-throttle-initial: Initial percentage of time guest cpus are throttled > # when migration auto-converge is activated. The > # default value is 20. (Since 2.7) > @@ -511,10 +522,10 @@ > ## > { 'enum': 'MigrationParameter', > 'data': ['compress-level', 'compress-threads', 'decompress-threads', > - 'cpu-throttle-initial', 'cpu-throttle-increment', > - 'tls-creds', 'tls-hostname', 'max-bandwidth', > - 'downtime-limit', 'x-checkpoint-delay', 'block-incremental', > - 'x-multifd-channels', 'x-multifd-page-count', > + 'decompress-error-check', 'cpu-throttle-initial', > + 'cpu-throttle-increment', 'tls-creds', 'tls-hostname', > + 'max-bandwidth', 'downtime-limit', 'x-checkpoint-delay', > + 'block-incremental', 'x-multifd-channels', 'x-multifd-page-count', > 'xbzrle-cache-size' ] } > > ## > @@ -526,6 +537,17 @@ > # > # @decompress-threads: decompression thread count > # > +# @decompress-error-check: check decompression errors. When false, the errors > +# triggered by memory decompression are ignored. > +# When true, migration is aborted if the errors are > +# detected. For the old QEMU versions (< 2.13) the > +# internal design will cause decompression to fail > +# so the destination should completely ignore the > +# error conditions, i.e, make it be false if these > +# QEMUs are going to be migrated. Since 2.13, this > +# design is fixed, make it be true to avoid corrupting > +# the VM silently (Since 2.13) > +# > # @cpu-throttle-initial: Initial percentage of time guest cpus are > # throttled when migration auto-converge is activated. > # The default value is 20. (Since 2.7) > @@ -591,6 +613,7 @@ > 'data': { '*compress-level': 'int', > '*compress-threads': 'int', > '*decompress-threads': 'int', > + '*decompress-error-check': 'bool', > '*cpu-throttle-initial': 'int', > '*cpu-throttle-increment': 'int', > '*tls-creds': 'StrOrNull', > @@ -630,6 +653,17 @@ > # > # @decompress-threads: decompression thread count > # > +# @decompress-error-check: check decompression errors. When false, the errors > +# triggered by memory decompression are ignored. > +# When true, migration is aborted if the errors are > +# detected. For the old QEMU versions (< 2.13) the > +# internal design will cause decompression to fail > +# so the destination should completely ignore the > +# error conditions, i.e, make it be false if these > +# QEMUs are going to be migrated. Since 2.13, this > +# design is fixed, make it be true to avoid corrupting > +# the VM silently (Since 2.13) > +# > # @cpu-throttle-initial: Initial percentage of time guest cpus are > # throttled when migration auto-converge is activated. > # (Since 2.7) > @@ -690,6 +724,7 @@ > 'data': { '*compress-level': 'uint8', > '*compress-threads': 'uint8', > '*decompress-threads': 'uint8', > + '*decompress-error-check': 'bool', > '*cpu-throttle-initial': 'uint8', > '*cpu-throttle-increment': 'uint8', > '*tls-creds': 'str', > -- > 2.14.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 04/26/2018 05:34 PM, Dr. David Alan Gilbert wrote: > * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote: >> From: Xiao Guangrong <xiaoguangrong@tencent.com> >> >> QEMU 2.13 enables strict check for compression & decompression to >> make the migration more robuster, that depends on the source to fix >> the internal design which triggers the unexpected error conditions > > Hmm that's painful! Yup, i will think it more carefully next time. :( > >> To make it work for migrating old version QEMU to 2.13 QEMU, we >> introduce this parameter to disable the error check on the >> destination >> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> >> --- >> hmp.c | 8 ++++++++ >> migration/migration.c | 19 +++++++++++++++++++ >> migration/migration.h | 1 + >> migration/ram.c | 2 +- >> qapi/migration.json | 43 +++++++++++++++++++++++++++++++++++++++---- >> 5 files changed, 68 insertions(+), 5 deletions(-) >> >> diff --git a/hmp.c b/hmp.c >> index 898e25f3e1..f0b934368b 100644 >> --- a/hmp.c >> +++ b/hmp.c >> @@ -329,6 +329,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) >> monitor_printf(mon, "%s: %u\n", >> MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS), >> params->decompress_threads); >> + assert(params->has_decompress_error_check); >> + monitor_printf(mon, "%s: %s\n", >> + MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK), >> + params->decompress_error_check ? "on" : "off"); > > Since it's a bool, this should be a migration-capability rather than a > parameter I think. > The parameter, @block-incremental, it is also a bool, i think it is okay to decompress-error-check as well, and it is one of the configurations of decompression, it is better to group them. Right? :) > Also, we need to make sure it defaults to off for compatibility. Yes, the parameter is allocated by zalloc, it has already been false on default, do you think we should make it be false explicitly? > > Other than that, I think it's OK. Thank you, Dave!
On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > QEMU 2.13 enables strict check for compression & decompression to > make the migration more robuster, that depends on the source to fix s/robuster/robust/ > the internal design which triggers the unexpected error conditions 2.13 hasn't been released yet. Why do we need a knob to explicitly turn off strict checking? Can we not instead make 2.13 automatically smart enough to tell if the incoming stream is coming from an older qemu (which might fail if the strict checks are enabled) vs. a newer qemu (the sender gave us what we need to ensure the strict checks are worthwhile)? > > To make it work for migrating old version QEMU to 2.13 QEMU, we > introduce this parameter to disable the error check on the > destination > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > --- > +++ b/qapi/migration.json > @@ -455,6 +455,17 @@ > # compression, so set the decompress-threads to the number about 1/4 > # of compress-threads is adequate. > # > +# @decompress-error-check: check decompression errors. When false, the errors > +# triggered by memory decompression are ignored. What are the consequences of such an error? Is it a security hole to leave this at false, when a malicious migration stream can cause us to misbehave by ignoring the errors? > +# When true, migration is aborted if the errors are > +# detected. For the old QEMU versions (< 2.13) the > +# internal design will cause decompression to fail > +# so the destination should completely ignore the > +# error conditions, i.e, make it be false if these > +# QEMUs are going to be migrated. Since 2.13, this > +# design is fixed, make it be true to avoid corrupting > +# the VM silently (Since 2.13) Rather wordy; I'd suggest: @decompress-error-check: Set to true to abort the migration if decompression errors are detected at the destination. Should be left at false (default) for qemu older than 2.13, since only newer qemu sends streams that do not trigger spurious decompression errors. (Since 2.13) But that's if we even need it (it SHOULD be possible to design something into the migration stream so that you can detect this property automatically instead of relying on the user to set the property).
On 04/26/2018 10:01 PM, Eric Blake wrote: > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: >> From: Xiao Guangrong <xiaoguangrong@tencent.com> >> >> QEMU 2.13 enables strict check for compression & decompression to >> make the migration more robuster, that depends on the source to fix > > s/robuster/robust/ > Will fix, thank you for pointing it out. >> the internal design which triggers the unexpected error conditions > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > off strict checking? Can we not instead make 2.13 automatically smart > enough to tell if the incoming stream is coming from an older qemu > (which might fail if the strict checks are enabled) vs. a newer qemu > (the sender gave us what we need to ensure the strict checks are > worthwhile)? > Really smart. How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, the destination will do strict check if got this command (i.e, new QEMU is running on the source), otherwise, turn the check off. >> >> To make it work for migrating old version QEMU to 2.13 QEMU, we >> introduce this parameter to disable the error check on the >> destination >> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> >> --- > >> +++ b/qapi/migration.json >> @@ -455,6 +455,17 @@ >> # compression, so set the decompress-threads to the number about 1/4 >> # of compress-threads is adequate. >> # >> +# @decompress-error-check: check decompression errors. When false, the errors >> +# triggered by memory decompression are ignored. > > What are the consequences of such an error? Is it a security hole to > leave this at false, when a malicious migration stream can cause us to > misbehave by ignoring the errors? The issue fixed by strict error check is avoiding VM corruption if zlib failed to compress & decompress the memory, i.e, catch error conditions returned by zlib API. > >> +# When true, migration is aborted if the errors are >> +# detected. For the old QEMU versions (< 2.13) the >> +# internal design will cause decompression to fail >> +# so the destination should completely ignore the >> +# error conditions, i.e, make it be false if these >> +# QEMUs are going to be migrated. Since 2.13, this >> +# design is fixed, make it be true to avoid corrupting >> +# the VM silently (Since 2.13) > > Rather wordy; I'd suggest: > > @decompress-error-check: Set to true to abort the migration if > decompression errors are detected at the destination. Should be > left at false (default) for qemu older than 2.13, since only > newer qemu sends streams that do not trigger spurious > decompression errors. (Since 2.13) > Yup, much better. > But that's if we even need it (it SHOULD be possible to design something > into the migration stream so that you can detect this property > automatically instead of relying on the user to set the property). > Yes, can not agree with you more. :)
On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote: > > > On 04/26/2018 10:01 PM, Eric Blake wrote: > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > QEMU 2.13 enables strict check for compression & decompression to > > > make the migration more robuster, that depends on the source to fix > > > > s/robuster/robust/ > > > > Will fix, thank you for pointing it out. > > > > the internal design which triggers the unexpected error conditions > > > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > > off strict checking? Can we not instead make 2.13 automatically smart > > enough to tell if the incoming stream is coming from an older qemu > > (which might fail if the strict checks are enabled) vs. a newer qemu > > (the sender gave us what we need to ensure the strict checks are > > worthwhile)? > > > > Really smart. > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, > the destination will do strict check if got this command (i.e, new > QEMU is running on the source), otherwise, turn the check off. Why not we just introduce a compat bit for that? I mean something like: 15c3850325 ("migration: move skip_section_footers", 2017-06-28). Then we turn that check bit off for <=2.12. Would that work? (I would suspect that's what Eric mean too) Regards,
On 04/27/2018 05:31 PM, Peter Xu wrote: > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote: >> >> >> On 04/26/2018 10:01 PM, Eric Blake wrote: >>> On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: >>>> From: Xiao Guangrong <xiaoguangrong@tencent.com> >>>> >>>> QEMU 2.13 enables strict check for compression & decompression to >>>> make the migration more robuster, that depends on the source to fix >>> >>> s/robuster/robust/ >>> >> >> Will fix, thank you for pointing it out. >> >>>> the internal design which triggers the unexpected error conditions >>> >>> 2.13 hasn't been released yet. Why do we need a knob to explicitly turn >>> off strict checking? Can we not instead make 2.13 automatically smart >>> enough to tell if the incoming stream is coming from an older qemu >>> (which might fail if the strict checks are enabled) vs. a newer qemu >>> (the sender gave us what we need to ensure the strict checks are >>> worthwhile)? >>> >> >> Really smart. >> >> How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, >> the destination will do strict check if got this command (i.e, new >> QEMU is running on the source), otherwise, turn the check off. > > Why not we just introduce a compat bit for that? I mean something > like: 15c3850325 ("migration: move skip_section_footers", > 2017-06-28). Then we turn that check bit off for <=2.12. > > Would that work? I am afraid it can not. :( The compat bit only impacts local behavior, however, in this case, we need the source QEMU to tell the destination if it supports strict error check.
* Xiao Guangrong (guangrong.xiao@gmail.com) wrote: > > > On 04/26/2018 10:01 PM, Eric Blake wrote: > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > QEMU 2.13 enables strict check for compression & decompression to > > > make the migration more robuster, that depends on the source to fix > > > > s/robuster/robust/ > > > > Will fix, thank you for pointing it out. > > > > the internal design which triggers the unexpected error conditions > > > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > > off strict checking? Can we not instead make 2.13 automatically smart > > enough to tell if the incoming stream is coming from an older qemu > > (which might fail if the strict checks are enabled) vs. a newer qemu > > (the sender gave us what we need to ensure the strict checks are > > worthwhile)? > > > > Really smart. > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, > the destination will do strict check if got this command (i.e, new > QEMU is running on the source), otherwise, turn the check off. The problem is that if it's received by an old qemu it will fail, so that breaks backwards migration which is unfortunate. > > > > > > To make it work for migrating old version QEMU to 2.13 QEMU, we > > > introduce this parameter to disable the error check on the > > > destination > > > > > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > > > --- > > > > > +++ b/qapi/migration.json > > > @@ -455,6 +455,17 @@ > > > # compression, so set the decompress-threads to the number about 1/4 > > > # of compress-threads is adequate. > > > # > > > +# @decompress-error-check: check decompression errors. When false, the errors > > > +# triggered by memory decompression are ignored. > > > > What are the consequences of such an error? Is it a security hole to > > leave this at false, when a malicious migration stream can cause us to > > misbehave by ignoring the errors? > > The issue fixed by strict error check is avoiding VM corruption if zlib failed > to compress & decompress the memory, i.e, catch error conditions returned by > zlib API. > > > > > > +# When true, migration is aborted if the errors are > > > +# detected. For the old QEMU versions (< 2.13) the > > > +# internal design will cause decompression to fail > > > +# so the destination should completely ignore the > > > +# error conditions, i.e, make it be false if these > > > +# QEMUs are going to be migrated. Since 2.13, this > > > +# design is fixed, make it be true to avoid corrupting > > > +# the VM silently (Since 2.13) > > > > Rather wordy; I'd suggest: > > > > @decompress-error-check: Set to true to abort the migration if > > decompression errors are detected at the destination. Should be > > left at false (default) for qemu older than 2.13, since only > > newer qemu sends streams that do not trigger spurious > > decompression errors. (Since 2.13) > > > > Yup, much better. > > > But that's if we even need it (it SHOULD be possible to design something > > into the migration stream so that you can detect this property > > automatically instead of relying on the user to set the property). > > > > Yes, can not agree with you more. :) The challenge is how to put something into the stream without breaking an old version of QEMU that's receiving the stream. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 04/27/2018 07:29 PM, Dr. David Alan Gilbert wrote: >> >> Yes, can not agree with you more. :) > > The challenge is how to put something into the stream without breaking > an old version of QEMU that's receiving the stream. > Er, i did not think this case :(. The new parameter as this patch did is the only idea now in my mind... not sure if Eric and Peter have another solution.
On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote: > > > On 04/27/2018 05:31 PM, Peter Xu wrote: > > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote: > > > > > > > > > On 04/26/2018 10:01 PM, Eric Blake wrote: > > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > > > > > QEMU 2.13 enables strict check for compression & decompression to > > > > > make the migration more robuster, that depends on the source to fix > > > > > > > > s/robuster/robust/ > > > > > > > > > > Will fix, thank you for pointing it out. > > > > > > > > the internal design which triggers the unexpected error conditions > > > > > > > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > > > > off strict checking? Can we not instead make 2.13 automatically smart > > > > enough to tell if the incoming stream is coming from an older qemu > > > > (which might fail if the strict checks are enabled) vs. a newer qemu > > > > (the sender gave us what we need to ensure the strict checks are > > > > worthwhile)? > > > > > > > > > > Really smart. > > > > > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, > > > the destination will do strict check if got this command (i.e, new > > > QEMU is running on the source), otherwise, turn the check off. > > > > Why not we just introduce a compat bit for that? I mean something > > like: 15c3850325 ("migration: move skip_section_footers", > > 2017-06-28). Then we turn that check bit off for <=2.12. > > > > Would that work? > > I am afraid it can not. :( > > The compat bit only impacts local behavior, however, in this case, we > need the source QEMU to tell the destination if it supports strict > error check. My understanding is that the new compat bit will only take effect when at destination. I'm not sure I'm thinking that correctly. I'll give some examples. When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with (using q35 as example, always) "-M pc-q35-2.12" to make the migration work, so this will let the destination QEMU stop checking decompressing errors. IMHO that's what we want so it's fine (forward migration). When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking decompression errors, so it's fine too even if we don't send some compress-errored pages. Then, would this mean that the compat bit could work too just like this patch? AFAIU the compat bit idea is very similar to current patch, however we don't really need a new parameter to make things complicated, we just let old QEMUs behave differently and automatically, then user won't need to worry about manually specify that parameter. Thanks,
* Peter Xu (peterx@redhat.com) wrote: > On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote: > > > > > > On 04/27/2018 05:31 PM, Peter Xu wrote: > > > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote: > > > > > > > > > > > > On 04/26/2018 10:01 PM, Eric Blake wrote: > > > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > > > > > > > QEMU 2.13 enables strict check for compression & decompression to > > > > > > make the migration more robuster, that depends on the source to fix > > > > > > > > > > s/robuster/robust/ > > > > > > > > > > > > > Will fix, thank you for pointing it out. > > > > > > > > > > the internal design which triggers the unexpected error conditions > > > > > > > > > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > > > > > off strict checking? Can we not instead make 2.13 automatically smart > > > > > enough to tell if the incoming stream is coming from an older qemu > > > > > (which might fail if the strict checks are enabled) vs. a newer qemu > > > > > (the sender gave us what we need to ensure the strict checks are > > > > > worthwhile)? > > > > > > > > > > > > > Really smart. > > > > > > > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, > > > > the destination will do strict check if got this command (i.e, new > > > > QEMU is running on the source), otherwise, turn the check off. > > > > > > Why not we just introduce a compat bit for that? I mean something > > > like: 15c3850325 ("migration: move skip_section_footers", > > > 2017-06-28). Then we turn that check bit off for <=2.12. > > > > > > Would that work? > > > > I am afraid it can not. :( > > > > The compat bit only impacts local behavior, however, in this case, we > > need the source QEMU to tell the destination if it supports strict > > error check. > > My understanding is that the new compat bit will only take effect when > at destination. > > I'm not sure I'm thinking that correctly. I'll give some examples. > > When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with > (using q35 as example, always) "-M pc-q35-2.12" to make the migration > work, so this will let the destination QEMU stop checking > decompressing errors. IMHO that's what we want so it's fine (forward > migration). > > When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking > decompression errors, so it's fine too even if we don't send some > compress-errored pages. > > Then, would this mean that the compat bit could work too just like > this patch? AFAIU the compat bit idea is very similar to current > patch, however we don't really need a new parameter to make things > complicated, we just let old QEMUs behave differently and > automatically, then user won't need to worry about manually specify > that parameter. I think you're saying just to wire it to the machine type for receive; that would work and would be fairly simple, although wouldn't provide the protection when going from new->new using an old machine type. Dave > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, May 02, 2018 at 03:57:13PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote: > > > > > > > > > On 04/27/2018 05:31 PM, Peter Xu wrote: > > > > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote: > > > > > > > > > > > > > > > On 04/26/2018 10:01 PM, Eric Blake wrote: > > > > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote: > > > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > > > > > > > > > > > > > QEMU 2.13 enables strict check for compression & decompression to > > > > > > > make the migration more robuster, that depends on the source to fix > > > > > > > > > > > > s/robuster/robust/ > > > > > > > > > > > > > > > > Will fix, thank you for pointing it out. > > > > > > > > > > > > the internal design which triggers the unexpected error conditions > > > > > > > > > > > > 2.13 hasn't been released yet. Why do we need a knob to explicitly turn > > > > > > off strict checking? Can we not instead make 2.13 automatically smart > > > > > > enough to tell if the incoming stream is coming from an older qemu > > > > > > (which might fail if the strict checks are enabled) vs. a newer qemu > > > > > > (the sender gave us what we need to ensure the strict checks are > > > > > > worthwhile)? > > > > > > > > > > > > > > > > Really smart. > > > > > > > > > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK, > > > > > the destination will do strict check if got this command (i.e, new > > > > > QEMU is running on the source), otherwise, turn the check off. > > > > > > > > Why not we just introduce a compat bit for that? I mean something > > > > like: 15c3850325 ("migration: move skip_section_footers", > > > > 2017-06-28). Then we turn that check bit off for <=2.12. > > > > > > > > Would that work? > > > > > > I am afraid it can not. :( > > > > > > The compat bit only impacts local behavior, however, in this case, we > > > need the source QEMU to tell the destination if it supports strict > > > error check. > > > > My understanding is that the new compat bit will only take effect when > > at destination. > > > > I'm not sure I'm thinking that correctly. I'll give some examples. > > > > When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with > > (using q35 as example, always) "-M pc-q35-2.12" to make the migration > > work, so this will let the destination QEMU stop checking > > decompressing errors. IMHO that's what we want so it's fine (forward > > migration). > > > > When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking > > decompression errors, so it's fine too even if we don't send some > > compress-errored pages. > > > > Then, would this mean that the compat bit could work too just like > > this patch? AFAIU the compat bit idea is very similar to current > > patch, however we don't really need a new parameter to make things > > complicated, we just let old QEMUs behave differently and > > automatically, then user won't need to worry about manually specify > > that parameter. > > I think you're saying just to wire it to the machine type for receive; > that would work and would be fairly simple, although wouldn't provide > the protection when going from new->new using an old machine type. Yes. But actually we can still leverage the protection even with new->new and old machine types - we just need to explicitly override that parameter on both sides (instead of explicitly disalbe that on old ones): -M pc-q35-2.12 -global migration.x-error-decompress-check=true After all the user already specified "-M pc-q35-2.12" explicitly rather than using the default 2.13 one, I would consider he/she an advanced user. Then IMHO it would be acceptable to make this explicit too when the user really wants that. (Will that happen a lot when people still use old machine types even if they are creating new VMs?)
diff --git a/hmp.c b/hmp.c index 898e25f3e1..f0b934368b 100644 --- a/hmp.c +++ b/hmp.c @@ -329,6 +329,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) monitor_printf(mon, "%s: %u\n", MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS), params->decompress_threads); + assert(params->has_decompress_error_check); + monitor_printf(mon, "%s: %s\n", + MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK), + params->decompress_error_check ? "on" : "off"); assert(params->has_cpu_throttle_initial); monitor_printf(mon, "%s: %u\n", MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL), @@ -1593,6 +1597,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) p->has_decompress_threads = true; visit_type_int(v, param, &p->decompress_threads, &err); break; + case MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK: + p->has_decompress_error_check = true; + visit_type_bool(v, param, &p->decompress_error_check, &err); + break; case MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL: p->has_cpu_throttle_initial = true; visit_type_int(v, param, &p->cpu_throttle_initial, &err); diff --git a/migration/migration.c b/migration/migration.c index 0bdb28e144..1eef084763 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -534,6 +534,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) params->compress_threads = s->parameters.compress_threads; params->has_decompress_threads = true; params->decompress_threads = s->parameters.decompress_threads; + params->has_decompress_error_check = true; + params->decompress_error_check = s->parameters.decompress_error_check; params->has_cpu_throttle_initial = true; params->cpu_throttle_initial = s->parameters.cpu_throttle_initial; params->has_cpu_throttle_increment = true; @@ -917,6 +919,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params, dest->decompress_threads = params->decompress_threads; } + if (params->has_decompress_error_check) { + dest->decompress_error_check = params->decompress_error_check; + } + if (params->has_cpu_throttle_initial) { dest->cpu_throttle_initial = params->cpu_throttle_initial; } @@ -979,6 +985,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) s->parameters.decompress_threads = params->decompress_threads; } + if (params->has_decompress_error_check) { + s->parameters.decompress_error_check = params->decompress_error_check; + } + if (params->has_cpu_throttle_initial) { s->parameters.cpu_throttle_initial = params->cpu_throttle_initial; } @@ -1620,6 +1630,15 @@ int migrate_decompress_threads(void) return s->parameters.decompress_threads; } +bool migrate_decompress_error_check(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.decompress_error_check; +} + bool migrate_dirty_bitmaps(void) { MigrationState *s; diff --git a/migration/migration.h b/migration/migration.h index 7c69598c54..5efbbaf9d8 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -241,6 +241,7 @@ bool migrate_use_compression(void); int migrate_compress_level(void); int migrate_compress_threads(void); int migrate_decompress_threads(void); +bool migrate_decompress_error_check(void); bool migrate_use_events(void); bool migrate_postcopy_blocktime(void); diff --git a/migration/ram.c b/migration/ram.c index 912810c18e..01cc815410 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2581,7 +2581,7 @@ static void *do_data_decompress(void *opaque) ret = qemu_uncompress_data(¶m->stream, des, pagesize, param->compbuf, len); - if (ret < 0) { + if (ret < 0 && migrate_decompress_error_check()) { error_report("decompress data failed"); qemu_file_set_error(decomp_file, ret); } diff --git a/qapi/migration.json b/qapi/migration.json index f3974c6807..68222358e1 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -455,6 +455,17 @@ # compression, so set the decompress-threads to the number about 1/4 # of compress-threads is adequate. # +# @decompress-error-check: check decompression errors. When false, the errors +# triggered by memory decompression are ignored. +# When true, migration is aborted if the errors are +# detected. For the old QEMU versions (< 2.13) the +# internal design will cause decompression to fail +# so the destination should completely ignore the +# error conditions, i.e, make it be false if these +# QEMUs are going to be migrated. Since 2.13, this +# design is fixed, make it be true to avoid corrupting +# the VM silently (Since 2.13) +# # @cpu-throttle-initial: Initial percentage of time guest cpus are throttled # when migration auto-converge is activated. The # default value is 20. (Since 2.7) @@ -511,10 +522,10 @@ ## { 'enum': 'MigrationParameter', 'data': ['compress-level', 'compress-threads', 'decompress-threads', - 'cpu-throttle-initial', 'cpu-throttle-increment', - 'tls-creds', 'tls-hostname', 'max-bandwidth', - 'downtime-limit', 'x-checkpoint-delay', 'block-incremental', - 'x-multifd-channels', 'x-multifd-page-count', + 'decompress-error-check', 'cpu-throttle-initial', + 'cpu-throttle-increment', 'tls-creds', 'tls-hostname', + 'max-bandwidth', 'downtime-limit', 'x-checkpoint-delay', + 'block-incremental', 'x-multifd-channels', 'x-multifd-page-count', 'xbzrle-cache-size' ] } ## @@ -526,6 +537,17 @@ # # @decompress-threads: decompression thread count # +# @decompress-error-check: check decompression errors. When false, the errors +# triggered by memory decompression are ignored. +# When true, migration is aborted if the errors are +# detected. For the old QEMU versions (< 2.13) the +# internal design will cause decompression to fail +# so the destination should completely ignore the +# error conditions, i.e, make it be false if these +# QEMUs are going to be migrated. Since 2.13, this +# design is fixed, make it be true to avoid corrupting +# the VM silently (Since 2.13) +# # @cpu-throttle-initial: Initial percentage of time guest cpus are # throttled when migration auto-converge is activated. # The default value is 20. (Since 2.7) @@ -591,6 +613,7 @@ 'data': { '*compress-level': 'int', '*compress-threads': 'int', '*decompress-threads': 'int', + '*decompress-error-check': 'bool', '*cpu-throttle-initial': 'int', '*cpu-throttle-increment': 'int', '*tls-creds': 'StrOrNull', @@ -630,6 +653,17 @@ # # @decompress-threads: decompression thread count # +# @decompress-error-check: check decompression errors. When false, the errors +# triggered by memory decompression are ignored. +# When true, migration is aborted if the errors are +# detected. For the old QEMU versions (< 2.13) the +# internal design will cause decompression to fail +# so the destination should completely ignore the +# error conditions, i.e, make it be false if these +# QEMUs are going to be migrated. Since 2.13, this +# design is fixed, make it be true to avoid corrupting +# the VM silently (Since 2.13) +# # @cpu-throttle-initial: Initial percentage of time guest cpus are # throttled when migration auto-converge is activated. # (Since 2.7) @@ -690,6 +724,7 @@ 'data': { '*compress-level': 'uint8', '*compress-threads': 'uint8', '*decompress-threads': 'uint8', + '*decompress-error-check': 'bool', '*cpu-throttle-initial': 'uint8', '*cpu-throttle-increment': 'uint8', '*tls-creds': 'str',