Message ID | 20210317044441.112313-1-gshan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/char/pl011: Fix clock migration failure | expand |
On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: > > There is a added clock to trace buad rate change since v5.2.0 by > commit aac63e0e6ea3 ("hw/char/pl011: add a clock input"). The added > clock causes migration failure. For example, migration from v5.2.0 > to v5.1.0 can fail with the following error messages: > > qemu-system-aarch64: error while loading state for instance 0x0 \ > of device 'pl011' > qemu-system-aarch64: load of migration failed: No such file or \ > directory > > This fixes the issue by reporting the baud rate change at post load > time so that the clock won't be migrated by sub-section to avoid the > migration failure. > > Cc: qemu-stable@nongnu.org > Fixes: aac63e0e6ea3 ("hw/char/pl011: add a clock input") > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > hw/char/pl011.c | 22 +++++++++------------- > 1 file changed, 9 insertions(+), 13 deletions(-) > > diff --git a/hw/char/pl011.c b/hw/char/pl011.c > index c5621a195f..401bd28536 100644 > --- a/hw/char/pl011.c > +++ b/hw/char/pl011.c > @@ -322,20 +322,20 @@ static const MemoryRegionOps pl011_ops = { > .endianness = DEVICE_NATIVE_ENDIAN, > }; > > -static const VMStateDescription vmstate_pl011_clock = { > - .name = "pl011/clock", > - .version_id = 1, > - .minimum_version_id = 1, > - .fields = (VMStateField[]) { > - VMSTATE_CLOCK(clk, PL011State), > - VMSTATE_END_OF_LIST() > - } > -}; > +static int pl011_post_load(void *opaque, int version_id) > +{ > + PL011State *s = PL011(opaque); > + > + pl011_trace_baudrate_change(s); > + > + return 0; > +} > > static const VMStateDescription vmstate_pl011 = { > .name = "pl011", > .version_id = 2, > .minimum_version_id = 2, > + .post_load = pl011_post_load, > .fields = (VMStateField[]) { > VMSTATE_UINT32(readbuff, PL011State), > VMSTATE_UINT32(flags, PL011State), > @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { > VMSTATE_INT32(read_trigger, PL011State), > VMSTATE_END_OF_LIST() > }, > - .subsections = (const VMStateDescription * []) { > - &vmstate_pl011_clock, > - NULL > - } > }; Doesn't dropping the subsection break migration compat ? thanks -- PMM
Hi Peter, On 3/17/21 8:09 PM, Peter Maydell wrote: > On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: >> >> There is a added clock to trace buad rate change since v5.2.0 by >> commit aac63e0e6ea3 ("hw/char/pl011: add a clock input"). The added >> clock causes migration failure. For example, migration from v5.2.0 >> to v5.1.0 can fail with the following error messages: >> >> qemu-system-aarch64: error while loading state for instance 0x0 \ >> of device 'pl011' >> qemu-system-aarch64: load of migration failed: No such file or \ >> directory >> >> This fixes the issue by reporting the baud rate change at post load >> time so that the clock won't be migrated by sub-section to avoid the >> migration failure. >> >> Cc: qemu-stable@nongnu.org >> Fixes: aac63e0e6ea3 ("hw/char/pl011: add a clock input") >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> --- >> hw/char/pl011.c | 22 +++++++++------------- >> 1 file changed, 9 insertions(+), 13 deletions(-) >> >> diff --git a/hw/char/pl011.c b/hw/char/pl011.c >> index c5621a195f..401bd28536 100644 >> --- a/hw/char/pl011.c >> +++ b/hw/char/pl011.c >> @@ -322,20 +322,20 @@ static const MemoryRegionOps pl011_ops = { >> .endianness = DEVICE_NATIVE_ENDIAN, >> }; >> >> -static const VMStateDescription vmstate_pl011_clock = { >> - .name = "pl011/clock", >> - .version_id = 1, >> - .minimum_version_id = 1, >> - .fields = (VMStateField[]) { >> - VMSTATE_CLOCK(clk, PL011State), >> - VMSTATE_END_OF_LIST() >> - } >> -}; >> +static int pl011_post_load(void *opaque, int version_id) >> +{ >> + PL011State *s = PL011(opaque); >> + >> + pl011_trace_baudrate_change(s); >> + >> + return 0; >> +} >> >> static const VMStateDescription vmstate_pl011 = { >> .name = "pl011", >> .version_id = 2, >> .minimum_version_id = 2, >> + .post_load = pl011_post_load, >> .fields = (VMStateField[]) { >> VMSTATE_UINT32(readbuff, PL011State), >> VMSTATE_UINT32(flags, PL011State), >> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { >> VMSTATE_INT32(read_trigger, PL011State), >> VMSTATE_END_OF_LIST() >> }, >> - .subsections = (const VMStateDescription * []) { >> - &vmstate_pl011_clock, >> - NULL >> - } >> }; > > Doesn't dropping the subsection break migration compat ? > It's why this patch needs to be backported to stable branches. In that way, we won't have migration compatible issue. Thanks, Gavin
On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: > > Hi Peter, > > On 3/17/21 8:09 PM, Peter Maydell wrote: > > On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: > >> > >> static const VMStateDescription vmstate_pl011 = { > >> .name = "pl011", > >> .version_id = 2, > >> .minimum_version_id = 2, > >> + .post_load = pl011_post_load, > >> .fields = (VMStateField[]) { > >> VMSTATE_UINT32(readbuff, PL011State), > >> VMSTATE_UINT32(flags, PL011State), > >> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { > >> VMSTATE_INT32(read_trigger, PL011State), > >> VMSTATE_END_OF_LIST() > >> }, > >> - .subsections = (const VMStateDescription * []) { > >> - &vmstate_pl011_clock, > >> - NULL > >> - } > >> }; > > > > Doesn't dropping the subsection break migration compat ? > > > > It's why this patch needs to be backported to stable branches. > In that way, we won't have migration compatible issue. No, migration has to work from the existing already shipped 5.1, 5.2, etc releases to 6.0 (assuming you use the correct "virt-5.2" &c versioned machine type.) thanks -- PMM
Hi Peter, On 3/17/21 9:40 PM, Peter Maydell wrote: > On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: >> On 3/17/21 8:09 PM, Peter Maydell wrote: >>> On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: >>>> >>>> static const VMStateDescription vmstate_pl011 = { >>>> .name = "pl011", >>>> .version_id = 2, >>>> .minimum_version_id = 2, >>>> + .post_load = pl011_post_load, >>>> .fields = (VMStateField[]) { >>>> VMSTATE_UINT32(readbuff, PL011State), >>>> VMSTATE_UINT32(flags, PL011State), >>>> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { >>>> VMSTATE_INT32(read_trigger, PL011State), >>>> VMSTATE_END_OF_LIST() >>>> }, >>>> - .subsections = (const VMStateDescription * []) { >>>> - &vmstate_pl011_clock, >>>> - NULL >>>> - } >>>> }; >>> >>> Doesn't dropping the subsection break migration compat ? >>> >> >> It's why this patch needs to be backported to stable branches. >> In that way, we won't have migration compatible issue. > > No, migration has to work from the existing already > shipped 5.1, 5.2, etc releases to 6.0 (assuming you use > the correct "virt-5.2" &c versioned machine type.) > Commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") is merged to v5.2.0. The migration failure happens during migration from v6.0 to v5.1 with machine type as "virt-5.1", instead of migrating from v5.1 to v6.0. One question is if we need support backwards migration? If we do support backwards migration, I think there are two options: (1) merge this patch and backport it to v5.2+; (2) Backport commit aac63e0e6ea3 to v5.2-. I guess (1) would be right way to go because it's less effort than (2). Besides, the clock needn't to be migrated if I'm correct. Thanks, Gavin
On Wed, 17 Mar 2021 at 10:59, Gavin Shan <gshan@redhat.com> wrote: > > Hi Peter, > > On 3/17/21 9:40 PM, Peter Maydell wrote: > > On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: > >> On 3/17/21 8:09 PM, Peter Maydell wrote: > >>> On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: > >>>> > >>>> static const VMStateDescription vmstate_pl011 = { > >>>> .name = "pl011", > >>>> .version_id = 2, > >>>> .minimum_version_id = 2, > >>>> + .post_load = pl011_post_load, > >>>> .fields = (VMStateField[]) { > >>>> VMSTATE_UINT32(readbuff, PL011State), > >>>> VMSTATE_UINT32(flags, PL011State), > >>>> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { > >>>> VMSTATE_INT32(read_trigger, PL011State), > >>>> VMSTATE_END_OF_LIST() > >>>> }, > >>>> - .subsections = (const VMStateDescription * []) { > >>>> - &vmstate_pl011_clock, > >>>> - NULL > >>>> - } > >>>> }; > >>> > >>> Doesn't dropping the subsection break migration compat ? > >>> > >> > >> It's why this patch needs to be backported to stable branches. > >> In that way, we won't have migration compatible issue. > > > > No, migration has to work from the existing already > > shipped 5.1, 5.2, etc releases to 6.0 (assuming you use > > the correct "virt-5.2" &c versioned machine type.) > > > > Commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") is merged > to v5.2.0. The migration failure happens during migration from v6.0 > to v5.1 with machine type as "virt-5.1", instead of migrating from > v5.1 to v6.0. One question is if we need support backwards migration? Upstream doesn't care about backwards migration. AIUI RedHat as a downstream care about the backwards-migration case in some specific situations, but I don't know if that would include this one. I misread the commit message of this patch and hadn't realised that it was talking about a 5.2 -> 5.1 migration. From upstream's point of view, commit aac63e0e6ea3 is fine because it preserves forwards migration (5.1 -> 5.2). If there's a change that makes RH-as-a-downstream's life easier without breaking upstream's forward-compat requirements, we can look at it: but unless I'm misreading this patch, it would break 5.2 -> 6.0 migration, because 5.2 sends the subsection and 6.0 with this patch would say "I don't know how to deal with this subsection I've been sent". > If we do support backwards migration, I think there are two options: > (1) merge this patch and backport it to v5.2+; (2) Backport commit > aac63e0e6ea3 to v5.2-. I guess (1) would be right way to go because > it's less effort than (2). Besides, the clock needn't to be migrated > if I'm correct. You can't fix a backwards migration issue (if you care about it) by backporting any patch to anywhere -- you need to deal with what has already shipped. thanks -- PMM
On Wed, Mar 17, 2021 at 11:14:56AM +0000, Peter Maydell wrote: > On Wed, 17 Mar 2021 at 10:59, Gavin Shan <gshan@redhat.com> wrote: > > > > Hi Peter, > > > > On 3/17/21 9:40 PM, Peter Maydell wrote: > > > On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: > > >> On 3/17/21 8:09 PM, Peter Maydell wrote: > > >>> On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: > > >>>> > > >>>> static const VMStateDescription vmstate_pl011 = { > > >>>> .name = "pl011", > > >>>> .version_id = 2, > > >>>> .minimum_version_id = 2, > > >>>> + .post_load = pl011_post_load, > > >>>> .fields = (VMStateField[]) { > > >>>> VMSTATE_UINT32(readbuff, PL011State), > > >>>> VMSTATE_UINT32(flags, PL011State), > > >>>> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { > > >>>> VMSTATE_INT32(read_trigger, PL011State), > > >>>> VMSTATE_END_OF_LIST() > > >>>> }, > > >>>> - .subsections = (const VMStateDescription * []) { > > >>>> - &vmstate_pl011_clock, > > >>>> - NULL > > >>>> - } > > >>>> }; > > >>> > > >>> Doesn't dropping the subsection break migration compat ? > > >>> > > >> > > >> It's why this patch needs to be backported to stable branches. > > >> In that way, we won't have migration compatible issue. > > > > > > No, migration has to work from the existing already > > > shipped 5.1, 5.2, etc releases to 6.0 (assuming you use > > > the correct "virt-5.2" &c versioned machine type.) > > > > > > > Commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") is merged > > to v5.2.0. The migration failure happens during migration from v6.0 > > to v5.1 with machine type as "virt-5.1", instead of migrating from > > v5.1 to v6.0. One question is if we need support backwards migration? > > Upstream doesn't care about backwards migration. AIUI > RedHat as a downstream care about the backwards-migration > case in some specific situations, but I don't know if that > would include this one. Right, we do prefer to be able to support "ping-pong" migrations. For example, if we start a virt-5.1 machine on a 5.1 build of QEMU, and then migrate it to a 5.2 build of QEMU, we'd like to also be able to go back to the 5.1 build. I agree this patch is not the right approach. I think the right approach is to introduce a compat property and make the "new" section dependent on it. And then update the hw_compat_* arrays. Gavin, please take a look at "Connecting subsections to properties" of docs/devel/migration.rst. I'm also curious what the state of mach-virt's machine types are for migration. It'd be nice to exhaustively test both forward migration of all machine types and ping-pong migrations of all machine types. We can then consider each issue we find (the pessimist in me suggests we'll find more than this pl011 issue) and how/if we want to resolve them. Thanks, drew
+Beraldo On 3/17/21 1:54 PM, Andrew Jones wrote: > On Wed, Mar 17, 2021 at 11:14:56AM +0000, Peter Maydell wrote: >> On Wed, 17 Mar 2021 at 10:59, Gavin Shan <gshan@redhat.com> wrote: >>> >>> Hi Peter, >>> >>> On 3/17/21 9:40 PM, Peter Maydell wrote: >>>> On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: >>>>> On 3/17/21 8:09 PM, Peter Maydell wrote: >>>>>> On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: >>>>>>> >>>>>>> static const VMStateDescription vmstate_pl011 = { >>>>>>> .name = "pl011", >>>>>>> .version_id = 2, >>>>>>> .minimum_version_id = 2, >>>>>>> + .post_load = pl011_post_load, >>>>>>> .fields = (VMStateField[]) { >>>>>>> VMSTATE_UINT32(readbuff, PL011State), >>>>>>> VMSTATE_UINT32(flags, PL011State), >>>>>>> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { >>>>>>> VMSTATE_INT32(read_trigger, PL011State), >>>>>>> VMSTATE_END_OF_LIST() >>>>>>> }, >>>>>>> - .subsections = (const VMStateDescription * []) { >>>>>>> - &vmstate_pl011_clock, >>>>>>> - NULL >>>>>>> - } >>>>>>> }; >>>>>> >>>>>> Doesn't dropping the subsection break migration compat ? >>>>>> >>>>> >>>>> It's why this patch needs to be backported to stable branches. >>>>> In that way, we won't have migration compatible issue. >>>> >>>> No, migration has to work from the existing already >>>> shipped 5.1, 5.2, etc releases to 6.0 (assuming you use >>>> the correct "virt-5.2" &c versioned machine type.) >>>> >>> >>> Commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") is merged >>> to v5.2.0. The migration failure happens during migration from v6.0 >>> to v5.1 with machine type as "virt-5.1", instead of migrating from >>> v5.1 to v6.0. One question is if we need support backwards migration? >> >> Upstream doesn't care about backwards migration. AIUI >> RedHat as a downstream care about the backwards-migration >> case in some specific situations, but I don't know if that >> would include this one. > > Right, we do prefer to be able to support "ping-pong" migrations. For > example, if we start a virt-5.1 machine on a 5.1 build of QEMU, and then > migrate it to a 5.2 build of QEMU, we'd like to also be able to go back > to the 5.1 build. > > I agree this patch is not the right approach. I think the right approach > is to introduce a compat property and make the "new" section dependent > on it. And then update the hw_compat_* arrays. Gavin, please take a look > at "Connecting subsections to properties" of docs/devel/migration.rst. > > I'm also curious what the state of mach-virt's machine types are for > migration. It'd be nice to exhaustively test both forward migration of > all machine types and ping-pong migrations of all machine types. FYI this test has been suggested to the Avocado team few times. They might already have a ticket to track any progress. > We can > then consider each issue we find (the pessimist in me suggests we'll find > more than this pl011 issue) and how/if we want to resolve them.
On Wed, 17 Mar 2021 at 12:55, Andrew Jones <drjones@redhat.com> wrote: > I'm also curious what the state of mach-virt's machine types are for > migration. Probably not great -- I don't think anybody is really testing cross-version migration, and I don't think there's a great deal of in-practice use of it for Arm either. (See also the issue with accidentally having env->features in the CPU migration data, which broke cross-version migration: that was around a while and we only got one user complaint about it.) Unless we have a serious test suite for this kind of thing upstream it's just going to continue to be broken, because at the moment all we have is "people make best-efforts to think about migration compat when coding and reviewing, but don't actually test". thanks -- PMM
Hi Drew, On 3/17/21 11:54 PM, Andrew Jones wrote: > On Wed, Mar 17, 2021 at 11:14:56AM +0000, Peter Maydell wrote: >> On Wed, 17 Mar 2021 at 10:59, Gavin Shan <gshan@redhat.com> wrote: >>> On 3/17/21 9:40 PM, Peter Maydell wrote: >>>> On Wed, 17 Mar 2021 at 10:37, Gavin Shan <gshan@redhat.com> wrote: >>>>> On 3/17/21 8:09 PM, Peter Maydell wrote: >>>>>> On Wed, 17 Mar 2021 at 04:44, Gavin Shan <gshan@redhat.com> wrote: >>>>>>> >>>>>>> static const VMStateDescription vmstate_pl011 = { >>>>>>> .name = "pl011", >>>>>>> .version_id = 2, >>>>>>> .minimum_version_id = 2, >>>>>>> + .post_load = pl011_post_load, >>>>>>> .fields = (VMStateField[]) { >>>>>>> VMSTATE_UINT32(readbuff, PL011State), >>>>>>> VMSTATE_UINT32(flags, PL011State), >>>>>>> @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { >>>>>>> VMSTATE_INT32(read_trigger, PL011State), >>>>>>> VMSTATE_END_OF_LIST() >>>>>>> }, >>>>>>> - .subsections = (const VMStateDescription * []) { >>>>>>> - &vmstate_pl011_clock, >>>>>>> - NULL >>>>>>> - } >>>>>>> }; >>>>>> >>>>>> Doesn't dropping the subsection break migration compat ? >>>>>> >>>>> >>>>> It's why this patch needs to be backported to stable branches. >>>>> In that way, we won't have migration compatible issue. >>>> >>>> No, migration has to work from the existing already >>>> shipped 5.1, 5.2, etc releases to 6.0 (assuming you use >>>> the correct "virt-5.2" &c versioned machine type.) >>>> >>> >>> Commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") is merged >>> to v5.2.0. The migration failure happens during migration from v6.0 >>> to v5.1 with machine type as "virt-5.1", instead of migrating from >>> v5.1 to v6.0. One question is if we need support backwards migration? >> >> Upstream doesn't care about backwards migration. AIUI >> RedHat as a downstream care about the backwards-migration >> case in some specific situations, but I don't know if that >> would include this one. > > Right, we do prefer to be able to support "ping-pong" migrations. For > example, if we start a virt-5.1 machine on a 5.1 build of QEMU, and then > migrate it to a 5.2 build of QEMU, we'd like to also be able to go back > to the 5.1 build. > > I agree this patch is not the right approach. I think the right approach > is to introduce a compat property and make the "new" section dependent > on it. And then update the hw_compat_* arrays. Gavin, please take a look > at "Connecting subsections to properties" of docs/devel/migration.rst. > Agree and thanks for the pointer. I will post another patch to have something in hw_compat_5_1 to address this particular issue. > I'm also curious what the state of mach-virt's machine types are for > migration. It'd be nice to exhaustively test both forward migration of > all machine types and ping-pong migrations of all machine types. We can > then consider each issue we find (the pessimist in me suggests we'll find > more than this pl011 issue) and how/if we want to resolve them. > Yeah, I will think about it and do the testing to see if there are more issues. Also, it'd better to be integrated to existing testing framework as you suggested. Thanks, Gavin
diff --git a/hw/char/pl011.c b/hw/char/pl011.c index c5621a195f..401bd28536 100644 --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -322,20 +322,20 @@ static const MemoryRegionOps pl011_ops = { .endianness = DEVICE_NATIVE_ENDIAN, }; -static const VMStateDescription vmstate_pl011_clock = { - .name = "pl011/clock", - .version_id = 1, - .minimum_version_id = 1, - .fields = (VMStateField[]) { - VMSTATE_CLOCK(clk, PL011State), - VMSTATE_END_OF_LIST() - } -}; +static int pl011_post_load(void *opaque, int version_id) +{ + PL011State *s = PL011(opaque); + + pl011_trace_baudrate_change(s); + + return 0; +} static const VMStateDescription vmstate_pl011 = { .name = "pl011", .version_id = 2, .minimum_version_id = 2, + .post_load = pl011_post_load, .fields = (VMStateField[]) { VMSTATE_UINT32(readbuff, PL011State), VMSTATE_UINT32(flags, PL011State), @@ -355,10 +355,6 @@ static const VMStateDescription vmstate_pl011 = { VMSTATE_INT32(read_trigger, PL011State), VMSTATE_END_OF_LIST() }, - .subsections = (const VMStateDescription * []) { - &vmstate_pl011_clock, - NULL - } }; static Property pl011_properties[] = {
There is a added clock to trace buad rate change since v5.2.0 by commit aac63e0e6ea3 ("hw/char/pl011: add a clock input"). The added clock causes migration failure. For example, migration from v5.2.0 to v5.1.0 can fail with the following error messages: qemu-system-aarch64: error while loading state for instance 0x0 \ of device 'pl011' qemu-system-aarch64: load of migration failed: No such file or \ directory This fixes the issue by reporting the baud rate change at post load time so that the clock won't be migrated by sub-section to avoid the migration failure. Cc: qemu-stable@nongnu.org Fixes: aac63e0e6ea3 ("hw/char/pl011: add a clock input") Signed-off-by: Gavin Shan <gshan@redhat.com> --- hw/char/pl011.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)