Message ID | 1532959766-53343-1-git-send-email-borntraeger@de.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] s390x/sclp: fix maxram calculation | expand |
On 30.07.2018 16:09, Christian Borntraeger wrote: > We clamp down ram_size to match the sclp increment size. We do > not do the same for maxram_size, which means for large guests > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > This can break other code (e.g. CMMA migration) which uses maxram_size > to calculate the number of pages and then throws some errors. So the only problem is that the buffer size between source and target differ? > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > CC: qemu-stable@nongnu.org > CC: David Hildenbrand <david@redhat.com> > --- > hw/s390x/sclp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > index bd2a024..4510a80 100644 > --- a/hw/s390x/sclp.c > +++ b/hw/s390x/sclp.c > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > initial_mem = initial_mem >> increment_size << increment_size; > > machine->ram_size = initial_mem; > + machine->maxram_size = initial_mem; > /* let's propagate the changed ram size into the global variable. */ > ram_size = initial_mem; > } > I even have a private patch for that already :)
On 07/30/2018 04:34 PM, David Hildenbrand wrote: > On 30.07.2018 16:09, Christian Borntraeger wrote: >> We clamp down ram_size to match the sclp increment size. We do >> not do the same for maxram_size, which means for large guests >> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. >> This can break other code (e.g. CMMA migration) which uses maxram_size >> to calculate the number of pages and then throws some errors. > > So the only problem is that the buffer size between source and target > differ? The problem is that the target tries to access a non-existing buffer when committing all cmma value, so the kernel returns with EFAULT. > >> >> Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >> CC: qemu-stable@nongnu.org >> CC: David Hildenbrand <david@redhat.com> >> --- >> hw/s390x/sclp.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c >> index bd2a024..4510a80 100644 >> --- a/hw/s390x/sclp.c >> +++ b/hw/s390x/sclp.c >> @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) >> initial_mem = initial_mem >> increment_size << increment_size; >> >> machine->ram_size = initial_mem; >> + machine->maxram_size = initial_mem; >> /* let's propagate the changed ram size into the global variable. */ >> ram_size = initial_mem; >> } >> > > I even have a private patch for that already :)
On 30.07.2018 17:00, Christian Borntraeger wrote: > > > On 07/30/2018 04:34 PM, David Hildenbrand wrote: >> On 30.07.2018 16:09, Christian Borntraeger wrote: >>> We clamp down ram_size to match the sclp increment size. We do >>> not do the same for maxram_size, which means for large guests >>> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. >>> This can break other code (e.g. CMMA migration) which uses maxram_size >>> to calculate the number of pages and then throws some errors. >> >> So the only problem is that the buffer size between source and target >> differ? > > The problem is that the target tries to access a non-existing buffer when > committing all cmma value, so the kernel returns with EFAULT. >> Am I wrong or does CMMA migration code really not care about which parts of maxram are actually used (== which memory regions are actually defined)? If so, this looks broken to me and the right fix is to use ramsize for now, because it simply does not support maxram. (I assume using some -m X,maxmem=X+Y would make it fail in the same way) (this patch still makes sense and should be done)
On 07/30/2018 05:17 PM, David Hildenbrand wrote: > On 30.07.2018 17:00, Christian Borntraeger wrote: >> >> >> On 07/30/2018 04:34 PM, David Hildenbrand wrote: >>> On 30.07.2018 16:09, Christian Borntraeger wrote: >>>> We clamp down ram_size to match the sclp increment size. We do >>>> not do the same for maxram_size, which means for large guests >>>> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. >>>> This can break other code (e.g. CMMA migration) which uses maxram_size >>>> to calculate the number of pages and then throws some errors. >>> >>> So the only problem is that the buffer size between source and target >>> differ? >> >> The problem is that the target tries to access a non-existing buffer when >> committing all cmma value, so the kernel returns with EFAULT. >>> > > Am I wrong or does CMMA migration code really not care about which parts > of maxram are actually used (== which memory regions are actually defined)? > > If so, this looks broken to me and the right fix is to use ramsize for > now, because it simply does not support maxram. > > (I assume using some -m X,maxmem=X+Y would make it fail in the same way) > > (this patch still makes sense and should be done) I am looking for the minimal fix for 2.13 and ideally even for 2.12.1. Can we agree on this fix and do the remaining thing later?
On 30.07.2018 17:20, Christian Borntraeger wrote: > > > On 07/30/2018 05:17 PM, David Hildenbrand wrote: >> On 30.07.2018 17:00, Christian Borntraeger wrote: >>> >>> >>> On 07/30/2018 04:34 PM, David Hildenbrand wrote: >>>> On 30.07.2018 16:09, Christian Borntraeger wrote: >>>>> We clamp down ram_size to match the sclp increment size. We do >>>>> not do the same for maxram_size, which means for large guests >>>>> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. >>>>> This can break other code (e.g. CMMA migration) which uses maxram_size >>>>> to calculate the number of pages and then throws some errors. >>>> >>>> So the only problem is that the buffer size between source and target >>>> differ? >>> >>> The problem is that the target tries to access a non-existing buffer when >>> committing all cmma value, so the kernel returns with EFAULT. >>>> >> >> Am I wrong or does CMMA migration code really not care about which parts >> of maxram are actually used (== which memory regions are actually defined)? >> >> If so, this looks broken to me and the right fix is to use ramsize for >> now, because it simply does not support maxram. >> >> (I assume using some -m X,maxmem=X+Y would make it fail in the same way) >> >> (this patch still makes sense and should be done) > > I am looking for the minimal fix for 2.13 and ideally even for 2.12.1. > > Can we agree on this fix and do the remaining thing later? > Yes. The clean fix should then really only consider mapped memory regions (so the sending side should somehow iterate over them and also only access that memory). Reviewed-by: David Hildenbrand <david@redhat.com>
Are we still able to get things into 2.12.1 or are we too late? On 07/30/2018 04:09 PM, Christian Borntraeger wrote: > We clamp down ram_size to match the sclp increment size. We do > not do the same for maxram_size, which means for large guests > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > This can break other code (e.g. CMMA migration) which uses maxram_size > to calculate the number of pages and then throws some errors. > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > CC: qemu-stable@nongnu.org > CC: David Hildenbrand <david@redhat.com> > --- > hw/s390x/sclp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > index bd2a024..4510a80 100644 > --- a/hw/s390x/sclp.c > +++ b/hw/s390x/sclp.c > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > initial_mem = initial_mem >> increment_size << increment_size; > > machine->ram_size = initial_mem; > + machine->maxram_size = initial_mem; > /* let's propagate the changed ram size into the global variable. */ > ram_size = initial_mem; > } >
On Mon, 30 Jul 2018 17:20:25 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 07/30/2018 05:17 PM, David Hildenbrand wrote: > > On 30.07.2018 17:00, Christian Borntraeger wrote: > >> > >> > >> On 07/30/2018 04:34 PM, David Hildenbrand wrote: > >>> On 30.07.2018 16:09, Christian Borntraeger wrote: > >>>> We clamp down ram_size to match the sclp increment size. We do > >>>> not do the same for maxram_size, which means for large guests > >>>> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > >>>> This can break other code (e.g. CMMA migration) which uses maxram_size > >>>> to calculate the number of pages and then throws some errors. > >>> > >>> So the only problem is that the buffer size between source and target > >>> differ? > >> > >> The problem is that the target tries to access a non-existing buffer when > >> committing all cmma value, so the kernel returns with EFAULT. > >>> > > > > Am I wrong or does CMMA migration code really not care about which parts > > of maxram are actually used (== which memory regions are actually defined)? > > > > If so, this looks broken to me and the right fix is to use ramsize for > > now, because it simply does not support maxram. > > > > (I assume using some -m X,maxmem=X+Y would make it fail in the same way) > > > > (this patch still makes sense and should be done) > > I am looking for the minimal fix for 2.13 and ideally even for 2.12.1. 2.13 is not the QEMU version you are looking for :) > Can we agree on this fix and do the remaining thing later? If you guys agree, I'm happy to queue it to s390-fixes for 3.0-rc3. Just be quick, as I'd need to send it tomorrow morning the latest :) (FWIW, patch looks sane and small enough to me.)
On 30.07.2018 16:09, Christian Borntraeger wrote: > We clamp down ram_size to match the sclp increment size. We do > not do the same for maxram_size, which means for large guests > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > This can break other code (e.g. CMMA migration) which uses maxram_size > to calculate the number of pages and then throws some errors. > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > CC: qemu-stable@nongnu.org > CC: David Hildenbrand <david@redhat.com> > --- > hw/s390x/sclp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > index bd2a024..4510a80 100644 > --- a/hw/s390x/sclp.c > +++ b/hw/s390x/sclp.c > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > initial_mem = initial_mem >> increment_size << increment_size; > > machine->ram_size = initial_mem; > + machine->maxram_size = initial_mem; > /* let's propagate the changed ram size into the global variable. */ > ram_size = initial_mem; > } > BTW, I handle it in may private patch like this static inline SCLPDevice *get_sclp_device(void) { @@ -319,9 +321,12 @@ static void sclp_memory_init(SCLPDevice *sclp) * down to align with the nearest increment boundary. */ initial_mem = initial_mem >> increment_size << increment_size; - machine->ram_size = initial_mem; - /* let's propagate the changed ram size into the global variable. */ - ram_size = initial_mem; + /* propagate the changed ram size into the different places */ + if (initial_mem != machine->ram_size) { + machine->maxram_size -= machine->ram_size - initial_mem; + machine->ram_size = initial_mem; + ram_size = initial_mem; + } } You would right now overwrite any maxmem setting (which might be ok as we don't support it yet).
On Mon, 30 Jul 2018 17:43:42 +0200 David Hildenbrand <david@redhat.com> wrote: > On 30.07.2018 16:09, Christian Borntraeger wrote: > > We clamp down ram_size to match the sclp increment size. We do > > not do the same for maxram_size, which means for large guests > > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > > This can break other code (e.g. CMMA migration) which uses maxram_size > > to calculate the number of pages and then throws some errors. > > > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > > CC: qemu-stable@nongnu.org > > CC: David Hildenbrand <david@redhat.com> > > --- > > hw/s390x/sclp.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > > index bd2a024..4510a80 100644 > > --- a/hw/s390x/sclp.c > > +++ b/hw/s390x/sclp.c > > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > > initial_mem = initial_mem >> increment_size << increment_size; > > > > machine->ram_size = initial_mem; > > + machine->maxram_size = initial_mem; > > /* let's propagate the changed ram size into the global variable. */ > > ram_size = initial_mem; > > } > > > > BTW, I handle it in may private patch like this > > static inline SCLPDevice *get_sclp_device(void) > { > @@ -319,9 +321,12 @@ static void sclp_memory_init(SCLPDevice *sclp) > * down to align with the nearest increment boundary. */ > initial_mem = initial_mem >> increment_size << increment_size; > > - machine->ram_size = initial_mem; > - /* let's propagate the changed ram size into the global variable. */ > - ram_size = initial_mem; > + /* propagate the changed ram size into the different places */ > + if (initial_mem != machine->ram_size) { > + machine->maxram_size -= machine->ram_size - initial_mem; > + machine->ram_size = initial_mem; > + ram_size = initial_mem; > + } > } > > You would right now overwrite any maxmem setting (which might be ok as > we don't support it yet). > So, will you (for whatever value of 'you') submit more patches for 3.1?
On Mon, 30 Jul 2018 16:09:26 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > We clamp down ram_size to match the sclp increment size. We do > not do the same for maxram_size, which means for large guests > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > This can break other code (e.g. CMMA migration) which uses maxram_size > to calculate the number of pages and then throws some errors. > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > CC: qemu-stable@nongnu.org > CC: David Hildenbrand <david@redhat.com> > --- > hw/s390x/sclp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > index bd2a024..4510a80 100644 > --- a/hw/s390x/sclp.c > +++ b/hw/s390x/sclp.c > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > initial_mem = initial_mem >> increment_size << increment_size; > > machine->ram_size = initial_mem; > + machine->maxram_size = initial_mem; > /* let's propagate the changed ram size into the global variable. */ > ram_size = initial_mem; > } Thanks, queued to s390-fixes.
Quoting Christian Borntraeger (2018-07-30 10:31:12) > Are we still able to get things into 2.12.1 or are we too late? Freeze is EOD today, but I can grab them if they hit master/rc3 tomorrow. > > > On 07/30/2018 04:09 PM, Christian Borntraeger wrote: > > We clamp down ram_size to match the sclp increment size. We do > > not do the same for maxram_size, which means for large guests > > with some sizes (e.g. -m 50000) maxram_size differs from ram_size. > > This can break other code (e.g. CMMA migration) which uses maxram_size > > to calculate the number of pages and then throws some errors. > > > > Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > > CC: qemu-stable@nongnu.org > > CC: David Hildenbrand <david@redhat.com> > > --- > > hw/s390x/sclp.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c > > index bd2a024..4510a80 100644 > > --- a/hw/s390x/sclp.c > > +++ b/hw/s390x/sclp.c > > @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) > > initial_mem = initial_mem >> increment_size << increment_size; > > > > machine->ram_size = initial_mem; > > + machine->maxram_size = initial_mem; > > /* let's propagate the changed ram size into the global variable. */ > > ram_size = initial_mem; > > } > > >
On Mon, 30 Jul 2018 11:58:13 -0500 Michael Roth <mdroth@linux.vnet.ibm.com> wrote: > Quoting Christian Borntraeger (2018-07-30 10:31:12) > > Are we still able to get things into 2.12.1 or are we too late? > > Freeze is EOD today, but I can grab them if they hit master/rc3 tomorrow. OK, I just sent a pull request with this patch.
On 30.07.2018 17:47, Cornelia Huck wrote: > On Mon, 30 Jul 2018 17:43:42 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> On 30.07.2018 16:09, Christian Borntraeger wrote: >>> We clamp down ram_size to match the sclp increment size. We do >>> not do the same for maxram_size, which means for large guests >>> with some sizes (e.g. -m 50000) maxram_size differs from ram_size. >>> This can break other code (e.g. CMMA migration) which uses maxram_size >>> to calculate the number of pages and then throws some errors. >>> >>> Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") >>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >>> CC: qemu-stable@nongnu.org >>> CC: David Hildenbrand <david@redhat.com> >>> --- >>> hw/s390x/sclp.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c >>> index bd2a024..4510a80 100644 >>> --- a/hw/s390x/sclp.c >>> +++ b/hw/s390x/sclp.c >>> @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) >>> initial_mem = initial_mem >> increment_size << increment_size; >>> >>> machine->ram_size = initial_mem; >>> + machine->maxram_size = initial_mem; >>> /* let's propagate the changed ram size into the global variable. */ >>> ram_size = initial_mem; >>> } >>> >> >> BTW, I handle it in may private patch like this >> >> static inline SCLPDevice *get_sclp_device(void) >> { >> @@ -319,9 +321,12 @@ static void sclp_memory_init(SCLPDevice *sclp) >> * down to align with the nearest increment boundary. */ >> initial_mem = initial_mem >> increment_size << increment_size; >> >> - machine->ram_size = initial_mem; >> - /* let's propagate the changed ram size into the global variable. */ >> - ram_size = initial_mem; >> + /* propagate the changed ram size into the different places */ >> + if (initial_mem != machine->ram_size) { >> + machine->maxram_size -= machine->ram_size - initial_mem; >> + machine->ram_size = initial_mem; >> + ram_size = initial_mem; >> + } >> } >> >> You would right now overwrite any maxmem setting (which might be ok as >> we don't support it yet). >> > > So, will you (for whatever value of 'you') submit more patches for 3.1? > Once we have memory device support for s390x that will be needed. But the person that implemented cmma migration should fix the handling and only try to migrate memory that is actually there. This was broken before my patch, just never happened as Linux always onlines all memory it sees via SCLP (thereby creating the memory regions).
On Tue, 31 Jul 2018 08:52:05 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Mon, 30 Jul 2018 11:58:13 -0500 > Michael Roth <mdroth@linux.vnet.ibm.com> wrote: > > > Quoting Christian Borntraeger (2018-07-30 10:31:12) > > > Are we still able to get things into 2.12.1 or are we too late? > > > > Freeze is EOD today, but I can grab them if they hit master/rc3 tomorrow. > > OK, I just sent a pull request with this patch. ...and it is in master now.
diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index bd2a024..4510a80 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -320,6 +320,7 @@ static void sclp_memory_init(SCLPDevice *sclp) initial_mem = initial_mem >> increment_size << increment_size; machine->ram_size = initial_mem; + machine->maxram_size = initial_mem; /* let's propagate the changed ram size into the global variable. */ ram_size = initial_mem; }
We clamp down ram_size to match the sclp increment size. We do not do the same for maxram_size, which means for large guests with some sizes (e.g. -m 50000) maxram_size differs from ram_size. This can break other code (e.g. CMMA migration) which uses maxram_size to calculate the number of pages and then throws some errors. Fixes: 82fab5c5b90e468f3e9d54c ("s390x/sclp: remove memory hotplug support") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: qemu-stable@nongnu.org CC: David Hildenbrand <david@redhat.com> --- hw/s390x/sclp.c | 1 + 1 file changed, 1 insertion(+)