Message ID | 20221107130217.2243815-1-sunilvl@ventanamicro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] hw/riscv: virt: Remove size restriction for pflash | expand |
On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > The pflash implementation currently assumes fixed size of the > backend storage. Due to this, the backend storage file needs to be > exactly of size 32M. Otherwise, there will be an error like below. > > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > Fix this issue by using the actual size of the backing store. > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > --- Do you really want the flash device size presented to the guest to be variable depending on what the user passed as a block backend? I don't think this is how we handle flash devices on other boards... thanks -- PMM
On 7/11/22 14:06, Peter Maydell wrote: > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: >> >> The pflash implementation currently assumes fixed size of the >> backend storage. Due to this, the backend storage file needs to be >> exactly of size 32M. Otherwise, there will be an error like below. >> >> "device requires 33554432 bytes, block backend provides 4194304 bytes" >> >> Fix this issue by using the actual size of the backing store. >> >> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> >> --- > > Do you really want the flash device size presented to the guest > to be variable depending on what the user passed as a block backend? > I don't think this is how we handle flash devices on other boards... Ideally handling smaller/bigger backend size should be transparent for machine frontend, but we never agreed on what are user expectations and how to deal with such cases. Long term I'd go for: - if flash is read-only a/ bigger backend: display a warning and ignore extra backend data. b/ smaller backend: assume flash block is in erased state and fill missing gap with -1 (the default erase value), displaying a warning on startup. - if flash is read-write a/ bigger backend: display a warning and ignore extra backend data. b/ smaller backend: add a property to pflash device to handle missing gap as erased data. If this flag is not set, display a hint and exit with an error. In Sunil particular case, I suppose the issue comes from commit 334c388f25 ("hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two") which I'm going to revert because the code base is not ready for such check: https://lore.kernel.org/qemu-devel/78b914c5-ce7e-1d4a-0a67-450f286eb869@linaro.org/ Regards, Phil.
On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > > > The pflash implementation currently assumes fixed size of the > > backend storage. Due to this, the backend storage file needs to be > > exactly of size 32M. Otherwise, there will be an error like below. > > > > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > > > Fix this issue by using the actual size of the backing store. > > > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > --- > > Do you really want the flash device size presented to the guest > to be variable depending on what the user passed as a block backend? > I don't think this is how we handle flash devices on other boards... > Hi Peter, x86 appears to support variable flash but arm doesn't. What is the reason for not supporting variable size flash in arm? Thanks Sunil
Sunil V L <sunilvl@ventanamicro.com> writes: > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: >> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: >> > >> > The pflash implementation currently assumes fixed size of the >> > backend storage. Due to this, the backend storage file needs to be >> > exactly of size 32M. Otherwise, there will be an error like below. >> > >> > "device requires 33554432 bytes, block backend provides 4194304 bytes" >> > >> > Fix this issue by using the actual size of the backing store. >> > >> > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> >> > --- >> >> Do you really want the flash device size presented to the guest >> to be variable depending on what the user passed as a block backend? >> I don't think this is how we handle flash devices on other boards... >> > > Hi Peter, > > x86 appears to support variable flash but arm doesn't. What is > the reason for not supporting variable size flash in arm? If I recall from the last time we went around this is was the question of what you should pad it with. https://patchew.org/QEMU/20190307093723.655-1-armbru@redhat.com/20190307093723.655-3-armbru@redhat.com/ > > Thanks > Sunil
Philippe Mathieu-Daudé <philmd@linaro.org> writes: > On 7/11/22 14:06, Peter Maydell wrote: >> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: >>> >>> The pflash implementation currently assumes fixed size of the >>> backend storage. Intentional. commit 06f1521795207359a395996c253c306f4ab7586e Author: Markus Armbruster <armbru@redhat.com> Date: Tue Mar 19 17:35:50 2019 +0100 pflash: Require backend size to match device, improve errors We reject undersized backends with a rather enigmatic "failed to read the initial flash content" error. For instance: $ qemu-system-ppc64 -S -display none -M sam460ex -drive if=pflash,format=raw,file=eins.img qemu-system-ppc64: Initialization of device cfi.pflash02 failed: failed to read the initial flash content We happily accept oversized images, ignoring their tail. Throwing away parts of firmware that way is pretty much certain to end in an even more enigmatic failure to boot. Require the backend's size to match the device's size exactly. Report mismatch like this: qemu-system-ppc64: Initialization of device cfi.pflash01 failed: device requires 1048576 bytes, block backend provides 512 bytes Improve the error for actual read failures to "can't read block backend". To avoid duplicating even more code between the two pflash device models, do all that in new helper blk_check_size_and_read_all(). The error reporting can still be confusing. For instance: qemu-system-ppc64 -S -display none -M taihu -drive if=pflash,format=raw,file=eins.img -drive if=pflash,unit=1,format=raw,file=zwei.img qemu-system-ppc64: Initialization of device cfi.pflash02 failed: device requires 2097152 bytes, block backend provides 512 bytes Leaves the user guessing which of the two -drive is wrong. Mention the issue in a TODO comment. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190319163551.32499-2-armbru@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> >>> Due to this, the backend storage file needs to be >>> exactly of size 32M. Otherwise, there will be an error like below. >>> >>> "device requires 33554432 bytes, block backend provides 4194304 bytes" Why is that a problem? Genuine question! >>> Fix this issue by using the actual size of the backing store. >>> >>> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> >>> --- >> Do you really want the flash device size presented to the guest >> to be variable depending on what the user passed as a block backend? >> I don't think this is how we handle flash devices on other boards... Flash device is generally a property of the machine type. Similar to physical machines. Not an accident. > Ideally handling smaller/bigger backend size should be transparent for > machine frontend, but we never agreed on what are user expectations and > how to deal with such cases. > > Long term I'd go for: > > - if flash is read-only > > a/ bigger backend: display a warning and ignore extra backend data. Truncating images seems unlikely to be useful. > b/ smaller backend: assume flash block is in erased state and fill > missing gap with -1 (the default erase value), displaying a warning > on startup. Padding has a better chance to work. But is it worth the trouble? > > - if flash is read-write > > a/ bigger backend: display a warning and ignore extra backend data. > > b/ smaller backend: add a property to pflash device to handle missing > gap as erased data. If this flag is not set, display a hint and > exit with an error. What happens when the guest writes to the part that isn't backed by the backend? Is this worth the trouble? > In Sunil particular case, I suppose the issue comes from commit > 334c388f25 ("hw/block/pflash_cfi0{1, 2}: Error out if device length > isn't a power of two") which I'm going to revert because the code > base is not ready for such check: > > https://lore.kernel.org/qemu-devel/78b914c5-ce7e-1d4a-0a67-450f286eb869@linaro.org/ > > Regards, > > Phil.
On Mon, 7 Nov 2022 at 14:08, Sunil V L <sunilvl@ventanamicro.com> wrote: > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > > > > > The pflash implementation currently assumes fixed size of the > > > backend storage. Due to this, the backend storage file needs to be > > > exactly of size 32M. Otherwise, there will be an error like below. > > > > > > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > > > > > Fix this issue by using the actual size of the backing store. > > > > > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > > --- > > > > Do you really want the flash device size presented to the guest > > to be variable depending on what the user passed as a block backend? > > I don't think this is how we handle flash devices on other boards... > > > x86 appears to support variable flash but arm doesn't. What is > the reason for not supporting variable size flash in arm? Mostly, that that's the straightforward thing to code. Partly, that it avoids weird cases (eg you can have a backing file that's an odd number of bytes but you can't have a flash that size). If x86 has a standard way of handling this then I'm all in favour of being consistent across platforms. What's the x86 board doing there? thanks -- PMM
On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: > > Sunil V L <sunilvl@ventanamicro.com> writes: > > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > >> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > >> > > >> > The pflash implementation currently assumes fixed size of the > >> > backend storage. Due to this, the backend storage file needs to be > >> > exactly of size 32M. Otherwise, there will be an error like below. > >> > > >> > "device requires 33554432 bytes, block backend provides 4194304 bytes" > >> > > >> > Fix this issue by using the actual size of the backing store. > >> > > >> > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > >> > --- > >> > >> Do you really want the flash device size presented to the guest > >> to be variable depending on what the user passed as a block backend? > >> I don't think this is how we handle flash devices on other boards... > >> > > > > Hi Peter, > > > > x86 appears to support variable flash but arm doesn't. What is > > the reason for not supporting variable size flash in arm? > > If I recall from the last time we went around this is was the question > of what you should pad it with. Padding is a very good thing from the POV of upgrades. Firmware has shown a tendancy to change (grow) over time, and the size has an impact of the guest ABI/live migration state. To be able to live migrate, or save/restore to/from files, then the machine firmware size needs to be sufficient to cope with future size changes of the firmware. The best way to deal with this is to not use the firmware binaries' minimum compiled size, but instead to pad it upto a higher boundary. Enforcing such padding is a decent way to prevent users from inadvertantly painting themselves into a corner with a very specific firmware binary size at initial boot. With regards, Daniel
On Mon, Nov 07, 2022 at 04:19:10PM +0000, Daniel P. Berrangé wrote: > On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: > > > > Sunil V L <sunilvl@ventanamicro.com> writes: > > > > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > > >> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > >> > > > >> > The pflash implementation currently assumes fixed size of the > > >> > backend storage. Due to this, the backend storage file needs to be > > >> > exactly of size 32M. Otherwise, there will be an error like below. > > >> > > > >> > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > >> > > > >> > Fix this issue by using the actual size of the backing store. > > >> > > > >> > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > >> > --- > > >> > > >> Do you really want the flash device size presented to the guest > > >> to be variable depending on what the user passed as a block backend? > > >> I don't think this is how we handle flash devices on other boards... > > >> > > > > > > Hi Peter, > > > > > > x86 appears to support variable flash but arm doesn't. What is > > > the reason for not supporting variable size flash in arm? > > > > If I recall from the last time we went around this is was the question > > of what you should pad it with. > > Padding is a very good thing from the POV of upgrades. Firmware has shown > a tendancy to change (grow) over time, and the size has an impact of the > guest ABI/live migration state. > > To be able to live migrate, or save/restore to/from files, then the machine > firmware size needs to be sufficient to cope with future size changes of > the firmware. The best way to deal with this is to not use the firmware > binaries' minimum compiled size, but instead to pad it upto a higher > boundary. > > Enforcing such padding is a decent way to prevent users from inadvertantly > painting themselves into a corner with a very specific firmware binary > size at initial boot. Padding is a good idea, but too much causes other problems. When building lightweight VMs which may pull the firmware image from a network, AArch64 VMs require 64MB of mostly zeros to be transferred first, which can become a substantial amount of the overall boot time[*]. Being able to create images smaller than the total flash device size, but still add some pad for later growth, seems like the happy-medium to shoot for. [*] My web searching skills are failing me at the moment, but I recall seeing a BZ or gitlab/github issue specifically pointing out the AArch64 64MB firmware size as a pain point. Thanks, drew
On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: > On Mon, Nov 07, 2022 at 04:19:10PM +0000, Daniel P. Berrangé wrote: > > On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: > > > > > > Sunil V L <sunilvl@ventanamicro.com> writes: > > > > > > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > > > >> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > > >> > > > > >> > The pflash implementation currently assumes fixed size of the > > > >> > backend storage. Due to this, the backend storage file needs to be > > > >> > exactly of size 32M. Otherwise, there will be an error like below. > > > >> > > > > >> > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > > >> > > > > >> > Fix this issue by using the actual size of the backing store. > > > >> > > > > >> > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > > >> > --- > > > >> > > > >> Do you really want the flash device size presented to the guest > > > >> to be variable depending on what the user passed as a block backend? > > > >> I don't think this is how we handle flash devices on other boards... > > > >> > > > > > > > > Hi Peter, > > > > > > > > x86 appears to support variable flash but arm doesn't. What is > > > > the reason for not supporting variable size flash in arm? > > > > > > If I recall from the last time we went around this is was the question > > > of what you should pad it with. > > > > Padding is a very good thing from the POV of upgrades. Firmware has shown > > a tendancy to change (grow) over time, and the size has an impact of the > > guest ABI/live migration state. > > > > To be able to live migrate, or save/restore to/from files, then the machine > > firmware size needs to be sufficient to cope with future size changes of > > the firmware. The best way to deal with this is to not use the firmware > > binaries' minimum compiled size, but instead to pad it upto a higher > > boundary. > > > > Enforcing such padding is a decent way to prevent users from inadvertantly > > painting themselves into a corner with a very specific firmware binary > > size at initial boot. > > Padding is a good idea, but too much causes other problems. When building > lightweight VMs which may pull the firmware image from a network, > AArch64 VMs require 64MB of mostly zeros to be transferred first, which > can become a substantial amount of the overall boot time[*]. Being able to > create images smaller than the total flash device size, but still add some > pad for later growth, seems like the happy-medium to shoot for. QEMU configures the firmware using -blockdev, so can use any file format that QEMU supports at the block layer. IOW, you can store the firmware in a qcow2 file and thus you will never fetch any of the padding zeros to be transferred. That said I'm not sure that libvirt supports anything other than a raw file today. With regards, Daniel
On 7/11/22 18:34, Daniel P. Berrangé wrote: > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: >> On Mon, Nov 07, 2022 at 04:19:10PM +0000, Daniel P. Berrangé wrote: >>> On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: >>>> >>>> Sunil V L <sunilvl@ventanamicro.com> writes: >>>> >>>>> On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: >>>>>> On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: >>>>>>> >>>>>>> The pflash implementation currently assumes fixed size of the >>>>>>> backend storage. Due to this, the backend storage file needs to be >>>>>>> exactly of size 32M. Otherwise, there will be an error like below. >>>>>>> >>>>>>> "device requires 33554432 bytes, block backend provides 4194304 bytes" >>>>>>> >>>>>>> Fix this issue by using the actual size of the backing store. >>>>>>> >>>>>>> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> >>>>>>> --- >>>>>> >>>>>> Do you really want the flash device size presented to the guest >>>>>> to be variable depending on what the user passed as a block backend? >>>>>> I don't think this is how we handle flash devices on other boards... >>>>>> >>>>> >>>>> Hi Peter, >>>>> >>>>> x86 appears to support variable flash but arm doesn't. What is >>>>> the reason for not supporting variable size flash in arm? >>>> >>>> If I recall from the last time we went around this is was the question >>>> of what you should pad it with. >>> >>> Padding is a very good thing from the POV of upgrades. Firmware has shown >>> a tendancy to change (grow) over time, and the size has an impact of the >>> guest ABI/live migration state. >>> >>> To be able to live migrate, or save/restore to/from files, then the machine >>> firmware size needs to be sufficient to cope with future size changes of >>> the firmware. The best way to deal with this is to not use the firmware >>> binaries' minimum compiled size, but instead to pad it upto a higher >>> boundary. >>> >>> Enforcing such padding is a decent way to prevent users from inadvertantly >>> painting themselves into a corner with a very specific firmware binary >>> size at initial boot. >> >> Padding is a good idea, but too much causes other problems. When building >> lightweight VMs which may pull the firmware image from a network, >> AArch64 VMs require 64MB of mostly zeros to be transferred first, which >> can become a substantial amount of the overall boot time[*]. Being able to >> create images smaller than the total flash device size, but still add some >> pad for later growth, seems like the happy-medium to shoot for. > > QEMU configures the firmware using -blockdev, so can use any file > format that QEMU supports at the block layer. IOW, you can store > the firmware in a qcow2 file and thus you will never fetch any > of the padding zeros to be transferred. That said I'm not sure > that libvirt supports anything other than a raw file today. Drew might be referring to: https://lore.kernel.org/qemu-devel/20210810134050.396747-1-david.edmondson@oracle.com/ > Currently ARM UEFI images are typically built as 2MB/768kB flash > images for code and variables respectively. These images are both > then padded out to 64MB before being loaded by QEMU. > > Because the images are 64MB each, QEMU allocates 128MB of memory to > read them, and then proceeds to read all 128MB from disk (dirtying > the memory). Of this 128MB less than 3MB is useful - the rest is > zero padding. > > On a machine with 100 VMs this wastes over 12GB of memory. See previous attempts: - Huawei https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html - Tencent https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html - Oracle https://www.mail-archive.com/qemu-devel@nongnu.org/msg760065.html - Red Hat https://www.mail-archive.com/qemu-block@nongnu.org/msg81714.html
On Tue, Nov 08, 2022 at 03:12:42PM +0100, Philippe Mathieu-Daudé wrote: > On 7/11/22 18:34, Daniel P. Berrangé wrote: > > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: > > > On Mon, Nov 07, 2022 at 04:19:10PM +0000, Daniel P. Berrangé wrote: > > > > On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: > > > > > > > > > > Sunil V L <sunilvl@ventanamicro.com> writes: > > > > > > > > > > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > > > > > > > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > > > > > > > > > > > > > > > The pflash implementation currently assumes fixed size of the > > > > > > > > backend storage. Due to this, the backend storage file needs to be > > > > > > > > exactly of size 32M. Otherwise, there will be an error like below. > > > > > > > > > > > > > > > > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > > > > > > > > > > > > > > > Fix this issue by using the actual size of the backing store. > > > > > > > > > > > > > > > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > > > > > > > --- > > > > > > > > > > > > > > Do you really want the flash device size presented to the guest > > > > > > > to be variable depending on what the user passed as a block backend? > > > > > > > I don't think this is how we handle flash devices on other boards... > > > > > > > > > > > > > > > > > > > Hi Peter, > > > > > > > > > > > > x86 appears to support variable flash but arm doesn't. What is > > > > > > the reason for not supporting variable size flash in arm? > > > > > > > > > > If I recall from the last time we went around this is was the question > > > > > of what you should pad it with. > > > > > > > > Padding is a very good thing from the POV of upgrades. Firmware has shown > > > > a tendancy to change (grow) over time, and the size has an impact of the > > > > guest ABI/live migration state. > > > > > > > > To be able to live migrate, or save/restore to/from files, then the machine > > > > firmware size needs to be sufficient to cope with future size changes of > > > > the firmware. The best way to deal with this is to not use the firmware > > > > binaries' minimum compiled size, but instead to pad it upto a higher > > > > boundary. > > > > > > > > Enforcing such padding is a decent way to prevent users from inadvertantly > > > > painting themselves into a corner with a very specific firmware binary > > > > size at initial boot. > > > > > > Padding is a good idea, but too much causes other problems. When building > > > lightweight VMs which may pull the firmware image from a network, > > > AArch64 VMs require 64MB of mostly zeros to be transferred first, which > > > can become a substantial amount of the overall boot time[*]. Being able to > > > create images smaller than the total flash device size, but still add some > > > pad for later growth, seems like the happy-medium to shoot for. > > > > QEMU configures the firmware using -blockdev, so can use any file > > format that QEMU supports at the block layer. IOW, you can store > > the firmware in a qcow2 file and thus you will never fetch any > > of the padding zeros to be transferred. That said I'm not sure > > that libvirt supports anything other than a raw file today. > > Drew might be referring to: > > https://lore.kernel.org/qemu-devel/20210810134050.396747-1-david.edmondson@oracle.com/ I was thinking of a different one, but thanks for the reference to this one. Daniel's qcow2 proposal sounds like something worth investigating, since some padding makes sense, but the guess for how much will likely be wrong one direction or the other. Thanks, drew > > > Currently ARM UEFI images are typically built as 2MB/768kB flash > > images for code and variables respectively. These images are both > > then padded out to 64MB before being loaded by QEMU. > > > > Because the images are 64MB each, QEMU allocates 128MB of memory to > > read them, and then proceeds to read all 128MB from disk (dirtying > > the memory). Of this 128MB less than 3MB is useful - the rest is > > zero padding. > > > > On a machine with 100 VMs this wastes over 12GB of memory. > > See previous attempts: > - Huawei > https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html > - Tencent > https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html > - Oracle > https://www.mail-archive.com/qemu-devel@nongnu.org/msg760065.html > - Red Hat > https://www.mail-archive.com/qemu-block@nongnu.org/msg81714.html >
On Tue, Nov 08, 2022 at 03:12:42PM +0100, Philippe Mathieu-Daudé wrote: > On 7/11/22 18:34, Daniel P. Berrangé wrote: > > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: > > > On Mon, Nov 07, 2022 at 04:19:10PM +0000, Daniel P. Berrangé wrote: > > > > On Mon, Nov 07, 2022 at 03:50:44PM +0000, Alex Bennée wrote: > > > > > > > > > > Sunil V L <sunilvl@ventanamicro.com> writes: > > > > > > > > > > > On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: > > > > > > > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: > > > > > > > > > > > > > > > > The pflash implementation currently assumes fixed size of the > > > > > > > > backend storage. Due to this, the backend storage file needs to be > > > > > > > > exactly of size 32M. Otherwise, there will be an error like below. > > > > > > > > > > > > > > > > "device requires 33554432 bytes, block backend provides 4194304 bytes" > > > > > > > > > > > > > > > > Fix this issue by using the actual size of the backing store. > > > > > > > > > > > > > > > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> > > > > > > > > --- > > > > > > > > > > > > > > Do you really want the flash device size presented to the guest > > > > > > > to be variable depending on what the user passed as a block backend? > > > > > > > I don't think this is how we handle flash devices on other boards... > > > > > > > > > > > > > > > > > > > Hi Peter, > > > > > > > > > > > > x86 appears to support variable flash but arm doesn't. What is > > > > > > the reason for not supporting variable size flash in arm? > > > > > > > > > > If I recall from the last time we went around this is was the question > > > > > of what you should pad it with. > > > > > > > > Padding is a very good thing from the POV of upgrades. Firmware has shown > > > > a tendancy to change (grow) over time, and the size has an impact of the > > > > guest ABI/live migration state. > > > > > > > > To be able to live migrate, or save/restore to/from files, then the machine > > > > firmware size needs to be sufficient to cope with future size changes of > > > > the firmware. The best way to deal with this is to not use the firmware > > > > binaries' minimum compiled size, but instead to pad it upto a higher > > > > boundary. > > > > > > > > Enforcing such padding is a decent way to prevent users from inadvertantly > > > > painting themselves into a corner with a very specific firmware binary > > > > size at initial boot. > > > > > > Padding is a good idea, but too much causes other problems. When building > > > lightweight VMs which may pull the firmware image from a network, > > > AArch64 VMs require 64MB of mostly zeros to be transferred first, which > > > can become a substantial amount of the overall boot time[*]. Being able to > > > create images smaller than the total flash device size, but still add some > > > pad for later growth, seems like the happy-medium to shoot for. > > > > QEMU configures the firmware using -blockdev, so can use any file > > format that QEMU supports at the block layer. IOW, you can store > > the firmware in a qcow2 file and thus you will never fetch any > > of the padding zeros to be transferred. That said I'm not sure > > that libvirt supports anything other than a raw file today. > > Drew might be referring to: > > https://lore.kernel.org/qemu-devel/20210810134050.396747-1-david.edmondson@oracle.com/ > > > Currently ARM UEFI images are typically built as 2MB/768kB flash > > images for code and variables respectively. These images are both > > then padded out to 64MB before being loaded by QEMU. > > > > Because the images are 64MB each, QEMU allocates 128MB of memory to > > read them, and then proceeds to read all 128MB from disk (dirtying > > the memory). Of this 128MB less than 3MB is useful - the rest is > > zero padding. > > > > On a machine with 100 VMs this wastes over 12GB of memory. > > See previous attempts: > - Huawei > https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html > - Tencent > https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html > - Oracle > https://www.mail-archive.com/qemu-devel@nongnu.org/msg760065.html > - Red Hat > https://www.mail-archive.com/qemu-block@nongnu.org/msg81714.html I've not looked at the patch series above in detail, but I'm thinking that even if the file is 64 MB in size, it should never read all 64 MB of data. The block layer APIs have ability to detect and report holes, so it ought to be possible to only read the data which is non-zero, and thus avoid dirtying more than 3 MB of useful With regards, Daniel
Peter Maydell <peter.maydell@linaro.org> writes: > On Mon, 7 Nov 2022 at 14:08, Sunil V L <sunilvl@ventanamicro.com> wrote: >> >> On Mon, Nov 07, 2022 at 01:06:38PM +0000, Peter Maydell wrote: >> > On Mon, 7 Nov 2022 at 13:03, Sunil V L <sunilvl@ventanamicro.com> wrote: >> > > >> > > The pflash implementation currently assumes fixed size of the >> > > backend storage. Due to this, the backend storage file needs to be >> > > exactly of size 32M. Otherwise, there will be an error like below. >> > > >> > > "device requires 33554432 bytes, block backend provides 4194304 bytes" >> > > >> > > Fix this issue by using the actual size of the backing store. >> > > >> > > Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> >> > > --- >> > >> > Do you really want the flash device size presented to the guest >> > to be variable depending on what the user passed as a block backend? >> > I don't think this is how we handle flash devices on other boards... >> > > >> x86 appears to support variable flash but arm doesn't. What is >> the reason for not supporting variable size flash in arm? > > Mostly, that that's the straightforward thing to code. > Partly, that it avoids weird cases (eg you can have a backing > file that's an odd number of bytes but you can't have a > flash that size). > > If x86 has a standard way of handling this then I'm all > in favour of being consistent across platforms. What's > the x86 board doing there? I'm hardly the expert here, but I messed with it at some time... I guess I can try to answer. It's complicated. More often than not, long development history + need for backward compatibility = complicated. Which makes it (in my opinion) not a useful example to follow. We use either ROM or flash device(s) to hold the BIOS. The flash option exists since v1.1. The user interface for picking one or the other, and configure contents has a long and complicated history. I'll spare you the details. ROM contents comes from an image file. Configurable with -bios. Default depends on the machine type. ROM size is the image file size. It's mapped so it ends right before address 2^32. It's mapped a second time so it ends right before 2^20. If the image file is larger than 128KiB, only the last 128KiB are mapped there. Flash contents comes from a block backend. Configurable with machine properties "pflash0" and "pflash1" (or legacy -drive if=pflash). There is no default. If you don't configure flash contents, you get ROM instead. Flash device size is the block backend size. Must be a multiple of 4KiB. If "pflash0" is configured, we create a flash device so it ends right before address 2^32. If "pflash1" is also configured, we create a second flash device so it ends right before the first one (no gap). Combined size must not exceed a limit that depends on the machine type. Aside: why two flash devices? A physical machine has just one... Well, we need a read-only part for the code, and a read-write part for persistent configuration ("varstore" in UEFI parlance). Physical flash devices let you write-protect partially, but our device models don't implement that, it's all or nothing. So we use two. Kludge. Again, there's a second mapping that ends right before 2^20, limited to 128KiB. In my opinion, setting flash or ROM size to the size of the block backend or image file is a bad idea. It tangles up configuration of frontend and backend. We used to do this a lot (e.g. -drive). Untangling (e.g. -device and -blockdev) was a lot of work. With the tangle kept around for compatibility. Doug McIlroy's quip applies: "KISS became MICAHI: make it complicated and hide it." A physical machine has a fixed flash memory size. A virtual machine models a physical machine. It has a fixed flash memory size, too. If we want a machine type to model multiple variations of a physical machine, say multiple flash sizes, the configuration knob really belongs to the machine type. Hiding it somewhere on the file system instead is a bad idea.
Thanks a lot for the feedback and history. I understand now there are good reasons to mandate fixed size flash. Will drop this patch. Thanks Sunil
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: [...] >> Padding is a good idea, but too much causes other problems. When building >> lightweight VMs which may pull the firmware image from a network, >> AArch64 VMs require 64MB of mostly zeros to be transferred first, which >> can become a substantial amount of the overall boot time[*]. Being able to >> create images smaller than the total flash device size, but still add some >> pad for later growth, seems like the happy-medium to shoot for. > > QEMU configures the firmware using -blockdev, Yes, even though the devices in question are not block devices. > so can use any file > format that QEMU supports at the block layer. IOW, you can store > the firmware in a qcow2 file and thus you will never fetch any > of the padding zeros to be transferred. That said I'm not sure > that libvirt supports anything other than a raw file today. Here's another idea. The "raw" format supports exposing a slice of the underlying block node (options @offset and @size). It could support padding. Writing to the padding should then grow the underlying node. Taking a step back to look at the bigger picture... there are three issues, I think: (A) Storing padding on disk is wasteful. Use a file system that supports sparse files, or an image format that can represent the padding efficiently. (B) Reading padding into memory is wasteful. Matters mostly when a network is involved. Use an image format that can represent the padding efficiently. (C) Dirtying memory for padding is wasteful. I figure KSM could turn zero-padding into holes. We could play with mmap() & friends. Other ideas? Any solution needs to work both for read-only and read/write padding. Throwing away data written to the padding on cold restart is not what I'd regard as "works".
On Wed, Nov 09, 2022 at 04:26:53PM +0100, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: > > [...] > > >> Padding is a good idea, but too much causes other problems. When building > >> lightweight VMs which may pull the firmware image from a network, > >> AArch64 VMs require 64MB of mostly zeros to be transferred first, which > >> can become a substantial amount of the overall boot time[*]. Being able to > >> create images smaller than the total flash device size, but still add some > >> pad for later growth, seems like the happy-medium to shoot for. > > > > QEMU configures the firmware using -blockdev, > > Yes, even though the devices in question are not block devices. > > > so can use any file > > format that QEMU supports at the block layer. IOW, you can store > > the firmware in a qcow2 file and thus you will never fetch any > > of the padding zeros to be transferred. That said I'm not sure > > that libvirt supports anything other than a raw file today. > > Here's another idea. The "raw" format supports exposing a slice of the > underlying block node (options @offset and @size). It could support > padding. Writing to the padding should then grow the underlying node. > > Taking a step back to look at the bigger picture... there are three > issues, I think: > > (A) Storing padding on disk is wasteful. > > Use a file system that supports sparse files, or an image format > that can represent the padding efficiently. > > (B) Reading padding into memory is wasteful. > > Matters mostly when a network is involved. Use an image format that > can represent the padding efficiently. > > (C) Dirtying memory for padding is wasteful. > > I figure KSM could turn zero-padding into holes. > > We could play with mmap() & friends. > > Other ideas? Is (C) actually a separate issue ? I thought it was simply the result of (B) ? ie if we skip reading the zero padding, we won't be dirtying the memory with lots of zeros. we'll have mmap'd the full 64 MB, but most won't be paged in since we wouldn't write the zeros to it. Only if the guest writes to those areas do we need to then flush it back out. With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Nov 09, 2022 at 04:26:53PM +0100, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: >> >> [...] >> >> >> Padding is a good idea, but too much causes other problems. When building >> >> lightweight VMs which may pull the firmware image from a network, >> >> AArch64 VMs require 64MB of mostly zeros to be transferred first, which >> >> can become a substantial amount of the overall boot time[*]. Being able to >> >> create images smaller than the total flash device size, but still add some >> >> pad for later growth, seems like the happy-medium to shoot for. >> > >> > QEMU configures the firmware using -blockdev, >> >> Yes, even though the devices in question are not block devices. >> >> > so can use any file >> > format that QEMU supports at the block layer. IOW, you can store >> > the firmware in a qcow2 file and thus you will never fetch any >> > of the padding zeros to be transferred. That said I'm not sure >> > that libvirt supports anything other than a raw file today. >> >> Here's another idea. The "raw" format supports exposing a slice of the >> underlying block node (options @offset and @size). It could support >> padding. Writing to the padding should then grow the underlying node. >> >> Taking a step back to look at the bigger picture... there are three >> issues, I think: >> >> (A) Storing padding on disk is wasteful. >> >> Use a file system that supports sparse files, or an image format >> that can represent the padding efficiently. >> >> (B) Reading padding into memory is wasteful. >> >> Matters mostly when a network is involved. Use an image format that >> can represent the padding efficiently. >> >> (C) Dirtying memory for padding is wasteful. >> >> I figure KSM could turn zero-padding into holes. >> >> We could play with mmap() & friends. >> >> Other ideas? > > Is (C) actually a separate issue ? I thought it was simply the > result of (B) ? ie if we skip reading the zero padding, we won't > be dirtying the memory with lots of zeros. we'll have mmap'd the > full 64 MB, but most won't be paged in since we wouldn't write > the zeros to it. Only if the guest writes to those areas do we > need to then flush it back out. I expressed myself poorly. All three are related, but there's still a distinction between each of them in my thinking. Say we use an image format that compresses data. Represents the padding efficiently. Storage on disk is efficient (A), and so is reading it (B). Trouble is decompressing it to memory dirties the memory unless we take care not to write all-zero pages (C). Clearer now?
On Wed, Nov 09, 2022 at 04:45:18PM +0100, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Wed, Nov 09, 2022 at 04:26:53PM +0100, Markus Armbruster wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >> > On Mon, Nov 07, 2022 at 06:32:01PM +0100, Andrew Jones wrote: > >> > >> [...] > >> > >> >> Padding is a good idea, but too much causes other problems. When building > >> >> lightweight VMs which may pull the firmware image from a network, > >> >> AArch64 VMs require 64MB of mostly zeros to be transferred first, which > >> >> can become a substantial amount of the overall boot time[*]. Being able to > >> >> create images smaller than the total flash device size, but still add some > >> >> pad for later growth, seems like the happy-medium to shoot for. > >> > > >> > QEMU configures the firmware using -blockdev, > >> > >> Yes, even though the devices in question are not block devices. > >> > >> > so can use any file > >> > format that QEMU supports at the block layer. IOW, you can store > >> > the firmware in a qcow2 file and thus you will never fetch any > >> > of the padding zeros to be transferred. That said I'm not sure > >> > that libvirt supports anything other than a raw file today. > >> > >> Here's another idea. The "raw" format supports exposing a slice of the > >> underlying block node (options @offset and @size). It could support > >> padding. Writing to the padding should then grow the underlying node. > >> > >> Taking a step back to look at the bigger picture... there are three > >> issues, I think: > >> > >> (A) Storing padding on disk is wasteful. > >> > >> Use a file system that supports sparse files, or an image format > >> that can represent the padding efficiently. > >> > >> (B) Reading padding into memory is wasteful. > >> > >> Matters mostly when a network is involved. Use an image format that > >> can represent the padding efficiently. > >> > >> (C) Dirtying memory for padding is wasteful. > >> > >> I figure KSM could turn zero-padding into holes. > >> > >> We could play with mmap() & friends. > >> > >> Other ideas? > > > > Is (C) actually a separate issue ? I thought it was simply the > > result of (B) ? ie if we skip reading the zero padding, we won't > > be dirtying the memory with lots of zeros. we'll have mmap'd the > > full 64 MB, but most won't be paged in since we wouldn't write > > the zeros to it. Only if the guest writes to those areas do we > > need to then flush it back out. > > I expressed myself poorly. All three are related, but there's still a > distinction between each of them in my thinking. > > Say we use an image format that compresses data. Represents the padding > efficiently. Storage on disk is efficient (A), and so is reading it > (B). Trouble is decompressing it to memory dirties the memory unless we > take care not to write all-zero pages (C). > > Clearer now? Ok yeah, so reading can be efficient, but if the reader doesn't pay attention to where the holes are, it'll dirty memory anyway. With regards, Daniel
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index a5bc7353b4..b8ab1fd358 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -49,6 +49,7 @@ #include "hw/pci/pci.h" #include "hw/pci-host/gpex.h" #include "hw/display/ramfb.h" +#include "sysemu/block-backend.h" /* * The virt machine physical address space used by some of the devices @@ -140,14 +141,32 @@ static void virt_flash_create(RISCVVirtState *s) } static void virt_flash_map1(PFlashCFI01 *flash, - hwaddr base, hwaddr size, + hwaddr base, hwaddr max_size, MemoryRegion *sysmem) { DeviceState *dev = DEVICE(flash); + BlockBackend *blk; + int64_t flash_size; - assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE)); - assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX); - qdev_prop_set_uint32(dev, "num-blocks", size / VIRT_FLASH_SECTOR_SIZE); + blk = pflash_cfi01_get_blk(flash); + + if (blk) { + flash_size = blk_getlength(blk); + if (flash_size < 0) { + error_report("can't get size of block device %s: %s", + blk_name(blk), strerror(-flash_size)); + exit(1); + } + } else { + flash_size = max_size; + } + + assert(flash_size > 0); + assert(flash_size <= max_size); + assert(QEMU_IS_ALIGNED(flash_size, VIRT_FLASH_SECTOR_SIZE)); + assert(flash_size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX); + qdev_prop_set_uint32(dev, "num-blocks", + flash_size / VIRT_FLASH_SECTOR_SIZE); sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal); memory_region_add_subregion(sysmem, base, @@ -161,6 +180,14 @@ static void virt_flash_map(RISCVVirtState *s, hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2; hwaddr flashbase = virt_memmap[VIRT_FLASH].base; + /* + * The flash devices are created at fixed base addresses passed + * to virt_flash_map1(). + * However, the flashsize passed here to virt_flash_map1() + * is the maximum size of the flash possible. The actual size + * is determined inside virt_flash_map1() and can be smaller + * than this maximum size based on the backend file size. + */ virt_flash_map1(s->flash[0], flashbase, flashsize, sysmem); virt_flash_map1(s->flash[1], flashbase + flashsize, flashsize, @@ -971,15 +998,24 @@ static void create_fdt_flash(RISCVVirtState *s, const MemMapEntry *memmap) { char *name; MachineState *mc = MACHINE(s); - hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2; - hwaddr flashbase = virt_memmap[VIRT_FLASH].base; + MemoryRegion *flash_mem; + hwaddr flashsize[2]; + hwaddr flashbase[2]; + + flash_mem = pflash_cfi01_get_memory(s->flash[0]); + flashbase[0] = flash_mem->addr; + flashsize[0] = flash_mem->size; + + flash_mem = pflash_cfi01_get_memory(s->flash[1]); + flashbase[1] = flash_mem->addr; + flashsize[1] = flash_mem->size; - name = g_strdup_printf("/flash@%" PRIx64, flashbase); + name = g_strdup_printf("/flash@%" PRIx64, flashbase[0]); qemu_fdt_add_subnode(mc->fdt, name); qemu_fdt_setprop_string(mc->fdt, name, "compatible", "cfi-flash"); qemu_fdt_setprop_sized_cells(mc->fdt, name, "reg", - 2, flashbase, 2, flashsize, - 2, flashbase + flashsize, 2, flashsize); + 2, flashbase[0], 2, flashsize[0], + 2, flashbase[1], 2, flashsize[1]); qemu_fdt_setprop_cell(mc->fdt, name, "bank-width", 4); g_free(name); } @@ -1242,6 +1278,7 @@ static void virt_machine_done(Notifier *notifier, void *data) target_ulong firmware_end_addr, kernel_start_addr; uint32_t fdt_load_addr; uint64_t kernel_entry; + MemoryRegion *flash_mem; /* * Only direct boot kernel is currently supported for KVM VM, @@ -1288,8 +1325,8 @@ static void virt_machine_done(Notifier *notifier, void *data) * In either case, the next_addr for opensbi will be the flash address. */ riscv_setup_firmware_boot(machine); - kernel_entry = virt_memmap[VIRT_FLASH].base + - virt_memmap[VIRT_FLASH].size / 2; + flash_mem = pflash_cfi01_get_memory(s->flash[1]); + kernel_entry = flash_mem->addr; } else if (machine->kernel_filename) { kernel_start_addr = riscv_calc_kernel_start_addr(&s->soc[0], firmware_end_addr);
The pflash implementation currently assumes fixed size of the backend storage. Due to this, the backend storage file needs to be exactly of size 32M. Otherwise, there will be an error like below. "device requires 33554432 bytes, block backend provides 4194304 bytes" Fix this issue by using the actual size of the backing store. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> --- Changes since V1: 1) Handle the case when blk_getlength() returns failure. 2) Added assertion to check actual size doesn't exceed the limit 3) Updated virt_machine_done() to find the flash base dynamically 4) Added code comments as suggested by Drew hw/riscv/virt.c | 59 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 11 deletions(-)