diff mbox series

hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()

Message ID f77d641d500324525ac036fe1827b3070de75fc1.1701088320.git.mprivozn@redhat.com (mailing list archive)
State New, archived
Headers show
Series hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete() | expand

Commit Message

Michal Privoznik Nov. 27, 2023, 12:32 p.m. UTC
Simple reproducer:
qemu.git $ ./build/qemu-system-x86_64 \
-m size=8389632k,slots=16,maxmem=25600000k \
-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0,memdev=ram-node0

With current master I get:

qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid argument

The problem is that memory size (8193MiB) is not an integer
multiple of underlying pagesize (2MiB) which triggers a check
inside of madvise(), since we can't really set a madvise() policy
just to a fraction of a page.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 backends/hostmem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Nov. 27, 2023, 1:37 p.m. UTC | #1
On 27.11.23 13:32, Michal Privoznik wrote:
> Simple reproducer:
> qemu.git $ ./build/qemu-system-x86_64 \
> -m size=8389632k,slots=16,maxmem=25600000k \
> -object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
> -numa node,nodeid=0,cpus=0,memdev=ram-node0
> 
> With current master I get:
> 
> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid argument
> 
> The problem is that memory size (8193MiB) is not an integer
> multiple of underlying pagesize (2MiB) which triggers a check
> inside of madvise(), since we can't really set a madvise() policy
> just to a fraction of a page.

I thought we would just always fail create something that doesn't really 
make any sense.

Why would we want to support that case?

Let me dig, I thought we would have had some check there at some point 
that would make that fail (especially: RAM block not aligned to the 
pagesize).

> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>   backends/hostmem.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 747e7838c0..4e88d048de 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -326,9 +326,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>       HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
>       Error *local_err = NULL;
>       void *ptr;
> -    uint64_t sz;
>   
>       if (bc->alloc) {
> +        uint64_t sz;
> +
>           bc->alloc(backend, &local_err);
>           if (local_err) {
>               goto out;
> @@ -337,6 +338,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>           ptr = memory_region_get_ram_ptr(&backend->mr);
>           sz = memory_region_size(&backend->mr);
>   
> +        /* Round up size to be an integer multiple of pagesize, because
> +         * madvise() does not really like setting advices on a fraction of a
> +         * page. */
> +        sz = ROUND_UP(sz, qemu_ram_pagesize(backend->mr.ram_block));
> +
>           if (backend->merge) {
>               qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>           }
David Hildenbrand Nov. 27, 2023, 1:55 p.m. UTC | #2
On 27.11.23 14:37, David Hildenbrand wrote:
> On 27.11.23 13:32, Michal Privoznik wrote:
>> Simple reproducer:
>> qemu.git $ ./build/qemu-system-x86_64 \
>> -m size=8389632k,slots=16,maxmem=25600000k \
>> -object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>
>> With current master I get:
>>
>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid argument
>>
>> The problem is that memory size (8193MiB) is not an integer
>> multiple of underlying pagesize (2MiB) which triggers a check
>> inside of madvise(), since we can't really set a madvise() policy
>> just to a fraction of a page.
> 
> I thought we would just always fail create something that doesn't really
> make any sense.
> 
> Why would we want to support that case?
> 
> Let me dig, I thought we would have had some check there at some point
> that would make that fail (especially: RAM block not aligned to the
> pagesize).


At least memory-backend-memfd properly fails for that case:

$ ./build/qemu-system-x86_64 -object memory-backend-memfd,hugetlb=on,size=3m,id=tmp
qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument

memory-backend-file ends up creating a new file:

  $ ./build/qemu-system-x86_64 -object memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp

$ stat /dev/hugepages/tmp
   File: /dev/hugepages/tmp
   Size: 4194304         Blocks: 0          IO Block: 2097152 regular file

... and ends up sizing it properly aligned to the huge page size.


Seems to be due to:

     if (memory < block->page_size) {
         error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
                    "or larger than page size 0x%zx",
                    memory, block->page_size);
         return NULL;
     }

     memory = ROUND_UP(memory, block->page_size);

     /*
      * ftruncate is not supported by hugetlbfs in older
      * hosts, so don't bother bailing out on errors.
      * If anything goes wrong with it under other filesystems,
      * mmap will fail.
      *
      * Do not truncate the non-empty backend file to avoid corrupting
      * the existing data in the file. Disabling shrinking is not
      * enough. For example, the current vNVDIMM implementation stores
      * the guest NVDIMM labels at the end of the backend file. If the
      * backend file is later extended, QEMU will not be able to find
      * those labels. Therefore, extending the non-empty backend file
      * is disabled as well.
      */
     if (truncate && ftruncate(fd, offset + memory)) {
         perror("ftruncate");
     }

So we create a bigger file and map the bigger file and also have a
RAMBlock that is bigger. So we'll also consume more memory.

... but the memory region is smaller and we tell the VM that it has
less memory. Lot of work with no obvious benefit, and only some
memory waste :)


We better should have just rejected such memory backends right from
the start. But now it's likely too late.

I suspect other things like
  * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
  * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);

fail, but we don't care for hugetlb at least regarding merging
and don't even log an error.

But QEMU_MADV_DONTDUMP might also be broken, because that
qemu_madvise() call will just fail.

Your fix would be correct. But I do wonder if we want to just let that
case fail and warn users that they are doing something that doesn't
make too much sense.
Michal Privoznik Dec. 1, 2023, 9:07 a.m. UTC | #3
On 11/27/23 14:55, David Hildenbrand wrote:
> On 27.11.23 14:37, David Hildenbrand wrote:
>> On 27.11.23 13:32, Michal Privoznik wrote:
>>> Simple reproducer:
>>> qemu.git $ ./build/qemu-system-x86_64 \
>>> -m size=8389632k,slots=16,maxmem=25600000k \
>>> -object
>>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
>>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>>
>>> With current master I get:
>>>
>>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid
>>> argument
>>>
>>> The problem is that memory size (8193MiB) is not an integer
>>> multiple of underlying pagesize (2MiB) which triggers a check
>>> inside of madvise(), since we can't really set a madvise() policy
>>> just to a fraction of a page.
>>
>> I thought we would just always fail create something that doesn't really
>> make any sense.
>>
>> Why would we want to support that case?
>>
>> Let me dig, I thought we would have had some check there at some point
>> that would make that fail (especially: RAM block not aligned to the
>> pagesize).
> 
> 
> At least memory-backend-memfd properly fails for that case:
> 
> $ ./build/qemu-system-x86_64 -object
> memory-backend-memfd,hugetlb=on,size=3m,id=tmp
> qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
> 
> memory-backend-file ends up creating a new file:
> 
>  $ ./build/qemu-system-x86_64 -object
> memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
> 
> $ stat /dev/hugepages/tmp
>   File: /dev/hugepages/tmp
>   Size: 4194304         Blocks: 0          IO Block: 2097152 regular file
> 
> ... and ends up sizing it properly aligned to the huge page size.
> 
> 
> Seems to be due to:
> 
>     if (memory < block->page_size) {
>         error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
>                    "or larger than page size 0x%zx",
>                    memory, block->page_size);
>         return NULL;
>     }
> 
>     memory = ROUND_UP(memory, block->page_size);
> 
>     /*
>      * ftruncate is not supported by hugetlbfs in older
>      * hosts, so don't bother bailing out on errors.
>      * If anything goes wrong with it under other filesystems,
>      * mmap will fail.
>      *
>      * Do not truncate the non-empty backend file to avoid corrupting
>      * the existing data in the file. Disabling shrinking is not
>      * enough. For example, the current vNVDIMM implementation stores
>      * the guest NVDIMM labels at the end of the backend file. If the
>      * backend file is later extended, QEMU will not be able to find
>      * those labels. Therefore, extending the non-empty backend file
>      * is disabled as well.
>      */
>     if (truncate && ftruncate(fd, offset + memory)) {
>         perror("ftruncate");
>     }
> 
> So we create a bigger file and map the bigger file and also have a
> RAMBlock that is bigger. So we'll also consume more memory.
> 
> ... but the memory region is smaller and we tell the VM that it has
> less memory. Lot of work with no obvious benefit, and only some
> memory waste :)
> 
> 
> We better should have just rejected such memory backends right from
> the start. But now it's likely too late.
> 
> I suspect other things like
>  * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>  * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
> 
> fail, but we don't care for hugetlb at least regarding merging
> and don't even log an error.
> 
> But QEMU_MADV_DONTDUMP might also be broken, because that
> qemu_madvise() call will just fail.
> 
> Your fix would be correct. But I do wonder if we want to just let that
> case fail and warn users that they are doing something that doesn't
> make too much sense.
> 

Yeah, what's suspicious is: if the size is smaller than page size we
error out, but if it's larger (but still not aligned) we accept that.
I'm failing to see reasoning there. Looks like the ROUND_UP() was added
in v0.13.0-rc0~1201^2~4 (though it's done with some bit magic) and the
check itself was added in v2.8.0-rc0~30^2~23. So it's a bit late, yes.

OTOH - if users want to waste resources, should we stop them? For
instance, when user requests more vCPUs than physical CPUs a warning is
printed:

$ ./build/qemu-system-x86_64 -accel kvm -smp cpus=128
qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested
(128) exceeds the recommended cpus supported by KVM (8)

but that's about it. So maybe the error can be demoted to just a warning?

Michal
David Hildenbrand Dec. 4, 2023, 9:21 a.m. UTC | #4
On 01.12.23 10:07, Michal Prívozník wrote:
> On 11/27/23 14:55, David Hildenbrand wrote:
>> On 27.11.23 14:37, David Hildenbrand wrote:
>>> On 27.11.23 13:32, Michal Privoznik wrote:
>>>> Simple reproducer:
>>>> qemu.git $ ./build/qemu-system-x86_64 \
>>>> -m size=8389632k,slots=16,maxmem=25600000k \
>>>> -object
>>>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
>>>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>>>
>>>> With current master I get:
>>>>
>>>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid
>>>> argument
>>>>
>>>> The problem is that memory size (8193MiB) is not an integer
>>>> multiple of underlying pagesize (2MiB) which triggers a check
>>>> inside of madvise(), since we can't really set a madvise() policy
>>>> just to a fraction of a page.
>>>
>>> I thought we would just always fail create something that doesn't really
>>> make any sense.
>>>
>>> Why would we want to support that case?
>>>
>>> Let me dig, I thought we would have had some check there at some point
>>> that would make that fail (especially: RAM block not aligned to the
>>> pagesize).
>>
>>
>> At least memory-backend-memfd properly fails for that case:
>>
>> $ ./build/qemu-system-x86_64 -object
>> memory-backend-memfd,hugetlb=on,size=3m,id=tmp
>> qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
>>
>> memory-backend-file ends up creating a new file:
>>
>>   $ ./build/qemu-system-x86_64 -object
>> memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
>>
>> $ stat /dev/hugepages/tmp
>>    File: /dev/hugepages/tmp
>>    Size: 4194304         Blocks: 0          IO Block: 2097152 regular file
>>
>> ... and ends up sizing it properly aligned to the huge page size.
>>
>>
>> Seems to be due to:
>>
>>      if (memory < block->page_size) {
>>          error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
>>                     "or larger than page size 0x%zx",
>>                     memory, block->page_size);
>>          return NULL;
>>      }
>>
>>      memory = ROUND_UP(memory, block->page_size);
>>
>>      /*
>>       * ftruncate is not supported by hugetlbfs in older
>>       * hosts, so don't bother bailing out on errors.
>>       * If anything goes wrong with it under other filesystems,
>>       * mmap will fail.
>>       *
>>       * Do not truncate the non-empty backend file to avoid corrupting
>>       * the existing data in the file. Disabling shrinking is not
>>       * enough. For example, the current vNVDIMM implementation stores
>>       * the guest NVDIMM labels at the end of the backend file. If the
>>       * backend file is later extended, QEMU will not be able to find
>>       * those labels. Therefore, extending the non-empty backend file
>>       * is disabled as well.
>>       */
>>      if (truncate && ftruncate(fd, offset + memory)) {
>>          perror("ftruncate");
>>      }
>>
>> So we create a bigger file and map the bigger file and also have a
>> RAMBlock that is bigger. So we'll also consume more memory.
>>
>> ... but the memory region is smaller and we tell the VM that it has
>> less memory. Lot of work with no obvious benefit, and only some
>> memory waste :)
>>
>>
>> We better should have just rejected such memory backends right from
>> the start. But now it's likely too late.
>>
>> I suspect other things like
>>   * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>>   * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
>>
>> fail, but we don't care for hugetlb at least regarding merging
>> and don't even log an error.
>>
>> But QEMU_MADV_DONTDUMP might also be broken, because that
>> qemu_madvise() call will just fail.
>>
>> Your fix would be correct. But I do wonder if we want to just let that
>> case fail and warn users that they are doing something that doesn't
>> make too much sense.
>>
> 
> Yeah, what's suspicious is: if the size is smaller than page size we
> error out, but if it's larger (but still not aligned) we accept that.
> I'm failing to see reasoning there. Looks like the ROUND_UP() was added
> in v0.13.0-rc0~1201^2~4 (though it's done with some bit magic) and the
> check itself was added in v2.8.0-rc0~30^2~23. So it's a bit late, yes.

Yeah.

> 
> OTOH - if users want to waste resources, should we stop them? For

It's all inconsistent, including memfd handling or what you noted above.

For example, Having a 1025 MiB guest on gigantic pages, consuming 2 GiB 
really is just absolutely stupid.

Likely the user wants to know about such mistakes instead of making QEMU 
silence all side effects of that. :)


> instance, when user requests more vCPUs than physical CPUs a warning is
> printed:
> 
> $ ./build/qemu-system-x86_64 -accel kvm -smp cpus=128
> qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested
> (128) exceeds the recommended cpus supported by KVM (8)

But that case is still reasonable for testing guest behavior with many 
vCPUs, or migrating from a machine with more vCPUs.

Here, the guest will actually see all vCPUs. In comparison, the memory 
waste here will never ever be consumable by the VM.

> 
> but that's about it. So maybe the error can be demoted to just a warning?

The question is what we want to do, for example, with the 
qemu_madvise(QEMU_MADV_DONTDUMP). It will similarly simply fail.

I'm curious, are there real customers running into that?


We could fix it all that but always warn when something like that is 
being done.
Michal Privoznik Dec. 4, 2023, 2:12 p.m. UTC | #5
On 12/4/23 10:21, David Hildenbrand wrote:
> On 01.12.23 10:07, Michal Prívozník wrote:
>> On 11/27/23 14:55, David Hildenbrand wrote:
>>> On 27.11.23 14:37, David Hildenbrand wrote:
>>>> On 27.11.23 13:32, Michal Privoznik wrote:
>>>>> Simple reproducer:
>>>>> qemu.git $ ./build/qemu-system-x86_64 \
>>>>> -m size=8389632k,slots=16,maxmem=25600000k \
>>>>> -object
>>>>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
>>>>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>>>>
>>>>> With current master I get:
>>>>>
>>>>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid
>>>>> argument
>>>>>
>>>>> The problem is that memory size (8193MiB) is not an integer
>>>>> multiple of underlying pagesize (2MiB) which triggers a check
>>>>> inside of madvise(), since we can't really set a madvise() policy
>>>>> just to a fraction of a page.
>>>>
>>>> I thought we would just always fail create something that doesn't
>>>> really
>>>> make any sense.
>>>>
>>>> Why would we want to support that case?
>>>>
>>>> Let me dig, I thought we would have had some check there at some point
>>>> that would make that fail (especially: RAM block not aligned to the
>>>> pagesize).
>>>
>>>
>>> At least memory-backend-memfd properly fails for that case:
>>>
>>> $ ./build/qemu-system-x86_64 -object
>>> memory-backend-memfd,hugetlb=on,size=3m,id=tmp
>>> qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
>>>
>>> memory-backend-file ends up creating a new file:
>>>
>>>   $ ./build/qemu-system-x86_64 -object
>>> memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
>>>
>>> $ stat /dev/hugepages/tmp
>>>    File: /dev/hugepages/tmp
>>>    Size: 4194304         Blocks: 0          IO Block: 2097152 regular
>>> file
>>>
>>> ... and ends up sizing it properly aligned to the huge page size.
>>>
>>>
>>> Seems to be due to:
>>>
>>>      if (memory < block->page_size) {
>>>          error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be
>>> equal to "
>>>                     "or larger than page size 0x%zx",
>>>                     memory, block->page_size);
>>>          return NULL;
>>>      }
>>>
>>>      memory = ROUND_UP(memory, block->page_size);
>>>
>>>      /*
>>>       * ftruncate is not supported by hugetlbfs in older
>>>       * hosts, so don't bother bailing out on errors.
>>>       * If anything goes wrong with it under other filesystems,
>>>       * mmap will fail.
>>>       *
>>>       * Do not truncate the non-empty backend file to avoid corrupting
>>>       * the existing data in the file. Disabling shrinking is not
>>>       * enough. For example, the current vNVDIMM implementation stores
>>>       * the guest NVDIMM labels at the end of the backend file. If the
>>>       * backend file is later extended, QEMU will not be able to find
>>>       * those labels. Therefore, extending the non-empty backend file
>>>       * is disabled as well.
>>>       */
>>>      if (truncate && ftruncate(fd, offset + memory)) {
>>>          perror("ftruncate");
>>>      }
>>>
>>> So we create a bigger file and map the bigger file and also have a
>>> RAMBlock that is bigger. So we'll also consume more memory.
>>>
>>> ... but the memory region is smaller and we tell the VM that it has
>>> less memory. Lot of work with no obvious benefit, and only some
>>> memory waste :)
>>>
>>>
>>> We better should have just rejected such memory backends right from
>>> the start. But now it's likely too late.
>>>
>>> I suspect other things like
>>>   * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>>>   * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
>>>
>>> fail, but we don't care for hugetlb at least regarding merging
>>> and don't even log an error.
>>>
>>> But QEMU_MADV_DONTDUMP might also be broken, because that
>>> qemu_madvise() call will just fail.
>>>
>>> Your fix would be correct. But I do wonder if we want to just let that
>>> case fail and warn users that they are doing something that doesn't
>>> make too much sense.
>>>
>>
>> Yeah, what's suspicious is: if the size is smaller than page size we
>> error out, but if it's larger (but still not aligned) we accept that.
>> I'm failing to see reasoning there. Looks like the ROUND_UP() was added
>> in v0.13.0-rc0~1201^2~4 (though it's done with some bit magic) and the
>> check itself was added in v2.8.0-rc0~30^2~23. So it's a bit late, yes.
> 
> Yeah.
> 
>>
>> OTOH - if users want to waste resources, should we stop them? For
> 
> It's all inconsistent, including memfd handling or what you noted above.
> 
> For example, Having a 1025 MiB guest on gigantic pages, consuming 2 GiB
> really is just absolutely stupid.
> 
> Likely the user wants to know about such mistakes instead of making QEMU
> silence all side effects of that. :)

Agreed. As I said, consistency should win here.

> 
> 
>> instance, when user requests more vCPUs than physical CPUs a warning is
>> printed:
>>
>> $ ./build/qemu-system-x86_64 -accel kvm -smp cpus=128
>> qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested
>> (128) exceeds the recommended cpus supported by KVM (8)
> 
> But that case is still reasonable for testing guest behavior with many
> vCPUs, or migrating from a machine with more vCPUs.
> 
> Here, the guest will actually see all vCPUs. In comparison, the memory
> waste here will never ever be consumable by the VM.

Good point.

> 
>>
>> but that's about it. So maybe the error can be demoted to just a warning?
> 
> The question is what we want to do, for example, with the
> qemu_madvise(QEMU_MADV_DONTDUMP). It will similarly simply fail.

It will. But the retval of qemu_madvise() is not checked here, and in
qemu_ram_setup_dump() it's just a warning.

> 
> I'm curious, are there real customers running into that?
> 
>

No, I haven't seen any bugreport from a customer, just our QE ran into
this issue: https://issues.redhat.com/browse/RHEL-1127 (I've asked to
make this issue public).


> We could fix it all that but always warn when something like that is
> being done.
> 

Fair enough. After all of this, I'm inclined to turn this into a proper
error and deny not page aligned sizes. There's no real benefit in having
them and furthermore, the original bug report is about cryptic error
message.

Michal
David Hildenbrand Dec. 4, 2023, 6:38 p.m. UTC | #6
>>
> 
> Fair enough. After all of this, I'm inclined to turn this into a proper
> error and deny not page aligned sizes. There's no real benefit in having
> them and furthermore, the original bug report is about cryptic error
> message.
> 

I guess if we glue this to compat machines we should be definitely fine.
diff mbox series

Patch

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c0..4e88d048de 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -326,9 +326,10 @@  host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
     HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
     Error *local_err = NULL;
     void *ptr;
-    uint64_t sz;
 
     if (bc->alloc) {
+        uint64_t sz;
+
         bc->alloc(backend, &local_err);
         if (local_err) {
             goto out;
@@ -337,6 +338,11 @@  host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         ptr = memory_region_get_ram_ptr(&backend->mr);
         sz = memory_region_size(&backend->mr);
 
+        /* Round up size to be an integer multiple of pagesize, because
+         * madvise() does not really like setting advices on a fraction of a
+         * page. */
+        sz = ROUND_UP(sz, qemu_ram_pagesize(backend->mr.ram_block));
+
         if (backend->merge) {
             qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
         }