diff mbox series

[v2,2/2] accel: kvm: Add aligment assert for kvm_log_clear_one_slot

Message ID 20201217014941.22872-3-zhukeqian1@huawei.com (mailing list archive)
State New, archived
Headers show
Series accel: kvm: Some bugfixes for kvm dirty log | expand

Commit Message

zhukeqian Dec. 17, 2020, 1:49 a.m. UTC
The parameters start and size are transfered from QEMU memory
emulation layer. It can promise that they are TARGET_PAGE_SIZE
aligned. However, KVM needs they are qemu_real_page_size aligned.

Though no caller breaks this aligned requirement currently, we'd
better add an explicit assert to avoid future breaking.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 accel/kvm/kvm-all.c | 7 +++++++
 1 file changed, 7 insertions(+)

---
v2
 - Address Andrew's commment (Use assert instead of return err).

Comments

Andrew Jones Dec. 17, 2020, 12:18 p.m. UTC | #1
On Thu, Dec 17, 2020 at 09:49:41AM +0800, Keqian Zhu wrote:
> The parameters start and size are transfered from QEMU memory
> emulation layer. It can promise that they are TARGET_PAGE_SIZE
> aligned. However, KVM needs they are qemu_real_page_size aligned.
> 
> Though no caller breaks this aligned requirement currently, we'd
> better add an explicit assert to avoid future breaking.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>  accel/kvm/kvm-all.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> ---
> v2
>  - Address Andrew's commment (Use assert instead of return err).
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f6b16a8df8..73b195cc41 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -692,6 +692,10 @@ out:
>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>  
> +/*
> + * As the granule of kvm dirty log is qemu_real_host_page_size,
> + * @start and @size are expected and restricted to align to it.
> + */
>  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>                                    uint64_t size)
>  {
> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>      unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>      int ret;
>  
> +    /* Make sure start and size are qemu_real_host_page_size aligned */
> +    assert(QEMU_IS_ALIGNED(start | size, psize));
> +
>      /*
>       * We need to extend either the start or the size or both to
>       * satisfy the KVM interface requirement.  Firstly, do the start
> -- 
> 2.23.0
> 
>

Reviewed-by: Andrew Jones <drjones@redhat.com>
Peter Xu Dec. 17, 2020, 2:36 p.m. UTC | #2
On Thu, Dec 17, 2020 at 09:49:41AM +0800, Keqian Zhu wrote:
> The parameters start and size are transfered from QEMU memory
> emulation layer. It can promise that they are TARGET_PAGE_SIZE
> aligned. However, KVM needs they are qemu_real_page_size aligned.
> 
> Though no caller breaks this aligned requirement currently, we'd
> better add an explicit assert to avoid future breaking.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>

Acked-by: Peter Xu <peterx@redhat.com>
Philippe Mathieu-Daudé Feb. 1, 2021, 3:14 p.m. UTC | #3
Hi,

On 12/17/20 2:49 AM, Keqian Zhu wrote:
> The parameters start and size are transfered from QEMU memory
> emulation layer. It can promise that they are TARGET_PAGE_SIZE
> aligned. However, KVM needs they are qemu_real_page_size aligned.
> 
> Though no caller breaks this aligned requirement currently, we'd
> better add an explicit assert to avoid future breaking.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>  accel/kvm/kvm-all.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> ---
> v2
>  - Address Andrew's commment (Use assert instead of return err).
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f6b16a8df8..73b195cc41 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -692,6 +692,10 @@ out:
>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>  
> +/*
> + * As the granule of kvm dirty log is qemu_real_host_page_size,
> + * @start and @size are expected and restricted to align to it.
> + */
>  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>                                    uint64_t size)
>  {
> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>      unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>      int ret;
>  
> +    /* Make sure start and size are qemu_real_host_page_size aligned */
> +    assert(QEMU_IS_ALIGNED(start | size, psize));

Why not return an error instead of aborting the VM?

>      /*
>       * We need to extend either the start or the size or both to
>       * satisfy the KVM interface requirement.  Firstly, do the start
>
zhukeqian Feb. 2, 2021, 1:17 a.m. UTC | #4
Hi Philippe,

On 2021/2/1 23:14, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> On 12/17/20 2:49 AM, Keqian Zhu wrote:
>> The parameters start and size are transfered from QEMU memory
>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>
>> Though no caller breaks this aligned requirement currently, we'd
>> better add an explicit assert to avoid future breaking.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>  accel/kvm/kvm-all.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> ---
>> v2
>>  - Address Andrew's commment (Use assert instead of return err).
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index f6b16a8df8..73b195cc41 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -692,6 +692,10 @@ out:
>>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>  
>> +/*
>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>> + * @start and @size are expected and restricted to align to it.
>> + */
>>  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>                                    uint64_t size)
>>  {
>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>      unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>      int ret;
>>  
>> +    /* Make sure start and size are qemu_real_host_page_size aligned */
>> +    assert(QEMU_IS_ALIGNED(start | size, psize));
> 
> Why not return an error instead of aborting the VM?
Yep, I return an error in v1. As suggested by Peter Xu: "Returning -EINVAL is the same as abort() currently - it'll just abort() at
kvm_log_clear() instead."

> 
>>      /*
>>       * We need to extend either the start or the size or both to
>>       * satisfy the KVM interface requirement.  Firstly, do the start
>>
> 
> .
> 
Thanks for review.

Keqian.
Thomas Huth March 9, 2021, 1:48 p.m. UTC | #5
On 17/12/2020 02.49, Keqian Zhu wrote:
> The parameters start and size are transfered from QEMU memory
> emulation layer. It can promise that they are TARGET_PAGE_SIZE
> aligned. However, KVM needs they are qemu_real_page_size aligned.
> 
> Though no caller breaks this aligned requirement currently, we'd
> better add an explicit assert to avoid future breaking.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>   accel/kvm/kvm-all.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> ---
> v2
>   - Address Andrew's commment (Use assert instead of return err).
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f6b16a8df8..73b195cc41 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -692,6 +692,10 @@ out:
>   #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>   #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>   
> +/*
> + * As the granule of kvm dirty log is qemu_real_host_page_size,
> + * @start and @size are expected and restricted to align to it.
> + */
>   static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>                                     uint64_t size)
>   {
> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>       unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>       int ret;
>   
> +    /* Make sure start and size are qemu_real_host_page_size aligned */
> +    assert(QEMU_IS_ALIGNED(start | size, psize));

Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:

$ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: 
kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
Aborted (core dumped)

Can we please revert this patch?

  Thomas
zhukeqian March 9, 2021, 2:05 p.m. UTC | #6
On 2021/3/9 21:48, Thomas Huth wrote:
> On 17/12/2020 02.49, Keqian Zhu wrote:
>> The parameters start and size are transfered from QEMU memory
>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>
>> Though no caller breaks this aligned requirement currently, we'd
>> better add an explicit assert to avoid future breaking.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>   accel/kvm/kvm-all.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> ---
>> v2
>>   - Address Andrew's commment (Use assert instead of return err).
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index f6b16a8df8..73b195cc41 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -692,6 +692,10 @@ out:
>>   #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>>   #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>   +/*
>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>> + * @start and @size are expected and restricted to align to it.
>> + */
>>   static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>                                     uint64_t size)
>>   {
>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>       unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>       int ret;
>>   +    /* Make sure start and size are qemu_real_host_page_size aligned */
>> +    assert(QEMU_IS_ALIGNED(start | size, psize));
> 
> Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
> 
> $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
> qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
> Aborted (core dumped)
Hi Thomas,

I think this patch is ok, maybe it trigger a potential bug?

Thanks,
Keqian

> 
> Can we please revert this patch?
> 
>  Thomas
> 
> .
>
Thomas Huth March 9, 2021, 2:45 p.m. UTC | #7
On 09/03/2021 15.05, Keqian Zhu wrote:
> 
> 
> On 2021/3/9 21:48, Thomas Huth wrote:
>> On 17/12/2020 02.49, Keqian Zhu wrote:
>>> The parameters start and size are transfered from QEMU memory
>>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>>
>>> Though no caller breaks this aligned requirement currently, we'd
>>> better add an explicit assert to avoid future breaking.
>>>
>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>>> ---
>>>    accel/kvm/kvm-all.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> ---
>>> v2
>>>    - Address Andrew's commment (Use assert instead of return err).
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index f6b16a8df8..73b195cc41 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -692,6 +692,10 @@ out:
>>>    #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>>>    #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>>    +/*
>>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>>> + * @start and @size are expected and restricted to align to it.
>>> + */
>>>    static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>                                      uint64_t size)
>>>    {
>>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>        unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>>        int ret;
>>>    +    /* Make sure start and size are qemu_real_host_page_size aligned */
>>> +    assert(QEMU_IS_ALIGNED(start | size, psize));
>>
>> Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
>>
>> $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
>> qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
>> Aborted (core dumped)
> Hi Thomas,
> 
> I think this patch is ok, maybe it trigger a potential bug?

Well, sure, there is either a bug somewhere else or in this new code. But it's certainly not normal that the assert() triggers, is it?

FWIW, here's a backtrace:

#0  0x00007ffff2c1584f in raise () at /lib64/libc.so.6
#1  0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
#2  0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000555555ba25f3 in kvm_log_clear_one_slot
     (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
     at ../../devel/qemu/accel/kvm/kvm-all.c:691
#5  0x0000555555ba25f3 in kvm_physical_log_clear
     (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
     at ../../devel/qemu/accel/kvm/kvm-all.c:843
#6  0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
     at ../../devel/qemu/accel/kvm/kvm-all.c:1253
#7  0x0000555555b023d8 in memory_region_clear_dirty_bitmap
     (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
     at ../../devel/qemu/softmmu/memory.c:2132
#8  0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
     (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
#9  0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
     (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
     at ../../devel/qemu/softmmu/memory.c:2146
#10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1661
#11 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1784
#12 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1757
#13 0x00005555558ddd32 in graphic_hw_update (con=0x555556a11800)
     at ../../devel/qemu/ui/console.c:279
#14 0x00005555558dccd2 in dpy_refresh (s=0x555556c17da0) at ../../devel/qemu/ui/console.c:1742
#15 0x00005555558dccd2 in gui_update (opaque=opaque@entry=0x555556c17da0)
     at ../../devel/qemu/ui/console.c:209
#16 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
     at ../../devel/qemu/util/qemu-timer.c:574
#17 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
     at ../../devel/qemu/util/qemu-timer.c:499
#18 0x0000555555dbd74a in qemu_clock_run_timers (type=<optimized out>)
     at ../../devel/qemu/util/qemu-timer.c:670
#19 0x0000555555dbd74a in qemu_clock_run_all_timers () at ../../devel/qemu/util/qemu-timer.c:670

Looks like something in the vga code calls this with size=6910080
and thus triggers the alignment assertion?

  Thomas
Dr. David Alan Gilbert March 9, 2021, 2:57 p.m. UTC | #8
* Thomas Huth (thuth@redhat.com) wrote:
> On 09/03/2021 15.05, Keqian Zhu wrote:
> > 
> > 
> > On 2021/3/9 21:48, Thomas Huth wrote:
> > > On 17/12/2020 02.49, Keqian Zhu wrote:
> > > > The parameters start and size are transfered from QEMU memory
> > > > emulation layer. It can promise that they are TARGET_PAGE_SIZE
> > > > aligned. However, KVM needs they are qemu_real_page_size aligned.
> > > > 
> > > > Though no caller breaks this aligned requirement currently, we'd
> > > > better add an explicit assert to avoid future breaking.
> > > > 
> > > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> > > > ---
> > > >    accel/kvm/kvm-all.c | 7 +++++++
> > > >    1 file changed, 7 insertions(+)
> > > > 
> > > > ---
> > > > v2
> > > >    - Address Andrew's commment (Use assert instead of return err).
> > > > 
> > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > > > index f6b16a8df8..73b195cc41 100644
> > > > --- a/accel/kvm/kvm-all.c
> > > > +++ b/accel/kvm/kvm-all.c
> > > > @@ -692,6 +692,10 @@ out:
> > > >    #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
> > > >    #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
> > > >    +/*
> > > > + * As the granule of kvm dirty log is qemu_real_host_page_size,
> > > > + * @start and @size are expected and restricted to align to it.
> > > > + */
> > > >    static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
> > > >                                      uint64_t size)
> > > >    {
> > > > @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
> > > >        unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
> > > >        int ret;
> > > >    +    /* Make sure start and size are qemu_real_host_page_size aligned */
> > > > +    assert(QEMU_IS_ALIGNED(start | size, psize));
> > > 
> > > Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
> > > 
> > > $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
> > > qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
> > > Aborted (core dumped)
> > Hi Thomas,
> > 
> > I think this patch is ok, maybe it trigger a potential bug?
> 
> Well, sure, there is either a bug somewhere else or in this new code. But it's certainly not normal that the assert() triggers, is it?
> 
> FWIW, here's a backtrace:
> 
> #0  0x00007ffff2c1584f in raise () at /lib64/libc.so.6
> #1  0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
> #2  0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> #3  0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
> #4  0x0000555555ba25f3 in kvm_log_clear_one_slot
>     (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
>     at ../../devel/qemu/accel/kvm/kvm-all.c:691
> #5  0x0000555555ba25f3 in kvm_physical_log_clear
>     (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
>     at ../../devel/qemu/accel/kvm/kvm-all.c:843
> #6  0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
>     at ../../devel/qemu/accel/kvm/kvm-all.c:1253
> #7  0x0000555555b023d8 in memory_region_clear_dirty_bitmap
>     (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
>     at ../../devel/qemu/softmmu/memory.c:2132
> #8  0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
>     (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
> #9  0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
>     (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
>     at ../../devel/qemu/softmmu/memory.c:2146

Could you please figure out which memory region this is?
WTH is that size? Is that really the problem that the size is just
crazy?

Dave

> #10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
>     at ../../devel/qemu/hw/display/vga.c:1661
> #11 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
>     at ../../devel/qemu/hw/display/vga.c:1784
> #12 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
>     at ../../devel/qemu/hw/display/vga.c:1757
> #13 0x00005555558ddd32 in graphic_hw_update (con=0x555556a11800)
>     at ../../devel/qemu/ui/console.c:279
> #14 0x00005555558dccd2 in dpy_refresh (s=0x555556c17da0) at ../../devel/qemu/ui/console.c:1742
> #15 0x00005555558dccd2 in gui_update (opaque=opaque@entry=0x555556c17da0)
>     at ../../devel/qemu/ui/console.c:209
> #16 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
>     at ../../devel/qemu/util/qemu-timer.c:574
> #17 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
>     at ../../devel/qemu/util/qemu-timer.c:499
> #18 0x0000555555dbd74a in qemu_clock_run_timers (type=<optimized out>)
>     at ../../devel/qemu/util/qemu-timer.c:670
> #19 0x0000555555dbd74a in qemu_clock_run_all_timers () at ../../devel/qemu/util/qemu-timer.c:670
> 
> Looks like something in the vga code calls this with size=6910080
> and thus triggers the alignment assertion?
> 
>  Thomas
zhukeqian March 9, 2021, 3:11 p.m. UTC | #9
Thanks for your bug report. I was just off work, will dig into it tomorrow.   thanks :)

Keqian

On 09/03/2021 15.05, Keqian Zhu wrote:
>
>
> On 2021/3/9 21:48, Thomas Huth wrote:
>> On 17/12/2020 02.49, Keqian Zhu wrote:
>>> The parameters start and size are transfered from QEMU memory
>>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>>
>>> Though no caller breaks this aligned requirement currently, we'd
>>> better add an explicit assert to avoid future breaking.
>>>
>>> Signed-off-by: Keqian Zhu < zhukeqian1@huawei.com<mailto:zhukeqian1@huawei.com>>
>>> ---
>>>    accel/kvm/kvm-all.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> ---
>>> v2
>>>    - Address Andrew's commment (Use assert instead of return err).
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index f6b16a8df8..73b195cc41 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -692,6 +692,10 @@ out:
>>>    #define KVM_CLEAR_LOG_ALIGN (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>>>    #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>>    +/*
>>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>>> + * @start and @size are expected and restricted to align to it.
>>> + */
>>>    static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>                                      uint64_t size)
>>>    {
>>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>        unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>>        int ret;
>>>    +    /* Make sure start and size are qemu_real_host_page_size aligned */
>>> +    assert(QEMU_IS_ALIGNED(start | size, psize));
>>
>> Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
>>
>> $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
>> qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
>> Aborted (core dumped)
> Hi Thomas,
>
> I think this patch is ok, maybe it trigger a potential bug?

Well, sure, there is either a bug somewhere else or in this new code. But it's certainly not normal that the assert() triggers, is it?

FWIW, here's a backtrace:

#0 0x00007ffff2c1584f in raise () at /lib64/libc.so.6
#1 0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
#2 0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3 0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
#4 0x0000555555ba25f3 in kvm_log_clear_one_slot
     (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
     at ../../devel/qemu/accel/kvm/kvm-all.c:691
#5 0x0000555555ba25f3 in kvm_physical_log_clear
     (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
     at ../../devel/qemu/accel/kvm/kvm-all.c:843
#6 0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
     at ../../devel/qemu/accel/kvm/kvm-all.c:1253
#7 0x0000555555b023d8 in memory_region_clear_dirty_bitmap
     (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
     at ../../devel/qemu/softmmu/memory.c:2132
#8 0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
     (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
#9 0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
     (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
     at ../../devel/qemu/softmmu/memory.c:2146
#10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1661
#11 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1784
#12 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
     at ../../devel/qemu/hw/display/vga.c:1757
#13 0x00005555558ddd32 in graphic_hw_update (con=0x555556a11800)
     at ../../devel/qemu/ui/console.c:279
#14 0x00005555558dccd2 in dpy_refresh (s=0x555556c17da0) at ../../devel/qemu/ui/console.c:1742
#15 0x00005555558dccd2 in gui_update (opaque=opaque@entry=0x555556c17da0)
     at ../../devel/qemu/ui/console.c:209
#16 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
     at ../../devel/qemu/util/qemu-timer.c:574
#17 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
     at ../../devel/qemu/util/qemu-timer.c:499
#18 0x0000555555dbd74a in qemu_clock_run_timers (type=<optimized out>)
     at ../../devel/qemu/util/qemu-timer.c:670
#19 0x0000555555dbd74a in qemu_clock_run_all_timers () at ../../devel/qemu/util/qemu-timer.c:670

Looks like something in the vga code calls this with size=6910080
and thus triggers the alignment assertion?

  Thomas
Peter Xu March 9, 2021, 4:08 p.m. UTC | #10
On Tue, Mar 09, 2021 at 02:57:53PM +0000, Dr. David Alan Gilbert wrote:
> * Thomas Huth (thuth@redhat.com) wrote:
> > On 09/03/2021 15.05, Keqian Zhu wrote:
> > > 
> > > 
> > > On 2021/3/9 21:48, Thomas Huth wrote:
> > > > On 17/12/2020 02.49, Keqian Zhu wrote:
> > > > > The parameters start and size are transfered from QEMU memory
> > > > > emulation layer. It can promise that they are TARGET_PAGE_SIZE
> > > > > aligned. However, KVM needs they are qemu_real_page_size aligned.
> > > > > 
> > > > > Though no caller breaks this aligned requirement currently, we'd
> > > > > better add an explicit assert to avoid future breaking.
> > > > > 
> > > > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> > > > > ---
> > > > >    accel/kvm/kvm-all.c | 7 +++++++
> > > > >    1 file changed, 7 insertions(+)
> > > > > 
> > > > > ---
> > > > > v2
> > > > >    - Address Andrew's commment (Use assert instead of return err).
> > > > > 
> > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > > > > index f6b16a8df8..73b195cc41 100644
> > > > > --- a/accel/kvm/kvm-all.c
> > > > > +++ b/accel/kvm/kvm-all.c
> > > > > @@ -692,6 +692,10 @@ out:
> > > > >    #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
> > > > >    #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
> > > > >    +/*
> > > > > + * As the granule of kvm dirty log is qemu_real_host_page_size,
> > > > > + * @start and @size are expected and restricted to align to it.
> > > > > + */
> > > > >    static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
> > > > >                                      uint64_t size)
> > > > >    {
> > > > > @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
> > > > >        unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
> > > > >        int ret;
> > > > >    +    /* Make sure start and size are qemu_real_host_page_size aligned */
> > > > > +    assert(QEMU_IS_ALIGNED(start | size, psize));
> > > > 
> > > > Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
> > > > 
> > > > $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
> > > > qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
> > > > Aborted (core dumped)
> > > Hi Thomas,
> > > 
> > > I think this patch is ok, maybe it trigger a potential bug?
> > 
> > Well, sure, there is either a bug somewhere else or in this new code. But it's certainly not normal that the assert() triggers, is it?
> > 
> > FWIW, here's a backtrace:
> > 
> > #0  0x00007ffff2c1584f in raise () at /lib64/libc.so.6
> > #1  0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
> > #2  0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> > #3  0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
> > #4  0x0000555555ba25f3 in kvm_log_clear_one_slot
> >     (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
> >     at ../../devel/qemu/accel/kvm/kvm-all.c:691
> > #5  0x0000555555ba25f3 in kvm_physical_log_clear
> >     (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
> >     at ../../devel/qemu/accel/kvm/kvm-all.c:843
> > #6  0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
> >     at ../../devel/qemu/accel/kvm/kvm-all.c:1253
> > #7  0x0000555555b023d8 in memory_region_clear_dirty_bitmap
> >     (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
> >     at ../../devel/qemu/softmmu/memory.c:2132
> > #8  0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
> >     (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
> > #9  0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
> >     (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
> >     at ../../devel/qemu/softmmu/memory.c:2146
> 
> Could you please figure out which memory region this is?
> WTH is that size? Is that really the problem that the size is just
> crazy?

It seems vga_draw_graphic() could call memory_region_snapshot_and_clear_dirty()
with not-page-aligned size.  cpu_physical_memory_snapshot_and_clear_dirty()
actually took care of most of it on alignment, however still the "length"
parameter got passed in without alignment check or so.

Cc Gerd too.

I'm not sure how many use cases are there like this.. if there're a lot maybe
we can indeed drop this assert patch, but instead in kvm_log_clear_one_slot()
we should ALIGN_DOWN the size to smallest host page size. Say, if we need to
clear dirty bit for range (0, 0x1020), we should only clean (0, 0x1000) since
there can still be dirty data on range (0x1020, 0x2000).

Thanks,

> 
> Dave
> 
> > #10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
> >     at ../../devel/qemu/hw/display/vga.c:1661
> > #11 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
> >     at ../../devel/qemu/hw/display/vga.c:1784
> > #12 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
> >     at ../../devel/qemu/hw/display/vga.c:1757
> > #13 0x00005555558ddd32 in graphic_hw_update (con=0x555556a11800)
> >     at ../../devel/qemu/ui/console.c:279
> > #14 0x00005555558dccd2 in dpy_refresh (s=0x555556c17da0) at ../../devel/qemu/ui/console.c:1742
> > #15 0x00005555558dccd2 in gui_update (opaque=opaque@entry=0x555556c17da0)
> >     at ../../devel/qemu/ui/console.c:209
> > #16 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
> >     at ../../devel/qemu/util/qemu-timer.c:574
> > #17 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
> >     at ../../devel/qemu/util/qemu-timer.c:499
> > #18 0x0000555555dbd74a in qemu_clock_run_timers (type=<optimized out>)
> >     at ../../devel/qemu/util/qemu-timer.c:670
> > #19 0x0000555555dbd74a in qemu_clock_run_all_timers () at ../../devel/qemu/util/qemu-timer.c:670
> > 
> > Looks like something in the vga code calls this with size=6910080
> > and thus triggers the alignment assertion?
> > 
> >  Thomas
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
Thomas Huth March 9, 2021, 4:20 p.m. UTC | #11
On 09/03/2021 15.57, Dr. David Alan Gilbert wrote:
> * Thomas Huth (thuth@redhat.com) wrote:
>> On 09/03/2021 15.05, Keqian Zhu wrote:
>>>
>>>
>>> On 2021/3/9 21:48, Thomas Huth wrote:
>>>> On 17/12/2020 02.49, Keqian Zhu wrote:
>>>>> The parameters start and size are transfered from QEMU memory
>>>>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>>>>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>>>>
>>>>> Though no caller breaks this aligned requirement currently, we'd
>>>>> better add an explicit assert to avoid future breaking.
>>>>>
>>>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>>>>> ---
>>>>>     accel/kvm/kvm-all.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> ---
>>>>> v2
>>>>>     - Address Andrew's commment (Use assert instead of return err).
>>>>>
>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>>> index f6b16a8df8..73b195cc41 100644
>>>>> --- a/accel/kvm/kvm-all.c
>>>>> +++ b/accel/kvm/kvm-all.c
>>>>> @@ -692,6 +692,10 @@ out:
>>>>>     #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>>>>>     #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>>>>     +/*
>>>>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>>>>> + * @start and @size are expected and restricted to align to it.
>>>>> + */
>>>>>     static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>>>                                       uint64_t size)
>>>>>     {
>>>>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>>>         unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>>>>         int ret;
>>>>>     +    /* Make sure start and size are qemu_real_host_page_size aligned */
>>>>> +    assert(QEMU_IS_ALIGNED(start | size, psize));
>>>>
>>>> Sorry, but that was a bad idea: It triggers and kills my Centos 6 VM:
>>>>
>>>> $ qemu-system-x86_64 -accel kvm -hda ~/virt/images/centos6.qcow2 -m 1G
>>>> qemu-system-x86_64: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed.
>>>> Aborted (core dumped)
>>> Hi Thomas,
>>>
>>> I think this patch is ok, maybe it trigger a potential bug?
>>
>> Well, sure, there is either a bug somewhere else or in this new code. But it's certainly not normal that the assert() triggers, is it?
>>
>> FWIW, here's a backtrace:
>>
>> #0  0x00007ffff2c1584f in raise () at /lib64/libc.so.6
>> #1  0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
>> #2  0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
>> #3  0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
>> #4  0x0000555555ba25f3 in kvm_log_clear_one_slot
>>      (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
>>      at ../../devel/qemu/accel/kvm/kvm-all.c:691
>> #5  0x0000555555ba25f3 in kvm_physical_log_clear
>>      (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
>>      at ../../devel/qemu/accel/kvm/kvm-all.c:843
>> #6  0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
>>      at ../../devel/qemu/accel/kvm/kvm-all.c:1253
>> #7  0x0000555555b023d8 in memory_region_clear_dirty_bitmap
>>      (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
>>      at ../../devel/qemu/softmmu/memory.c:2132
>> #8  0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
>>      (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
>> #9  0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
>>      (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
>>      at ../../devel/qemu/softmmu/memory.c:2146
> 
> Could you please figure out which memory region this is?
> WTH is that size? Is that really the problem that the size is just
> crazy?

The answer is one stack frame below...

>> #10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
>>      at ../../devel/qemu/hw/display/vga.c:1661

The vga code basically does this:

     region_start = (s->start_addr * 4);
     region_end = region_start + (ram_addr_t)s->line_offset * height;
     region_end += width * depth / 8; /* scanline length */
     region_end -= s->line_offset;
     ...
     memory_region_snapshot_and_clear_dirty(... region_end - region_start...);

Thus it uses a size that is nowhere guaranteed to be a multiple
of the page size.

A similar usage can be seen in other devices, too (e.g. sm501.c),
so either there is a bug in the assert() statement, or we have
a problem with many devices...

  Thomas
Peter Maydell March 9, 2021, 4:26 p.m. UTC | #12
On Tue, 9 Mar 2021 at 16:20, Thomas Huth <thuth@redhat.com> wrote:
> The vga code basically does this:
>
>      region_start = (s->start_addr * 4);
>      region_end = region_start + (ram_addr_t)s->line_offset * height;
>      region_end += width * depth / 8; /* scanline length */
>      region_end -= s->line_offset;
>      ...
>      memory_region_snapshot_and_clear_dirty(... region_end - region_start...);
>
> Thus it uses a size that is nowhere guaranteed to be a multiple
> of the page size.

The documentation comment for memory_region_snapshot_and_clear_dirty()
says:
 * The dirty bitmap region which gets copyed into the snapshot (and
 * cleared afterwards) can be larger than requested.  The boundaries
 * are rounded up/down

That is, it is the job of memory_region_snapshot_and_clear_dirty()
to round the boundaries up/down to whatever extent it requires
internally.

thanks
-- PMM
Paolo Bonzini March 9, 2021, 7:03 p.m. UTC | #13
On 09/03/21 17:26, Peter Maydell wrote:
> The documentation comment for memory_region_snapshot_and_clear_dirty()
> says:
>   * The dirty bitmap region which gets copyed into the snapshot (and
>   * cleared afterwards) can be larger than requested.  The boundaries
>   * are rounded up/down
> 
> That is, it is the job of memory_region_snapshot_and_clear_dirty()
> to round the boundaries up/down to whatever extent it requires
> internally.

Or alternatively of the log_sync/log_clear callbacks.  I'll queue a 
revert for now, anyway.

Paolo
zhukeqian March 10, 2021, 1:57 a.m. UTC | #14
On 2021/3/10 0:08, Peter Xu wrote:
> On Tue, Mar 09, 2021 at 02:57:53PM +0000, Dr. David Alan Gilbert wrote:
>> * Thomas Huth (thuth@redhat.com) wrote:
>>> On 09/03/2021 15.05, Keqian Zhu wrote:
>>>>
>>>>
>>>> On 2021/3/9 21:48, Thomas Huth wrote:
>>>>> On 17/12/2020 02.49, Keqian Zhu wrote:
[...]

>>>
>>> #0  0x00007ffff2c1584f in raise () at /lib64/libc.so.6
>>> #1  0x00007ffff2bffc45 in abort () at /lib64/libc.so.6
>>> #2  0x00007ffff2bffb19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
>>> #3  0x00007ffff2c0de36 in .annobin_assert.c_end () at /lib64/libc.so.6
>>> #4  0x0000555555ba25f3 in kvm_log_clear_one_slot
>>>     (size=6910080, start=0, as_id=0, mem=0x555556e1ee00)
>>>     at ../../devel/qemu/accel/kvm/kvm-all.c:691
>>> #5  0x0000555555ba25f3 in kvm_physical_log_clear
>>>     (section=0x7fffffffd0b0, section=0x7fffffffd0b0, kml=0x555556dbaac0)
>>>     at ../../devel/qemu/accel/kvm/kvm-all.c:843
>>> #6  0x0000555555ba25f3 in kvm_log_clear (listener=0x555556dbaac0, section=0x7fffffffd0b0)
>>>     at ../../devel/qemu/accel/kvm/kvm-all.c:1253
>>> #7  0x0000555555b023d8 in memory_region_clear_dirty_bitmap
>>>     (mr=mr@entry=0x5555573394c0, start=start@entry=0, len=len@entry=6910080)
>>>     at ../../devel/qemu/softmmu/memory.c:2132
>>> #8  0x0000555555b313d9 in cpu_physical_memory_snapshot_and_clear_dirty
>>>     (mr=mr@entry=0x5555573394c0, offset=offset@entry=0, length=length@entry=6910080, client=client@entry=0) at ../../devel/qemu/softmmu/physmem.c:1109
>>> #9  0x0000555555b02483 in memory_region_snapshot_and_clear_dirty
>>>     (mr=mr@entry=0x5555573394c0, addr=addr@entry=0, size=size@entry=6910080, client=client@entry=0)
>>>     at ../../devel/qemu/softmmu/memory.c:2146
>>
>> Could you please figure out which memory region this is?
>> WTH is that size? Is that really the problem that the size is just
>> crazy?
> 
> It seems vga_draw_graphic() could call memory_region_snapshot_and_clear_dirty()
> with not-page-aligned size.  cpu_physical_memory_snapshot_and_clear_dirty()
> actually took care of most of it on alignment, however still the "length"
> parameter got passed in without alignment check or so.
> 
> Cc Gerd too.
> 
> I'm not sure how many use cases are there like this.. if there're a lot maybe
> we can indeed drop this assert patch, but instead in kvm_log_clear_one_slot()
> we should ALIGN_DOWN the size to smallest host page size. Say, if we need to
> clear dirty bit for range (0, 0x1020), we should only clean (0, 0x1000) since
> there can still be dirty data on range (0x1020, 0x2000).
Right, the @start and @size should be properly aligned by kvm_log_clear_one_slot().
We shouldn't clear areas that beyond what caller expected.

An assert here is not properly.

Thanks,
Keqian
> 
> Thanks,
> 
>>
>> Dave
>>
>>> #10 0x0000555555babe99 in vga_draw_graphic (full_update=0, s=0x5555573394b0)
>>>     at ../../devel/qemu/hw/display/vga.c:1661
>>> #11 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
>>>     at ../../devel/qemu/hw/display/vga.c:1784
>>> #12 0x0000555555babe99 in vga_update_display (opaque=0x5555573394b0)
>>>     at ../../devel/qemu/hw/display/vga.c:1757
>>> #13 0x00005555558ddd32 in graphic_hw_update (con=0x555556a11800)
>>>     at ../../devel/qemu/ui/console.c:279
>>> #14 0x00005555558dccd2 in dpy_refresh (s=0x555556c17da0) at ../../devel/qemu/ui/console.c:1742
>>> #15 0x00005555558dccd2 in gui_update (opaque=opaque@entry=0x555556c17da0)
>>>     at ../../devel/qemu/ui/console.c:209
>>> #16 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
>>>     at ../../devel/qemu/util/qemu-timer.c:574
>>> #17 0x0000555555dbd520 in timerlist_run_timers (timer_list=0x555556937c50)
>>>     at ../../devel/qemu/util/qemu-timer.c:499
>>> #18 0x0000555555dbd74a in qemu_clock_run_timers (type=<optimized out>)
>>>     at ../../devel/qemu/util/qemu-timer.c:670
>>> #19 0x0000555555dbd74a in qemu_clock_run_all_timers () at ../../devel/qemu/util/qemu-timer.c:670
>>>
>>> Looks like something in the vga code calls this with size=6910080
>>> and thus triggers the alignment assertion?
>>>
>>>  Thomas
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f6b16a8df8..73b195cc41 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -692,6 +692,10 @@  out:
 #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
 #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
 
+/*
+ * As the granule of kvm dirty log is qemu_real_host_page_size,
+ * @start and @size are expected and restricted to align to it.
+ */
 static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
                                   uint64_t size)
 {
@@ -701,6 +705,9 @@  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
     unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
     int ret;
 
+    /* Make sure start and size are qemu_real_host_page_size aligned */
+    assert(QEMU_IS_ALIGNED(start | size, psize));
+
     /*
      * We need to extend either the start or the size or both to
      * satisfy the KVM interface requirement.  Firstly, do the start