Message ID | 20201208114013.875-1-yuzenghui@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: Take into account the unaligned section size when preparing bitmap | expand |
Hi, Zenghui, On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote: > The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the > start and the size of the given range of pages. We have been careful to > handle the unaligned cases when performing CLEAR on one slot. But it seems > that we forget to take the unaligned *size* case into account when > preparing bitmap for the interface, and we may end up clearing dirty status > for pages outside of [start, start + size). Thanks for the patch, though my understanding is that this is not a bug. Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the value sizeof(unsigned long)). That exactly covers 64 pages. So here as long as start_delta==0 (so the value of "bmap_npages - size / psize" won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap long enough to cover the range we'd like to clear. Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized it to all zero so the extra bits will always be zero for the whole lifecycle of the vm/bitmap. Thanks, > > Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> > --- > accel/kvm/kvm-all.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index bed2455ca5..05d323ba1f 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, > assert(bmap_start % BITS_PER_LONG == 0); > /* We should never do log_clear before log_sync */ > assert(mem->dirty_bmap); > - if (start_delta) { > + if (start_delta || bmap_npages - size / psize) { > /* Slow path - we need to manipulate a temp bitmap */ > bmap_clear = bitmap_new(bmap_npages); > bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap, > @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, > bitmap_clear(bmap_clear, 0, start_delta); > d.dirty_bitmap = bmap_clear; > } else { > - /* Fast path - start address aligns well with BITS_PER_LONG */ > + /* > + * Fast path - both start and size align well with BITS_PER_LONG > + * (or the end of memory slot) > + */ > d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start); > } > > -- > 2.19.1 > >
Hi Peter, Thanks for having a look at it. On 2020/12/8 23:16, Peter Xu wrote: > Hi, Zenghui, > > On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote: >> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the >> start and the size of the given range of pages. We have been careful to >> handle the unaligned cases when performing CLEAR on one slot. But it seems >> that we forget to take the unaligned *size* case into account when >> preparing bitmap for the interface, and we may end up clearing dirty status >> for pages outside of [start, start + size). > > Thanks for the patch, though my understanding is that this is not a bug. > > Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the > dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the > value sizeof(unsigned long)). That exactly covers 64 pages. > > So here as long as start_delta==0 (so the value of "bmap_npages - size / psize" > won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap > long enough to cover the range we'd like to clear. I agree. But actually I'm not saying that KVMSlot.dirty_bmap is not long enough. What I was having in mind is something like: // psize = qemu_real_host_page_size; // slot.start_addr = 0; // slot.memory_size = 64 * psize; kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize); --> [1] kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize); --> [2] So the @size is not aligned with 64 pages. Before this patch, we'll clear dirty status for all pages(0-63) through [1]. It looks to me that this violates the caller's expectation since we only want to clear pages(0-31). As I said, I don't think this will happen in practice -- the migration code should always provide us with a 64-page aligned section (right?). I'm just thinking about the correctness of the specific algorithm used by kvm_log_clear_one_slot(). Or maybe I had missed some other points obvious ;-) ? Thanks, Zenghui > Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size > of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized > it to all zero so the extra bits will always be zero for the whole lifecycle of > the vm/bitmap. > > Thanks, > >> >> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >> --- >> accel/kvm/kvm-all.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c >> index bed2455ca5..05d323ba1f 100644 >> --- a/accel/kvm/kvm-all.c >> +++ b/accel/kvm/kvm-all.c >> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, >> assert(bmap_start % BITS_PER_LONG == 0); >> /* We should never do log_clear before log_sync */ >> assert(mem->dirty_bmap); >> - if (start_delta) { >> + if (start_delta || bmap_npages - size / psize) { >> /* Slow path - we need to manipulate a temp bitmap */ >> bmap_clear = bitmap_new(bmap_npages); >> bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap, >> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, >> bitmap_clear(bmap_clear, 0, start_delta); >> d.dirty_bitmap = bmap_clear; >> } else { >> - /* Fast path - start address aligns well with BITS_PER_LONG */ >> + /* >> + * Fast path - both start and size align well with BITS_PER_LONG >> + * (or the end of memory slot) >> + */ >> d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start); >> } >> >> -- >> 2.19.1 >> >> >
On Wed, Dec 09, 2020 at 10:33:41AM +0800, Zenghui Yu wrote: > Hi Peter, > > Thanks for having a look at it. > > On 2020/12/8 23:16, Peter Xu wrote: > > Hi, Zenghui, > > > > On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote: > > > The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the > > > start and the size of the given range of pages. We have been careful to > > > handle the unaligned cases when performing CLEAR on one slot. But it seems > > > that we forget to take the unaligned *size* case into account when > > > preparing bitmap for the interface, and we may end up clearing dirty status > > > for pages outside of [start, start + size). > > > > Thanks for the patch, though my understanding is that this is not a bug. > > > > Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the > > dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the > > value sizeof(unsigned long)). That exactly covers 64 pages. > > > > So here as long as start_delta==0 (so the value of "bmap_npages - size / psize" > > won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap > > long enough to cover the range we'd like to clear. > > I agree. But actually I'm not saying that KVMSlot.dirty_bmap is not > long enough. What I was having in mind is something like: > > // psize = qemu_real_host_page_size; > // slot.start_addr = 0; > // slot.memory_size = 64 * psize; > > kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize); --> [1] > kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize); --> [2] > > So the @size is not aligned with 64 pages. Before this patch, we'll > clear dirty status for all pages(0-63) through [1]. It looks to me that > this violates the caller's expectation since we only want to clear > pages(0-31). Now I see; I think you're right. :) > > As I said, I don't think this will happen in practice -- the migration > code should always provide us with a 64-page aligned section (right?). Yes, migration is the major consumer, and that should be guaranteed indeed, see CLEAR_BITMAP_SHIFT_MIN. Not sure about VGA - that should try to do log clear even without migration, but I guess that satisfies the 64-page alignment too, since it's not a huge number (256KB). The VGA buffer size could be related to screen resolution, then N*1024*768 could still guarantee a safe use of the fast path. > I'm just thinking about the correctness of the specific algorithm used > by kvm_log_clear_one_slot(). Yeah, then I think it's okay to have this, just in case someday we'll hit it. Acked-by: Peter Xu <peterx@redhat.com> (It would be nicer if above example could be squashed into commit message) Thanks,
Hi, I see that if start or size is not PAGE aligned, it also clears areas which beyond caller's expectation, so do we also need to consider this? Thanks, Keqian On 2020/12/9 10:33, Zenghui Yu wrote: > Hi Peter, > > Thanks for having a look at it. > > On 2020/12/8 23:16, Peter Xu wrote: >> Hi, Zenghui, >> >> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote: >>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the >>> start and the size of the given range of pages. We have been careful to >>> handle the unaligned cases when performing CLEAR on one slot. But it seems >>> that we forget to take the unaligned *size* case into account when >>> preparing bitmap for the interface, and we may end up clearing dirty status >>> for pages outside of [start, start + size). >> >> Thanks for the patch, though my understanding is that this is not a bug. >> >> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the >> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the >> value sizeof(unsigned long)). That exactly covers 64 pages. >> >> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize" >> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap >> long enough to cover the range we'd like to clear. > > I agree. But actually I'm not saying that KVMSlot.dirty_bmap is not > long enough. What I was having in mind is something like: > > // psize = qemu_real_host_page_size; > // slot.start_addr = 0; > // slot.memory_size = 64 * psize; > > kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize); --> [1] > kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize); --> [2] > > So the @size is not aligned with 64 pages. Before this patch, we'll > clear dirty status for all pages(0-63) through [1]. It looks to me that > this violates the caller's expectation since we only want to clear > pages(0-31). > > As I said, I don't think this will happen in practice -- the migration > code should always provide us with a 64-page aligned section (right?). > I'm just thinking about the correctness of the specific algorithm used > by kvm_log_clear_one_slot(). > > Or maybe I had missed some other points obvious ;-) ? > > > Thanks, > Zenghui > >> Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size >> of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized >> it to all zero so the extra bits will always be zero for the whole lifecycle of >> the vm/bitmap. >> >> Thanks, >> >>> >>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>> --- >>> accel/kvm/kvm-all.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c >>> index bed2455ca5..05d323ba1f 100644 >>> --- a/accel/kvm/kvm-all.c >>> +++ b/accel/kvm/kvm-all.c >>> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, >>> assert(bmap_start % BITS_PER_LONG == 0); >>> /* We should never do log_clear before log_sync */ >>> assert(mem->dirty_bmap); >>> - if (start_delta) { >>> + if (start_delta || bmap_npages - size / psize) { >>> /* Slow path - we need to manipulate a temp bitmap */ >>> bmap_clear = bitmap_new(bmap_npages); >>> bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap, >>> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, >>> bitmap_clear(bmap_clear, 0, start_delta); >>> d.dirty_bitmap = bmap_clear; >>> } else { >>> - /* Fast path - start address aligns well with BITS_PER_LONG */ >>> + /* >>> + * Fast path - both start and size align well with BITS_PER_LONG >>> + * (or the end of memory slot) >>> + */ >>> d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start); >>> } >>> -- >>> 2.19.1 >>> >>> >> > > . >
Keqian, On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: > Hi, > > I see that if start or size is not PAGE aligned, it also clears areas > which beyond caller's expectation, so do we also need to consider this? Could you elaborate? If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. Thanks,
On 2020/12/10 10:08, Peter Xu wrote: > Keqian, > > On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: >> Hi, >> >> I see that if start or size is not PAGE aligned, it also clears areas >> which beyond caller's expectation, so do we also need to consider this? > > Could you elaborate? > > If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. > > Thanks, > Hi Peter, start_delta /= psize; If start is not PAGE aligned, then start_delta is not PAGE aligned. so I think the above code will implicitly extend our start to be PAGE aligned. I suggest that we should shrink the start and (start + size) to be PAGE aligned at beginning of this function. Maybe I miss something? Keqian.
On 2020/12/10 10:53, zhukeqian wrote: > > > On 2020/12/10 10:08, Peter Xu wrote: >> Keqian, >> >> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: >>> Hi, >>> >>> I see that if start or size is not PAGE aligned, it also clears areas >>> which beyond caller's expectation, so do we also need to consider this? Ah, I really don't anticipate this to happen but ... The question is what's the expectation of the caller if a non-PAGE aligned @start is given. Should we clear dirty state for the page which covers the unaligned @start? Or just skip it? >> >> Could you elaborate? >> >> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. >> >> Thanks, >> > > Hi Peter, > > start_delta /= psize; > > If start is not PAGE aligned, then start_delta is not PAGE aligned. > so I think the above code will implicitly extend our start to be PAGE aligned. > > I suggest that we should shrink the start and (start + size) to be PAGE aligned > at beginning of this function. I don't object to do so (though haven't checked the code carefully). I also don't think there is much we can do given that the caller doesn't care about which *exact pages* to clear. Thanks, Zenghui
On 2020/12/10 5:09, Peter Xu wrote: > On Wed, Dec 09, 2020 at 10:33:41AM +0800, Zenghui Yu wrote: >> Hi Peter, >> >> Thanks for having a look at it. >> >> On 2020/12/8 23:16, Peter Xu wrote: >>> Hi, Zenghui, >>> >>> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote: >>>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the >>>> start and the size of the given range of pages. We have been careful to >>>> handle the unaligned cases when performing CLEAR on one slot. But it seems >>>> that we forget to take the unaligned *size* case into account when >>>> preparing bitmap for the interface, and we may end up clearing dirty status >>>> for pages outside of [start, start + size). >>> >>> Thanks for the patch, though my understanding is that this is not a bug. >>> >>> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the >>> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the >>> value sizeof(unsigned long)). That exactly covers 64 pages. >>> >>> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize" >>> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap >>> long enough to cover the range we'd like to clear. >> >> I agree. But actually I'm not saying that KVMSlot.dirty_bmap is not >> long enough. What I was having in mind is something like: >> >> // psize = qemu_real_host_page_size; >> // slot.start_addr = 0; >> // slot.memory_size = 64 * psize; >> >> kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize); --> [1] >> kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize); --> [2] >> >> So the @size is not aligned with 64 pages. Before this patch, we'll >> clear dirty status for all pages(0-63) through [1]. It looks to me that >> this violates the caller's expectation since we only want to clear >> pages(0-31). > > Now I see; I think you're right. :) > >> >> As I said, I don't think this will happen in practice -- the migration >> code should always provide us with a 64-page aligned section (right?). > > Yes, migration is the major consumer, and that should be guaranteed indeed, see > CLEAR_BITMAP_SHIFT_MIN. > > Not sure about VGA - that should try to do log clear even without migration, > but I guess that satisfies the 64-page alignment too, since it's not a huge > number (256KB). The VGA buffer size could be related to screen resolution, > then N*1024*768 could still guarantee a safe use of the fast path. > >> I'm just thinking about the correctness of the specific algorithm used >> by kvm_log_clear_one_slot(). > > Yeah, then I think it's okay to have this, just in case someday we'll hit it. > > Acked-by: Peter Xu <peterx@redhat.com> Thanks for that. > (It would be nicer if above example could be squashed into commit message) I'll squash it if v2 is needed. Thanks, Zenghui
On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote: > > > On 2020/12/10 10:08, Peter Xu wrote: > > Keqian, > > > > On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: > >> Hi, > >> > >> I see that if start or size is not PAGE aligned, it also clears areas > >> which beyond caller's expectation, so do we also need to consider this? > > > > Could you elaborate? > > > > If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. > > > > Thanks, > > > > Hi Peter, > > start_delta /= psize; > > If start is not PAGE aligned, then start_delta is not PAGE aligned. > so I think the above code will implicitly extend our start to be PAGE aligned. > > I suggest that we should shrink the start and (start + size) to be PAGE aligned > at beginning of this function. Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64 should be pretty safe since host/guest page sizes match. Though indeed I must confess I don't know how it worked in general when host page size != target page size, at least for migration. For example, I believe kvm dirty logging is host page size based, though migration should be migrating pages in guest page size granule when it spots a dirty bit set.
On 2020/12/10 22:50, Peter Xu wrote: > On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote: >> >> >> On 2020/12/10 10:08, Peter Xu wrote: >>> Keqian, >>> >>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: >>>> Hi, >>>> >>>> I see that if start or size is not PAGE aligned, it also clears areas >>>> which beyond caller's expectation, so do we also need to consider this? >>> >>> Could you elaborate? >>> >>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. >>> >>> Thanks, >>> >> >> Hi Peter, >> >> start_delta /= psize; >> >> If start is not PAGE aligned, then start_delta is not PAGE aligned. >> so I think the above code will implicitly extend our start to be PAGE aligned. >> >> I suggest that we should shrink the start and (start + size) to be PAGE aligned >> at beginning of this function. > > Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64 > should be pretty safe since host/guest page sizes match. > > Though indeed I must confess I don't know how it worked in general when host > page size != target page size, at least for migration. For example, I believe > kvm dirty logging is host page size based, though migration should be migrating > pages in guest page size granule when it spots a dirty bit set. > Hi, Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better add explicit function comments about alignment requirement, and explicit alignment assert on @start and @size? Keqian.
On Fri, Dec 11, 2020 at 09:13:10AM +0800, zhukeqian wrote: > > On 2020/12/10 22:50, Peter Xu wrote: > > On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote: > >> > >> > >> On 2020/12/10 10:08, Peter Xu wrote: > >>> Keqian, > >>> > >>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: > >>>> Hi, > >>>> > >>>> I see that if start or size is not PAGE aligned, it also clears areas > >>>> which beyond caller's expectation, so do we also need to consider this? > >>> > >>> Could you elaborate? > >>> > >>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. > >>> > >>> Thanks, > >>> > >> > >> Hi Peter, > >> > >> start_delta /= psize; > >> > >> If start is not PAGE aligned, then start_delta is not PAGE aligned. > >> so I think the above code will implicitly extend our start to be PAGE aligned. > >> > >> I suggest that we should shrink the start and (start + size) to be PAGE aligned > >> at beginning of this function. > > > > Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64 > > should be pretty safe since host/guest page sizes match. > > > > Though indeed I must confess I don't know how it worked in general when host > > page size != target page size, at least for migration. For example, I believe > > kvm dirty logging is host page size based, though migration should be migrating > > pages in guest page size granule when it spots a dirty bit set. > > > Hi, > > Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better > add explicit function comments about alignment requirement, and explicit alignment assert > on @start and @size? Yes we can, but I think it's not strongly necessary. As Zenghui pointed out, the callers of memory_region_clear_dirty_bitmap() should always be aware of the fact that dirty bitmap is always page size based. OTOH I'm more worried on the other question on how we handle guest psize != host psize case for migration now... Thanks,
On 2020/12/11 23:25, Peter Xu wrote: > On Fri, Dec 11, 2020 at 09:13:10AM +0800, zhukeqian wrote: >> >> On 2020/12/10 22:50, Peter Xu wrote: >>> On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote: >>>> >>>> >>>> On 2020/12/10 10:08, Peter Xu wrote: >>>>> Keqian, >>>>> >>>>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote: >>>>>> Hi, >>>>>> >>>>>> I see that if start or size is not PAGE aligned, it also clears areas >>>>>> which beyond caller's expectation, so do we also need to consider this? >>>>> >>>>> Could you elaborate? >>>>> >>>>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path. >>>>> >>>>> Thanks, >>>>> >>>> >>>> Hi Peter, >>>> >>>> start_delta /= psize; >>>> >>>> If start is not PAGE aligned, then start_delta is not PAGE aligned. >>>> so I think the above code will implicitly extend our start to be PAGE aligned. >>>> >>>> I suggest that we should shrink the start and (start + size) to be PAGE aligned >>>> at beginning of this function. >>> >>> Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64 >>> should be pretty safe since host/guest page sizes match. >>> >>> Though indeed I must confess I don't know how it worked in general when host >>> page size != target page size, at least for migration. For example, I believe >>> kvm dirty logging is host page size based, though migration should be migrating >>> pages in guest page size granule when it spots a dirty bit set. >>> >> Hi, >> >> Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better >> add explicit function comments about alignment requirement, and explicit alignment assert >> on @start and @size? > Hi Peter, > Yes we can, but I think it's not strongly necessary. As Zenghui pointed out, > the callers of memory_region_clear_dirty_bitmap() should always be aware of the > fact that dirty bitmap is always page size based. Agree. > > OTOH I'm more worried on the other question on how we handle guest psize != > host psize case for migration now... I think it does not matter when guest_psize != host_psize, as we only need to interact with stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even if guest close stage1, we also can do a successful migration. Please point out if I misunderstood what you meant. Thanks, Keqian > > Thanks, >
On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote: [...] > >>> Though indeed I must confess I don't know how it worked in general when host > >>> page size != target page size, at least for migration. For example, I believe > >>> kvm dirty logging is host page size based, though migration should be migrating > >>> pages in guest page size granule when it spots a dirty bit set. [1] > Hi Peter, Keqian, > > OTOH I'm more worried on the other question on how we handle guest psize != > > host psize case for migration now... > I think it does not matter when guest_psize != host_psize, as we only need to interact with > stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even > if guest close stage1, we also can do a successful migration. I don't know why 2-stage matters here, since I believe KVM can track dirty pages either using two dimentional paging or shadowing, however it's always done in host small page size. The question I'm confused is there seems to have a size mismatch between qemu migration and what kvm does [1]. For example, how migration works on ARM64 where host has psize==4K while guest has psize=64K. Thanks,
On 2020/12/14 23:36, Peter Xu wrote: > On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote: > > [...] > >>>>> Though indeed I must confess I don't know how it worked in general when host >>>>> page size != target page size, at least for migration. For example, I believe >>>>> kvm dirty logging is host page size based, though migration should be migrating >>>>> pages in guest page size granule when it spots a dirty bit set. > > [1] > >> Hi Peter, > > Keqian, > >>> OTOH I'm more worried on the other question on how we handle guest psize != >>> host psize case for migration now... >> I think it does not matter when guest_psize != host_psize, as we only need to interact with >> stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even >> if guest close stage1, we also can do a successful migration. > > I don't know why 2-stage matters here, since I believe KVM can track dirty > pages either using two dimentional paging or shadowing, however it's always > done in host small page size. The question I'm confused is there seems to have > a size mismatch between qemu migration and what kvm does [1]. For example, how > migration works on ARM64 where host has psize==4K while guest has psize=64K. > Hi Peter, OK, I got it ;-) Do you mean qemu_real_host_page_size != TARGET_PAGE_SIZE? After my analysis, I see that when qemu_real_host_page_size != TARGET_PAGE_SIZE, there are some problems indeed. I have send out some patches, please check whether they solve this problem, thanks! Keqian. > Thanks, >
Hi Keqian, Peter, On 2020/12/15 15:23, zhukeqian wrote: > > On 2020/12/14 23:36, Peter Xu wrote: >> On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote: >> >> [...] >> >>>>>> Though indeed I must confess I don't know how it worked in general when host >>>>>> page size != target page size, at least for migration. For example, I believe >>>>>> kvm dirty logging is host page size based, though migration should be migrating >>>>>> pages in guest page size granule when it spots a dirty bit set. >> >> [1] >> >>> Hi Peter, >> >> Keqian, >> >>>> OTOH I'm more worried on the other question on how we handle guest psize != >>>> host psize case for migration now... >>> I think it does not matter when guest_psize != host_psize, as we only need to interact with >>> stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even >>> if guest close stage1, we also can do a successful migration. >> >> I don't know why 2-stage matters here, since I believe KVM can track dirty >> pages either using two dimentional paging or shadowing, however it's always >> done in host small page size. The question I'm confused is there seems to have >> a size mismatch between qemu migration and what kvm does [1]. For example, how >> migration works on ARM64 where host has psize==4K while guest has psize=64K. >> > Hi Peter, > > OK, I got it ;-) Do you mean qemu_real_host_page_size != TARGET_PAGE_SIZE? > After my analysis, I see that when qemu_real_host_page_size != TARGET_PAGE_SIZE, > there are some problems indeed. I have send out some patches, please check whether they solve this > problem, thanks! Now I see what your concern is :) Thanks both for the explanation and the further fix! Zenghui
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index bed2455ca5..05d323ba1f 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, assert(bmap_start % BITS_PER_LONG == 0); /* We should never do log_clear before log_sync */ assert(mem->dirty_bmap); - if (start_delta) { + if (start_delta || bmap_npages - size / psize) { /* Slow path - we need to manipulate a temp bitmap */ bmap_clear = bitmap_new(bmap_npages); bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap, @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start, bitmap_clear(bmap_clear, 0, start_delta); d.dirty_bitmap = bmap_clear; } else { - /* Fast path - start address aligns well with BITS_PER_LONG */ + /* + * Fast path - both start and size align well with BITS_PER_LONG + * (or the end of memory slot) + */ d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start); }
The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the start and the size of the given range of pages. We have been careful to handle the unaligned cases when performing CLEAR on one slot. But it seems that we forget to take the unaligned *size* case into account when preparing bitmap for the interface, and we may end up clearing dirty status for pages outside of [start, start + size). If the size is unaligned, let's go through the slow path to manipulate a temp bitmap for the interface so that we won't bother with those unaligned bits at the end of bitmap. I don't think this can happen in practice since the upper layer would provide us with the alignment guarantee. I'm not sure if kvm-all could rely on it. And this patch is mainly intended to address correctness of the specific algorithm used inside kvm_log_clear_one_slot(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> --- accel/kvm/kvm-all.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)