diff mbox

[v12,2/6] KVM: Add generic support for dirty page logging

Message ID 1414017251-5772-3-git-send-email-m.smarduch@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mario Smarduch Oct. 22, 2014, 10:34 p.m. UTC
This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
makefile. No other architectures are affected, each uses it's own version.
This changed from previous patch revision where non-generic architectures 
were modified.

In subsequent patch armv7 does samething. All other architectures continue
use architecture defined version.


Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/x86/include/asm/kvm_host.h |    3 --
 arch/x86/kvm/Kconfig            |    1 +
 arch/x86/kvm/Makefile           |    1 +
 arch/x86/kvm/x86.c              |   86 ------------------------------
 include/linux/kvm_host.h        |    4 ++
 virt/kvm/Kconfig                |    3 ++
 virt/kvm/dirtylog.c             |  112 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 121 insertions(+), 89 deletions(-)
 create mode 100644 virt/kvm/dirtylog.c

Comments

Cornelia Huck Oct. 30, 2014, 12:14 p.m. UTC | #1
On Wed, 22 Oct 2014 15:34:07 -0700
Mario Smarduch <m.smarduch@samsung.com> wrote:

> This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
> to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
> dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
> makefile. No other architectures are affected, each uses it's own version.
> This changed from previous patch revision where non-generic architectures 
> were modified.
> 
> In subsequent patch armv7 does samething. All other architectures continue
> use architecture defined version.
> 

Hm.

"The x86 specific version of dirty page logging is generic enough to be
used by other architectures, noteably ARMv7. So let's move the x86 code
under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
architectures continue to use their own implementations."

?

> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    3 --
>  arch/x86/kvm/Kconfig            |    1 +
>  arch/x86/kvm/Makefile           |    1 +
>  arch/x86/kvm/x86.c              |   86 ------------------------------
>  include/linux/kvm_host.h        |    4 ++
>  virt/kvm/Kconfig                |    3 ++
>  virt/kvm/dirtylog.c             |  112 +++++++++++++++++++++++++++++++++++++++
>  7 files changed, 121 insertions(+), 89 deletions(-)
>  create mode 100644 virt/kvm/dirtylog.c
> 

> diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
> new file mode 100644
> index 0000000..67ffffa
> --- /dev/null
> +++ b/virt/kvm/dirtylog.c
> @@ -0,0 +1,112 @@
> +/*
> + * kvm generic dirty logging support, used by architectures that share
> + * comman dirty page logging implementation.

s/comman/common/

The approach looks sane to me, especially as it does not change other
architectures needlessly.
Mario Smarduch Oct. 30, 2014, 7:19 p.m. UTC | #2
On 10/30/2014 05:14 AM, Cornelia Huck wrote:
> On Wed, 22 Oct 2014 15:34:07 -0700
> Mario Smarduch <m.smarduch@samsung.com> wrote:
> 
>> This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
>> to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
>> dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
>> makefile. No other architectures are affected, each uses it's own version.
>> This changed from previous patch revision where non-generic architectures 
>> were modified.
>>
>> In subsequent patch armv7 does samething. All other architectures continue
>> use architecture defined version.
>>
> 
> Hm.
> 
> "The x86 specific version of dirty page logging is generic enough to be
> used by other architectures, noteably ARMv7. So let's move the x86 code
> under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
> architectures continue to use their own implementations."
> 
> ?

I'll update descriptions for both patches, with the more concise
descriptions.

Thanks.

> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    3 --
>>  arch/x86/kvm/Kconfig            |    1 +
>>  arch/x86/kvm/Makefile           |    1 +
>>  arch/x86/kvm/x86.c              |   86 ------------------------------
>>  include/linux/kvm_host.h        |    4 ++
>>  virt/kvm/Kconfig                |    3 ++
>>  virt/kvm/dirtylog.c             |  112 +++++++++++++++++++++++++++++++++++++++
>>  7 files changed, 121 insertions(+), 89 deletions(-)
>>  create mode 100644 virt/kvm/dirtylog.c
>>
> 
>> diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
>> new file mode 100644
>> index 0000000..67ffffa
>> --- /dev/null
>> +++ b/virt/kvm/dirtylog.c
>> @@ -0,0 +1,112 @@
>> +/*
>> + * kvm generic dirty logging support, used by architectures that share
>> + * comman dirty page logging implementation.
> 
> s/comman/common/
> 
> The approach looks sane to me, especially as it does not change other
> architectures needlessly.
>
James Hogan Nov. 1, 2014, 10:12 a.m. UTC | #3
Hi Mario,

On Wed, Oct 22, 2014 at 03:34:07PM -0700, Mario Smarduch wrote:
> +/**
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
> + *
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently.  So, to avoid losing data, we keep the following order for
> + * each bit:
> + *
> + *   1. Take a snapshot of the bit and clear it if needed.
> + *   2. Write protect the corresponding page.
> + *   3. Flush TLB's if needed.
> + *   4. Copy the snapshot to the userspace.
> + *
> + * Between 2 and 3, the guest may write to the page using the remaining TLB
> + * entry.  This is not a problem because the page will be reported dirty at
> + * step 4 using the snapshot taken before and step 3 ensures that successive
> + * writes will be logged for the next call.
> + */
> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +{
> +	int r;
> +	struct kvm_memory_slot *memslot;
> +	unsigned long n, i;
> +	unsigned long *dirty_bitmap;
> +	unsigned long *dirty_bitmap_buffer;
> +	bool is_dirty = false;
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	r = -EINVAL;
> +	if (log->slot >= KVM_USER_MEM_SLOTS)
> +		goto out;
> +
> +	memslot = id_to_memslot(kvm->memslots, log->slot);
> +
> +	dirty_bitmap = memslot->dirty_bitmap;
> +	r = -ENOENT;
> +	if (!dirty_bitmap)
> +		goto out;
> +
> +	n = kvm_dirty_bitmap_bytes(memslot);
> +
> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> +	memset(dirty_bitmap_buffer, 0, n);
> +
> +	spin_lock(&kvm->mmu_lock);
> +
> +	for (i = 0; i < n / sizeof(long); i++) {
> +		unsigned long mask;
> +		gfn_t offset;
> +
> +		if (!dirty_bitmap[i])
> +			continue;
> +
> +		is_dirty = true;
> +
> +		mask = xchg(&dirty_bitmap[i], 0);
> +		dirty_bitmap_buffer[i] = mask;
> +
> +		offset = i * BITS_PER_LONG;
> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +	}
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	/*
> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
> +	 * kvm_mmu_slot_remove_write_access().
> +	 */
> +	if (is_dirty)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	r = -EFAULT;
> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> +		goto out;

AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
of the existing generic function kvm_get_dirty_log() to help implement
their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
similar now except for TLB flushing.

Would they not be a better base for a generic
kvm_vm_ioctl_get_dirty_log()?

It feels a bit wrong to add a generic higher level function which
doesn't make use of the existing generic lower level abstraction.

(Appologies if this has already been brought up in previous versions of
the patchset, I haven't been tracking them).

Cheers
James

> +
> +	r = 0;
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return r;
> +}
Takuya Yoshikawa Nov. 3, 2014, 3:01 a.m. UTC | #4
On Thu, 30 Oct 2014 12:19:00 -0700
Mario Smarduch <m.smarduch@samsung.com> wrote:

> On 10/30/2014 05:14 AM, Cornelia Huck wrote:
> > On Wed, 22 Oct 2014 15:34:07 -0700
> > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > 
> >> This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
> >> to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
> >> dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
> >> makefile. No other architectures are affected, each uses it's own version.
> >> This changed from previous patch revision where non-generic architectures 
> >> were modified.
> >>
> >> In subsequent patch armv7 does samething. All other architectures continue
> >> use architecture defined version.
> >>
> > 
> > Hm.
> > 
> > "The x86 specific version of dirty page logging is generic enough to be
> > used by other architectures, noteably ARMv7. So let's move the x86 code
> > under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
> > architectures continue to use their own implementations."
> > 
> > ?
> 
> I'll update descriptions for both patches, with the more concise
> descriptions.

I don't think it's so generic.

Especially, the relationship between mmu_lock and TLB flushing has
been changed a few times for optimizing x86 mmu code, and it may change
in the future again.

Since how mmu_lock is used is depends on each arch, it's not so
simple to make the function generic IMO.

Thanks,
	Takuya

> 
> Thanks.
> 
> > 
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |    3 --
> >>  arch/x86/kvm/Kconfig            |    1 +
> >>  arch/x86/kvm/Makefile           |    1 +
> >>  arch/x86/kvm/x86.c              |   86 ------------------------------
> >>  include/linux/kvm_host.h        |    4 ++
> >>  virt/kvm/Kconfig                |    3 ++
> >>  virt/kvm/dirtylog.c             |  112 +++++++++++++++++++++++++++++++++++++++
> >>  7 files changed, 121 insertions(+), 89 deletions(-)
> >>  create mode 100644 virt/kvm/dirtylog.c
> >>
> > 
> >> diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
> >> new file mode 100644
> >> index 0000000..67ffffa
> >> --- /dev/null
> >> +++ b/virt/kvm/dirtylog.c
> >> @@ -0,0 +1,112 @@
> >> +/*
> >> + * kvm generic dirty logging support, used by architectures that share
> >> + * comman dirty page logging implementation.
> > 
> > s/comman/common/
> > 
> > The approach looks sane to me, especially as it does not change other
> > architectures needlessly.
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mario Smarduch Nov. 3, 2014, 6:44 p.m. UTC | #5
On 11/01/2014 03:12 AM, James Hogan wrote:
> Hi Mario,
> 
> On Wed, Oct 22, 2014 at 03:34:07PM -0700, Mario Smarduch wrote:
>> +/**
>> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> + * @kvm: kvm instance
>> + * @log: slot id and address to which we copy the log
>> + *
>> + * We need to keep it in mind that VCPU threads can write to the bitmap
>> + * concurrently.  So, to avoid losing data, we keep the following order for
>> + * each bit:
>> + *
>> + *   1. Take a snapshot of the bit and clear it if needed.
>> + *   2. Write protect the corresponding page.
>> + *   3. Flush TLB's if needed.
>> + *   4. Copy the snapshot to the userspace.
>> + *
>> + * Between 2 and 3, the guest may write to the page using the remaining TLB
>> + * entry.  This is not a problem because the page will be reported dirty at
>> + * step 4 using the snapshot taken before and step 3 ensures that successive
>> + * writes will be logged for the next call.
>> + */
>> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +{
>> +	int r;
>> +	struct kvm_memory_slot *memslot;
>> +	unsigned long n, i;
>> +	unsigned long *dirty_bitmap;
>> +	unsigned long *dirty_bitmap_buffer;
>> +	bool is_dirty = false;
>> +
>> +	mutex_lock(&kvm->slots_lock);
>> +
>> +	r = -EINVAL;
>> +	if (log->slot >= KVM_USER_MEM_SLOTS)
>> +		goto out;
>> +
>> +	memslot = id_to_memslot(kvm->memslots, log->slot);
>> +
>> +	dirty_bitmap = memslot->dirty_bitmap;
>> +	r = -ENOENT;
>> +	if (!dirty_bitmap)
>> +		goto out;
>> +
>> +	n = kvm_dirty_bitmap_bytes(memslot);
>> +
>> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> +	memset(dirty_bitmap_buffer, 0, n);
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +
>> +	for (i = 0; i < n / sizeof(long); i++) {
>> +		unsigned long mask;
>> +		gfn_t offset;
>> +
>> +		if (!dirty_bitmap[i])
>> +			continue;
>> +
>> +		is_dirty = true;
>> +
>> +		mask = xchg(&dirty_bitmap[i], 0);
>> +		dirty_bitmap_buffer[i] = mask;
>> +
>> +		offset = i * BITS_PER_LONG;
>> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> +	}
>> +
>> +	spin_unlock(&kvm->mmu_lock);
>> +
>> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
>> +	lockdep_assert_held(&kvm->slots_lock);
>> +
>> +	/*
>> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
>> +	 * kvm_mmu_slot_remove_write_access().
>> +	 */
>> +	if (is_dirty)
>> +		kvm_flush_remote_tlbs(kvm);
>> +
>> +	r = -EFAULT;
>> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> +		goto out;
> 
> AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
> except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
> of the existing generic function kvm_get_dirty_log() to help implement
> their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
> similar now except for TLB flushing.
> 
> Would they not be a better base for a generic
> kvm_vm_ioctl_get_dirty_log()?

But kvm_get_dirty_log() is just a helper for handling dirty bitmap, after
dirty page logging updates based on arch. implementation. There is not
much to reuse here other then add arm sync version and then call
kvm_get_dirty_log.  But the sync would pretty much be identical to our current
kvm_vm_ioctl_get_dirty_log. The purpose of these patches is to reuse some of
dirty page logging logic.
> 
> It feels a bit wrong to add a generic higher level function which
> doesn't make use of the existing generic lower level abstraction.
> 
> (Appologies if this has already been brought up in previous versions of
> the patchset, I haven't been tracking them).

Yeah there is no MAINTAINER entry for mips kvm, per my earlier email.
> 
> Cheers
> James
> 
>> +
>> +	r = 0;
>> +out:
>> +	mutex_unlock(&kvm->slots_lock);
>> +	return r;
>> +}
Mario Smarduch Nov. 3, 2014, 7:34 p.m. UTC | #6
On 11/02/2014 07:01 PM, Takuya Yoshikawa wrote:
> On Thu, 30 Oct 2014 12:19:00 -0700
> Mario Smarduch <m.smarduch@samsung.com> wrote:
> 
>> On 10/30/2014 05:14 AM, Cornelia Huck wrote:
>>> On Wed, 22 Oct 2014 15:34:07 -0700
>>> Mario Smarduch <m.smarduch@samsung.com> wrote:
>>>
>>>> This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
>>>> to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
>>>> dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
>>>> makefile. No other architectures are affected, each uses it's own version.
>>>> This changed from previous patch revision where non-generic architectures 
>>>> were modified.
>>>>
>>>> In subsequent patch armv7 does samething. All other architectures continue
>>>> use architecture defined version.
>>>>
>>>
>>> Hm.
>>>
>>> "The x86 specific version of dirty page logging is generic enough to be
>>> used by other architectures, noteably ARMv7. So let's move the x86 code
>>> under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
>>> architectures continue to use their own implementations."
>>>
>>> ?
>>
>> I'll update descriptions for both patches, with the more concise
>> descriptions.
> 
> I don't think it's so generic.
> 
> Especially, the relationship between mmu_lock and TLB flushing has
> been changed a few times for optimizing x86 mmu code, and it may change
> in the future again.
> 
> Since how mmu_lock is used is depends on each arch, it's not so
> simple to make the function generic IMO.
> 
> Thanks,
> 	Takuya

Hi Takuya,

If the bitmap can be processed locklessly then it's not reusable for sure.

The interface is not "to generic", but "generic enough" for arm/armv8 to
reuse.
For TLB flushing I would think optimizations would wind up in arch subtree.
I'm not convinced that location of couple lines doesn't make the
interface re-usable, going back to 3.10 it hardly changed.

- Mario

> 
>>
>> Thanks.
>>
>>>
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/x86/include/asm/kvm_host.h |    3 --
>>>>  arch/x86/kvm/Kconfig            |    1 +
>>>>  arch/x86/kvm/Makefile           |    1 +
>>>>  arch/x86/kvm/x86.c              |   86 ------------------------------
>>>>  include/linux/kvm_host.h        |    4 ++
>>>>  virt/kvm/Kconfig                |    3 ++
>>>>  virt/kvm/dirtylog.c             |  112 +++++++++++++++++++++++++++++++++++++++
>>>>  7 files changed, 121 insertions(+), 89 deletions(-)
>>>>  create mode 100644 virt/kvm/dirtylog.c
>>>>
>>>
>>>> diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
>>>> new file mode 100644
>>>> index 0000000..67ffffa
>>>> --- /dev/null
>>>> +++ b/virt/kvm/dirtylog.c
>>>> @@ -0,0 +1,112 @@
>>>> +/*
>>>> + * kvm generic dirty logging support, used by architectures that share
>>>> + * comman dirty page logging implementation.
>>>
>>> s/comman/common/
>>>
>>> The approach looks sane to me, especially as it does not change other
>>> architectures needlessly.
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
Paolo Bonzini Nov. 5, 2014, 4:09 p.m. UTC | #7
On 01/11/2014 11:12, James Hogan wrote:
> AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
> except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
> of the existing generic function kvm_get_dirty_log() to help implement
> their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
> similar now except for TLB flushing.
> 
> Would they not be a better base for a generic
> kvm_vm_ioctl_get_dirty_log()?
> 
> It feels a bit wrong to add a generic higher level function which
> doesn't make use of the existing generic lower level abstraction.
> 
> (Appologies if this has already been brought up in previous versions of
> the patchset, I haven't been tracking them).

I agree that we should make the interface look more like
kvm_get_dirty_log().  Here the steps are:

+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.

and I believe we can swap 3 and 4, since (4) is just a copy of data and
it has no ordering relationship with the action of the guest.  Once we
do that, we can rewrite code to look a lot like kvm_get_dirty_log(),
let's call it kvm_get_dirty_log_protect():

	r = -EINVAL;
	if (log->slot >= KVM_USER_MEM_SLOTS)
		goto out;

	memslot = id_to_memslot(kvm->memslots, log->slot);

	dirty_bitmap = memslot->dirty_bitmap;
	r = -ENOENT;
	if (!dirty_bitmap)
		goto out;

	n = kvm_dirty_bitmap_bytes(memslot);

	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
	memset(dirty_bitmap_buffer, 0, n);

	spin_lock(&kvm->mmu_lock);

	*is_dirty = false;
	for (i = 0; i < n / sizeof(long); i++) {
		unsigned long mask;
		gfn_t offset;

		if (!dirty_bitmap[i])
			continue;

		*is_dirty = true;

		mask = xchg(&dirty_bitmap[i], 0);
		dirty_bitmap_buffer[i] = mask;

		offset = i * BITS_PER_LONG;
		kvm_arch_write_protect_pt_masked(kvm, memslot, offset,
						 mask);
	}

	spin_unlock(&kvm->mmu_lock);

	r = -EFAULT;
	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
		goto out;

	r = 0;
out:
	return r;

where the TLB flushing is moved to the caller as in kvm_get_dirty_log
callers.  Taking the slots lock would also be kept in the per-arch
kvm_vm_ioctl_get_dirty_log, again similar to PPC/MIPS/S390.

You can add a new Kconfig symbol, or define an implementation of
kvm_arch_write_protect_pt_masked that BUG()s for ia64/PPC/MIPS/S390.

BTW, you can leave the function in kvm_main.c.

Paolo
Mario Smarduch Nov. 5, 2014, 11:05 p.m. UTC | #8
On 11/05/2014 08:09 AM, Paolo Bonzini wrote:
> 
> 
> On 01/11/2014 11:12, James Hogan wrote:
>> AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
>> except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
>> of the existing generic function kvm_get_dirty_log() to help implement
>> their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
>> similar now except for TLB flushing.
>>
>> Would they not be a better base for a generic
>> kvm_vm_ioctl_get_dirty_log()?
>>
>> It feels a bit wrong to add a generic higher level function which
>> doesn't make use of the existing generic lower level abstraction.
>>
>> (Appologies if this has already been brought up in previous versions of
>> the patchset, I haven't been tracking them).
> 
> I agree that we should make the interface look more like
> kvm_get_dirty_log().  Here the steps are:
> 
> + *   1. Take a snapshot of the bit and clear it if needed.
> + *   2. Write protect the corresponding page.
> + *   3. Flush TLB's if needed.
> + *   4. Copy the snapshot to the userspace.

Hi Paolo,
  thanks for breaking it down between generic/architecture layers,
helps a lot. Initially I thought we could get TLB flushing to
generic layer, previous x86 version worked for ARM. But looking
deeper other architectures either use non-generic flush or none
at all. Right now we would have x86, ARM, IA64 using generic TLB flush.
I'll restructure for another version.

> 
> and I believe we can swap 3 and 4, since (4) is just a copy of data and
> it has no ordering relationship with the action of the guest.  Once we
> do that, we can rewrite code to look a lot like kvm_get_dirty_log(),
> let's call it kvm_get_dirty_log_protect():
> 
> 	r = -EINVAL;
> 	if (log->slot >= KVM_USER_MEM_SLOTS)
> 		goto out;
> 
> 	memslot = id_to_memslot(kvm->memslots, log->slot);
> 
> 	dirty_bitmap = memslot->dirty_bitmap;
> 	r = -ENOENT;
> 	if (!dirty_bitmap)
> 		goto out;
> 
> 	n = kvm_dirty_bitmap_bytes(memslot);
> 
> 	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> 	memset(dirty_bitmap_buffer, 0, n);
> 
> 	spin_lock(&kvm->mmu_lock);
> 
> 	*is_dirty = false;
> 	for (i = 0; i < n / sizeof(long); i++) {
> 		unsigned long mask;
> 		gfn_t offset;
> 
> 		if (!dirty_bitmap[i])
> 			continue;
> 
> 		*is_dirty = true;
> 
> 		mask = xchg(&dirty_bitmap[i], 0);
> 		dirty_bitmap_buffer[i] = mask;
> 
> 		offset = i * BITS_PER_LONG;
> 		kvm_arch_write_protect_pt_masked(kvm, memslot, offset,
> 						 mask);
> 	}
> 
> 	spin_unlock(&kvm->mmu_lock);
> 
> 	r = -EFAULT;
> 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> 		goto out;
> 
> 	r = 0;
> out:
> 	return r;
> 
> where the TLB flushing is moved to the caller as in kvm_get_dirty_log
> callers.  Taking the slots lock would also be kept in the per-arch
> kvm_vm_ioctl_get_dirty_log, again similar to PPC/MIPS/S390.
Ok.
> 
> You can add a new Kconfig symbol, or define an implementation of
> kvm_arch_write_protect_pt_masked that BUG()s for ia64/PPC/MIPS/S390.
Ok will do.
> 
> BTW, you can leave the function in kvm_main.c.
Ok.
> 
> Paolo
> 

Thanks.
Paolo Bonzini Nov. 6, 2014, 10:14 a.m. UTC | #9
On 06/11/2014 00:05, Mario Smarduch wrote:
> On 11/05/2014 08:09 AM, Paolo Bonzini wrote:
>>
>>
>> On 01/11/2014 11:12, James Hogan wrote:
>>> AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
>>> except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
>>> of the existing generic function kvm_get_dirty_log() to help implement
>>> their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
>>> similar now except for TLB flushing.
>>>
>>> Would they not be a better base for a generic
>>> kvm_vm_ioctl_get_dirty_log()?
>>>
>>> It feels a bit wrong to add a generic higher level function which
>>> doesn't make use of the existing generic lower level abstraction.
>>>
>>> (Appologies if this has already been brought up in previous versions of
>>> the patchset, I haven't been tracking them).
>>
>> I agree that we should make the interface look more like
>> kvm_get_dirty_log().  Here the steps are:
>>
>> + *   1. Take a snapshot of the bit and clear it if needed.
>> + *   2. Write protect the corresponding page.
>> + *   3. Flush TLB's if needed.
>> + *   4. Copy the snapshot to the userspace.
> 
> Hi Paolo,
>   thanks for breaking it down between generic/architecture layers,
> helps a lot. Initially I thought we could get TLB flushing to
> generic layer, previous x86 version worked for ARM. But looking
> deeper other architectures either use non-generic flush or none
> at all. Right now we would have x86, ARM, IA64 using generic TLB flush.
> I'll restructure for another version.

I'll test the swap between 3 and 4 above, and send it to the list.  Feel
free to include it in v13, so that it gets back to me via Christoffer
and Marc.

Paolo
Mario Smarduch Nov. 6, 2014, 6:07 p.m. UTC | #10
On 11/06/2014 02:14 AM, Paolo Bonzini wrote:
> 
> 
> On 06/11/2014 00:05, Mario Smarduch wrote:
>> On 11/05/2014 08:09 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 01/11/2014 11:12, James Hogan wrote:
>>>> AFAICT all of the arch implementations of kvm_vm_ioctl_get_dirty_log()
>>>> except x86 and ppc hv (i.e. ia60, mips, ppc pv, s390) already make use
>>>> of the existing generic function kvm_get_dirty_log() to help implement
>>>> their kvm_vm_ioctl_get_dirty_log functions, which all look pretty
>>>> similar now except for TLB flushing.
>>>>
>>>> Would they not be a better base for a generic
>>>> kvm_vm_ioctl_get_dirty_log()?
>>>>
>>>> It feels a bit wrong to add a generic higher level function which
>>>> doesn't make use of the existing generic lower level abstraction.
>>>>
>>>> (Appologies if this has already been brought up in previous versions of
>>>> the patchset, I haven't been tracking them).
>>>
>>> I agree that we should make the interface look more like
>>> kvm_get_dirty_log().  Here the steps are:
>>>
>>> + *   1. Take a snapshot of the bit and clear it if needed.
>>> + *   2. Write protect the corresponding page.
>>> + *   3. Flush TLB's if needed.
>>> + *   4. Copy the snapshot to the userspace.
>>
>> Hi Paolo,
>>   thanks for breaking it down between generic/architecture layers,
>> helps a lot. Initially I thought we could get TLB flushing to
>> generic layer, previous x86 version worked for ARM. But looking
>> deeper other architectures either use non-generic flush or none
>> at all. Right now we would have x86, ARM, IA64 using generic TLB flush.
>> I'll restructure for another version.
> 
> I'll test the swap between 3 and 4 above, and send it to the list.  Feel
> free to include it in v13, so that it gets back to me via Christoffer
> and Marc.
> 
> Paolo
> 

Ok, will do. So it would be kvm/kvm/x86 and 4 armv7 patches.

Mario
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..934dc24 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,9 +805,6 @@  void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f9d16ff..dca6fc7 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -40,6 +40,7 @@  config KVM
 	select HAVE_KVM_MSI
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_VFIO
+	select KVM_GENERIC_DIRTYLOG
 	---help---
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2..2536195 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,6 +12,7 @@  kvm-y			+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
+kvm-$(CONFIG_KVM_GENERIC_DIRTYLOG) 	+=$(KVM)/dirtylog.o
 
 kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o cpuid.o pmu.o
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..1467fa4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3595,92 +3595,6 @@  static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..05ebcbe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -592,6 +592,10 @@  int kvm_get_dirty_log(struct kvm *kvm,
 			struct kvm_dirty_log *log, int *is_dirty);
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				struct kvm_dirty_log *log);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+					struct kvm_memory_slot *slot,
+					gfn_t gfn_offset,
+					unsigned long mask);
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 			bool line_status);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 3796a21..368fc4a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -40,3 +40,6 @@  config KVM_VFIO
 
 config HAVE_KVM_ARCH_TLB_FLUSH_ALL
        bool
+
+config KVM_GENERIC_DIRTYLOG
+       bool
diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
new file mode 100644
index 0000000..67ffffa
--- /dev/null
+++ b/virt/kvm/dirtylog.c
@@ -0,0 +1,112 @@ 
+/*
+ * kvm generic dirty logging support, used by architectures that share
+ * comman dirty page logging implementation.
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
+ *
+ * Authors:
+ *   Avi Kivity   <avi@qumranet.com>
+ *   Yaniv Kamay  <yaniv@qumranet.com>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/processor.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	/* See the comments in kvm_mmu_slot_remove_write_access(). */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * All the TLBs can be flushed out of mmu lock, see the comments in
+	 * kvm_mmu_slot_remove_write_access().
+	 */
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}