Message ID | facda6e2-3655-4f2c-9013-ebb18d0e6972@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pvclock time drifting backward | expand |
+David On Wed, Mar 19, 2025, Ming Lin wrote: > Hi, > > After performing a live migration on a QEMU guest OS that had been running for over 30 days, > we noticed that the guest OS time was more than 2 seconds behind the actual time. > > After extensive debugging, we found that this issue is related to master_kernel_ns and master_cycle_now. > > When the guest OS starts, the host initializes a pair of master_kernel_ns and master_cycle_now values. > After live migration, the host updates these values. > > Our debugging showed that if the host does not update master_kernel_ns/master_cycle_now, > the guest OS time remains correct. > > To illustrate how updating master_kernel_ns/master_cycle_now leads to the guest OS time drifting backward, > we applied the following debug patch: > > The patch adds a KVM debugfs entry to trigger time calculations and print the results. > The patch runs on the host side, but we use __pvclock_read_cycles() to simulate the guest OS updating its time. > > Example Output: > > # cat /sys/kernel/debug/kvm/946-13/pvclock > old: master_kernel_ns: 15119778316 > old: master_cycle_now: 37225912658 > old: ns: 1893199569691 > new: master_kernel_ns: 1908210098649 > new: master_cycle_now: 4391329912268 > new: ns: 1893199548401 > > tsc 4391329912368 > kvmclock_offset -15010550291 > diff: ns: 21290 > > Explanation of Parameters: > > Input: > "old: master_kernel_ns:" The master_kernel_ns value recorded when the guest OS started (remains unchanged during testing). > "old: master_cycle_now:" The master_cycle_now value recorded when the guest OS started (remains unchanged during testing). > "new: master_kernel_ns:" The latest master_kernel_ns value at the time of reading. > "new: master_cycle_now:" The latest master_cycle_now value at the time of reading. > tsc: The rdtsc() value at the time of reading. > kvmclock_offset: The offset recorded by KVM_SET_CLOCK when the guest OS started (remains unchanged during testing). > > Output: > "old: ns:" Time in nanoseconds calculated using the old master_kernel_ns/master_cycle_now. > "new: ns:" Time in nanoseconds calculated using the new master_kernel_ns/master_cycle_now. > "diff: ns:" (old ns - new ns), representing the time drift relative to the guest OS start time. > > Test Script: > #!/bin/bash > > qemu_pid=$(pidof qemu-system-x86_64) > > while [ 1 ] ; do > echo "=====================================" > echo "Guest OS running time: $(ps -p $qemu_pid -o etime= | awk '{print $1}')" > cat /sys/kernel/debug/kvm/*/pvclock > echo > sleep 10 > done > > Test Results: > Below are the first and last parts of a >2-hour test run. > As time progresses, the time drift calculated using the latest master_kernel_ns/master_cycle_now increases monotonically. > > After 2 hours and 18 minutes, the guest OS time drifted by approximately 93 milliseconds. > > I have uploaded an image for a more intuitive visualization of the time drift: > https://postimg.cc/crCDWtD7 > > Is this a real problem? David can confirm, but I'm pretty sure the drift you are observing is addressed by David's series to fix a plethora of kvmclock warts. https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org > If there is any fix patch, I’d be happy to test it. Thanks! > > > 1 ===================================== > 2 guest os running time: 00:50 > 3 old: master_kernel_ns: 15119778316 > 4 old: master_cycle_now: 37225912658 > 5 old: ns: 48092694964 > 6 new: master_kernel_ns: 63103244699 > 7 new: master_cycle_now: 147587790614 > 8 new: ns: 48092694425 > 9 > 10 tsc 147587790654 > 11 kvmclock_offset -15010550291 > 12 diff: ns: 539 > 13 > 14 ===================================== > 15 guest os running time: 01:00 > 16 old: master_kernel_ns: 15119778316 > 17 old: master_cycle_now: 37225912658 > 18 old: ns: 58139026532 > 19 new: master_kernel_ns: 73149576143 > 20 new: master_cycle_now: 170694333104 > 21 new: ns: 58139025879 > 22 > 23 tsc 170694333168 > 24 kvmclock_offset -15010550291 > 25 diff: ns: 653 > 26 > 27 ===================================== > 28 guest os running time: 01:10 > 29 old: master_kernel_ns: 15119778316 > 30 old: master_cycle_now: 37225912658 > 31 old: ns: 68183772122 > 32 new: master_kernel_ns: 83194321616 > 33 new: master_cycle_now: 193797227862 > 34 new: ns: 68183771357 > 35 > 36 tsc 193797227936 > 37 kvmclock_offset -15010550291 > 38 diff: ns: 765 > 39 > 40 ===================================== > 41 guest os running time: 01:20 > 42 old: master_kernel_ns: 15119778316 > 43 old: master_cycle_now: 37225912658 > 44 old: ns: 78225289157 > 45 new: master_kernel_ns: 93235838545 > 46 new: master_cycle_now: 216892696976 > 47 new: ns: 78225288279 > 48 > 49 tsc 216892697034 > 50 kvmclock_offset -15010550291 > 51 diff: ns: 878 > 52 > 53 ===================================== > 54 guest os running time: 01:30 > 55 old: master_kernel_ns: 15119778316 > 56 old: master_cycle_now: 37225912658 > 57 old: ns: 88268955340 > 58 new: master_kernel_ns: 103279504612 > 59 new: master_cycle_now: 239993109102 > 60 new: ns: 88268954349 > 61 > 62 tsc 239993109168 > 63 kvmclock_offset -15010550291 > 64 diff: ns: 991 > 65 > 66 ===================================== > 67 guest os running time: 01:40 > 68 old: master_kernel_ns: 15119778316 > 69 old: master_cycle_now: 37225912658 > 70 old: ns: 98313212581 > 71 new: master_kernel_ns: 113323761740 > 72 new: master_cycle_now: 263094880668 > 73 new: ns: 98313211476 > 74 > 75 tsc 263094880732 > 76 kvmclock_offset -15010550291 > 77 diff: ns: 1105 > ..... > ..... > 10160 ===================================== > 10161 guest os running time: 02:17:11 > 10162 old: master_kernel_ns: 15119778316 > 10163 old: master_cycle_now: 37225912658 > 10164 old: ns: 8229817213297 > 10165 new: master_kernel_ns: 8244827670997 > 10166 new: master_cycle_now: 18965537819524 > 10167 new: ns: 8229817120748 > 10168 > 10169 tsc 18965537819622 > 10170 kvmclock_offset -15010550291 > 10171 diff: ns: 92549 > 10172 ===================================== > 10173 guest os running time: 02:17:21 > 10174 old: master_kernel_ns: 15119778316 > 10175 old: master_cycle_now: 37225912658 > 10176 old: ns: 8239861074959 > 10177 new: master_kernel_ns: 8254871532564 > 10178 new: master_cycle_now: 18988638681302 > 10179 new: ns: 8239860982297 > 10180 > 10181 tsc 18988638681358 > 10182 kvmclock_offset -15010550291 > 10183 diff: ns: 92662 > 10184 ===================================== > 10185 guest os running time: 02:17:31 > 10186 old: master_kernel_ns: 15119778316 > 10187 old: master_cycle_now: 37225912658 > 10188 old: ns: 8249904622988 > 10189 new: master_kernel_ns: 8264915080459 > 10190 new: master_cycle_now: 19011738821632 > 10191 new: ns: 8249904530213 > 10192 > 10193 tsc 19011738821736 > 10194 kvmclock_offset -15010550291 > 10195 diff: ns: 92775^@ > 10196 ===================================== > 10197 guest os running time: 02:17:41 > 10198 old: master_kernel_ns: 15119778316 > 10199 old: master_cycle_now: 37225912658 > 10200 old: ns: 8259949369203 > 10201 new: master_kernel_ns: 8274959826576 > 10202 new: master_cycle_now: 19034841717872 > 10203 new: ns: 8259949276315 > 10204 > 10205 tsc 19034841717942 > 10206 kvmclock_offset -15010550291 > 10207 diff: ns: 92888 > 10208 ===================================== > 10209 guest os running time: 02:17:51 > 10210 old: master_kernel_ns: 15119778316 > 10211 old: master_cycle_now: 37225912658 > 10212 old: ns: 8269996849598 > 10213 new: master_kernel_ns: 8285007306846 > 10214 new: master_cycle_now: 19057950902658 > 10215 new: ns: 8269996756597 > 10216 > 10217 tsc 19057950902756 > 10218 kvmclock_offset -15010550291 > 10219 diff: ns: 93001^@ > 10220 ===================================== > 10221 guest os running time: 02:18:02 > 10222 old: master_kernel_ns: 15119778316 > 10223 old: master_cycle_now: 37225912658 > 10224 old: ns: 8280039094317 > 10225 new: master_kernel_ns: 8295049551453 > 10226 new: master_cycle_now: 19081048045430 > 10227 new: ns: 8280039001203 > 10228 > 10229 tsc 19081048045526 > 10230 kvmclock_offset -15010550291 > 10231 diff: ns: 93114^@ > > > > pvclock debugfs patch > --- > arch/x86/include/asm/kvm_host.h | 4 +++ > arch/x86/kvm/x86.c | 29 +++++++++++++++- > b.sh | 1 + > virt/kvm/kvm_main.c | 75 +++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 108 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 32ae3aa50c7e..5a82a69bfe7a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1411,6 +1411,10 @@ struct kvm_arch { > struct delayed_work kvmclock_update_work; > struct delayed_work kvmclock_sync_work; > + u64 old_master_kernel_ns; > + u64 old_master_cycle_now; > + s64 old_kvmclock_offset; > + > struct kvm_xen_hvm_config xen_hvm_config; > /* reads protected by irq_srcu, writes by irq_lock */ > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4b64ab350bcd..a56511ed8c5b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2819,7 +2819,7 @@ static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp, > * As with get_kvmclock_base_ns(), this counts from boot time, at the > * frequency of CLOCK_MONOTONIC_RAW (hence adding gtos->offs_boot). > */ > -static int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) > +int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) > { > struct pvclock_gtod_data *gtod = &pvclock_gtod_data; > unsigned long seq; > @@ -2861,6 +2861,27 @@ static int do_monotonic(s64 *t, u64 *tsc_timestamp) > return mode; > } > +u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc) > +{ > + struct pvclock_vcpu_time_info hv_clock; > + u64 ret; > + > + hv_clock.tsc_timestamp = master_cycle_now; > + hv_clock.system_time = master_kernel_ns + kvmclock_offset; > + > + /* both __this_cpu_read() and rdtsc() should be on the same cpu */ > + get_cpu(); > + > + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, > + &hv_clock.tsc_shift, > + &hv_clock.tsc_to_system_mul); > + ret = __pvclock_read_cycles(&hv_clock, tsc); > + > + put_cpu(); > + > + return ret; > +} > + > static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp) > { > struct pvclock_gtod_data *gtod = &pvclock_gtod_data; > @@ -2988,6 +3009,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) > host_tsc_clocksource = kvm_get_time_and_clockread( > &ka->master_kernel_ns, > &ka->master_cycle_now); > + ka->old_master_kernel_ns = ka->master_kernel_ns; > + ka->old_master_cycle_now = ka->master_cycle_now; > + printk("MYDEBUG: old_master_kernel_ns = %llu, old_master_cycle_now = %llu\n", > + ka->old_master_kernel_ns, ka->old_master_cycle_now); > ka->use_master_clock = host_tsc_clocksource && vcpus_matched > && !ka->backwards_tsc_observed > @@ -6989,6 +7014,8 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) > else > now_raw_ns = get_kvmclock_base_ns(); > ka->kvmclock_offset = data.clock - now_raw_ns; > + ka->old_kvmclock_offset = ka->kvmclock_offset; > + printk("MYDEBUG: old_kvmclock_offset = %lld\n", ka->old_kvmclock_offset); > kvm_end_pvclock_update(kvm); > return 0; > } > diff --git a/b.sh b/b.sh > new file mode 120000 > index 000000000000..0ff9a93fd53f > --- /dev/null > +++ b/b.sh > @@ -0,0 +1 @@ > +/home/mlin/build.upstream/b.sh > \ No newline at end of file > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index ba0327e2d0d3..d6b9a6e7275e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -399,6 +399,7 @@ int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, > return mc->nobjs >= min ? 0 : -ENOMEM; > mc->objects[mc->nobjs++] = obj; > } > + > return 0; > } > @@ -998,6 +999,78 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm) > } > } > +extern int do_kvmclock_base(s64 *t, u64 *tsc_timestamp); > +extern u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc); > + > +static ssize_t kvm_mydebug_pvclock_read(struct file *file, char __user *buf, > + size_t len, loff_t *ppos) > +{ > + struct kvm *kvm = file->private_data; > + struct kvm_arch *ka; > + char buffer[256]; > + ssize_t ret, copied; > + u64 new_master_kernel_ns; > + u64 new_master_cycle_now; > + u64 old_ns, new_ns; > + u64 tsc; > + > + if (!kvm) { > + pr_err("file->private_data is NULL\n"); > + return -EINVAL; > + } > + > + ka = &kvm->arch; > + > + do_kvmclock_base(&new_master_kernel_ns, &new_master_cycle_now); > + > + tsc = rdtsc(); > + > + old_ns = mydebug_get_kvmclock_ns(ka->old_master_kernel_ns, ka->old_master_cycle_now, ka->old_kvmclock_offset, tsc); > + new_ns = mydebug_get_kvmclock_ns(new_master_kernel_ns, new_master_cycle_now, ka->old_kvmclock_offset, tsc); > + > + ret = snprintf(buffer, sizeof(buffer), > + "old: master_kernel_ns: %llu\n" > + "old: master_cycle_now: %llu\n" > + "old: ns: %llu\n" > + "new: master_kernel_ns: %llu\n" > + "new: master_cycle_now: %llu\n" > + "new: ns: %llu\n\n" > + "tsc %llu\n" > + "kvmclock_offset %lld\n" > + "diff: ns: %lld\n", > + ka->old_master_kernel_ns, ka->old_master_cycle_now, old_ns, > + new_master_kernel_ns, new_master_cycle_now, new_ns, > + tsc, ka->old_kvmclock_offset, > + old_ns - new_ns > + ); > + > + if (ret < 0) > + return ret; > + > + if ((size_t)ret > sizeof(buffer)) > + ret = sizeof(buffer); > + > + if (*ppos >= ret) > + return 0; /* EOF */ > + > + copied = min(len, (size_t)(ret - *ppos)); > + > + if (copy_to_user(buf, buffer + *ppos, copied)) { > + pr_err("copy_to_user failed\n"); > + return -EFAULT; > + } > + > + *ppos += copied; > + > + return copied; > +} > + > +static const struct file_operations kvm_pvclock_fops = { > + .owner = THIS_MODULE, > + .read = kvm_mydebug_pvclock_read, > + .open = simple_open, > +}; > + > static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) > { > static DEFINE_MUTEX(kvm_debugfs_lock); > @@ -1063,6 +1136,8 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) > &stat_fops_per_vm); > } > + debugfs_create_file("pvclock", 0444, kvm->debugfs_dentry, kvm, &kvm_pvclock_fops); > + > kvm_arch_create_vm_debugfs(kvm); > return 0; > out_err: >
On Mon, 2025-03-24 at 15:54 -0700, Sean Christopherson wrote: > > David can confirm, but I'm pretty sure the drift you are observing is addressed > by David's series to fix a plethora of kvmclock warts. > > https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org Yes, that looks like exactly the problem my series is addressing. We shouldn't update the pvclock so often. And if/when we do, we shouldn't clamp it to some *other* clock which progresses at a different rate to the pvclock, because that causes steps in the pvclock.
On Tue, Mar 25, 2025 at 4:07 AM David Woodhouse <dwmw2@infradead.org> wrote: > > On Mon, 2025-03-24 at 15:54 -0700, Sean Christopherson wrote: > > > > David can confirm, but I'm pretty sure the drift you are observing is addressed > > by David's series to fix a plethora of kvmclock warts. > > > > https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org > > Yes, that looks like exactly the problem my series is addressing. We > shouldn't update the pvclock so often. And if/when we do, we shouldn't > clamp it to some *other* clock which progresses at a different rate to > the pvclock, because that causes steps in the pvclock. I applied the patch series on top of 6.9 cleanly and tested it with my debug tool patch. But it seems the time drift still increased monotonically. Would you help take a look if the tool patch makes sense? https://github.com/minggr/linux/commit/5284a211b6bdc9f9041b669539558a6a858e88d0 The tool patch adds a KVM debugfs entry to trigger time calculations and print the results. See my first email for more detail. Test Script: #!/bin/bash qemu_pid=$(pidof qemu-system-x86_64) while [ 1 ] ; do echo "=====================================" echo "Guest OS running time: $(ps -p $qemu_pid -o etime= | awk '{print $1}')" cat /sys/kernel/debug/kvm/*/pvclock echo sleep 10 done Thanks, Ming
On Wed, 2025-03-26 at 08:54 -0700, Ming Lin wrote: > I applied the patch series on top of 6.9 cleanly and tested it with my > debug tool patch. > But it seems the time drift still increased monotonically. > > Would you help take a look if the tool patch makes sense? > https://github.com/minggr/linux/commit/5284a211b6bdc9f9041b669539558a6a858e88d0 > > The tool patch adds a KVM debugfs entry to trigger time calculations > and print the results. > See my first email for more detail. Your first message seemed to say that the problem occurred with live migration. This message says "the time drift still increased monotonically". Trying to make sure I fully understand... the time drift between the host's CLOCK_MONOTONIC_RAW and the guest's kvmclock increases monotonically *but* the guest only observes the change when its master_kernel_ns/master_cycle_now are updated (e.g. on live migration) and its kvmclock is reset back to the host's CLOCK_MONOTONIC_RAW? Is this live migration from one VMM to another on the same host, so we don't have to worry about the accuracy of the TSC itself? The guest TSC remains consistent? And presumably your host does *have* a stable TSC, and the guest's test case really ought to be checking the PVCLOCK_TSC_STABLE_BIT to make sure of that? If all the above assumptions/interpretations of mine are true, I still think it's expected that your clock will jump on live migration *unless* you also taught your VMM to use the new KVM_[GS]ET_CLOCK_GUEST ioctls which were added in my patch series, specifically to preserve the mathematical relationship between guest TSC and kvmclock across a migration.
On Thu, Mar 27, 2025 at 1:10 AM David Woodhouse <dwmw2@infradead.org> wrote: > > On Wed, 2025-03-26 at 08:54 -0700, Ming Lin wrote: > > I applied the patch series on top of 6.9 cleanly and tested it with my > > debug tool patch. > > But it seems the time drift still increased monotonically. > > > > Would you help take a look if the tool patch makes sense? > > https://github.com/minggr/linux/commit/5284a211b6bdc9f9041b669539558a6a858e88d0 > > > > The tool patch adds a KVM debugfs entry to trigger time calculations > > and print the results. > > See my first email for more detail. > > Your first message seemed to say that the problem occurred with live > migration. This message says "the time drift still increased > monotonically". Yes, we discovered this issue in our production environment, where the time inside the guest OS slowed down by more than 2 seconds. This problem occurred both during live upgrades locally and live migrations remotely. However, the issue is only noticeable after the guest OS has been running for a long time, typically over 30 days. Since 30 days is too long to wait, I wrote a debugfs tool to quickly reproduce the original issue, but now I'm not sure if the tool is working correctly. > > Trying to make sure I fully understand... the time drift between the > host's CLOCK_MONOTONIC_RAW and the guest's kvmclock increases > monotonically *but* the guest only observes the change when its > master_kernel_ns/master_cycle_now are updated (e.g. on live migration) > and its kvmclock is reset back to the host's CLOCK_MONOTONIC_RAW? Yes, we are using the 5.4 kernel and have verified that the guest OS time remains correct after live upgrades/migrations, as long as master_kernel_ns / master_cycle_now are not updated (i.e., if the old master_kernel_ns / master_cycle_now values are retained). > > Is this live migration from one VMM to another on the same host, so we > don't have to worry about the accuracy of the TSC itself? The guest TSC > remains consistent? And presumably your host does *have* a stable TSC, > and the guest's test case really ought to be checking the > PVCLOCK_TSC_STABLE_BIT to make sure of that? The live migration is from one VMM to another on a remote host, and we have also observed the same issue during live upgrades on the same host. > > If all the above assumptions/interpretations of mine are true, I still > think it's expected that your clock will jump on live migration > *unless* you also taught your VMM to use the new KVM_[GS]ET_CLOCK_GUEST > ioctls which were added in my patch series, specifically to preserve > the mathematical relationship between guest TSC and kvmclock across a > migration. > We are planning to test the patches on a 6.9 kernel (where they can be applied cleanly) and modify the live upgrade/migration code to use the new KVM_[GS]ET_CLOCK_GUEST ioctls. BTW, what is the plan for upstreaming these patches? Thanks, Ming
On Fri, 2025-03-28 at 11:30 -0700, Ming Lin wrote: > > > > > Is this live migration from one VMM to another on the same host, so we > > don't have to worry about the accuracy of the TSC itself? The guest TSC > > remains consistent? And presumably your host does *have* a stable TSC, > > and the guest's test case really ought to be checking the > > PVCLOCK_TSC_STABLE_BIT to make sure of that? > > The live migration is from one VMM to another on a remote host, and we > have also observed the same issue during live upgrades on the same host. Moving to a remote host also requires that you get the guest TSC to be reasonably synchronised on the destination. Which is another litany of sadness, especially if you have TSC scaling in the mix. Or even if your two "identical" hosts calculated a slightly different TSC frequency when they measured it at first boot. In that latter case, the mul/scale factors advertised to the guest in the pvclock will be *slightly* different on the new host. Your test in the guest would then fail if it requires that the pvclock be *identical*. The actual criterion is that the result should be identical at the time of the live migration (when the TSC frequency effectively changes). The code currently requires that the old and new TSC frequencies are within ±1kHz of each other. > > > > If all the above assumptions/interpretations of mine are true, I still > > think it's expected that your clock will jump on live migration > > *unless* you also taught your VMM to use the new KVM_[GS]ET_CLOCK_GUEST > > ioctls which were added in my patch series, specifically to preserve > > the mathematical relationship between guest TSC and kvmclock across a > > migration. > > > > We are planning to test the patches on a 6.9 kernel (where they can be > applied cleanly) and modify the live upgrade/migration code to use the new > KVM_[GS]ET_CLOCK_GUEST ioctls. > > BTW, what is the plan for upstreaming these patches? I need to find the time to rebase, rework and retest them. You're doing some of the testing and increasing the motivation... I'll see if I can get it done in the next week or three.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 32ae3aa50c7e..5a82a69bfe7a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1411,6 +1411,10 @@ struct kvm_arch { struct delayed_work kvmclock_update_work; struct delayed_work kvmclock_sync_work; + u64 old_master_kernel_ns; + u64 old_master_cycle_now; + s64 old_kvmclock_offset; + struct kvm_xen_hvm_config xen_hvm_config; /* reads protected by irq_srcu, writes by irq_lock */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4b64ab350bcd..a56511ed8c5b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2819,7 +2819,7 @@ static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp, * As with get_kvmclock_base_ns(), this counts from boot time, at the * frequency of CLOCK_MONOTONIC_RAW (hence adding gtos->offs_boot). */ -static int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) +int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) { struct pvclock_gtod_data *gtod = &pvclock_gtod_data; unsigned long seq; @@ -2861,6 +2861,27 @@ static int do_monotonic(s64 *t, u64 *tsc_timestamp) return mode; } +u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc) +{ + struct pvclock_vcpu_time_info hv_clock; + u64 ret; + + hv_clock.tsc_timestamp = master_cycle_now; + hv_clock.system_time = master_kernel_ns + kvmclock_offset; + + /* both __this_cpu_read() and rdtsc() should be on the same cpu */ + get_cpu(); + + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, + &hv_clock.tsc_shift, + &hv_clock.tsc_to_system_mul); + ret = __pvclock_read_cycles(&hv_clock, tsc); + + put_cpu(); + + return ret; +} + static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp) { struct pvclock_gtod_data *gtod = &pvclock_gtod_data; @@ -2988,6 +3009,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) host_tsc_clocksource = kvm_get_time_and_clockread( &ka->master_kernel_ns, &ka->master_cycle_now); + ka->old_master_kernel_ns = ka->master_kernel_ns; + ka->old_master_cycle_now = ka->master_cycle_now; + printk("MYDEBUG: old_master_kernel_ns = %llu, old_master_cycle_now = %llu\n", + ka->old_master_kernel_ns, ka->old_master_cycle_now); ka->use_master_clock = host_tsc_clocksource && vcpus_matched && !ka->backwards_tsc_observed @@ -6989,6 +7014,8 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) else now_raw_ns = get_kvmclock_base_ns(); ka->kvmclock_offset = data.clock - now_raw_ns; + ka->old_kvmclock_offset = ka->kvmclock_offset; + printk("MYDEBUG: old_kvmclock_offset = %lld\n", ka->old_kvmclock_offset); kvm_end_pvclock_update(kvm); return 0; } diff --git a/b.sh b/b.sh new file mode 120000 index 000000000000..0ff9a93fd53f --- /dev/null +++ b/b.sh @@ -0,0 +1 @@ +/home/mlin/build.upstream/b.sh \ No newline at end of file diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ba0327e2d0d3..d6b9a6e7275e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -399,6 +399,7 @@ int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, return mc->nobjs >= min ? 0 : -ENOMEM; mc->objects[mc->nobjs++] = obj; } + return 0; } @@ -998,6 +999,78 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm) } } +extern int do_kvmclock_base(s64 *t, u64 *tsc_timestamp); +extern u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc); + +static ssize_t kvm_mydebug_pvclock_read(struct file *file, char __user *buf, + size_t len, loff_t *ppos) +{ + struct kvm *kvm = file->private_data; + struct kvm_arch *ka; + char buffer[256]; + ssize_t ret, copied; + u64 new_master_kernel_ns; + u64 new_master_cycle_now; + u64 old_ns, new_ns; + u64 tsc; + + if (!kvm) { + pr_err("file->private_data is NULL\n"); + return -EINVAL; + } + + ka = &kvm->arch; + + do_kvmclock_base(&new_master_kernel_ns, &new_master_cycle_now); + + tsc = rdtsc(); + + old_ns = mydebug_get_kvmclock_ns(ka->old_master_kernel_ns, ka->old_master_cycle_now, ka->old_kvmclock_offset, tsc); + new_ns = mydebug_get_kvmclock_ns(new_master_kernel_ns, new_master_cycle_now, ka->old_kvmclock_offset, tsc); + + ret = snprintf(buffer, sizeof(buffer), + "old: master_kernel_ns: %llu\n" + "old: master_cycle_now: %llu\n" + "old: ns: %llu\n" + "new: master_kernel_ns: %llu\n" + "new: master_cycle_now: %llu\n" + "new: ns: %llu\n\n" + "tsc %llu\n" + "kvmclock_offset %lld\n" + "diff: ns: %lld\n", + ka->old_master_kernel_ns, ka->old_master_cycle_now, old_ns, + new_master_kernel_ns, new_master_cycle_now, new_ns, + tsc, ka->old_kvmclock_offset, + old_ns - new_ns + ); + + if (ret < 0) + return ret; + + if ((size_t)ret > sizeof(buffer)) + ret = sizeof(buffer); + + if (*ppos >= ret) + return 0; /* EOF */ + + copied = min(len, (size_t)(ret - *ppos)); + + if (copy_to_user(buf, buffer + *ppos, copied)) { + pr_err("copy_to_user failed\n"); + return -EFAULT; + } + + *ppos += copied; + + return copied; +} + +static const struct file_operations kvm_pvclock_fops = { + .owner = THIS_MODULE, + .read = kvm_mydebug_pvclock_read, + .open = simple_open, +}; + static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) { static DEFINE_MUTEX(kvm_debugfs_lock); @@ -1063,6 +1136,8 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) &stat_fops_per_vm); } + debugfs_create_file("pvclock", 0444, kvm->debugfs_dentry, kvm, &kvm_pvclock_fops); + kvm_arch_create_vm_debugfs(kvm); return 0; out_err: