Message ID | 20230324145424.293889-2-nrb@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 285cff4c0454340a4dc53f46e67f2cb1c293bd74 |
Headers | show |
Series | KVM: s390: CMMA migration selftest and small bugfix | expand |
On 3/24/23 15:54, Nico Boehr wrote: > The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace > specifies a start_gfn outside of memslots. > > This can occur when a VM has multiple memslots with a hole in between: > > +-----+----------+--------+--------+ > | ... | Slot N-1 | <hole> | Slot N | > +-----+----------+--------+--------+ > ^ ^ ^ ^ > | | | | > GFN A A+B | | > A+B+C | > A+B+C+D > > When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the > CMMA values of the first dirty page in Slot N. However, userspace may get a > start_gfn of A+B+C+D with a count of 0, hence completely skipping over any > dirty pages in slot N. > > The error is in kvm_s390_next_dirty_cmma(), which assumes > gfn_to_memslot_approx() will return the memslot _below_ the specified GFN > when the specified GFN lies outside a memslot. In reality it may return > either the memslot below or above the specified GFN. > > When a memslot above the specified GFN is returned this happens: > > - ofs is calculated, but since the memslot's base_gfn is larger than the > specified cur_gfn, ofs will underflow to a huge number. > - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's > number of pages, the number of pages in the memslot is returned, > completely skipping over all bits in the memslot userspace would be > interested in. > > Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is > returned (cur_gfn < ms->base_gfn). > > Signed-off-by: Nico Boehr <nrb@linux.ibm.com> > Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Does this need a fix tag?
Quoting Nico Boehr (2023-03-24 15:54:23) > The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace > specifies a start_gfn outside of memslots. > > This can occur when a VM has multiple memslots with a hole in between: > > +-----+----------+--------+--------+ > | ... | Slot N-1 | <hole> | Slot N | > +-----+----------+--------+--------+ > ^ ^ ^ ^ > | | | | > GFN A A+B | | > A+B+C | > A+B+C+D > > When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the > CMMA values of the first dirty page in Slot N. However, userspace may get a > start_gfn of A+B+C+D with a count of 0, hence completely skipping over any > dirty pages in slot N. > > The error is in kvm_s390_next_dirty_cmma(), which assumes > gfn_to_memslot_approx() will return the memslot _below_ the specified GFN > when the specified GFN lies outside a memslot. In reality it may return > either the memslot below or above the specified GFN. > > When a memslot above the specified GFN is returned this happens: > > - ofs is calculated, but since the memslot's base_gfn is larger than the > specified cur_gfn, ofs will underflow to a huge number. > - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's > number of pages, the number of pages in the memslot is returned, > completely skipping over all bits in the memslot userspace would be > interested in. > > Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is > returned (cur_gfn < ms->base_gfn). > > Signed-off-by: Nico Boehr <nrb@linux.ibm.com> > Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Since Janosch asked for it, I would suggest the following Fixes tag: Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots")
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 39b36562c043..db2de76bb0ef 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2155,6 +2155,10 @@ static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots, ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]); ofs = 0; } + + if (cur_gfn < ms->base_gfn) + ofs = 0; + ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs); while (ofs >= ms->npages && (mnode = rb_next(mnode))) { ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);