Message ID | 3e1150560a41bd567049627d684cd4e754530869.1710342968.git-series.marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | MSI-X support with qemu in stubdomain, and other related changes | expand |
On 13.03.2024 16:16, Marek Marczykowski-Górecki wrote: > QEMU needs to know whether clearing maskbit of a vector is really > clearing, or was already cleared before. Currently Xen sends only > clearing that bit to the device model, but not setting it, so QEMU > cannot detect it. Because of that, QEMU is working this around by > checking via /dev/mem, but that isn't the proper approach. > > Give all necessary information to QEMU by passing all ctrl writes, > including masking a vector. Advertise the new behavior via > XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem > anymore. > > While this commit doesn't move the whole maskbit handling to QEMU (as > discussed on xen-devel as one of the possibilities), it is a necessary > first step anyway. Including telling QEMU it will get all the required > information to do so. The actual implementation would need to include: > - a hypercall for QEMU to control just maskbit (without (re)binding the > interrupt again > - a methor for QEMU to tell Xen it will actually do the work > Those are not part of this series. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > --- > I did not added any control to enable/disable this new behavior (as > Roger have suggested for possible non-QEMU ioreqs). I don't see how the > new behavior could be problematic for some existing ioreq server (they > already received writes to those addresses, just not all of them), > but if that's really necessary, I can probably add a command line option > to restore previous behavior system-wide. Roger, please indicate whether you consider things to be okay to go in as is, or whether you demand this earlier concern of yours to be addressed by adding a command line option (or even finer-grained control). Jan
On Wed, Mar 13, 2024 at 04:16:06PM +0100, Marek Marczykowski-Górecki wrote: > QEMU needs to know whether clearing maskbit of a vector is really > clearing, or was already cleared before. Currently Xen sends only > clearing that bit to the device model, but not setting it, so QEMU > cannot detect it. Because of that, QEMU is working this around by > checking via /dev/mem, but that isn't the proper approach. > > Give all necessary information to QEMU by passing all ctrl writes, > including masking a vector. Advertise the new behavior via > XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem > anymore. > > While this commit doesn't move the whole maskbit handling to QEMU (as > discussed on xen-devel as one of the possibilities), it is a necessary > first step anyway. Including telling QEMU it will get all the required > information to do so. The actual implementation would need to include: > - a hypercall for QEMU to control just maskbit (without (re)binding the > interrupt again > - a methor for QEMU to tell Xen it will actually do the work ^ method > Those are not part of this series. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > --- > I did not added any control to enable/disable this new behavior (as > Roger have suggested for possible non-QEMU ioreqs). I don't see how the > new behavior could be problematic for some existing ioreq server (they > already received writes to those addresses, just not all of them), > but if that's really necessary, I can probably add a command line option > to restore previous behavior system-wide. That's fine I guess, as you say such ioreq servers should already know how to handle the ranges, and if anything the current behavior of device models not receiving all accesses is likely the bogus (or unexpected at least). Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index adbac965f9f7..999917983789 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -283,8 +283,8 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, unsigned long flags; struct irq_desc *desc; - if ( (len != 4 && len != 8) || (address & (len - 1)) ) - return r; + if ( !IS_ALIGNED(address, len) ) + return X86EMUL_OKAY; rcu_read_lock(&msixtbl_rcu_lock); @@ -345,8 +345,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address, unlock: spin_unlock_irqrestore(&desc->lock, flags); - if ( len == 4 ) - r = X86EMUL_OKAY; + r = X86EMUL_OKAY; out: rcu_read_unlock(&msixtbl_rcu_lock); @@ -357,7 +356,17 @@ static int cf_check _msixtbl_write( const struct hvm_io_handler *handler, uint64_t address, uint32_t len, uint64_t val) { - return msixtbl_write(current, address, len, val); + /* Ignore invalid length or unaligned writes. */ + if ( (len != 4 && len != 8) || !IS_ALIGNED(address, len) ) + return X86EMUL_OKAY; + + /* + * This function returns X86EMUL_UNHANDLEABLE even if write is properly + * handled, to propagate it to the device model (so it can keep its + * internal state in sync). + */ + msixtbl_write(current, address, len, val); + return X86EMUL_UNHANDLEABLE; } static bool cf_check msixtbl_range( diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 08dbaa2a054c..b44b2439ca8e 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -637,6 +637,7 @@ long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | (1U << XENFEAT_hvm_callback_vector) | (has_pirq(d) ? (1U << XENFEAT_hvm_pirqs) : 0); + fi.submap |= (1U << XENFEAT_dm_msix_all_writes); #endif if ( !paging_mode_translate(d) || is_domain_direct_mapped(d) ) fi.submap |= (1U << XENFEAT_direct_mapped); diff --git a/xen/include/public/features.h b/xen/include/public/features.h index 4437f25d2532..880193094713 100644 --- a/xen/include/public/features.h +++ b/xen/include/public/features.h @@ -120,6 +120,14 @@ #define XENFEAT_runstate_phys_area 18 #define XENFEAT_vcpu_time_phys_area 19 +/* + * If set, Xen will passthrough all MSI-X vector ctrl writes to device model, + * not only those unmasking an entry. This allows device model to properly keep + * track of the MSI-X table without having to read it from the device behind + * Xen's backs. This information is relevant only for device models. + */ +#define XENFEAT_dm_msix_all_writes 20 + #define XENFEAT_NR_SUBMAPS 1 #endif /* __XEN_PUBLIC_FEATURES_H__ */
QEMU needs to know whether clearing maskbit of a vector is really clearing, or was already cleared before. Currently Xen sends only clearing that bit to the device model, but not setting it, so QEMU cannot detect it. Because of that, QEMU is working this around by checking via /dev/mem, but that isn't the proper approach. Give all necessary information to QEMU by passing all ctrl writes, including masking a vector. Advertise the new behavior via XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem anymore. While this commit doesn't move the whole maskbit handling to QEMU (as discussed on xen-devel as one of the possibilities), it is a necessary first step anyway. Including telling QEMU it will get all the required information to do so. The actual implementation would need to include: - a hypercall for QEMU to control just maskbit (without (re)binding the interrupt again - a methor for QEMU to tell Xen it will actually do the work Those are not part of this series. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- I did not added any control to enable/disable this new behavior (as Roger have suggested for possible non-QEMU ioreqs). I don't see how the new behavior could be problematic for some existing ioreq server (they already received writes to those addresses, just not all of them), but if that's really necessary, I can probably add a command line option to restore previous behavior system-wide. Changes in v5: - announce the feature only on x86 - style fixes Changes in v4: - ignore unaligned writes with X86EMUL_OKAY - restructure the code to forward all writes in _msixtbl_write() instead of manipulating return value of msixtbl_write() - this makes WRITE_LEN4_COMPLETION special case unnecessary - advertise the changed behavior via XENVER_get_features instead of DMOP v3: - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make "flags" parameter IN/OUT - move len check back to msixtbl_write() - will be needed there anyway in a later patch v2: - passthrough quad writes to emulator too (Jan) - (ab)use len==0 for write len=4 completion (Jan), but add descriptive #define for this magic value --- xen/arch/x86/hvm/vmsi.c | 19 ++++++++++++++----- xen/common/kernel.c | 1 + xen/include/public/features.h | 8 ++++++++ 3 files changed, 23 insertions(+), 5 deletions(-)