[v5,2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

On 2017-01-27 14:14:49 Fri, Paul Mackerras wrote:
> On Wed, Jan 18, 2017 at 11:19:26AM +0530, Mahesh Jagannath Salgaonkar wrote:
> > On 01/16/2017 10:05 AM, Paul Mackerras wrote:
> > > On Fri, Jan 13, 2017 at 04:51:45PM +0530, Aravinda Prasad wrote:
> 
> [snip]
> 
> > >>  	case BOOK3S_INTERRUPT_MACHINE_CHECK:
> > >> +		/* Exit to guest with KVM_EXIT_NMI as exit reason */
> > >> +		run->exit_reason = KVM_EXIT_NMI;
> > >> +		r = RESUME_HOST;
> > >>  		/*
> > >> -		 * Deliver a machine check interrupt to the guest.
> > >> -		 * We have to do this, even if the host has handled the
> > >> -		 * machine check, because machine checks use SRR0/1 and
> > >> -		 * the interrupt might have trashed guest state in them.
> > >> +		 * Invoke host-kernel handler to perform any host-side
> > >> +		 * handling before exiting the guest.
> > >>  		 */
> > >> -		kvmppc_book3s_queue_irqprio(vcpu,
> > >> -					    BOOK3S_INTERRUPT_MACHINE_CHECK);
> > >> -		r = RESUME_GUEST;
> > >> +		kvmppc_machine_check_hook();
> > > 
> > > Note that this won't necessarily be called on the same CPU that
> > > received the machine check.  This will be called on thread 0 of the
> > > core (or subcore), whereas the machine check could have occurred on
> > > some other thread.  Are you sure that the machine check handling code
> > > will be OK with that?
> > 
> > That will have only one problem. get_mce_event() from
> > opal_machine_check() may not be able to pull mce event for error on
> > non-zero thread. We should hook the mce event into vcpu structure during
> > kvmppc_realmode_machine_check() and then pass it to
> > ppc_md.machine_check_exception() as an additional argument.
> 
> To move things along...
> 
> Mahesh, how would we get hold of the mce event from real-mode assembly
> code?  What function would we need to call to get the event?  Could
> you write some code (or at least some pseudo-code) to illustrate how
> it would be done?

I am thinking of passing all the handled/unhandled mce errors to guest
if fwnmi capability is supported by QEMU. Otherwise fall back to the old
behaviour. I have done modifications to this patch which is almost
ready. Planning to pass additional info to QEMU about whether MCE
error was recovered or not, so that QEMU can correctly set the RTAS
DISPOSITION.

Below is the modified patch. It uses one bit from kvm_run->flags to pass
disposition information (0 = unrecovred, 1 = recovred). But RTAS
disposition can have 3 values as per PAPR. The 3rd value indicates
limited recovery. For PAPR compliance, I am thiniking of using
two bits from kvm_run->flags (b00 = Fully recovered, b01 = Limited
recovery, b10= Not recovered.).

Let me know what you think about this approach.

------------------------------------------

Enhance KVM to cause a guest exit with KVM_EXIT_NMI

From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch introduces a hook which is invoked at the time
of guest exit to facilitate the host-side handling of
machine check exception before the exception is passed
on to the guest. Hence, the host-side handling which was
performed earlier via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h     |    2 +
 arch/powerpc/include/asm/machdep.h      |    7 +++++
 arch/powerpc/include/asm/opal.h         |    4 +++
 arch/powerpc/include/uapi/asm/kvm.h     |    3 ++
 arch/powerpc/kvm/book3s_hv.c            |   21 +++++++++-----
 arch/powerpc/kvm/book3s_hv_ras.c        |   14 +++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   47 ++++++++++++++++---------------
 arch/powerpc/platforms/powernv/opal.c   |   26 +++++++++++++++++
 arch/powerpc/platforms/powernv/setup.c  |    3 ++
 9 files changed, 96 insertions(+), 31 deletions(-)

[v5,2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

Commit Message

Patch