diff mbox series

[v7,3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor

Message ID 20240315080710.2812974-4-maobibo@loongson.cn (mailing list archive)
State New, archived
Headers show
Series LoongArch: Add pv ipi support on LoongArch VM | expand

Commit Message

bibo mao March 15, 2024, 8:07 a.m. UTC
Instruction cpucfg can be used to get processor features. And there
is trap exception when it is executed in VM mode, and also it is
to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used.  Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to privide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for
real HW, it is only used by software.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/include/asm/inst.h      |  1 +
 arch/loongarch/include/asm/loongarch.h | 10 +++++
 arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
 3 files changed, 54 insertions(+), 16 deletions(-)

Comments

WANG Xuerui March 23, 2024, 7:02 p.m. UTC | #1
On 3/15/24 16:07, Bibo Mao wrote:
> Instruction cpucfg can be used to get processor features. And there
> is trap exception when it is executed in VM mode, and also it is
> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> is used.  Here one specified area 0x40000000 -- 0x400000ff is used
> for KVM hypervisor to privide PV features, and the area can be extended
> for other hypervisors in future. This area will never be used for
> real HW, it is only used by software.
> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>   arch/loongarch/include/asm/inst.h      |  1 +
>   arch/loongarch/include/asm/loongarch.h | 10 +++++
>   arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
>   3 files changed, 54 insertions(+), 16 deletions(-)
> 

Sorry for the late reply, but I think it may be a bit non-constructive 
to repeatedly submit the same code without due explanation in our 
previous review threads. Let me try to recollect some of the details 
though...

If I remember correctly, during the previous reviews, it was mentioned 
that the only upsides of using CPUCFG were:

- it was exactly identical to the x86 approach,
- it would not require access to the LoongArch Reference Manual Volume 3 
to use, and
- it was plain old data.

But, for the first point, we don't have to follow x86 convention after 
all. The second reason might be compelling, but on the one hand that's 
another problem orthogonal to the current one, and on the other hand 
HVCL is:

- already effectively public because of the fact that this very patchset 
is public,
- its semantics is trivial to implement even without access to the LVZ 
manual, because of its striking similarity with SYSCALL, and
- by being a function call, we reserve the possibility for hypervisors 
to invoke logic for self-identification purposes, even if this is likely 
overkill from today's perspective.

And, even if we decide that using HVCL for self-identification is 
overkill after all, we still have another choice that's IOCSR. We 
already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) 
to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that 
we put the identification word in the IOCSR space. As far as I can see, 
the IOCSR space is plenty and equally available for making reservations; 
it can only be even easier when it's done by a Loongson team.

Finally, I've mentioned multiple times, that varying CPUCFG behavior 
based on PLV is not something well documented on the manuals, hence not 
friendly to low-level developers. Devs of third-party firmware and/or 
kernels do exist, I've personally spoken to some of them on the 
2023-11-18 3A6000 release event; in order for the varying CPUCFG 
behavior approach to pass for me, at the very least, the LoongArch 
reference manual must be amended to explicitly include an explanation of 
it, and a reference to potential use cases.
bibo mao April 2, 2024, 1:43 a.m. UTC | #2
On 2024/3/24 上午3:02, WANG Xuerui wrote:
> On 3/15/24 16:07, Bibo Mao wrote:
>> Instruction cpucfg can be used to get processor features. And there
>> is trap exception when it is executed in VM mode, and also it is
>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>> is used.  Here one specified area 0x40000000 -- 0x400000ff is used
>> for KVM hypervisor to privide PV features, and the area can be extended
>> for other hypervisors in future. This area will never be used for
>> real HW, it is only used by software.
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   arch/loongarch/include/asm/inst.h      |  1 +
>>   arch/loongarch/include/asm/loongarch.h | 10 +++++
>>   arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
>>   3 files changed, 54 insertions(+), 16 deletions(-)
>>
> 
> Sorry for the late reply, but I think it may be a bit non-constructive 
> to repeatedly submit the same code without due explanation in our 
> previous review threads. Let me try to recollect some of the details 
> though...
Because your review comments about hypercall method is wrong, I need not 
adopt it.
> 
> If I remember correctly, during the previous reviews, it was mentioned 
> that the only upsides of using CPUCFG were:
> 
> - it was exactly identical to the x86 approach,
> - it would not require access to the LoongArch Reference Manual Volume 3 
> to use, and
> - it was plain old data.
> 
> But, for the first point, we don't have to follow x86 convention after 
X86 virtualization is successfully and widely applied in our life and 
products. It it normal to follow it if there is not obvious issues.

> all. The second reason might be compelling, but on the one hand that's 
> another problem orthogonal to the current one, and on the other hand 
> HVCL is:
> 
> - already effectively public because of the fact that this very patchset 
> is public,
> - its semantics is trivial to implement even without access to the LVZ 
> manual, because of its striking similarity with SYSCALL, and
> - by being a function call, we reserve the possibility for hypervisors 
> to invoke logic for self-identification purposes, even if this is likely 
> overkill from today's perspective.
> 
> And, even if we decide that using HVCL for self-identification is 
> overkill after all, we still have another choice that's IOCSR. We 
> already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) 
> to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that 
> we put the identification word in the IOCSR space. As far as I can see, 
> the IOCSR space is plenty and equally available for making reservations; 
> it can only be even easier when it's done by a Loongson team.
IOCSR method is possible also, about chip design CPUCFG is used for cpu 
features and IOCSR is for device featurs. Here CPUCFG method is 
selected, I am KVM LoongArch maintainer and I can decide to select 
methods if the method works well. Is that right?

If you are interested in KVM LoongArch, you can submit more patches and 
become maintainer or write new hypervisor support such xen/xvisor etc, 
and use your method.

Also you are interested in Linux kernel, there are some issues. Can you 
help to improve it?

1. T0-T7 are scratch registers during SYSCALL ABI, this is what you 
suggest, does there exist information leaking to user space from T0-T7 
registers?

2. LoongArch KVM depends on AS_HAS_LVZ_EXTENSION, which requires the 
latest binutils. It is also what you suggest. Some kernel developers 
does not have the latest binutils and common kvm code is modified and 
LoongArch KVM fails to compile. But they can not find it since their 
LoongArch cross-compile is old and LoongArch KVM is disabled. This issue 
can be found at https://lkml.org/lkml/2023/11/15/828.

Regards
Bibo Mao
> 
> Finally, I've mentioned multiple times, that varying CPUCFG behavior 
> based on PLV is not something well documented on the manuals, hence not 
> friendly to low-level developers. Devs of third-party firmware and/or 
> kernels do exist, I've personally spoken to some of them on the 
> 2023-11-18 3A6000 release event; in order for the varying CPUCFG 
> behavior approach to pass for me, at the very least, the LoongArch 
> reference manual must be amended to explicitly include an explanation of 
> it, and a reference to potential use cases.
>
Xi Ruoyao April 2, 2024, 2:49 a.m. UTC | #3
On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
> > Sorry for the late reply, but I think it may be a bit non-constructive 
> > to repeatedly submit the same code without due explanation in our 
> > previous review threads. Let me try to recollect some of the details
> > though...
> Because your review comments about hypercall method is wrong, I need not 
> adopt it.

Again it's unfair to say so considering the lack of LVZ documentation.

/* snip */

> 
> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you 
> suggest, does there exist information leaking to user space from T0-T7
> registers?

It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
despite T0-T7 are not saved.  So a "junk" value will be read from the
leading PT_SIZE bytes of the kernel stack for this thread.

The leading PT_SIZE bytes of the kernel stack is dedicated for storing
the struct pt_regs representing the reg file of the thread in the
userspace.

Thus we may only read out the userspace T0-T7 value stored when the same
thread was interrupted or trapped last time, or 0 (if the thread was
never interrupted or trapped before).

And it's impossible to read some data used by the kernel internally, or
some data of another thread.

But indeed there is some improvement here.  Zeroing these registers
seems cleaner than reading out the junk values, and also faster (move
$t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
to violate Huacai's "keep things simple" aspiration though.
bibo mao April 2, 2024, 3:04 a.m. UTC | #4
On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
> 
> Again it's unfair to say so considering the lack of LVZ documentation.
> 
> /* snip */
> 
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
> 
> It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved.  So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
For you it is "junk" value, some guys maybe thinks it is useful.

There is another issue, since kernel restore T0-T7 registers and user 
space save T0-T7. Why T0-T7 is scratch registers rather than preserve 
registers like other architecture? What is the advantage if it is 
scratch registers?

Regards
Bibo Mao
> 
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
> 
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
> 
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
> 
> But indeed there is some improvement here.  Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>
bibo mao April 2, 2024, 3:34 a.m. UTC | #5
On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
> 
> Again it's unfair to say so considering the lack of LVZ documentation.
> 
> /* snip */
> 
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
> 
> It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved.  So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
> 
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
Not all syscalls use leading PT_SIZE bytes of the kernel stack. It is 
complicated if syscall is combined with interrupt and singals.

> 
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
> 
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
Are you sure that it's impossible to read some data used by the kernel 
internally?

Regards
Bibo Mao
> 
> But indeed there is some improvement here.  Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>
Xi Ruoyao April 2, 2024, 5:34 a.m. UTC | #6
On Tue, 2024-04-02 at 11:34 +0800, maobibo wrote:


> Are you sure that it's impossible to read some data used by the kernel
> internally?

Yes.

> There is another issue, since kernel restore T0-T7 registers and user
> space save T0-T7. Why T0-T7 is scratch registers rather than preserve
> registers like other architecture? What is the advantage if it is
> scratch registers?

I'd say "MIPS legacy."  Note that MIPS also does not preserve temp
registers, and MIPS does not have the "info leak" issue as well (or it
should have been assigned a CVE, in all these years).

I do agree maybe it's the time to move away from MIPS legacy and be more
similar to RISC-V etc now...

In Glibc we can condition __SYSCALL_CLOBBERS with #if
__LINUX_KERNEL_VERSION > xxxxxxx to take the advantage.

Huacai, Xuerui, how do you think?
diff mbox series

Patch

diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@  enum reg2_op {
 	revhd_op	= 0x11,
 	extwh_op	= 0x16,
 	extwb_op	= 0x17,
+	cpucfg_op	= 0x1b,
 	iocsrrdb_op     = 0x19200,
 	iocsrrdh_op     = 0x19201,
 	iocsrrdw_op     = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a1d22e8b6f94 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,16 @@ 
 #define  CPUCFG48_VFPU_CG		BIT(2)
 #define  CPUCFG48_RAM_CG		BIT(3)
 
+/*
+ * cpucfg index area: 0x40000000 -- 0x400000ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE			0x40000000UL
+#define CPUCFG_KVM_SIZE			0x100
+#define CPUCFG_KVM_SIG			CPUCFG_KVM_BASE
+#define  KVM_SIGNATURE			"KVM\0"
+#define CPUCFG_KVM_FEATURE		(CPUCFG_KVM_BASE + 4)
+
 #ifndef __ASSEMBLY__
 
 /* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 923bbca9bd22..a8d3b652d3ea 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,50 @@  int kvm_emu_idle(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
 {
 	int rd, rj;
 	unsigned int index;
+	unsigned long plv;
+
+	rd = inst.reg2_format.rd;
+	rj = inst.reg2_format.rj;
+	++vcpu->stat.cpucfg_exits;
+	index = vcpu->arch.gprs[rj];
+
+	/*
+	 * By LoongArch Reference Manual 2.2.10.5
+	 * Return value is 0 for undefined cpucfg index
+	 *
+	 * Disable preemption since hw gcsr is accessed
+	 */
+	preempt_disable();
+	plv = kvm_read_hw_gcsr(LOONGARCH_CSR_CRMD) >> CSR_CRMD_PLV_SHIFT;
+	switch (index) {
+	case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+		vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+		break;
+	case CPUCFG_KVM_SIG:
+		/*
+		 * Cpucfg emulation between 0x40000000 -- 0x400000ff
+		 * Return value with 0 if executed in user mode
+		 */
+		if ((plv & CSR_CRMD_PLV) == PLV_KERN)
+			vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+		else
+			vcpu->arch.gprs[rd] = 0;
+		break;
+	default:
+		vcpu->arch.gprs[rd] = 0;
+		break;
+	}
+
+	preempt_enable();
+	return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
 	unsigned long curr_pc;
 	larch_inst inst;
 	enum emulation_result er = EMULATE_DONE;
@@ -224,21 +264,8 @@  static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
 	er = EMULATE_FAIL;
 	switch (((inst.word >> 24) & 0xff)) {
 	case 0x0: /* CPUCFG GSPR */
-		if (inst.reg2_format.opcode == 0x1B) {
-			rd = inst.reg2_format.rd;
-			rj = inst.reg2_format.rj;
-			++vcpu->stat.cpucfg_exits;
-			index = vcpu->arch.gprs[rj];
-			er = EMULATE_DONE;
-			/*
-			 * By LoongArch Reference Manual 2.2.10.5
-			 * return value is 0 for undefined cpucfg index
-			 */
-			if (index < KVM_MAX_CPUCFG_REGS)
-				vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
-			else
-				vcpu->arch.gprs[rd] = 0;
-		}
+		if (inst.reg2_format.opcode == cpucfg_op)
+			er = kvm_emu_cpucfg(vcpu, inst);
 		break;
 	case 0x4: /* CSR{RD,WR,XCHG} GSPR */
 		er = kvm_handle_csr(vcpu, inst);