[V5] drm/i915: Disable stolen memory when i915 runs on qemu

Message ID	1491999600-4406-1-git-send-email-xiong.y.zhang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Xiong Zhang <xiong.y.zhang@intel.com> To: joonas.lahtinen@linux.intel.com, daniel@ffwll.ch, zhenyuw@linux.intel.com, jani.nikula@linux.intel.com Date: Wed, 12 Apr 2017 20:20:00 +0800 Message-Id: <1491999600-4406-1-git-send-email-xiong.y.zhang@intel.com> In-Reply-To: <1491358106-26329-1-git-send-email-xiong.y.zhang@intel.com> References: <1491358106-26329-1-git-send-email-xiong.y.zhang@intel.com> Cc: intel-gfx@lists.freedesktop.org, intel-gvt-dev@lists.freedesktop.org, stable@vger.kernel.org Subject: [Intel-gfx] [PATCH V5] drm/i915: Disable stolen memory when i915 runs on qemu Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Zhang, Xiong Y April 12, 2017, 12:20 p.m. UTC

Stolen memory isn't a standard pci resource and exists in RMRR which has
identity mapping in iommu table, IGD could access stolen memory in host OS.
While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices using
RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then both EPT
and guest iommu domain table lack of maaping for stolen memory in kvm IGD
passthrough environment. If IGD access stolen memory in such environment,
many iommu exceptions exist in host dmesg and gpu hang exists also.
DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
[fault reason 05] PTE Write access is not set
DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
[fault reason 06] PTE Read access is not set

So stolen memory should be disabled in KVM IGD passthrough environment,
this patch detects such environment through the existence of qemu emulated 
isa bridge.

When the real ISA bridge is also passed through to guest, guest will have
two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.
funcnum of emulated isa bridge is always less than the real one. Then
emulated isa bridge is always detected first by pci_get_class(ISA). So
stolen memory will be disabled in this case also.

Stolen memory exists in kernel for a long time, but this patch depends
on INTEL_PCH_QEMU_DEVICE_ID_TYPE which was introduced in v4.5 kernel,
so this patch should be backported into v4.5 kernel and above.

v2:GVT-g may run in non qemu (Zhenyu)
v3:Make commit message clear (Daniel)
v4:Fix typo
v5:Exclude P2X as it is used for VMware (Joonas)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.c        | 5 +++++
 drivers/gpu/drm/i915/i915_drv.h        | 1 +
 drivers/gpu/drm/i915/i915_gem_stolen.c | 4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

Joonas Lahtinen April 12, 2017, 1:21 p.m. UTC | #1

+ Kevin and David

On ke, 2017-04-12 at 20:20 +0800, Xiong Zhang wrote:
> Stolen memory isn't a standard pci resource and exists in RMRR which has
> identity mapping in iommu table, IGD could access stolen memory in host OS.
> While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices using
> RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then both EPT
> and guest iommu domain table lack of maaping for stolen memory in kvm IGD
> passthrough environment. If IGD access stolen memory in such environment,
> many iommu exceptions exist in host dmesg and gpu hang exists also.
> DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
> [fault reason 05] PTE Write access is not set
> DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
> [fault reason 06] PTE Read access is not set
> 
> So stolen memory should be disabled in KVM IGD passthrough environment,
> this patch detects such environment through the existence of qemu emulated 
> isa bridge.
> 
> When the real ISA bridge is also passed through to guest, guest will have
> two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.
> funcnum of emulated isa bridge is always less than the real one. Then
> emulated isa bridge is always detected first by pci_get_class(ISA). So
> stolen memory will be disabled in this case also.
> 
> Stolen memory exists in kernel for a long time, but this patch depends
> on INTEL_PCH_QEMU_DEVICE_ID_TYPE which was introduced in v4.5 kernel,
> so this patch should be backported into v4.5 kernel and above.
> 
> v2:GVT-g may run in non qemu (Zhenyu)
> v3:Make commit message clear (Daniel)
> v4:Fix typo
> v5:Exclude P2X as it is used for VMware (Joonas)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org

The commit message still fails to address the fact that the Bugzilla
entry has a completely bogus bisect, the fact that there is a later
commit that allows RMRRs on graphics devices;

commit 18436afdc11a00ac881990b454cfb2eae81d6003
Author: David Woodhouse <David.Woodhouse@intel.com>
Date:   Wed Mar 25 15:05:47 2015 +0000

    iommu/vt-d: Allow RMRR on graphics devices too

And the fact that GuC status is still not answered even I explicitly
asked for it.

By my limited understanding of VT-d details: The stolen memory is never
directly accessed by i915 driver (because CPU access doesn't work even
in DOM0). It is only used through the aperture, which just requires for
the GT device to have access to the RMRR. Further, the GT device needs
to have access to stolen memory, because that's what GuC uses for
backing storage for for WOPCM.

And even if after all of the above is addressed, shouldn't we rather
try to detect the lack of RMRR, than presence of QEMU ISA?

What comes to my mind is exporting function like device_has_rmrr() from
intel-iommu.com and consuming that, if we end up doing this. That way,
if somebody, some day, goes and write RMRR pass-through code currently
missing, it'll start working, just like it should.

Regards, Joonas

Alex Williamson April 12, 2017, 6:01 p.m. UTC | #2

On Wed, 12 Apr 2017 20:20:00 +0800
Xiong Zhang <xiong.y.zhang@intel.com> wrote:

> Stolen memory isn't a standard pci resource and exists in RMRR which has
> identity mapping in iommu table, IGD could access stolen memory in host OS.
> While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices using
> RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then both EPT
> and guest iommu domain table lack of maaping for stolen memory in kvm IGD
> passthrough environment. If IGD access stolen memory in such environment,
> many iommu exceptions exist in host dmesg and gpu hang exists also.
> DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
> [fault reason 05] PTE Write access is not set
> DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
> [fault reason 06] PTE Read access is not set
> 
> So stolen memory should be disabled in KVM IGD passthrough environment,
> this patch detects such environment through the existence of qemu emulated 
> isa bridge.
> 
> When the real ISA bridge is also passed through to guest, guest will have
> two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.
> funcnum of emulated isa bridge is always less than the real one. Then
> emulated isa bridge is always detected first by pci_get_class(ISA). So
> stolen memory will be disabled in this case also.

Where does QEMU make this guarantee or any sort of guarantee wrt the
ISA bridge?  Thanks,

Alex
 
> Stolen memory exists in kernel for a long time, but this patch depends
> on INTEL_PCH_QEMU_DEVICE_ID_TYPE which was introduced in v4.5 kernel,
> so this patch should be backported into v4.5 kernel and above.
> 
> v2:GVT-g may run in non qemu (Zhenyu)
> v3:Make commit message clear (Daniel)
> v4:Fix typo
> v5:Exclude P2X as it is used for VMware (Joonas)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_drv.c        | 5 +++++
>  drivers/gpu/drm/i915/i915_drv.h        | 1 +
>  drivers/gpu/drm/i915/i915_gem_stolen.c | 4 ++--
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6d9944a..0d3c395 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -223,6 +223,11 @@ static void intel_detect_pch(struct drm_i915_private *dev_priv)
>  					    PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
>  				    pch->subsystem_device ==
>  					    PCI_SUBDEVICE_ID_QEMU)) {
> +				/*
> +				 * P2X is used for VMware, exclude it
> +				 */
> +				if (id != INTEL_PCH_P2X_DEVICE_ID_TYPE)
> +					dev_priv->run_on_qemu = true;
>  				dev_priv->pch_type =
>  					intel_virt_detect_pch(dev_priv);
>  			} else
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2911c49..c87150e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2152,6 +2152,7 @@ struct drm_i915_private {
>  	struct intel_uncore uncore;
>  
>  	struct i915_virtual_gpu vgpu;
> +	bool run_on_qemu;
>  
>  	struct intel_gvt *gvt;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index f3abdc2..6a011b0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -409,8 +409,8 @@ int i915_gem_init_stolen(struct drm_i915_private *dev_priv)
>  
>  	mutex_init(&dev_priv->mm.stolen_lock);
>  
> -	if (intel_vgpu_active(dev_priv)) {
> -		DRM_INFO("iGVT-g active, disabling use of stolen memory\n");
> +	if (dev_priv->run_on_qemu || intel_vgpu_active(dev_priv)) {
> +		DRM_INFO("Running in guest, disabling use of stolen memory\n");
>  		return 0;
>  	}
>

Zhang, Xiong Y April 13, 2017, 4:15 a.m. UTC | #3

> + Kevin and David

> 

> On ke, 2017-04-12 at 20:20 +0800, Xiong Zhang wrote:

> > Stolen memory isn't a standard pci resource and exists in RMRR which has

> > identity mapping in iommu table, IGD could access stolen memory in host

> OS.

> > While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices

> using

> > RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then

> both EPT

> > and guest iommu domain table lack of maaping for stolen memory in kvm

> IGD

> > passthrough environment. If IGD access stolen memory in such environment,

> > many iommu exceptions exist in host dmesg and gpu hang exists also.

> > DMAR: [DMA Read] Request device [00:02.0] fault addr da012000

> > [fault reason 05] PTE Write access is not set

> > DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000

> > [fault reason 06] PTE Read access is not set

> >

> > So stolen memory should be disabled in KVM IGD passthrough environment,

> > this patch detects such environment through the existence of qemu

> emulated

> > isa bridge.

> >

> > When the real ISA bridge is also passed through to guest, guest will have

> > two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.

> > funcnum of emulated isa bridge is always less than the real one. Then

> > emulated isa bridge is always detected first by pci_get_class(ISA). So

> > stolen memory will be disabled in this case also.

> >

> > Stolen memory exists in kernel for a long time, but this patch depends

> > on INTEL_PCH_QEMU_DEVICE_ID_TYPE which was introduced in v4.5 kernel,

> > so this patch should be backported into v4.5 kernel and above.

> >

> > v2:GVT-g may run in non qemu (Zhenyu)

> > v3:Make commit message clear (Daniel)

> > v4:Fix typo

> > v5:Exclude P2X as it is used for VMware (Joonas)

> >

> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028

> >

> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>

> > Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>

> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> > Cc: stable@vger.kernel.org

> 

> The commit message still fails to address the fact that the Bugzilla

> entry has a completely bogus bisect, the fact that there is a later

> commit that allows RMRRs on graphics devices;

[Zhang, Xiong Y] Indeed when I boot kernel 3.18, gpu hang don't
happen during boot process, but IOMMU DMA R/W to stolen memory
exception still exist in host dmesg.
When I boot kernel 3.19 and above, I see DMA R/W exception
in host dmesg and gpu hang. I'm lack of the knowledge to analyze the
gpu hang error. And I have updated the error into bugzilla, could you
help check whether this hang is caused by GT accessing to stolen memory
or not?

https://bugs.freedesktop.org/show_bug.cgi?id=99028
and
https://bugs.freedesktop.org/show_bug.cgi?id=99025
are the same issue which could be fixed by disable stolen memory.
But bisect result are different, so I think the first bad commit of 
git bisect isn't accurate.

> 

> commit 18436afdc11a00ac881990b454cfb2eae81d6003

> Author: David Woodhouse <David.Woodhouse@intel.com>

> Date:   Wed Mar 25 15:05:47 2015 +0000

> 

>     iommu/vt-d: Allow RMRR on graphics devices too

> 

[Zhang, Xiong Y] 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices
Using RMRRs from IOMMU API domains")', this commit prevent devices
associated with RMRR from passing through to guest.
'commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too")',
this commit add an exception for graphics device to above commit. So that
IGD could be assigned (pass through) to guest.

Hi, David:
   The following message exists in your 18436afdc11a commit message: 
    "Add an exclusion for graphics devices too, so that 'iommu=pt' works
    there. We should be able to successfully assign graphics devices to
    guests too, as long as the initial handling of stolen memory is
    reconfigured appropriately. This has certainly worked in the past."
What's the mean of "initial handling of stolen memory is reconfigured 
appropriately" ? we meet guest IGD accessing stolen memory issue.
 
> And the fact that GuC status is still not answered even I explicitly

> asked for it.

[Zhang, Xiong Y] GuC accessing to stolen memory bypass VT-d, Kevin has
confirmed this with VPG.

I add i915.enable_guc_loading=1 and i915.enable_guc_submission=1 option, and the dmesg demonstrate guc works when stolen memory is disabled.
[    5.265653] [drm:intel_uc_prepare_fw [i915]] fetch uC fw from i915/skl_guc_ver6_1.bin succeeded, fw ffff9167bc93c1e0
[    5.265668] [drm:intel_uc_prepare_fw [i915]] firmware version 6.1 OK (minimum 6.1)
[    5.265709] [drm:intel_uc_prepare_fw [i915]] uC fw fetch status SUCCESS, obj ffff91675ce88600
[    5.267493] [drm:intel_guc_init_hw [i915]] GuC fw status: path i915/skl_guc_ver6_1.bin, fetch SUCCESS, load NONE
[    5.267508] [drm:intel_guc_init_hw [i915]] GuC fw status: fetch SUCCESS, load PENDING
[    5.271571] [drm:guc_ucode_xfer_dma [i915]] DMA status 0x10, GuC status 0x8002f0ec
[    5.271586] [drm:guc_ucode_xfer_dma [i915]] returning 0
[    5.271587] [drm] GuC submission enabled (firmware i915/skl_guc_ver6_1.bin [version 6.1])
[    5.272232] [drm:i915_guc_submission_enable [i915]] reserved cacheline 0x0, next 0x40, linesize 64
[    5.272248] [drm:i915_guc_submission_enable [i915]] Host engines 0x17 => GuC engines used 0xf
[    5.272263] [drm:__reserve_doorbell [i915]] client 0 (high prio=no) reserved doorbell: 0
[    5.274352] [drm:i915_guc_submission_enable [i915]] new priority 2 client ffff91675cc56d80 for engine(s) 0x17: stage_id 0
[    5.274389] [drm:i915_guc_submission_enable [i915]] doorbell id 0, cacheline offset 0x0> 

> By my limited understanding of VT-d details: The stolen memory is never

> directly accessed by i915 driver (because CPU access doesn't work even

> in DOM0). It is only used through the aperture, which just requires for

> the GT device to have access to the RMRR. Further, the GT device needs

> to have access to stolen memory, because that's what GuC uses for

> backing storage for for WOPCM.

> 

> And even if after all of the above is addressed, shouldn't we rather

> try to detect the lack of RMRR, than presence of QEMU ISA?

[Zhang, Xiong Y] Good idea. Devices know I need RMRR, but on a native
machine, RMRR need bios support which allocate and reserve memory range
for RMRR. So RMRR need hypervisor's help in emulated environment. Only
hypervisor knows whether it support RMRR or not. In order to detect the lack
of RMRR in guest, i915 driver need to detect hypervisor. So in my last mail,
I try to use cupid(40000001) to detect hypervisor. Zhenyu think it is unacceptable
to use cupid(40000001) in a UPT(universal pass through) driver. 
> 

> What comes to my mind is exporting function like device_has_rmrr() from

> intel-iommu.com and consuming that, if we end up doing this. That way,

> if somebody, some day, goes and write RMRR pass-through code currently

> missing, it'll start working, just like it should.

[Zhang, Xiong Y] I also want to implement RMRR pass-through code at first.
But this solution is denied in my team's meeting. As kvm/qemu community
discussed this before and came to a solution:
https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf
> 

> Regards, Joonas

> --

> Joonas Lahtinen

> Open Source Technology Center

> Intel Corporation

Zhang, Xiong Y April 13, 2017, 5:44 a.m. UTC | #4

> On Wed, 12 Apr 2017 20:20:00 +0800
> Xiong Zhang <xiong.y.zhang@intel.com> wrote:
> 
> > Stolen memory isn't a standard pci resource and exists in RMRR which has
> > identity mapping in iommu table, IGD could access stolen memory in host
> OS.
> > While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices
> using
> > RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then
> both EPT
> > and guest iommu domain table lack of maaping for stolen memory in kvm
> IGD
> > passthrough environment. If IGD access stolen memory in such environment,
> > many iommu exceptions exist in host dmesg and gpu hang exists also.
> > DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
> > [fault reason 05] PTE Write access is not set
> > DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
> > [fault reason 06] PTE Read access is not set
> >
> > So stolen memory should be disabled in KVM IGD passthrough environment,
> > this patch detects such environment through the existence of qemu
> emulated
> > isa bridge.
> >
> > When the real ISA bridge is also passed through to guest, guest will have
> > two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.
> > funcnum of emulated isa bridge is always less than the real one. Then
> > emulated isa bridge is always detected first by pci_get_class(ISA). So
> > stolen memory will be disabled in this case also.
> 
> Where does QEMU make this guarantee or any sort of guarantee wrt the
> ISA bridge?  Thanks,
> 
> Alex
> 
[Zhang, Xiong Y] In my guest environment I always see emulated devices 
are at head of pci device list, the passed through devices are at tail. Even if
I want to assign the passed IGD to 00:02.0, the qemu tell me 00:02.0 has already
occupied by emulated graphic card.
If I pass through real ISA bridge to guest, the emulated ISA bridge is at 00:01.0,
While real ISA bridge is at 00:04.0.
Then I checked the code: emulated devices are created in pc_init1() function, it
creates host_bridge firstly, create isa_bridge secondly, create all other devices following.
So I think Qemu could guarantee. Now I'm suspect it, and need your coach. 

thanks

Tian, Kevin April 13, 2017, 7:23 a.m. UTC | #5

> From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com]

> Sent: Wednesday, April 12, 2017 9:22 PM

> 

[...]
> By my limited understanding of VT-d details: The stolen memory is never

> directly accessed by i915 driver (because CPU access doesn't work even

> in DOM0). It is only used through the aperture, which just requires for

> the GT device to have access to the RMRR. Further, the GT device needs

> to have access to stolen memory, because that's what GuC uses for

> backing storage for for WOPCM.

> 

> And even if after all of the above is addressed, shouldn't we rather

> try to detect the lack of RMRR, than presence of QEMU ISA?

> 

> What comes to my mind is exporting function like device_has_rmrr() from

> intel-iommu.com and consuming that, if we end up doing this. That way,

> if somebody, some day, goes and write RMRR pass-through code currently

> missing, it'll start working, just like it should.

> 


I like what you proposed in the long run, e.g. in a nested virtualization
environment L0-VMM assigns the device to L1-VMM which further
wants to assign device to L2-VM. In such case RMRR information 
must be propagated through the path to L1-VMM.

However I can see one limitation here on your proposal. There is no 
RMRR if VT-d is disabled in BIOS. Then you cannot use stolen memory 
even on bare metal in such configuration, which is possibly not desired.

Also the long term direction is to move away from RMRR for Intel
integrated devices. People realized its limitation (especially the
objection from KVM community. I don't think RMRR passthrough
would be an option there). So I'd be with Xiong's simple workaround
here. :-)

Thanks
Kevin

Alex Williamson April 13, 2017, 2:53 p.m. UTC | #6

On Thu, 13 Apr 2017 05:44:18 +0000
"Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:

> > On Wed, 12 Apr 2017 20:20:00 +0800
> > Xiong Zhang <xiong.y.zhang@intel.com> wrote:
> >   
> > > Stolen memory isn't a standard pci resource and exists in RMRR which has
> > > identity mapping in iommu table, IGD could access stolen memory in host  
> > OS.  
> > > While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude devices  
> > using  
> > > RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then  
> > both EPT  
> > > and guest iommu domain table lack of maaping for stolen memory in kvm  
> > IGD  
> > > passthrough environment. If IGD access stolen memory in such environment,
> > > many iommu exceptions exist in host dmesg and gpu hang exists also.
> > > DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
> > > [fault reason 05] PTE Write access is not set
> > > DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
> > > [fault reason 06] PTE Read access is not set
> > >
> > > So stolen memory should be disabled in KVM IGD passthrough environment,
> > > this patch detects such environment through the existence of qemu  
> > emulated  
> > > isa bridge.
> > >
> > > When the real ISA bridge is also passed through to guest, guest will have
> > > two isa bridges: emulated and real. Qemu guarantees the busnum:devnum.
> > > funcnum of emulated isa bridge is always less than the real one. Then
> > > emulated isa bridge is always detected first by pci_get_class(ISA). So
> > > stolen memory will be disabled in this case also.  
> > 
> > Where does QEMU make this guarantee or any sort of guarantee wrt the
> > ISA bridge?  Thanks,
> > 
> > Alex
> >   
> [Zhang, Xiong Y] In my guest environment I always see emulated devices 
> are at head of pci device list, the passed through devices are at tail. Even if
> I want to assign the passed IGD to 00:02.0, the qemu tell me 00:02.0 has already
> occupied by emulated graphic card.
> If I pass through real ISA bridge to guest, the emulated ISA bridge is at 00:01.0,
> While real ISA bridge is at 00:04.0.
> Then I checked the code: emulated devices are created in pc_init1() function, it
> creates host_bridge firstly, create isa_bridge secondly, create all other devices following.
> So I think Qemu could guarantee. Now I'm suspect it, and need your coach. 

So you're calling the current default behavior a guarantee.  That's not
valid, it ignores that we might have future chipsets that do things
differently and it ignores that the user can override some of those
defaults and specify addresses for devices that may not match your
expectations.  There is no agreement with the QEMU community to make
this a stable feature of the VM.  What about using smbios information
or detecting kvm (or any hypervisor) via the hypervisor MSRs?  You could
maybe use the VM PCI host bridge and figure out that this version of
IGD never shipped on 440fx or q35, but then you'll have the maintenance
headache of updating the code for any new chipset QEMU decides to
implement. Thanks,

Alex

Zhang, Xiong Y April 14, 2017, 6:39 a.m. UTC | #7

> On Thu, 13 Apr 2017 05:44:18 +0000
> "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
> 
> > > On Wed, 12 Apr 2017 20:20:00 +0800
> > > Xiong Zhang <xiong.y.zhang@intel.com> wrote:
> > >
> > > > Stolen memory isn't a standard pci resource and exists in RMRR which
> has
> > > > identity mapping in iommu table, IGD could access stolen memory in
> host
> > > OS.
> > > > While according to 'commit c875d2c1b808 ("iommu/vt-d: Exclude
> devices
> > > using
> > > > RMRRs from IOMMU API domains")',RMRR isn't supported by kvm, then
> > > both EPT
> > > > and guest iommu domain table lack of maaping for stolen memory in
> kvm
> > > IGD
> > > > passthrough environment. If IGD access stolen memory in such
> environment,
> > > > many iommu exceptions exist in host dmesg and gpu hang exists also.
> > > > DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
> > > > [fault reason 05] PTE Write access is not set
> > > > DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
> > > > [fault reason 06] PTE Read access is not set
> > > >
> > > > So stolen memory should be disabled in KVM IGD passthrough
> environment,
> > > > this patch detects such environment through the existence of qemu
> > > emulated
> > > > isa bridge.
> > > >
> > > > When the real ISA bridge is also passed through to guest, guest will have
> > > > two isa bridges: emulated and real. Qemu guarantees the
> busnum:devnum.
> > > > funcnum of emulated isa bridge is always less than the real one. Then
> > > > emulated isa bridge is always detected first by pci_get_class(ISA). So
> > > > stolen memory will be disabled in this case also.
> > >
> > > Where does QEMU make this guarantee or any sort of guarantee wrt the
> > > ISA bridge?  Thanks,
> > >
> > > Alex
> > >
> > [Zhang, Xiong Y] In my guest environment I always see emulated devices
> > are at head of pci device list, the passed through devices are at tail. Even if
> > I want to assign the passed IGD to 00:02.0, the qemu tell me 00:02.0 has
> already
> > occupied by emulated graphic card.
> > If I pass through real ISA bridge to guest, the emulated ISA bridge is at
> 00:01.0,
> > While real ISA bridge is at 00:04.0.
> > Then I checked the code: emulated devices are created in pc_init1() function,
> it
> > creates host_bridge firstly, create isa_bridge secondly, create all other
> devices following.
> > So I think Qemu could guarantee. Now I'm suspect it, and need your coach.
> 
> So you're calling the current default behavior a guarantee.  That's not
> valid, it ignores that we might have future chipsets that do things
> differently and it ignores that the user can override some of those
> defaults and specify addresses for devices that may not match your
> expectations.  There is no agreement with the QEMU community to make
> this a stable feature of the VM.  What about using smbios information
> or detecting kvm (or any hypervisor) via the hypervisor MSRs?  You could
> maybe use the VM PCI host bridge and figure out that this version of
> IGD never shipped on 440fx or q35, but then you'll have the maintenance
> headache of updating the code for any new chipset QEMU decides to
> implement. Thanks,
> 
[Zhang, Xiong Y] Thanks for your teach and propose.
For smbios, could you teach me which type and field could be used ?
For hypervisor MSRs, from https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt,
We should use cupid(0x40000001) first, then use rdmsr(), in this case we could use
cupid(0x40000000) directly to detect kvm. But I don't know whether community could accept
it or not ?

> Alex

Gerd Hoffmann April 18, 2017, 11:26 a.m. UTC | #8

Hi,

> [Zhang, Xiong Y] Thanks for your teach and propose.
> For smbios, could you teach me which type and field could be used ?

qemu adds a specific subsystem id to all virtual devices, so you can use
that to figure you are running on qemu.  One good candidate to check is
the host bridge (easy to find due to fixed pci address), another one is
the isa bridge aka lpc (igd already searches for that one for other
reasons).  In fact there already is a check for qemu in
intel_detect_pch() ...

cheers,
  Gerd

[V5] drm/i915: Disable stolen memory when i915 runs on qemu

Commit Message

Comments

Patch