Message ID | 1440056724-26976-5-git-send-email-zhiyuan.lv@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > shadowing. The shadow copy is created when guest creates a context. > If a context changes its LRCA address, the hypervisor is hard to know > whether it is a new context or not. We always pin context objects to > global GTT to make life easier. Nak. Please explain why we need to workaround a bug in the host. We cannot pin the context as that breaks userspace (e.g. synmark) who can and will try to use more contexts than we have room. -Chris
Hi Chris, On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > shadowing. The shadow copy is created when guest creates a context. > > If a context changes its LRCA address, the hypervisor is hard to know > > whether it is a new context or not. We always pin context objects to > > global GTT to make life easier. > > Nak. Please explain why we need to workaround a bug in the host. We > cannot pin the context as that breaks userspace (e.g. synmark) who can > and will try to use more contexts than we have room. This change is only for i915 running inside a VM. Host i915 driver's behavior will not be impacted. The reason we want to pin context is that our hypervisor identifies guest contexts with their LRCA, and need LRCA unchanged during contexts' life cycle so that shadow copy of contexts can easily find their counterparts. If we cannot pin them, we have to consider more complicated shadow implementation ... BTW Chris, are there many apps like synmark that may used up GGTT with contexts if they are all pinned? Thanks! Regards, -Zhiyuan > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
Hi Chris, If we cannot always pin lr context into GGTT, the LRCA cannot be used as a context identifier for us. Then we have to consider a proper interface for i915 in VM to notify GVT-g device model. The context creation might be OK. We can pin context first, then notify the context creation, after that, unpin the context. In GVT-g, we can get the context's guest physical addresses from GTT table, and use that to identify a guest context. But the destroy can be a problem. It does not make sense to pin context and unpin that time right? Do you have any suggestions/comments? Thanks in advance! Regards, -Zhiyuan On Thu, Aug 20, 2015 at 05:16:54PM +0800, Zhiyuan Lv wrote: > Hi Chris, > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > shadowing. The shadow copy is created when guest creates a context. > > > If a context changes its LRCA address, the hypervisor is hard to know > > > whether it is a new context or not. We always pin context objects to > > > global GTT to make life easier. > > > > Nak. Please explain why we need to workaround a bug in the host. We > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > and will try to use more contexts than we have room. > > This change is only for i915 running inside a VM. Host i915 driver's > behavior will not be impacted. > > The reason we want to pin context is that our hypervisor identifies > guest contexts with their LRCA, and need LRCA unchanged during > contexts' life cycle so that shadow copy of contexts can easily find > their counterparts. If we cannot pin them, we have to consider more > complicated shadow implementation ... > > BTW Chris, are there many apps like synmark that may used up GGTT with > contexts if they are all pinned? Thanks! > > Regards, > -Zhiyuan > > > -Chris > > > > -- > > Chris Wilson, Intel Open Source Technology Centre
Hi Chris, On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > shadowing. The shadow copy is created when guest creates a context. > > If a context changes its LRCA address, the hypervisor is hard to know > > whether it is a new context or not. We always pin context objects to > > global GTT to make life easier. > > Nak. Please explain why we need to workaround a bug in the host. We > cannot pin the context as that breaks userspace (e.g. synmark) who can > and will try to use more contexts than we have room. Could you have a look at below reasons and kindly give us your inputs? 1, Due to the GGTT partitioning, the global graphics memory available inside virtual machines is much smaller than native case. We cannot support some graphics memory intensive workloads anyway. So it looks affordable to just pin contexts which do not take much GGTT. 2, Our hypervisor needs to change i915 guest context in the shadow context implementation. That part will be tricky if the context is not always pinned. One scenario is that when a context finishes running, we need to copy shadow context, which has been updated by hardware, to guest context. The hypervisor knows context finishing by context interrupt, but that time shrinker may have unpin the context and its backing storage may have been swap-out. Such copy may fail. Look forward to your comments. Thanks! Regards, -Zhiyuan > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > Hi Chris, > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > shadowing. The shadow copy is created when guest creates a context. > > > If a context changes its LRCA address, the hypervisor is hard to know > > > whether it is a new context or not. We always pin context objects to > > > global GTT to make life easier. > > > > Nak. Please explain why we need to workaround a bug in the host. We > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > and will try to use more contexts than we have room. > > Could you have a look at below reasons and kindly give us your inputs? > > 1, Due to the GGTT partitioning, the global graphics memory available > inside virtual machines is much smaller than native case. We cannot > support some graphics memory intensive workloads anyway. So it looks > affordable to just pin contexts which do not take much GGTT. Wrong. It exposes the guest to a trivial denial-of-service attack. A smaller GGTT does not actually limit clients (there is greater aperture pressure and some paths are less likely but an individual client will function just fine). > 2, Our hypervisor needs to change i915 guest context in the shadow > context implementation. That part will be tricky if the context is not > always pinned. One scenario is that when a context finishes running, > we need to copy shadow context, which has been updated by hardware, to > guest context. The hypervisor knows context finishing by context > interrupt, but that time shrinker may have unpin the context and its > backing storage may have been swap-out. Such copy may fail. That is just a bug in your code. Firstly allowing swapout on an object you still are using, secondly not being able to swapin. -Chris
Attach the big picture to help the discussion: Windows Guest Linux Guest Linux Guest (i915 guest mode) * We are here +--------------+ +--------------+ +-------------------------------------------+ | | | | | Guest Context Lifecycle Management | |Windows Driver| | i915 | | +---------------------------------------+ | | | | +---->| | CREATE/DESTROY/PIN/UNPIN | | | (GUEST MODE) | | (GUEST MODE) | | | +----------------+ +----------------+ | | | | | | | | |Execlist Context| |Execlist Context| | | | | | | | | +-----^----------+ +------^---------+ | | | | | | | +-------|-------------------|-----------+ | +--------------+ +--------------+ +---------|-------------------|-------------+ | NOTIFICATION | +-------------------------|-------------------|-------------+ | XenGT Shadow Context Lifecycle|Management | | | | | | +---------|-------------------|----------+ | | +-----------+ | +-------v-------+ +---------v------+ | | | | i915 | | | Shadow Context| | Shadow Context | | | | | Host Mode | | +---------------+ +----------------+ | | | +-----------+ +----------------------------------------+ | | DOM0 Linux (i915 host mode w/XenGT) | +-----------------------------------------------------------+ +-----------------------------------------------------------+ | Hypervisor | +-----------------------------------------------------------+ SHADOW CONTEXT SUBMISSION As you can see, in this picture, each guest execlist context will have an related shadow context in host, and XenGT will be responsible for a. update SHADOW context via GUEST context when guest wants to submit an workload. b. submitting the shadow context into hardware. c. update the guest context via shadow context, when shadow context is finished. d. inject virtual context switch interrupt into guest. Then guest will see "hey, my job was retired from hardware, then I can do something to my context." NOTIFICATION BETWEEN HOST AND GUEST Now in our design we have built a notification channel for guest to notify XenGT due to performance reason. With this channel, guest can play like this "Hey XenGT, I created a new context, please shadow it! Oh, I have destroyed the context, please stop tracking this context." But when this trick comes before guest pin/unpin, it has problems. PROBLEMS First, guest context pin/unpin will cause LRCA change, which breaks relationship between guest context and shadow context. As you can see that in our design, XenGT needs an approach to find the related shadow context according to something of an guest context. For now we find the related shadow context via guest context ID(LRCA in fact). But whatever it is, I think there should be something unique. XenGT has no knowledge about guest context pin/unpin. Guest may swap out an context if it sees the seqno has been passed. When the XenGT wants to access guest context, the guest context may be not there (swapped-out already). It's hard and complicated for XenGT to track guest context without an unique context ID and a stable execlist context backing store. For now the whole tricks are only works under virtualization environment and will not affect the native i915. Welcome to discussions! :) Thanks, Zhi. -----Original Message----- From: Chris Wilson [mailto:chris@chris-wilson.co.uk] Sent: Monday, August 24, 2015 6:23 PM To: intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Wang, Zhi A; Tian, Kevin; joonas.lahtinen@linux.intel.com Subject: Re: About the iGVT-g's requirement to pin guest contexts in VM On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > Hi Chris, > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > Intel GVT-g will perform EXECLIST context shadowing and ring > > > buffer shadowing. The shadow copy is created when guest creates a context. > > > If a context changes its LRCA address, the hypervisor is hard to > > > know whether it is a new context or not. We always pin context > > > objects to global GTT to make life easier. > > > > Nak. Please explain why we need to workaround a bug in the host. We > > cannot pin the context as that breaks userspace (e.g. synmark) who > > can and will try to use more contexts than we have room. > > Could you have a look at below reasons and kindly give us your inputs? > > 1, Due to the GGTT partitioning, the global graphics memory available > inside virtual machines is much smaller than native case. We cannot > support some graphics memory intensive workloads anyway. So it looks > affordable to just pin contexts which do not take much GGTT. Wrong. It exposes the guest to a trivial denial-of-service attack. A smaller GGTT does not actually limit clients (there is greater aperture pressure and some paths are less likely but an individual client will function just fine). > 2, Our hypervisor needs to change i915 guest context in the shadow > context implementation. That part will be tricky if the context is not > always pinned. One scenario is that when a context finishes running, > we need to copy shadow context, which has been updated by hardware, to > guest context. The hypervisor knows context finishing by context > interrupt, but that time shrinker may have unpin the context and its > backing storage may have been swap-out. Such copy may fail. That is just a bug in your code. Firstly allowing swapout on an object you still are using, secondly not being able to swapin. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Hi Chris, On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote: > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > > Hi Chris, > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > > shadowing. The shadow copy is created when guest creates a context. > > > > If a context changes its LRCA address, the hypervisor is hard to know > > > > whether it is a new context or not. We always pin context objects to > > > > global GTT to make life easier. > > > > > > Nak. Please explain why we need to workaround a bug in the host. We > > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > > and will try to use more contexts than we have room. > > > > Could you have a look at below reasons and kindly give us your inputs? > > > > 1, Due to the GGTT partitioning, the global graphics memory available > > inside virtual machines is much smaller than native case. We cannot > > support some graphics memory intensive workloads anyway. So it looks > > affordable to just pin contexts which do not take much GGTT. > > Wrong. It exposes the guest to a trivial denial-of-service attack. A Inside a VM, indeed. > smaller GGTT does not actually limit clients (there is greater aperture > pressure and some paths are less likely but an individual client will > function just fine). > > > 2, Our hypervisor needs to change i915 guest context in the shadow > > context implementation. That part will be tricky if the context is not > > always pinned. One scenario is that when a context finishes running, > > we need to copy shadow context, which has been updated by hardware, to > > guest context. The hypervisor knows context finishing by context > > interrupt, but that time shrinker may have unpin the context and its > > backing storage may have been swap-out. Such copy may fail. > > That is just a bug in your code. Firstly allowing swapout on an object > you still are using, secondly not being able to swapin. As Zhi replied in another email, we do not have the knowledge of guest driver's swap operations. If we cannot pin context, we may have to ask guest driver not to swap out context pages. Do you think that would be the right way to go? Thanks! Regards, -Zhiyuan > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote: > Hi Chris, > > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote: > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > > > Hi Chris, > > > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > > > shadowing. The shadow copy is created when guest creates a context. > > > > > If a context changes its LRCA address, the hypervisor is hard to know > > > > > whether it is a new context or not. We always pin context objects to > > > > > global GTT to make life easier. > > > > > > > > Nak. Please explain why we need to workaround a bug in the host. We > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > > > and will try to use more contexts than we have room. > > > > > > Could you have a look at below reasons and kindly give us your inputs? > > > > > > 1, Due to the GGTT partitioning, the global graphics memory available > > > inside virtual machines is much smaller than native case. We cannot > > > support some graphics memory intensive workloads anyway. So it looks > > > affordable to just pin contexts which do not take much GGTT. > > > > Wrong. It exposes the guest to a trivial denial-of-service attack. A > > Inside a VM, indeed. > > > smaller GGTT does not actually limit clients (there is greater aperture > > pressure and some paths are less likely but an individual client will > > function just fine). > > > > > 2, Our hypervisor needs to change i915 guest context in the shadow > > > context implementation. That part will be tricky if the context is not > > > always pinned. One scenario is that when a context finishes running, > > > we need to copy shadow context, which has been updated by hardware, to > > > guest context. The hypervisor knows context finishing by context > > > interrupt, but that time shrinker may have unpin the context and its > > > backing storage may have been swap-out. Such copy may fail. > > > > That is just a bug in your code. Firstly allowing swapout on an object > > you still are using, secondly not being able to swapin. > > As Zhi replied in another email, we do not have the knowledge of guest > driver's swap operations. If we cannot pin context, we may have to ask > guest driver not to swap out context pages. Do you think that would be > the right way to go? Thanks! It doesn't matter at all - if the guest unpins the ctx and puts something else in there before the host tells it that the ctx is completed, that's a bug in the guest. Same with real hw, we guarantee that the context stays around for long enough. Also you obviously have to complete the copying from shadow->guest ctx before you send the irq to the guest to signal ctx completion. Which means there's really no overall problem here from a design pov, the only thing you have to do is fix up bugs in the host code (probably you should just write through the ggtt). -Daniel
Attach the some introduction to help the discussion: [ Current Implementation - Performance Oriented ! ] Guest create Guest submits a new context its context + + ^ | | | +-------+ |Notify XenGT | | | Guest | | | | +-------+ | | | TIMELINE ==================v====================================v=====+=============> +-------+ <-------------------------------> | XenGT | + + ^ +-------+ | +-----------------------------+ | | | | *XENGT TRACK THE WHOLE | | | Start to track | | LIFE CYCLE OF GUEST CONTEXT!| | | guest context | +-----------------------------+ | | | | | v v + XenGT creates XenGT submits XenGT updates Shadow context shadow context guest context [Track the whole life cycle of guest context via notification channel between guest and host] ADVANTAGE *Avoid CPU usage peak at the time of guest context submission due to shadow context translation - Build shadow context as early as possible - No need to postpone the shadow context translation into the time of guest context submission *Avoid the cost of frequently re-recreating the shadow context - XenGT just create/destroy shadow context with message from notification channel - No need to re-create / destroy shadow context in each time of guest context submission(PERFORMANCE!) DISADVANTAGE * Need an unique identification to locate the shadow context via information from guest context. Current it's LRCA. - Break by dynamically context GGTT pin/unpin in i915 * Guest needs extra modification of leveraging the notification channel to assist XenGT manage the lifecycle of shadow context. - Introduce extra functions into guest driver: i915_{execlist, ppgtt}_notify_vgt() * Backing store of GEM object has to be pinned, as hypervisor tracks guest memory via pfn. - Break by: shmem(tmpfs) swap-in/out will cause context pfn change when there are no available pages in page cache and swap cache. SOLUTION * Pin guest context to get an unique LRCA and stable backing store. - Windows driver has behave like this already without extra modification. [ Simple Implantation - Architecture Clean Oriented ! ] Guest sees updated Guest creates Guest submits context via virtual context context context switch interrupt + + ^ | | | +-----------+ | | | | Guest | | | | +-----------+ | | | TIMELINE =================v================v==============+=========================> +------------+ | | DOM0 XenGT | + ^ | +------------+ | | XenGT update | +-------------+ | guest context | | | via shadow | | | context | | | | | | | v + | XenGT creates XenGT submits XenGT destroy shadow shadow context +-----> shadow context context from guest into hardware context +-------------------------------+ | * IN-SUBMISSION-TIME | | SHADOW CONTEXT CREATE/DESTROY | +-------------------------------+ ADVANTAGE * No need to use notification channel - Avoid introduce extra modification into i915 * No need to pin guest GEM object - Guest backing store is pinned by i915 at the time of guest context submission. - Shadow context only survive in the time of submission, so unique identification between guest context and shadow context is not needed. DISADVANTAGE * Frequently shadow context re-create/destroy at the time of guest context submission(PERFORMANCE! MEMORY FOOTPRINT!) - Even trivial screen refresh will trigger guest context submission, like moving mouse over an event-sensed button in X-window. - May cause heavy performance drop and heavy memory footprints, due to frequently shadow context re-create/destroy. * High CPU usage peak due to frequently shadow context re-create/destroy at the time of guest context submission. Hope to help the discussion! :) Thanks, Zhi. -----Original Message----- From: Wang, Zhi A Sent: Tuesday, August 25, 2015 1:19 AM To: Chris Wilson; intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Tian, Kevin; joonas.lahtinen@linux.intel.com Subject: RE: About the iGVT-g's requirement to pin guest contexts in VM Attach the big picture to help the discussion: Windows Guest Linux Guest Linux Guest (i915 guest mode) * We are here +--------------+ +--------------+ +-------------------------------------------+ | | | | | Guest Context Lifecycle Management | |Windows Driver| | i915 | | +---------------------------------------+ | | | | +---->| | CREATE/DESTROY/PIN/UNPIN | | | (GUEST MODE) | | (GUEST MODE) | | | +----------------+ +----------------+ | | | | | | | | |Execlist Context| |Execlist Context| | | | | | | | | +-----^----------+ +------^---------+ | | | | | | | +-------|-------------------|-----------+ | +--------------+ +--------------+ +---------|-------------------|-------------+ | NOTIFICATION | +-------------------------|-------------------|-------------+ | XenGT Shadow Context Lifecycle|Management | | | | | | +---------|-------------------|----------+ | | +-----------+ | +-------v-------+ +---------v------+ | | | | i915 | | | Shadow Context| | Shadow Context | | | | | Host Mode | | +---------------+ +----------------+ | | | +-----------+ +----------------------------------------+ | | DOM0 Linux (i915 host mode w/XenGT) | +-----------------------------------------------------------+ +-----------------------------------------------------------+ | Hypervisor | +-----------------------------------------------------------+ SHADOW CONTEXT SUBMISSION As you can see, in this picture, each guest execlist context will have an related shadow context in host, and XenGT will be responsible for a. update SHADOW context via GUEST context when guest wants to submit an workload. b. submitting the shadow context into hardware. c. update the guest context via shadow context, when shadow context is finished. d. inject virtual context switch interrupt into guest. Then guest will see "hey, my job was retired from hardware, then I can do something to my context." NOTIFICATION BETWEEN HOST AND GUEST Now in our design we have built a notification channel for guest to notify XenGT due to performance reason. With this channel, guest can play like this "Hey XenGT, I created a new context, please shadow it! Oh, I have destroyed the context, please stop tracking this context." But when this trick comes before guest pin/unpin, it has problems. PROBLEMS First, guest context pin/unpin will cause LRCA change, which breaks relationship between guest context and shadow context. As you can see that in our design, XenGT needs an approach to find the related shadow context according to something of an guest context. For now we find the related shadow context via guest context ID(LRCA in fact). But whatever it is, I think there should be something unique. XenGT has no knowledge about guest context pin/unpin. Guest may swap out an context if it sees the seqno has been passed. When the XenGT wants to access guest context, the guest context may be not there (swapped-out already). It's hard and complicated for XenGT to track guest context without an unique context ID and a stable execlist context backing store. For now the whole tricks are only works under virtualization environment and will not affect the native i915. Welcome to discussions! :) Thanks, Zhi. -----Original Message----- From: Chris Wilson [mailto:chris@chris-wilson.co.uk] Sent: Monday, August 24, 2015 6:23 PM To: intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Wang, Zhi A; Tian, Kevin; joonas.lahtinen@linux.intel.com Subject: Re: About the iGVT-g's requirement to pin guest contexts in VM On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > Hi Chris, > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > Intel GVT-g will perform EXECLIST context shadowing and ring > > > buffer shadowing. The shadow copy is created when guest creates a context. > > > If a context changes its LRCA address, the hypervisor is hard to > > > know whether it is a new context or not. We always pin context > > > objects to global GTT to make life easier. > > > > Nak. Please explain why we need to workaround a bug in the host. We > > cannot pin the context as that breaks userspace (e.g. synmark) who > > can and will try to use more contexts than we have room. > > Could you have a look at below reasons and kindly give us your inputs? > > 1, Due to the GGTT partitioning, the global graphics memory available > inside virtual machines is much smaller than native case. We cannot > support some graphics memory intensive workloads anyway. So it looks > affordable to just pin contexts which do not take much GGTT. Wrong. It exposes the guest to a trivial denial-of-service attack. A smaller GGTT does not actually limit clients (there is greater aperture pressure and some paths are less likely but an individual client will function just fine). > 2, Our hypervisor needs to change i915 guest context in the shadow > context implementation. That part will be tricky if the context is not > always pinned. One scenario is that when a context finishes running, > we need to copy shadow context, which has been updated by hardware, to > guest context. The hypervisor knows context finishing by context > interrupt, but that time shrinker may have unpin the context and its > backing storage may have been swap-out. Such copy may fail. That is just a bug in your code. Firstly allowing swapout on an object you still are using, secondly not being able to swapin. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Hi Daniel, On Wed, Aug 26, 2015 at 10:56:00AM +0200, Daniel Vetter wrote: > On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote: > > Hi Chris, > > > > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote: > > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > > > > Hi Chris, > > > > > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > > > > shadowing. The shadow copy is created when guest creates a context. > > > > > > If a context changes its LRCA address, the hypervisor is hard to know > > > > > > whether it is a new context or not. We always pin context objects to > > > > > > global GTT to make life easier. > > > > > > > > > > Nak. Please explain why we need to workaround a bug in the host. We > > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > > > > and will try to use more contexts than we have room. > > > > > > > > Could you have a look at below reasons and kindly give us your inputs? > > > > > > > > 1, Due to the GGTT partitioning, the global graphics memory available > > > > inside virtual machines is much smaller than native case. We cannot > > > > support some graphics memory intensive workloads anyway. So it looks > > > > affordable to just pin contexts which do not take much GGTT. > > > > > > Wrong. It exposes the guest to a trivial denial-of-service attack. A > > > > Inside a VM, indeed. > > > > > smaller GGTT does not actually limit clients (there is greater aperture > > > pressure and some paths are less likely but an individual client will > > > function just fine). > > > > > > > 2, Our hypervisor needs to change i915 guest context in the shadow > > > > context implementation. That part will be tricky if the context is not > > > > always pinned. One scenario is that when a context finishes running, > > > > we need to copy shadow context, which has been updated by hardware, to > > > > guest context. The hypervisor knows context finishing by context > > > > interrupt, but that time shrinker may have unpin the context and its > > > > backing storage may have been swap-out. Such copy may fail. > > > > > > That is just a bug in your code. Firstly allowing swapout on an object > > > you still are using, secondly not being able to swapin. > > > > As Zhi replied in another email, we do not have the knowledge of guest > > driver's swap operations. If we cannot pin context, we may have to ask > > guest driver not to swap out context pages. Do you think that would be > > the right way to go? Thanks! > > It doesn't matter at all - if the guest unpins the ctx and puts something > else in there before the host tells it that the ctx is completed, that's a > bug in the guest. Same with real hw, we guarantee that the context stays > around for long enough. You are right. Previously I did not realize that shrinker will check not only the seqno, but also "ACTIVE_TO_IDLE" context interrupt for unpinning a context, then had above concern. Thanks for the explanation! > > Also you obviously have to complete the copying from shadow->guest ctx > before you send the irq to the guest to signal ctx completion. Which means > there's really no overall problem here from a design pov, the only thing Right. We cannot control when guest driver sees seqno change, but we can control when guest sees context interrupts. The guest CSB update and interrupt injection will be after we finish writing guest contexts. So right now we have two options of context shadowing: one is to track the whole life-cycle of guest context, and another is to do the shadow work in context schedule-in/schedule-out time. Zhi draws a nice picture of them. Currently we do not have concrete performance comparison of the two approaches. We will have a try and see. And about this patchset, I will remove the "context notification" part and send out an updated version. Thanks! > you have to do is fix up bugs in the host code (probably you should just > write through the ggtt). Sorry could you elaborate a little more about this? Guest context may not always be in aperture right? > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Aug 27, 2015 at 09:50:03AM +0800, Zhiyuan Lv wrote: > Hi Daniel, > > On Wed, Aug 26, 2015 at 10:56:00AM +0200, Daniel Vetter wrote: > > On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote: > > > Hi Chris, > > > > > > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote: > > > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote: > > > > > Hi Chris, > > > > > > > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote: > > > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote: > > > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer > > > > > > > shadowing. The shadow copy is created when guest creates a context. > > > > > > > If a context changes its LRCA address, the hypervisor is hard to know > > > > > > > whether it is a new context or not. We always pin context objects to > > > > > > > global GTT to make life easier. > > > > > > > > > > > > Nak. Please explain why we need to workaround a bug in the host. We > > > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can > > > > > > and will try to use more contexts than we have room. > > > > > > > > > > Could you have a look at below reasons and kindly give us your inputs? > > > > > > > > > > 1, Due to the GGTT partitioning, the global graphics memory available > > > > > inside virtual machines is much smaller than native case. We cannot > > > > > support some graphics memory intensive workloads anyway. So it looks > > > > > affordable to just pin contexts which do not take much GGTT. > > > > > > > > Wrong. It exposes the guest to a trivial denial-of-service attack. A > > > > > > Inside a VM, indeed. > > > > > > > smaller GGTT does not actually limit clients (there is greater aperture > > > > pressure and some paths are less likely but an individual client will > > > > function just fine). > > > > > > > > > 2, Our hypervisor needs to change i915 guest context in the shadow > > > > > context implementation. That part will be tricky if the context is not > > > > > always pinned. One scenario is that when a context finishes running, > > > > > we need to copy shadow context, which has been updated by hardware, to > > > > > guest context. The hypervisor knows context finishing by context > > > > > interrupt, but that time shrinker may have unpin the context and its > > > > > backing storage may have been swap-out. Such copy may fail. > > > > > > > > That is just a bug in your code. Firstly allowing swapout on an object > > > > you still are using, secondly not being able to swapin. > > > > > > As Zhi replied in another email, we do not have the knowledge of guest > > > driver's swap operations. If we cannot pin context, we may have to ask > > > guest driver not to swap out context pages. Do you think that would be > > > the right way to go? Thanks! > > > > It doesn't matter at all - if the guest unpins the ctx and puts something > > else in there before the host tells it that the ctx is completed, that's a > > bug in the guest. Same with real hw, we guarantee that the context stays > > around for long enough. > > You are right. Previously I did not realize that shrinker will check > not only the seqno, but also "ACTIVE_TO_IDLE" context interrupt for > unpinning a context, then had above concern. Thanks for the > explanation! > > > > > Also you obviously have to complete the copying from shadow->guest ctx > > before you send the irq to the guest to signal ctx completion. Which means > > there's really no overall problem here from a design pov, the only thing > > Right. We cannot control when guest driver sees seqno change, but we > can control when guest sees context interrupts. The guest CSB update > and interrupt injection will be after we finish writing guest > contexts. > > So right now we have two options of context shadowing: one is to track > the whole life-cycle of guest context, and another is to do the shadow > work in context schedule-in/schedule-out time. Zhi draws a nice > picture of them. > > Currently we do not have concrete performance comparison of the two > approaches. We will have a try and see. And about this patchset, I > will remove the "context notification" part and send out an updated > version. Thanks! > > > you have to do is fix up bugs in the host code (probably you should just > > write through the ggtt). > > Sorry could you elaborate a little more about this? Guest context may > not always be in aperture right? Yeah the high-level problem is that global gtt is contended (we already have trouble with that on xengt and there's the ongoing but unfished partial mmap support for that). And permanently pinning execlist contexts will cause lots of troubles. Windows can do this because it segments the global gtt into different parts (at least last time I looked at their memory manager), which means execlist will never sit in the middle of the range used for mmaps. But linux has a unified memory manager, which means execlist can sit anywhere, and therefore badly fragment the global gtt. If we pin them then that will cause trouble after long system uptime. And afaiui xengt is mostly aimed at servers, where the uptime assumption should be "runs forever". Compounding factor is that despite that I raised this in the original review execlist is still not yet using the active list in upstream and instead does short-time pinning. It's better than pinning forever but still breaks the eviction logic. What Chris Wilson and I talked about forever is adding an object-specific global_gtt_unbind hook. The idea is that this would be called when unbinding/evicting a special object (e.g. hw context), and you could use that to do the host signalling. That would be the perfect combination of both approaches: - Fast: host signalling (and therefore shadow context recreation) would only be done when the execlist context has actually moved around. That almost never happens, and hence per-execbuf overhead would be as low as with your pinning solution. - Flexible: The i915 memory manager is still in full control since we don't pin any objects unecessarily. Cheers, Daniel
Hi Daniel, Thanks for the comments! And my reply in line: On Wed, Sep 02, 2015 at 10:19:03AM +0200, Daniel Vetter wrote: > > > > > > Also you obviously have to complete the copying from shadow->guest ctx > > > before you send the irq to the guest to signal ctx completion. Which means > > > there's really no overall problem here from a design pov, the only thing > > > > Right. We cannot control when guest driver sees seqno change, but we > > can control when guest sees context interrupts. The guest CSB update > > and interrupt injection will be after we finish writing guest > > contexts. > > > > So right now we have two options of context shadowing: one is to track > > the whole life-cycle of guest context, and another is to do the shadow > > work in context schedule-in/schedule-out time. Zhi draws a nice > > picture of them. > > > > Currently we do not have concrete performance comparison of the two > > approaches. We will have a try and see. And about this patchset, I > > will remove the "context notification" part and send out an updated > > version. Thanks! > > > > > you have to do is fix up bugs in the host code (probably you should just > > > write through the ggtt). > > > > Sorry could you elaborate a little more about this? Guest context may > > not always be in aperture right? > > Yeah the high-level problem is that global gtt is contended (we already > have trouble with that on xengt and there's the ongoing but unfished > partial mmap support for that). And permanently pinning execlist contexts > will cause lots of troubles. > > Windows can do this because it segments the global gtt into different > parts (at least last time I looked at their memory manager), which means > execlist will never sit in the middle of the range used for mmaps. But > linux has a unified memory manager, which means execlist can sit anywhere, > and therefore badly fragment the global gtt. If we pin them then that will > cause trouble after long system uptime. And afaiui xengt is mostly aimed > at servers, where the uptime assumption should be "runs forever". In server side, we do not expect host to run much graphics workloads (all should be in virtual machines with shorter uptime). But yeah, gm fragmentation is still an issue. Thanks for the explanation! > > Compounding factor is that despite that I raised this in the original > review execlist is still not yet using the active list in upstream and > instead does short-time pinning. It's better than pinning forever but > still breaks the eviction logic. > > What Chris Wilson and I talked about forever is adding an object-specific > global_gtt_unbind hook. The idea is that this would be called when > unbinding/evicting a special object (e.g. hw context), and you could use > that to do the host signalling. That would be the perfect combination of > both approaches: Thanks for the suggestion! That sounds great! Right now in XenGT part we still plan to try the shadow context work in context schedule-in/out time. With this way, we do not need to pin context as well as the host notification. We will collect some performance data to see how it works. I sent out v2 patch set which has removed the pin/unpin of execlist contexts. The patchset addressed review comments from you, Chris and Joonas(Sorry I forgot to add CC to related people). Is that patch set OK to be merged? Thanks! Regards, -Zhiyuan > > - Fast: host signalling (and therefore shadow context recreation) would > only be done when the execlist context has actually moved around. That > almost never happens, and hence per-execbuf overhead would be as low as > with your pinning solution. > > - Flexible: The i915 memory manager is still in full control since we > don't pin any objects unecessarily. > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Sep 02, 2015 at 05:20:34PM +0800, Zhiyuan Lv wrote: > Hi Daniel, > > Thanks for the comments! And my reply in line: > > On Wed, Sep 02, 2015 at 10:19:03AM +0200, Daniel Vetter wrote: > > > > > > > > Also you obviously have to complete the copying from shadow->guest ctx > > > > before you send the irq to the guest to signal ctx completion. Which means > > > > there's really no overall problem here from a design pov, the only thing > > > > > > Right. We cannot control when guest driver sees seqno change, but we > > > can control when guest sees context interrupts. The guest CSB update > > > and interrupt injection will be after we finish writing guest > > > contexts. > > > > > > So right now we have two options of context shadowing: one is to track > > > the whole life-cycle of guest context, and another is to do the shadow > > > work in context schedule-in/schedule-out time. Zhi draws a nice > > > picture of them. > > > > > > Currently we do not have concrete performance comparison of the two > > > approaches. We will have a try and see. And about this patchset, I > > > will remove the "context notification" part and send out an updated > > > version. Thanks! > > > > > > > you have to do is fix up bugs in the host code (probably you should just > > > > write through the ggtt). > > > > > > Sorry could you elaborate a little more about this? Guest context may > > > not always be in aperture right? > > > > Yeah the high-level problem is that global gtt is contended (we already > > have trouble with that on xengt and there's the ongoing but unfished > > partial mmap support for that). And permanently pinning execlist contexts > > will cause lots of troubles. > > > > Windows can do this because it segments the global gtt into different > > parts (at least last time I looked at their memory manager), which means > > execlist will never sit in the middle of the range used for mmaps. But > > linux has a unified memory manager, which means execlist can sit anywhere, > > and therefore badly fragment the global gtt. If we pin them then that will > > cause trouble after long system uptime. And afaiui xengt is mostly aimed > > at servers, where the uptime assumption should be "runs forever". > > In server side, we do not expect host to run much graphics workloads > (all should be in virtual machines with shorter uptime). But yeah, gm > fragmentation is still an issue. Thanks for the explanation! Fragmentation is a lot worse on the guest side (due to the reduce global gtt space due to ballooning). Host isn't really any different. > > Compounding factor is that despite that I raised this in the original > > review execlist is still not yet using the active list in upstream and > > instead does short-time pinning. It's better than pinning forever but > > still breaks the eviction logic. > > > > What Chris Wilson and I talked about forever is adding an object-specific > > global_gtt_unbind hook. The idea is that this would be called when > > unbinding/evicting a special object (e.g. hw context), and you could use > > that to do the host signalling. That would be the perfect combination of > > both approaches: > > Thanks for the suggestion! That sounds great! Right now in XenGT part > we still plan to try the shadow context work in context > schedule-in/out time. With this way, we do not need to pin context as > well as the host notification. We will collect some performance data > to see how it works. > > I sent out v2 patch set which has removed the pin/unpin of execlist > contexts. The patchset addressed review comments from you, Chris and > Joonas(Sorry I forgot to add CC to related people). Is that patch set > OK to be merged? Thanks! I'll defer to detailed reviewers, but if the static pinning is gone then I'm ok. We can add the optimization I described here later on. -Daniel
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 39df304..4b2ac37 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2282,7 +2282,8 @@ void intel_lr_context_free(struct intel_context *ctx) ctx->engine[i].ringbuf; struct intel_engine_cs *ring = ringbuf->ring; - if (ctx == ring->default_context) { + if ((ctx == ring->default_context) || + (intel_vgpu_active(ring->dev))) { intel_unpin_ringbuffer_obj(ringbuf); i915_gem_object_ggtt_unpin(ctx_obj); } @@ -2353,6 +2354,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, struct intel_engine_cs *ring) { const bool is_global_default_ctx = (ctx == ring->default_context); + const bool need_to_pin_ctx = (is_global_default_ctx || + (intel_vgpu_active(ring->dev))); struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; struct drm_i915_gem_object *ctx_obj; @@ -2374,7 +2377,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, return -ENOMEM; } - if (is_global_default_ctx) { + if (need_to_pin_ctx) { ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, PIN_OFFSET_BIAS | GUC_WOPCM_TOP); if (ret) { @@ -2415,7 +2418,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, goto error_free_rbuf; } - if (is_global_default_ctx) { + if (need_to_pin_ctx) { ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf); if (ret) { DRM_ERROR( @@ -2464,14 +2467,14 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, return 0; error: - if (is_global_default_ctx) + if (need_to_pin_ctx) intel_unpin_ringbuffer_obj(ringbuf); error_destroy_rbuf: intel_destroy_ringbuffer_obj(ringbuf); error_free_rbuf: kfree(ringbuf); error_unpin_ctx: - if (is_global_default_ctx) + if (need_to_pin_ctx) i915_gem_object_ggtt_unpin(ctx_obj); drm_gem_object_unreference(&ctx_obj->base); return ret;