diff mbox series

[v6,12/12] drm/xe: Increase the XE_PL_TT watermark

Message ID 20240703153813.182001-13-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series TTM shrinker helpers and xe buffer object shrinker | expand

Commit Message

Thomas Hellstrom July 3, 2024, 3:38 p.m. UTC
The XE_PL_TT watermark was set to 50% of system memory.
The idea behind that was unclear since the net effect is that
TT memory will be evicted to TTM_PL_SYSTEM memory if that
watermark is exceeded, requiring PPGTT rebinds and dma
remapping. But there is no similar watermark for TTM_PL_SYSTEM
memory.

The TTM functionality that tries to swap out system memory to
shmem objects if a 50% limit of total system memory is reached
is orthogonal to this, and with the shrinker added, it's no
longer in effect.

Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
allowing all graphics memory to be bound to the device unless it
has been swapped out by the shrinker.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Souza, Jose Aug. 5, 2024, 6:35 p.m. UTC | #1
On Wed, 2024-07-03 at 17:38 +0200, Thomas Hellström wrote:
> The XE_PL_TT watermark was set to 50% of system memory.
> The idea behind that was unclear since the net effect is that
> TT memory will be evicted to TTM_PL_SYSTEM memory if that
> watermark is exceeded, requiring PPGTT rebinds and dma
> remapping. But there is no similar watermark for TTM_PL_SYSTEM
> memory.
> 
> The TTM functionality that tries to swap out system memory to
> shmem objects if a 50% limit of total system memory is reached
> is orthogonal to this, and with the shrinker added, it's no
> longer in effect.
> 
> Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> allowing all graphics memory to be bound to the device unless it
> has been swapped out by the shrinker.

Sorry if I missed some patch changing it but I did not found in this series anything changing the 50% limit in ttm_global_init().
When I debugged some Vulkan tests allocate a lot of memory, the reason that KMD was not allocating memory wash this ttm_global limit that is shared
with all devices using TTM.

> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> index 9844a8edbfe1..d38b91872da3 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
>  	u64 gtt_size;
>  
>  	si_meminfo(&si);
> +	/* Potentially restrict amount of TT memory here. */
>  	gtt_size = (u64)si.totalram * si.mem_unit;
> -	/* TTM limits allocation of all TTM devices by 50% of system memory */
> -	gtt_size /= 2;
>  
>  	man->use_tt = true;
>  	man->func = &xe_ttm_sys_mgr_func;
Matthew Brost Aug. 7, 2024, 11:13 p.m. UTC | #2
On Mon, Aug 05, 2024 at 12:35:34PM -0600, Souza, Jose wrote:
> On Wed, 2024-07-03 at 17:38 +0200, Thomas Hellström wrote:
> > The XE_PL_TT watermark was set to 50% of system memory.
> > The idea behind that was unclear since the net effect is that
> > TT memory will be evicted to TTM_PL_SYSTEM memory if that
> > watermark is exceeded, requiring PPGTT rebinds and dma
> > remapping. But there is no similar watermark for TTM_PL_SYSTEM
> > memory.
> > 
> > The TTM functionality that tries to swap out system memory to
> > shmem objects if a 50% limit of total system memory is reached
> > is orthogonal to this, and with the shrinker added, it's no
> > longer in effect.
> > 
> > Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> > allowing all graphics memory to be bound to the device unless it
> > has been swapped out by the shrinker.
> 
> Sorry if I missed some patch changing it but I did not found in this series anything changing the 50% limit in ttm_global_init().
> When I debugged some Vulkan tests allocate a lot of memory, the reason that KMD was not allocating memory wash this ttm_global limit that is shared
> with all devices using TTM.
> 

I'm reviewing this series and starting make sense of all this.

Thomas please correct me if I'm wrong here...

The limit set in ttm_global_init is the watermark for the TTM pool where
if exceeded upon freeing a BO's pages the pages are actually freed
rather than just returning to the TTM pool cache. The global watermark
is reason why in issue #2438 it observed a bunch of memory is still
consumed when nothing is running or any BOs exist - pages are being
cached in the TTM pool. The global watermark doesn't actually limit the
amount system memory TTM can allocate. A shrinker also exists which can
free cached pages in the TTM pool if memory pressure exists or 'echo 3 >
/proc/sys/vm/drop_caches' is done.

The watermark changed in this patch, is the actual limit for the number
of pages we can allocate for BOs. With a shrinker hooked into BOs, we
now can freely allocate all of the system pages for BOs and if memory
pressure exists idle BOs pages are swapped to shmem via the shrinker and
restored upon next GPU use.

Matt

> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > index 9844a8edbfe1..d38b91872da3 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
> >  	u64 gtt_size;
> >  
> >  	si_meminfo(&si);
> > +	/* Potentially restrict amount of TT memory here. */
> >  	gtt_size = (u64)si.totalram * si.mem_unit;
> > -	/* TTM limits allocation of all TTM devices by 50% of system memory */
> > -	gtt_size /= 2;
> >  
> >  	man->use_tt = true;
> >  	man->func = &xe_ttm_sys_mgr_func;
>
Matthew Brost Aug. 7, 2024, 11:44 p.m. UTC | #3
On Wed, Jul 03, 2024 at 05:38:13PM +0200, Thomas Hellström wrote:
> The XE_PL_TT watermark was set to 50% of system memory.
> The idea behind that was unclear since the net effect is that
> TT memory will be evicted to TTM_PL_SYSTEM memory if that
> watermark is exceeded, requiring PPGTT rebinds and dma
> remapping. But there is no similar watermark for TTM_PL_SYSTEM
> memory.
> 
> The TTM functionality that tries to swap out system memory to
> shmem objects if a 50% limit of total system memory is reached
> is orthogonal to this, and with the shrinker added, it's no
> longer in effect.
> 
> Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> allowing all graphics memory to be bound to the device unless it
> has been swapped out by the shrinker.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> index 9844a8edbfe1..d38b91872da3 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
>  	u64 gtt_size;
>  
>  	si_meminfo(&si);
> +	/* Potentially restrict amount of TT memory here. */
>  	gtt_size = (u64)si.totalram * si.mem_unit;
> -	/* TTM limits allocation of all TTM devices by 50% of system memory */
> -	gtt_size /= 2;
>  
>  	man->use_tt = true;
>  	man->func = &xe_ttm_sys_mgr_func;
> -- 
> 2.44.0
>
Thomas Hellstrom Aug. 9, 2024, 12:22 p.m. UTC | #4
On Wed, 2024-08-07 at 23:13 +0000, Matthew Brost wrote:
> On Mon, Aug 05, 2024 at 12:35:34PM -0600, Souza, Jose wrote:
> > On Wed, 2024-07-03 at 17:38 +0200, Thomas Hellström wrote:
> > > The XE_PL_TT watermark was set to 50% of system memory.
> > > The idea behind that was unclear since the net effect is that
> > > TT memory will be evicted to TTM_PL_SYSTEM memory if that
> > > watermark is exceeded, requiring PPGTT rebinds and dma
> > > remapping. But there is no similar watermark for TTM_PL_SYSTEM
> > > memory.
> > > 
> > > The TTM functionality that tries to swap out system memory to
> > > shmem objects if a 50% limit of total system memory is reached
> > > is orthogonal to this, and with the shrinker added, it's no
> > > longer in effect.
> > > 
> > > Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> > > allowing all graphics memory to be bound to the device unless it
> > > has been swapped out by the shrinker.
> > 
> > Sorry if I missed some patch changing it but I did not found in
> > this series anything changing the 50% limit in ttm_global_init().
> > When I debugged some Vulkan tests allocate a lot of memory, the
> > reason that KMD was not allocating memory wash this ttm_global
> > limit that is shared
> > with all devices using TTM.
> > 
> 
> I'm reviewing this series and starting make sense of all this.
> 
> Thomas please correct me if I'm wrong here...
> 
> The limit set in ttm_global_init is the watermark for the TTM pool
> where
> if exceeded upon freeing a BO's pages the pages are actually freed
> rather than just returning to the TTM pool cache. The global
> watermark
> is reason why in issue #2438 it observed a bunch of memory is still
> consumed when nothing is running or any BOs exist - pages are being
> cached in the TTM pool. 

This is correct.



> The global watermark doesn't actually limit the
> amount system memory TTM can allocate. A shrinker also exists which
> can
> free cached pages in the TTM pool if memory pressure exists or 'echo
> 3 >
> /proc/sys/vm/drop_caches' is done.

Yes, this is also true except the global watermark should be called the
pool watermark.

There is another global watermark that, if the amount of pages used for
graphics (PL_SYSTEM or PL_TT) exceeds 50% of system, bo swapping
starts. That means bos are pulled of the various LRU lists in a device
round-robin fashion and moved to shmem objects, in the anticipation
that theses shmem objects can then be paged out to disc by the core.
However, this is done even if there is no disc swap-space attached.
Also if this shmem swapping fails, nothing happens but the allocation
is free to proceed anyway.

This is what I typically refer to as the global watermark. It used to
be implemented by an opportunistic swapper process (similar to kswapd)
and a direct swapper (similar to direct reclaim) and the 50% limit was
configurable, but much of that functionality was ripped out. I didn't
follow the discussion preceding that change, though.


> 
> The watermark changed in this patch, is the actual limit for the
> number
> of pages we can allocate for BOs.

What is changed in this patch is actually the amount to memory we can
have in the TT placement and therefore also bound to the device. If
this limit is exceeded, eviction from TT to SYSTEM will take place in
addition to the above global swapping. This limit is per device. So now
we can in theory have all of system bound to a device.

>  With a shrinker hooked into BOs, we
> now can freely allocate all of the system pages for BOs and if memory
> pressure exists idle BOs pages are swapped to shmem via the shrinker
> and
> restored upon next GPU use.

Exactly.

> 
> Matt

/Thomas

> 
> > > 
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > index 9844a8edbfe1..d38b91872da3 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
> > >  	u64 gtt_size;
> > >  
> > >  	si_meminfo(&si);
> > > +	/* Potentially restrict amount of TT memory here. */
> > >  	gtt_size = (u64)si.totalram * si.mem_unit;
> > > -	/* TTM limits allocation of all TTM devices by 50% of
> > > system memory */
> > > -	gtt_size /= 2;
> > >  
> > >  	man->use_tt = true;
> > >  	man->func = &xe_ttm_sys_mgr_func;
> >
Thomas Hellstrom Aug. 9, 2024, 1:53 p.m. UTC | #5
On Wed, 2024-08-07 at 23:44 +0000, Matthew Brost wrote:
> On Wed, Jul 03, 2024 at 05:38:13PM +0200, Thomas Hellström wrote:
> > The XE_PL_TT watermark was set to 50% of system memory.
> > The idea behind that was unclear since the net effect is that
> > TT memory will be evicted to TTM_PL_SYSTEM memory if that
> > watermark is exceeded, requiring PPGTT rebinds and dma
> > remapping. But there is no similar watermark for TTM_PL_SYSTEM
> > memory.
> > 
> > The TTM functionality that tries to swap out system memory to
> > shmem objects if a 50% limit of total system memory is reached
> > is orthogonal to this, and with the shrinker added, it's no
> > longer in effect.
> > 
> > Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> > allowing all graphics memory to be bound to the device unless it
> > has been swapped out by the shrinker.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>

Thanks for reviewing!
/Thomas


> 
> > ---
> >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > index 9844a8edbfe1..d38b91872da3 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
> >  	u64 gtt_size;
> >  
> >  	si_meminfo(&si);
> > +	/* Potentially restrict amount of TT memory here. */
> >  	gtt_size = (u64)si.totalram * si.mem_unit;
> > -	/* TTM limits allocation of all TTM devices by 50% of
> > system memory */
> > -	gtt_size /= 2;
> >  
> >  	man->use_tt = true;
> >  	man->func = &xe_ttm_sys_mgr_func;
> > -- 
> > 2.44.0
> >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
index 9844a8edbfe1..d38b91872da3 100644
--- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
@@ -108,9 +108,8 @@  int xe_ttm_sys_mgr_init(struct xe_device *xe)
 	u64 gtt_size;
 
 	si_meminfo(&si);
+	/* Potentially restrict amount of TT memory here. */
 	gtt_size = (u64)si.totalram * si.mem_unit;
-	/* TTM limits allocation of all TTM devices by 50% of system memory */
-	gtt_size /= 2;
 
 	man->use_tt = true;
 	man->func = &xe_ttm_sys_mgr_func;