From patchwork Wed Feb 15 16:13:50 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141905
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 15435C636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:14:39 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 7FCCC10E580;
	Wed, 15 Feb 2023 16:14:38 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id EC41E10EB15;
 Wed, 15 Feb 2023 16:14:35 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477676; x=1708013676;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=f+jziJqtg2tHO1sm8YICg4x70AohpDQ+/WN+2ra+cxA=;
 b=hGzxEyCoRV9XSb+cgq05kFe2dsTNMKFXnyxk5+fNKTWL7L7gVDAE+BPX
 DTv5Sq3rv9SSFfBdueRHl9sQ1+jHaTmoLSe8jqO4JJ2Ow+EvzqIK1T2Z2
 uLTho31N3bEPMaG0yd+MD62kQpKI4vIasrE46dmYa9pDSGpu4S4e4GQKQ
 XLb26qMmpVSfR2yhQuzACjLjzwwQCwEcDgvV5S0diYxjLe4BRr4ZGOufO
 QajVgroHb/uRBHaRwJq/Vreu/LJXzzcJctbXUW2eWP/21QECq6zMeUsK1
 aVy3EpRuxOd+55iRAAMuZxG439Jeadazqht+U8fJn+f2nKutZ9vfO1uMy g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870669"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870669"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:34 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758471994"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758471994"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:27 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:50 +0100
Message-Id: <20230215161405.187368-2-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer
 dereference
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>, Philip Yang <Philip.Yang@amd.com>,
 NeilBrown <neilb@suse.de>, Daniel Vetter <daniel.vetter@ffwll.ch>,
 Peter Xu <peterx@redhat.com>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, Huang Rui <ray.huang@amd.com>,
 David Hildenbrand <david@redhat.com>,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>,
 linux-graphics-maintainer@vmware.com, Matthew Auld <matthew.auld@intel.com>,
 Ramalingam C <ramalingam.c@intel.com>, Dave Airlie <airlied@redhat.com>,
	=?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>,
 intel-gfx@lists.freedesktop.org, Qiang Yu <qiang.yu@amd.com>,
 Felix Kuehling <Felix.Kuehling@amd.com>,
 Johannes Weiner <hannes@cmpxchg.org>,
 Alex Deucher <alexander.deucher@amd.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>,
 Nirmoy Das <nirmoy.das@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

The LRU mechanism may look up a resource in the process of being removed
from an object. The locking rules here are a bit unclear but it looks
currently like res->bo assignment is protected by the LRU lock, whereas
bo->resource is protected by the object lock, while *clearing* of
bo->resource is also protected by the LRU lock. This means that if
we check that bo->resource points to the LRU resource under the LRU
lock we should be safe.
So perform that check before deciding to swap out a bo. That avoids
dereferencing a NULL bo->resource in ttm_bo_swapout().

Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Qiang Yu <qiang.yu@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c7a1862f322a..ae2f19dc9f81 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 			struct ttm_buffer_object *bo = res->bo;
 			uint32_t num_pages;
 
-			if (!bo)
+			if (!bo || bo->resource != res)
 				continue;
 
 			num_pages = PFN_UP(bo->base.size);

From patchwork Wed Feb 15 16:13:51 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141906
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 46A13C636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:14:44 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BA51910EACF;
	Wed, 15 Feb 2023 16:14:43 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 38C0D10EACF;
 Wed, 15 Feb 2023 16:14:41 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477681; x=1708013681;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=fBC7jBzo37+O/nCkHNJ6G49Az5rw8kNvRLn7CLfn/ts=;
 b=T3/OqiP3MbeVN3h2OFWdtEtsuvHjJQqxTyYSrDBdPbbPyzOO8w//c4Bj
 t9R1XCpdrNjw1twsgeuul9GWrWGWsMJMcdP/gwSoNKQHniSP+RNtJCgG1
 iP57yEAHpxh4K0nxsa91hIl7IwOZ3yXxCNSNSMAnT48YkafGKiennfgab
 OfuiGdgDPpEATKjkFGOkvoE6oKCQAfLjWbhRiT6YflhfHcAfywihO6aaa
 Vg4pq15areggrhhlLbMXvRPZQPyAuEdQvZYLEocH9ibY1NMKriBNyJbYi
 bNIux8c5BJhKV7MjWhTIbnIdy1RRrYaXZ0afSd4axpUDEm9NX8sZl6RDq g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870710"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870710"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:40 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472124"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472124"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:34 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:51 +0100
Message-Id: <20230215161405.187368-3-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc
 error path
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, Huang Rui <ray.huang@amd.com>,
 linux-graphics-maintainer@vmware.com, Peter Xu <peterx@redhat.com>,
 Johannes Weiner <hannes@cmpxchg.org>,
 Madhav Chauhan <madhav.chauhan@amd.com>, Dave Airlie <airlied@redhat.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

When hitting an error, the error path forgot to unmap dma mappings and
could call set_pages_wb() on already uncached pages.

Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Madhav Chauhan <madhav.chauhan@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..1cc7591a9542 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 	return 0;
 }
 
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
+			    struct page **caching_divide,
+			    enum ttm_caching initial_caching,
+			    enum ttm_caching subseq_caching,
+			    pgoff_t num_pages)
+{
+	enum ttm_caching caching = subseq_caching;
+	struct page **pages = tt->pages;
+	unsigned int order;
+	pgoff_t i, nr;
+
+	if (pool && caching_divide)
+		caching = initial_caching;
+
+	for (i = 0; i < num_pages; i += nr, pages += nr) {
+		struct ttm_pool_type *pt = NULL;
+
+		if (unlikely(caching_divide == pages))
+			caching = subseq_caching;
+
+		order = ttm_pool_page_order(pool, *pages);
+		nr = (1UL << order);
+		if (tt->dma_address)
+			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+		pt = ttm_pool_select_type(pool, caching, order);
+		if (pt)
+			ttm_pool_type_give(pt, *pages);
+		else
+			ttm_pool_free_page(pool, caching, order, *pages);
+	}
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	dma_addr_t *dma_addr = tt->dma_address;
 	struct page **caching = tt->pages;
 	struct page **pages = tt->pages;
+	enum ttm_caching page_caching;
 	gfp_t gfp_flags = GFP_USER;
-	unsigned int i, order;
+	unsigned int order;
 	struct page *p;
 	int r;
 
@@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		struct ttm_pool_type *pt;
 
+		page_caching = tt->caching;
 		pt = ttm_pool_select_type(pool, tt->caching, order);
 		p = pt ? ttm_pool_type_take(pt) : NULL;
 		if (p) {
@@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 			if (r)
 				goto error_free_page;
 
+			caching = pages;
 			do {
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
@@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				if (r)
 					goto error_free_page;
 
+				caching = pages;
 				if (num_pages < (1 << order))
 					break;
 
 				p = ttm_pool_type_take(pt);
 			} while (p);
-			caching = pages;
 		}
 
+		page_caching = ttm_cached;
 		while (num_pages >= (1 << order) &&
 		       (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
 
@@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 							   tt->caching);
 				if (r)
 					goto error_free_page;
+				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
 						    &num_pages, &pages);
@@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	return 0;
 
 error_free_page:
-	ttm_pool_free_page(pool, tt->caching, order, p);
+	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
 	num_pages = tt->num_pages - num_pages;
-	for (i = 0; i < num_pages; ) {
-		order = ttm_pool_page_order(pool, tt->pages[i]);
-		ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
-		i += 1 << order;
-	}
+	__ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
+			num_pages);
 
 	return r;
 }
@@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
  */
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
-	unsigned int i;
-
-	for (i = 0; i < tt->num_pages; ) {
-		struct page *p = tt->pages[i];
-		unsigned int order, num_pages;
-		struct ttm_pool_type *pt;
-
-		order = ttm_pool_page_order(pool, p);
-		num_pages = 1ULL << order;
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
-
-		pt = ttm_pool_select_type(pool, tt->caching, order);
-		if (pt)
-			ttm_pool_type_give(pt, tt->pages[i]);
-		else
-			ttm_pool_free_page(pool, tt->caching, order,
-					   tt->pages[i]);
-
-		i += num_pages;
-	}
+	__ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
+			tt->num_pages);
 
 	while (atomic_long_read(&allocated_pages) > page_pool_size)
 		ttm_pool_shrink();

From patchwork Wed Feb 15 16:13:52 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141907
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 50BC2C636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:14:50 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 184BA10EADF;
	Wed, 15 Feb 2023 16:14:48 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id B9BA710E272;
 Wed, 15 Feb 2023 16:14:45 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477685; x=1708013685;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=McHyT4De9u46Zn7/STJhV0OybtGzOXa9c4egudSyqSQ=;
 b=kDbC1WmZESkMs0WvsLrCD/JBSHWB+XjMUP+5SEr1/j0oIv+9dhFQ504C
 luejq1XOloXNSVwSuhC6jiNc2JaPariDtKsi5RQWjs6fYOqpueiwN9xI6
 Duysx4uwrNW4eqGtfo/9QyhJp2nDlx31K9Sw9pyKQf8Mx4GUAXXk8pp48
 5OIY7kds6UD5ITPYbdjX1qKJp27u7yGvwn6et4yC/8GcyOqTYFizGpopx
 2C9ByqRFi+EgDrHyV0M/TvzKwf+HpTU8+9EtOhc5RnXzlmes/1kawnckt
 VfK4JZwhXt8N0crDm4vDYevXtxlIJqiA72xzEG9ELugxQzUpUp7KDFhi2 g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870738"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870738"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:45 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472167"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472167"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:40 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:52 +0100
Message-Id: <20230215161405.187368-4-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the
 TTM_TT_FLAGs
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

New code is recommended to use the BIT macro instead of the explicit
shifts. Change the older defines so that we can keep the style consistent
with upcoming changes.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/drm/ttm/ttm_tt.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index b7d3f3843f1e..cc54be1912e1 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -83,12 +83,12 @@ struct ttm_tt {
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
 	 */
-#define TTM_TT_FLAG_SWAPPED		(1 << 0)
-#define TTM_TT_FLAG_ZERO_ALLOC		(1 << 1)
-#define TTM_TT_FLAG_EXTERNAL		(1 << 2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	(1 << 3)
+#define TTM_TT_FLAG_SWAPPED		BIT(0)
+#define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
+#define TTM_TT_FLAG_EXTERNAL		BIT(2)
+#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 
-#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
+#define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
 	uint32_t num_pages;

From patchwork Wed Feb 15 16:13:53 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141908
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id E2FFBC636D4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:14:53 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 402BE10EAF1;
	Wed, 15 Feb 2023 16:14:53 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id E3EDC10EAE2;
 Wed, 15 Feb 2023 16:14:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477691; x=1708013691;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=fjuWLn2itNr8O5y3d+HDbNx0L325SuwvFLJeYvIOXZg=;
 b=hW2adgtxefBUG0X+rKTXfWdkCet0HnNxi0gR6YlrNnBmgNXl7f8cPaC6
 7NtGr79hvgW3h0WJ1LuTKo75u63zal5LUExNTWtV7sU+sW98IhIbCWnFl
 BpWfiAT//KLPJ/amMFbAhWDwm4EKRUdWfJqB2pI0CweoLUmXxbILDwqU3
 7O821QKhP7srBdPU+frsC9l54qRn7suoJsCajiGpFH1hkz7i7kBSZnAFc
 i7OsOT4K3osQdBQihh67tmTiJAbyT80rIXCfQJOqz0HIX/c7geSIiJMz9
 K4rvkNi4OYjvO784jP5uQQtnuYj4C7s/wmYD4WnrBfESZOxPfuWFrqlJg w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870774"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870774"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:50 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472206"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472206"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:45 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:53 +0100
Message-Id: <20230215161405.187368-5-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 04/16] drm/ttm,
 drm/vmwgfx: Update the TTM swapout interface
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Update the TTM swapout interfaces for better compatibility with a shrinker.
- Replace number-of-pages int return with a long to better match the
  kernel's shrinker interface.
- The gfp_flags parameter to ttm_xx_swapout() currently only takes the
  GFP_KERNEL value and shouldn't really be needed since the shrinker we
  hook up in upcoming patches sets a allocation context to match reclaim.
- Introduce a shrink reason enumeration and a driver callback to shrink
  buffer objects.
  The TTM_SHRINK_WATERMARK reason is going to still be handled using the
  existing shmem copy, and will be used by pool types that don't lend
  themselves well to shinking (dma_alloc pool) and when drivers explicitly
  requests swapout.
  The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from a
  shrinker and is to be handled by a new driver callback, bo_shrink().
  Helpers for the new driver callback are provided in upcoming patches.

Cc: linux-graphics-maintainer@vmware.com
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c        | 38 ++++++++++++++++----
 drivers/gpu/drm/ttm/ttm_device.c    | 55 +++++++++++++++++++++--------
 drivers/gpu/drm/ttm/ttm_tt.c        | 23 ++++++------
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
 include/drm/ttm/ttm_bo.h            |  4 +--
 include/drm/ttm/ttm_device.h        | 36 +++++++++++++++++--
 include/drm/ttm/ttm_tt.h            | 17 +++++++--
 7 files changed, 136 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..e5c0970564c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
 }
 EXPORT_SYMBOL(ttm_bo_wait_ctx);
 
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
-		   gfp_t gfp_flags)
+/**
+ * ttm_bo_swapout() - Swap out or purge a buffer object
+ * @bo: The buffer object.
+ * @ctx: The ttm operation context.
+ * @reason: The swapout reason.
+ *
+ * Try to swap out or purge the contents of a system memory backed buffer
+ * object. The function needs to be called with the device's LRU lock held.
+ *
+ * Return: -EBUSY if the bo lock could not be grabbed or the object was
+ * otherwise busy. Otherwise the number of pages swapped out or negative
+ * error code on error. Iff the function didn't return -EBUSY, the
+ * LRU lock was dropped, and LRU traversal needs to restart.
+ */
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
+		    enum ttm_shrink_reason reason)
 {
 	struct ttm_place place;
 	bool locked;
 	long ret;
 
+	lockdep_assert_held(&bo->bdev->lru_lock);
+
 	/*
 	 * While the bo may already reside in SYSTEM placement, set
 	 * SYSTEM as new placement to cover also the move further below.
@@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	}
 
 	if (bo->deleted) {
+		long num_pages = bo->ttm->num_pages;
+
 		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
 		ttm_bo_put(bo);
+		if (!ret)
+			return num_pages;
 		return ret == -EBUSY ? -ENOSPC : ret;
 	}
 
@@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	 * Swap out. Buffer will be swapped in again as soon as
 	 * anyone tries to access a ttm page.
 	 */
-	if (bo->bdev->funcs->swap_notify)
-		bo->bdev->funcs->swap_notify(bo);
+	if (bo->bdev->funcs->bo_shrink && reason != TTM_SHRINK_WATERMARK) {
+		ret = bo->bdev->funcs->bo_shrink(bo, ctx);
+	} else {
+		if (bo->bdev->funcs->swap_notify)
+			bo->bdev->funcs->swap_notify(bo);
+		ret = ttm_tt_swapout(bo->bdev, bo->ttm);
+		if (!ret)
+			ret = bo->ttm->num_pages;
+	}
 
-	if (ttm_tt_is_populated(bo->ttm))
-		ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
 out:
-
 	/*
 	 * Unreserve without putting on LRU to avoid swapping out an
 	 * already swapped buffer.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index ae2f19dc9f81..7eadea07027f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -116,19 +116,28 @@ static int ttm_global_init(void)
 	return ret;
 }
 
-/*
- * A buffer object shrink method that tries to swap out the first
- * buffer object on the global::swap_lru list.
+/**
+ * ttm_global_swapout() - Select and swap out a system-memory-backed bo.
+ * @ctx: The operation context.
+ * @reason: The reason for swapout.
+ *
+ * Select, based on round-robin a TTM device and traverse the LRUs of
+ * that specific device until a suitable bo backed by system memory is found
+ * and swapped-out or purged.
+ *
+ * Return: Positive value or zero indicating the size in pages of the
+ * bo swapped out. Negative error code on error.
  */
-int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
+long ttm_global_swapout(struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason)
 {
 	struct ttm_global *glob = &ttm_glob;
 	struct ttm_device *bdev;
-	int ret = 0;
+	long ret = 0;
 
 	mutex_lock(&ttm_global_mutex);
 	list_for_each_entry(bdev, &glob->device_list, device_list) {
-		ret = ttm_device_swapout(bdev, ctx, gfp_flags);
+		ret = ttm_device_swapout(bdev, ctx, reason);
 		if (ret > 0) {
 			list_move_tail(&bdev->device_list, &glob->device_list);
 			break;
@@ -139,14 +148,29 @@ int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
 }
 EXPORT_SYMBOL(ttm_global_swapout);
 
-int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
-		       gfp_t gfp_flags)
+/**
+ * ttm_device_swapout() - Select and swap out a system-memory-backed bo.
+ * @bdev: The device whos bos are considered for swapout.
+ * @ctx: The operation context.
+ * @reason: The reason for swapout.
+ *
+ * Traverse the LRUs of a specific device until a suitable bo backed by
+ * system memory is found and swapped-out or purged.
+ *
+ * Return: Positive value or zero indicating the size in pages of the
+ * bo swapped out. Negative error code on error.
+ */
+long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason)
 {
 	struct ttm_resource_cursor cursor;
 	struct ttm_resource_manager *man;
 	struct ttm_resource *res;
 	unsigned i;
-	int ret;
+	long ret;
+
+	if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
+		return 0;
 
 	spin_lock(&bdev->lru_lock);
 	for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
@@ -156,16 +180,19 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 
 		ttm_resource_manager_for_each_res(man, &cursor, res) {
 			struct ttm_buffer_object *bo = res->bo;
-			uint32_t num_pages;
+			struct ttm_tt *tt;
 
 			if (!bo || bo->resource != res)
 				continue;
 
-			num_pages = PFN_UP(bo->base.size);
-			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
+			tt = bo->ttm;
+			if (!tt || (reason == TTM_SHRINK_PURGE &&
+				    !ttm_tt_purgeable(tt)))
+				continue;
+			ret = ttm_bo_swapout(bo, ctx, reason);
 			/* ttm_bo_swapout has dropped the lru_lock */
-			if (!ret)
-				return num_pages;
+			if (ret >= 0)
+				return ret;
 			if (ret != -EBUSY)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index ab725d9d14a6..a68c14de0161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -239,22 +239,21 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 
 /**
  * ttm_tt_swapout - swap out tt object
- *
  * @bdev: TTM device structure.
  * @ttm: The struct ttm_tt.
- * @gfp_flags: Flags to use for memory allocation.
  *
- * Swapout a TT object to a shmem_file, return number of pages swapped out or
- * negative error code.
+ * Swapout a TT object to a shmem_file.
+ *
+ * Return: number of pages swapped out or negative error code on error.
  */
-int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
-		   gfp_t gfp_flags)
+int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	loff_t size = (loff_t)ttm->num_pages << PAGE_SHIFT;
 	struct address_space *swap_space;
 	struct file *swap_storage;
 	struct page *from_page;
 	struct page *to_page;
+	gfp_t gfp_flags;
 	int i, ret;
 
 	swap_storage = shmem_file_setup("ttm swap", size, 0);
@@ -264,7 +263,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
 	}
 
 	swap_space = swap_storage->f_mapping;
-	gfp_flags &= mapping_gfp_mask(swap_space);
+	gfp_flags = GFP_KERNEL & mapping_gfp_mask(swap_space);
 
 	for (i = 0; i < ttm->num_pages; ++i) {
 		from_page = ttm->pages[i];
@@ -315,12 +314,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) >
 	       ttm_dma32_pages_limit) {
+		long r = ttm_global_swapout(ctx, TTM_SHRINK_WATERMARK);
 
-		ret = ttm_global_swapout(ctx, GFP_KERNEL);
-		if (ret == 0)
+		if (!r)
 			break;
-		if (ret < 0)
+		if (r < 0) {
+			ret = r;
 			goto error;
+		}
 	}
 
 	if (bdev->funcs->ttm_tt_populate)
@@ -379,7 +380,7 @@ static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
 {
 	struct ttm_operation_ctx ctx = { false, false };
 
-	seq_printf(m, "%d\n", ttm_global_swapout(&ctx, GFP_KERNEL));
+	seq_printf(m, "%ld\n", ttm_global_swapout(&ctx, TTM_SHRINK_SWAP));
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 2588615a2a38..292c5199d2cc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1514,7 +1514,8 @@ static int vmw_pm_freeze(struct device *kdev)
 	vmw_execbuf_release_pinned_bo(dev_priv);
 	vmw_resource_evict_all(dev_priv);
 	vmw_release_device_early(dev_priv);
-	while (ttm_device_swapout(&dev_priv->bdev, &ctx, GFP_KERNEL) > 0);
+	while (ttm_device_swapout(&dev_priv->bdev, &ctx, TTM_SHRINK_WATERMARK) > 0)
+		;
 	vmw_fifo_resource_dec(dev_priv);
 	if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
 		DRM_ERROR("Can't hibernate while 3D resources are active.\n");
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8b113c384236..6b45e0b639e0 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -375,8 +375,8 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map);
 int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map);
 void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map);
 int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo);
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
-		   gfp_t gfp_flags);
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
+		    enum ttm_shrink_reason reason);
 void ttm_bo_pin(struct ttm_buffer_object *bo);
 void ttm_bo_unpin(struct ttm_buffer_object *bo);
 int ttm_mem_evict_first(struct ttm_device *bdev,
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 4f3e81eac6f3..6bd2abf712ab 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -35,6 +35,21 @@ struct ttm_placement;
 struct ttm_buffer_object;
 struct ttm_operation_ctx;
 
+/**
+ * enum ttm_shrink_reason - Reason for shrinking system memory
+ * @TTM_SHRINK_WATERMARK - A watermark limit was reached. Not from reclaim.
+ * @TTM_SHRINK_PURGE - A request for shrinking only purged objects.
+ * @TTM_SHRINK_SWAP - A request for shrinking any object.
+ *
+ * This enum is intended for the buffer object- and shrink method selection
+ * algorithms. It's not intended to leak to or be used by TTM drivers.
+ */
+enum ttm_shrink_reason {
+	TTM_SHRINK_WATERMARK,
+	TTM_SHRINK_PURGE,
+	TTM_SHRINK_SWAP,
+};
+
 /**
  * struct ttm_global - Buffer object driver global data.
  */
@@ -207,6 +222,19 @@ struct ttm_device_funcs {
 	 * adding fences that may force a delayed delete
 	 */
 	void (*release_notify)(struct ttm_buffer_object *bo);
+
+	/**
+	 * Shrink the bo's system pages, Either by swapping or by purging.
+	 * @bo: Bo the system pages of which are to be shrunken.
+	 * @ctx: Operation ctx. In particular the driver callback should
+	 *       adhere to the no_wait_gpu and interruptible fields.
+	 *
+	 * This is also notifying the driver that the bo is about to be
+	 * shrunken and the driver should take care to unbind any GPU bindings
+	 * and to note that the content is purged if @bo->ttm is purgeable.
+	 */
+	long (*bo_shrink)(struct ttm_buffer_object *bo,
+			  struct ttm_operation_ctx *ctx);
 };
 
 /**
@@ -268,9 +296,11 @@ struct ttm_device {
 	struct workqueue_struct *wq;
 };
 
-int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
-int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
-		       gfp_t gfp_flags);
+long ttm_global_swapout(struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason);
+
+long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason);
 
 static inline struct ttm_resource_manager *
 ttm_manager_type(struct ttm_device *bdev, int mem_type)
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index cc54be1912e1..627168eba8f6 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -87,6 +87,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
 #define TTM_TT_FLAG_EXTERNAL		BIT(2)
 #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
+#define TTM_TT_FLAG_DONTNEED		BIT(4)
 
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
@@ -180,8 +181,8 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm);
  * Swap in a previously swap out ttm_tt.
  */
 int ttm_tt_swapin(struct ttm_tt *ttm);
-int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
-		   gfp_t gfp_flags);
+
+int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm);
 
 /**
  * ttm_tt_populate - allocate pages for a ttm
@@ -223,6 +224,18 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 
+/**
+ * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is purgeable
+ * @tt: The struct ttm_tt to consider.
+ *
+ * Return: Whether the contents is purgeable in the sence that the owner
+ * doesn't mind losing it as long as it gets notified.
+ */
+static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
+{
+	return tt->page_flags & TTM_TT_FLAG_DONTNEED;
+}
+
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
 

From patchwork Wed Feb 15 16:13:54 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141909
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 7910EC636D7
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:00 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id CB42110EAE5;
	Wed, 15 Feb 2023 16:14:59 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id E87C110EAE0;
 Wed, 15 Feb 2023 16:14:56 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477697; x=1708013697;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=Pe00gkUs+MCnr90AkFGX3dvjDjmFzOer/NWfRABWSrI=;
 b=JvfZCL+iGATfk0iYFCgcFfWFVLgEkmHz3ybguLTM2oxcZsudLdvqgf6+
 NKeOvBeCTLVq91VqC/eERddwRHc6QTotbKYn7KIdKaP3hm9LbhWsZ2imF
 DNIDzZXWD3NJ65upmU3bxOeYiFF5AbMVEETA1BLOS+zR4Se78HoIXidk3
 25qSoYhOscvNIaPltOLNZMggmwlJRlXA+k1z6XghLlpW3lVbpwlRA+fBu
 /V/P7bB447rqYXHacsKI/AqGdHGAmUzKnvN4B0Fl8fq6fjcfovB+f46w1
 F/g4OpRJEqjxNvGh9vPvrIYAyl5YaPhUEk5qU+mxN4DOwA5FPgubQQyb9 Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870822"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870822"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:56 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472248"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472248"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:50 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:54 +0100
Message-Id: <20230215161405.187368-6-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout()
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Unexport ttm_global_swapout() since it is not used outside of TTM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 7eadea07027f..a3cac42bb456 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -146,7 +146,6 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 	mutex_unlock(&ttm_global_mutex);
 	return ret;
 }
-EXPORT_SYMBOL(ttm_global_swapout);
 
 /**
  * ttm_device_swapout() - Select and swap out a system-memory-backed bo.

From patchwork Wed Feb 15 16:13:55 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141910
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 12A69C636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:03 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 7C06D10EAE2;
	Wed, 15 Feb 2023 16:15:02 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 651C310EAE2;
 Wed, 15 Feb 2023 16:15:01 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477701; x=1708013701;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=6/IPmi7YTji8TNgTR+ciw3L6Y9HiGiLfBZFPL4xWuE4=;
 b=KxTfYbT7QsDFIR/gxuxbkzjvu9PV6liTvVQHNxsmfs/naowOziL61ZHC
 RY60VhzWtzC00w475gzgcHArNzIomgXhzzANqf8rmT9epWrtfZzkvlhBY
 c/ozcIdeDClepWbo3IDL4VXy73NEPa5BfXnZk7e+vT2mC6sHNKNoEfrrV
 Yzr2qhSyIJ/QML9vwzUaDkPBWnhVB8+maffWArDROhRMsW2Kd9xuy8sUY
 KK+PSD7i0qS6MjQ/g2QZMnQgyvdP6IUutba/KB01icXEFAIaerSyIGahd
 THtuVIlPLRYyNuNTHEbGHe1Sc/tV+nNCpel/NAPmB9Lwge3y85oqwkQXe w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870850"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870850"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:00 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472313"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472313"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:14:55 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:55 +0100
Message-Id: <20230215161405.187368-7-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 06/16] drm/ttm: Don't use watermark
 accounting on shrinkable pools
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Clarify the meaning of the ttm_tt pages_limit watermarks as the max
number of pages not accessible by shrinkers, and update accordingly so that
memory allocated by TTM devices that support shrinking is not
accounted against those limits. In particular this means that devices
using the dma_alloc pool will still be using the watermark method.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
 drivers/gpu/drm/ttm/ttm_tt.c     | 43 +++++++++++++++++++-------------
 include/drm/ttm/ttm_pool.h       | 15 +++++++++++
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index a3cac42bb456..e0a2be3ed13d 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -168,7 +168,8 @@ long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 	unsigned i;
 	long ret;
 
-	if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
+	if (reason != TTM_SHRINK_WATERMARK &&
+	    (!bdev->funcs->bo_shrink || !ttm_pool_can_shrink(&bdev->pool)))
 		return 0;
 
 	spin_lock(&bdev->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a68c14de0161..771e5f3c2fee 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -54,6 +54,21 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
+			      const struct ttm_tt *tt)
+{
+	return !!bdev->funcs->bo_shrink &&
+		ttm_pool_can_shrink(&bdev->pool) &&
+		!(tt->page_flags & TTM_TT_FLAG_EXTERNAL);
+}
+
+static void ttm_tt_mod_allocated(bool dma32, long value)
+{
+	atomic_long_add(value, &ttm_pages_allocated);
+	if (dma32)
+		atomic_long_add(value, &ttm_dma32_pages_allocated);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -304,12 +319,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	if (ttm_tt_is_populated(ttm))
 		return 0;
 
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_add(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, ttm->num_pages);
 
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) >
@@ -343,12 +355,10 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	return 0;
 
 error:
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_sub(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, -(long)ttm->num_pages);
+
 	return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -363,12 +373,9 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	else
 		ttm_pool_free(&bdev->pool, ttm);
 
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_sub(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, -(long)ttm->num_pages);
 
 	ttm->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
 }
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index ef09b23d29e3..c1200552892e 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -89,4 +89,19 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
+/**
+ * ttm_pool_can_shrink - Whether page allocations from this pool are shrinkable
+ * @pool: The pool.
+ *
+ * Return: true if shrinkable, false if not.
+ */
+static inline bool ttm_pool_can_shrink(const struct ttm_pool *pool)
+{
+	/*
+	 * The dma_alloc pool pages can't be inserted into the
+	 * swap cache. Nor can they be split.
+	 */
+	return !pool->use_dma_alloc;
+}
+
 #endif

From patchwork Wed Feb 15 16:13:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141911
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id B39B2C636D4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:10 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id E79B510EB02;
	Wed, 15 Feb 2023 16:15:09 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1A9F410E236;
 Wed, 15 Feb 2023 16:15:07 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477707; x=1708013707;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=o5ahpK5T7/o9yYCKobDBT3y231WpHzebkC3ym7Te37s=;
 b=bY3xYJDkkGiTCLjg66SSIAXDc/cy9Q7X2ZS5TtSExUb2Q0PFnPD2KT0U
 DJqt0YLPiES0R38GcHnWaXDS8Lu0xgSZXIq+Nd3QZ0FstIMkKFrmOuIOn
 LUg7u+8+nQBQuTIWX2SK28PP+7SCmQ9w+ipmb+zVsE9l+z5Q7jEfvfk32
 Hdym+XuGrb/go3iVItTWYLEz2HHepPJxBgbXOUtC6T3YnUQO4ZX0spcOm
 yCD5xFM9PSvKZYZDUbfQRrIi/CCWZov2XLoAP4jdKwXtlyLqctwOttByA
 LdLTOfUVn1+d+DlHCKRUkvd2WuCc9DN+AXR+MESX+mXtzZwcmgqX6A6eZ Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870897"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870897"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472414"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472414"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:00 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:56 +0100
Message-Id: <20230215161405.187368-8-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 07/16] drm/ttm: Reduce the number of used
 allocation orders for TTM pages
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can be nicer
to the system and avoid splitting gigantic pages. On top of this we also
include the 64K page size in the page sizes tried, since that appears to
be a common size for GPU applications.

Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 58 ++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
  * cause they are rather slow compared to alloc_pages+map.
  */
 
+#define pr_fmt(fmt) "[TTM POOL] " fmt
+
 #include <linux/module.h>
 #include <linux/dma-mapping.h>
 #include <linux/debugfs.h>
@@ -47,6 +49,18 @@
 
 #include "ttm_module.h"
 
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
 /**
  * struct ttm_pool_dma - Helper object for coherent DMA mappings
  *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
 	}
 }
 
+static unsigned int ttm_pool_select_order(unsigned int order, pgoff_t num_pages)
+{
+	unsigned int *cur_order = ttm_pool_orders;
+
+	order = min_t(unsigned int, __fls(num_pages), order);
+	while (order < *cur_order)
+		++cur_order;
+
+	return *cur_order;
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
-	     num_pages;
-	     order = min_t(unsigned int, order, __fls(num_pages))) {
+	order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+	for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
 		struct ttm_pool_type *pt;
 
 		page_caching = tt->caching;
@@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	if (use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
 	}
@@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
 	if (pool->use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
 
@@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
 	unsigned int i;
 
 	seq_puts(m, "\t ");
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " ---%2u---", i);
 	seq_puts(m, "\n");
 }
@@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
 	seq_puts(m, "\n");
 }
@@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
+	if (TTM_64K_ORDER < TTM_MAX_ORDER)
+		ttm_pool_orders[1] = TTM_64K_ORDER;
+
+	pr_debug("Used orders are %u %u %u\n", ttm_pool_orders[0],
+		 ttm_pool_orders[1], ttm_pool_orders[2]);
+
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
 		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_fini(&global_write_combined[i]);
 		ttm_pool_type_fini(&global_uncached[i]);
 

From patchwork Wed Feb 15 16:13:57 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141912
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id CBD51C64EC4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:15 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 41FEF10E236;
	Wed, 15 Feb 2023 16:15:15 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id B657410EAFB;
 Wed, 15 Feb 2023 16:15:11 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477712; x=1708013712;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=n4ovsEqlT6l3tqJByLmbZrFyXkrQBTKpHU8ibleASdU=;
 b=QwX0o0xa5h930OFKF6BmWF0uNc+EHDRASzlTBRvQLvmimmCr5CcdRJ5x
 RgbnN27FBD5IienO8nVDVEv9TGqPHMnglT9GXRUDX3m3JMccgeAHQDk36
 +R3YDa+eYvaXEmKcKcTJ5vJtMMz/zK6CH+Lelcb5yjutT0o5hg+HtWKYs
 pl3w79GV7KE9K5/H4QIf32eWez4hlQne0elczF7AzdXnss9JfRPuOIiYN
 rXrz5YJEUwTUttuHO/t4w342kOprzIChYIQnyc134WdiP4rLoixaBcgrv
 r/AVUtC7Apnp7AiurA1pGIcL1iEL6B1cRG8zCvBpl5c1k/1mxlK4whie6 A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870932"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870932"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472471"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472471"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:06 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:57 +0100
Message-Id: <20230215161405.187368-9-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker
 accounting
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Register a TTM system memory-backed object shrinker and add
accounting for shrinkable and purgeable pages. For the shrinker to work,
the driver needs to register the bo_shrink callback which is responsible
for unbinding from GPU and the dma layer if needed. Helpers for that
callback to actually perform shrinking will be introduced in upcoming
patches.

Note that we can't lock the ttm_global_mutex from within the shrinker
scan() function as that might cause a deadlock issue. To fix that, add and
use a mutex which is used for global device list manipulation only and
make sure it isn't held when registering the shrinker.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c |  26 ++++---
 drivers/gpu/drm/ttm/ttm_tt.c     | 112 +++++++++++++++++++++++++++++--
 include/drm/ttm/ttm_tt.h         |   2 +
 3 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e0a2be3ed13d..ce98752d2d32 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -36,10 +36,10 @@
 
 #include "ttm_module.h"
 
-/*
- * ttm_global_mutex - protecting the global state
- */
+/* ttm_global_mutex - protects the global state init and fini. */
 static DEFINE_MUTEX(ttm_global_mutex);
+/* ttm_global_list_mutex - protects the device list. */
+static DEFINE_MUTEX(ttm_global_list_mutex);
 static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
@@ -54,6 +54,7 @@ static void ttm_global_release(void)
 	if (--ttm_glob_use_count > 0)
 		goto out;
 
+	ttm_tt_mgr_fini();
 	ttm_pool_mgr_fini();
 	debugfs_remove(ttm_debugfs_root);
 
@@ -102,7 +103,10 @@ static int ttm_global_init(void)
 		goto out;
 	}
 
+	mutex_lock(&ttm_global_list_mutex);
 	INIT_LIST_HEAD(&glob->device_list);
+	mutex_unlock(&ttm_global_list_mutex);
+
 	atomic_set(&glob->bo_count, 0);
 
 	debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
@@ -135,7 +139,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 	struct ttm_device *bdev;
 	long ret = 0;
 
-	mutex_lock(&ttm_global_mutex);
+	mutex_lock(&ttm_global_list_mutex);
 	list_for_each_entry(bdev, &glob->device_list, device_list) {
 		ret = ttm_device_swapout(bdev, ctx, reason);
 		if (ret > 0) {
@@ -143,7 +147,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 			break;
 		}
 	}
-	mutex_unlock(&ttm_global_mutex);
+	mutex_unlock(&ttm_global_list_mutex);
 	return ret;
 }
 
@@ -247,9 +251,9 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
 	spin_lock_init(&bdev->lru_lock);
 	INIT_LIST_HEAD(&bdev->pinned);
 	bdev->dev_mapping = mapping;
-	mutex_lock(&ttm_global_mutex);
+	mutex_lock(&ttm_global_list_mutex);
 	list_add_tail(&bdev->device_list, &glob->device_list);
-	mutex_unlock(&ttm_global_mutex);
+	mutex_unlock(&ttm_global_list_mutex);
 
 	return 0;
 }
@@ -260,14 +264,14 @@ void ttm_device_fini(struct ttm_device *bdev)
 	struct ttm_resource_manager *man;
 	unsigned i;
 
+	mutex_lock(&ttm_global_list_mutex);
+	list_del(&bdev->device_list);
+	mutex_unlock(&ttm_global_list_mutex);
+
 	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
 	ttm_resource_manager_set_used(man, false);
 	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
 
-	mutex_lock(&ttm_global_mutex);
-	list_del(&bdev->device_list);
-	mutex_unlock(&ttm_global_mutex);
-
 	drain_workqueue(bdev->wq);
 	destroy_workqueue(bdev->wq);
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 771e5f3c2fee..5a57117c21ec 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -37,6 +37,7 @@
 #include <linux/module.h>
 #include <drm/drm_cache.h>
 #include <drm/ttm/ttm_bo.h>
+#include <drm/ttm/ttm_pool.h>
 #include <drm/ttm/ttm_tt.h>
 
 #include "ttm_module.h"
@@ -54,6 +55,11 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static long shrinkable_pages;
+static long purgeable_pages;
+static DEFINE_RWLOCK(shrinkable_lock);
+static struct shrinker mm_shrinker;
+
 static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
 			      const struct ttm_tt *tt)
 {
@@ -69,6 +75,14 @@ static void ttm_tt_mod_allocated(bool dma32, long value)
 		atomic_long_add(value, &ttm_dma32_pages_allocated);
 }
 
+static void ttm_tt_mod_shrinkable_pages(long shrinkable, long purgeable)
+{
+	write_lock(&shrinkable_lock);
+	shrinkable_pages += shrinkable;
+	purgeable_pages += purgeable;
+	write_unlock(&shrinkable_lock);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -352,6 +366,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
 		}
 	}
 
+	if (ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_shrinkable_pages(ttm->num_pages, 0);
+
 	return 0;
 
 error:
@@ -368,6 +385,13 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	if (!ttm_tt_is_populated(ttm))
 		return;
 
+	if (ttm_tt_shrinkable(bdev, ttm)) {
+		if (ttm_tt_purgeable(ttm))
+			ttm_tt_mod_shrinkable_pages(0, -(long)ttm->num_pages);
+		else
+			ttm_tt_mod_shrinkable_pages(-(long)ttm->num_pages, 0);
+	}
+
 	if (bdev->funcs->ttm_tt_unpopulate)
 		bdev->funcs->ttm_tt_unpopulate(bdev, ttm);
 	else
@@ -394,11 +418,86 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
 
 #endif
 
+static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
+					   struct shrink_control *sc)
+{
+	unsigned long num_pages;
 
-/*
- * ttm_tt_mgr_init - register with the MM shrinker
- *
- * Register with the MM shrinker for swapping out BOs.
+	num_pages = get_nr_swap_pages();
+	read_lock(&shrinkable_lock);
+	num_pages = min_t(unsigned long, num_pages, shrinkable_pages);
+	num_pages += purgeable_pages;
+	read_unlock(&shrinkable_lock);
+
+	return num_pages ? num_pages : SHRINK_EMPTY;
+}
+
+static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
+					  struct shrink_control *sc)
+{
+	bool is_kswapd = current_is_kswapd();
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = !is_kswapd,
+	};
+	unsigned long nr_to_scan, freed = 0;
+	long ret;
+
+	sc->nr_scanned = 0;
+	nr_to_scan = sc->nr_to_scan;
+
+	while (freed < nr_to_scan) {
+		ret = ttm_global_swapout(&ctx, TTM_SHRINK_PURGE);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+	if (freed < nr_to_scan)
+		nr_to_scan -= freed;
+	else
+		nr_to_scan = 0;
+	if (!nr_to_scan)
+		return freed ? freed : SHRINK_STOP;
+
+	while (freed < nr_to_scan) {
+		ret = ttm_global_swapout(&ctx, TTM_SHRINK_SWAP);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+
+	return freed ? freed : SHRINK_STOP;
+}
+
+/**
+ * ttm_tt_mgr_fini() - Check shrinkable accounting consistensy and remove
+ * the shrinker.
+ */
+void ttm_tt_mgr_fini(void)
+{
+	if (WARN_ON_ONCE(atomic_long_read(&ttm_pages_allocated) ||
+			 atomic_long_read(&ttm_dma32_pages_allocated) ||
+			 shrinkable_pages || purgeable_pages)) {
+		pr_warn("Inconsistent ttm_tt accounting:\n");
+		pr_warn("pages %ld dma32 %ld shrinkable %ld purgeable %ld\n",
+			atomic_long_read(&ttm_pages_allocated),
+			atomic_long_read(&ttm_dma32_pages_allocated),
+			shrinkable_pages, purgeable_pages);
+	}
+
+	unregister_shrinker(&mm_shrinker);
+}
+
+/**
+ * ttm_tt_mgr_init() - Provide watermark limits and register the shrinker.
+ * @num_pages - Number of pages TTM is allowed to pin.
+ * @num_dma32_pages - Number of dma32 pages TTM is allowed to pin.
  */
 void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages)
 {
@@ -412,6 +511,11 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages)
 
 	if (!ttm_dma32_pages_limit)
 		ttm_dma32_pages_limit = num_dma32_pages;
+
+	mm_shrinker.count_objects = ttm_tt_shrinker_count;
+	mm_shrinker.scan_objects = ttm_tt_shrinker_scan;
+	mm_shrinker.seeks = DEFAULT_SEEKS;
+	(void)register_shrinker(&mm_shrinker, "ttm-objects");
 }
 
 static void ttm_kmap_iter_tt_map_local(struct ttm_kmap_iter *iter,
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 627168eba8f6..3f99787e2b93 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -221,6 +221,8 @@ static inline void ttm_tt_mark_for_clear(struct ttm_tt *ttm)
 
 void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
 
+void ttm_tt_mgr_fini(void);
+
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 

From patchwork Wed Feb 15 16:13:58 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141913
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 6545EC64EC4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:22 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id CB9AF10EB07;
	Wed, 15 Feb 2023 16:15:21 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 271B410E269;
 Wed, 15 Feb 2023 16:15:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477718; x=1708013718;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=CA6Gd5JNe0QIjwqEaLurGUUyFsS+XKhHwV/GCIHG1zw=;
 b=AakKCsX99plUdfGH8SbAL292jriIjFAAB/B8NEXd3K8m9/Tprp06fDRS
 gpIRR1Bz91mGVJgZWspbGoGd7mB086ejEfJuwjzvkp0cn7tG1OsAIpXMQ
 uoQshIlHSbVCqBe/R+ks9P5A0zCu4OHYzK516CmgZbht6Y4/2+PT1zx6Z
 GTDsVnd1VjNl4TNfFcMMKM62h3YJ2etqvJkGaUQEtFNYWIfiPazu/Zdel
 KXo+CPaTQduRuSvwO4E2Z2GdvCtkEtoIhqHjAEt7kUSrEm0INu9KBB7g6
 idi0YaUBLvGBz+m09MxDWAw1GlIavmFAGm4okEth8HxKI0QuR6Yk5dfGX Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393870966"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393870966"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472509"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472509"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:10 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:58 +0100
Message-Id: <20230215161405.187368-10-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 09/16] drm/ttm: Introduce shrink throttling.
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Since pages are not immediately freed by the TTM shrinker but rather
inserted into the swap cache, the system will keep on calling the
shrinker rapidly filling the swap cache which has a negative impact
on system performance.

When shrinking, throttle on the number of pages present in the swap
cache.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 40 ++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5a57117c21ec..848adf2a623e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -432,6 +432,42 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
 	return num_pages ? num_pages : SHRINK_EMPTY;
 }
 
+#define TTM_SWAP_MIN_SWAP_PAGES (SZ_128M >> PAGE_SHIFT)
+#define TTM_SWAP_MAX_SWAPCACHE_PAGES (SZ_1G >> PAGE_SHIFT)
+static unsigned long ttm_tt_shrinker_throttle(unsigned long pages)
+{
+	unsigned long
+		tmp = get_nr_swap_pages();
+
+	/*
+	 * Draining available swap space too far will trigger
+	 * systemd-oomd even if there are a huge number of dirty pages
+	 * available for laundry and free in the swap cache. Don't drain
+	 * the available swap-space too far.
+	 */
+	if (tmp > TTM_SWAP_MIN_SWAP_PAGES)
+		tmp -= TTM_SWAP_MIN_SWAP_PAGES;
+	else
+		tmp = 0;
+
+	pages = min(tmp, pages);
+
+	/*
+	 * Our shrinker doesn't immediately free pages unless they belong
+	 * to purgeable objects. Rather they are inserted into the swap-cache.
+	 * But the system doesn't really get this and continues to call our
+	 * shrinker thinking it's still out of memory, when it could just
+	 * laundry pages in the swap cache and free them. So throttle on the
+	 * number of pages in the swap cache.
+	 */
+
+	tmp = total_swapcache_pages();
+	if (tmp > TTM_SWAP_MAX_SWAPCACHE_PAGES)
+		pages = 0;
+
+	return pages;
+}
+
 static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
 					  struct shrink_control *sc)
 {
@@ -459,6 +495,10 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
 		nr_to_scan -= freed;
 	else
 		nr_to_scan = 0;
+
+	if (nr_to_scan)
+		nr_to_scan = ttm_tt_shrinker_throttle(nr_to_scan);
+
 	if (!nr_to_scan)
 		return freed ? freed : SHRINK_STOP;
 

From patchwork Wed Feb 15 16:13:59 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141914
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 2C67AC636D7
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:25 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 71C5F10EB08;
	Wed, 15 Feb 2023 16:15:24 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id D7D2210EB07;
 Wed, 15 Feb 2023 16:15:20 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477720; x=1708013720;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=+J9Q8IYoZ1+1On4d1sDlco0YPYtSi+y9zkG53x3D21Y=;
 b=jBbsypqT1kYJ93s/K0e2PAYMWNW5wQ+AFUt73L84PxBN2FPb7S8B3VXw
 iYSBcwE7lPjE6xBuDsqUtWuIFVFx/jbxpyOw3efgDn3ExOvXtiFTkQhow
 7w6VyCDr7e9KZb7t+pd3FAjs+fZXjUEn2x7rDzZ8j7lQ0mm2YpT2y8fLk
 F1V9PyDGWVLGMJg0UaI3/+EB9Tz5w5RZYpTOBCH+ax/jyUTP9dXNhmN8Z
 VjUKb7BzJOgbvnNIjRiLJou2e2D8na2dJaHia5pbW0wi3YWVgNilkJc9u
 XGaYGb/Kg+YwvMrvNuKivYBqqED+eXAzQ3m3suEISgGCKt5fP5JvDaiKY A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871016"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871016"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:20 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472532"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472532"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:15 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:13:59 +0100
Message-Id: <20230215161405.187368-11-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 10/16] drm/ttm: Remove pinned bos from
 shrinkable accounting
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Pinned bos aren't shinkable and needs to be removed from the shrinkable
accounting. Do that, and in the process constify the tt argument to
ttm_tt_is_populated.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |  7 +++++++
 drivers/gpu/drm/ttm/ttm_tt.c | 22 ++++++++++++++++++++++
 include/drm/ttm/ttm_tt.h     |  6 +++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e5c0970564c0..e59e2a4605d0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -650,6 +650,10 @@ void ttm_bo_pin(struct ttm_buffer_object *bo)
 {
 	dma_resv_assert_held(bo->base.resv);
 	WARN_ON_ONCE(!kref_read(&bo->kref));
+
+	if (!bo->pin_count && bo->ttm)
+		ttm_tt_set_pinned(bo->bdev, bo->ttm);
+
 	spin_lock(&bo->bdev->lru_lock);
 	if (bo->resource)
 		ttm_resource_del_bulk_move(bo->resource, bo);
@@ -671,6 +675,9 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo)
 	if (WARN_ON_ONCE(!bo->pin_count))
 		return;
 
+	if (bo->pin_count == 1 && bo->ttm)
+		ttm_tt_set_unpinned(bo->bdev, bo->ttm);
+
 	spin_lock(&bo->bdev->lru_lock);
 	--bo->pin_count;
 	if (bo->resource)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 848adf2a623e..a39c617c7a8e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -83,6 +83,28 @@ static void ttm_tt_mod_shrinkable_pages(long shrinkable, long purgeable)
 	write_unlock(&shrinkable_lock);
 }
 
+/**
+ * ttm_tt_set_pinned() - Modify the shinkable accounting when pinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the pinned bo.
+ */
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
+{
+	if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+		ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages, 0);
+}
+
+/**
+ * ttm_tt_set_unpinned() - Modify the shinkable accounting when unpinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the no longer pinned bo.
+ */
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
+{
+	if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+		ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 3f99787e2b93..69467671c2dd 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -118,7 +118,7 @@ struct ttm_kmap_iter_tt {
 	pgprot_t prot;
 };
 
-static inline bool ttm_tt_is_populated(struct ttm_tt *tt)
+static inline bool ttm_tt_is_populated(const struct ttm_tt *tt)
 {
 	return tt->page_flags & TTM_TT_FLAG_PRIV_POPULATED;
 }
@@ -238,6 +238,10 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
 	return tt->page_flags & TTM_TT_FLAG_DONTNEED;
 }
 
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
 

From patchwork Wed Feb 15 16:14:00 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141915
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id B6556C64EC4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:42 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 942D410EB18;
	Wed, 15 Feb 2023 16:15:30 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 0F26810E580;
 Wed, 15 Feb 2023 16:15:26 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477726; x=1708013726;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=xm8ETaxZeJqXVpmDZaBdgzwHZhDfpl++98/mmYFZhMY=;
 b=MdTDt4G+Cts9G6KE9KowH6y9E/0q8fH4Xnow0VqszvAWal9rZKgHOzNo
 XTpi8rCOHvEziQAswHTXjAMcA7NJJtKWsiUV2Trn2xzl8JKmpVLSt0sBM
 q9jjDrja+z46SFncSAXzYNA/CultyxRseIVy1Wa2+r3XI+FQuTLJAjchX
 RxguWk/4q2qatmrcNT+w777bNmfDB/rjgeunGEwrqoPAiqvzsH0fhQ+oX
 3L4ydiPBnAzFNfFMCrWnOi8aYDFNe9+aDfrQnzXLolz+Y+JBoFV0EutFw
 E34gkBZAMUcmAlNRYeXiwtzyWCmVro1XFqxfwiJDWcCjlerHEXVjKJTlp Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871053"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871053"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:25 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472576"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472576"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:20 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:00 +0100
Message-Id: <20230215161405.187368-12-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 11/16] drm/ttm: Add a simple api to set /
 clear purgeable ttm_tt content
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

In the absence of free swap space, a shrinker could still efficiently
free memory the content of which is no longer needed, and graphics
drivers typically has an interface to mark buffer object content as
no longer needed.

Add a possibility to propagate this to TTM, so that the shrinker
accounting and shrinker actions can be updated accordingly.

Moving forward, we will probably want this interface on the bo level and
have bo move support for it, but for now we strictly only need it for
the shrinker. Another option would be to have the drivers do the
purgeable vs shrinkable accounting.

This still leaves the responsibility to the driver to assign proper
LRU priority to purgeable buffer object so that the shrinker finds those
objects early during LRU traversal.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 59 ++++++++++++++++++++++++++++++++++++
 include/drm/ttm/ttm_tt.h     |  3 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a39c617c7a8e..c63be8f5ed2a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -105,6 +105,65 @@ void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
 		ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
 }
 
+/**
+ * ttm_tt_set_dontneed() - Mark ttm_tt content as not needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as not needed for the shrinker accounting.
+ * This also means that the content will not be backed up on shrinking,
+ * but rather freed immediately.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * already backed up and was purged by this call.
+ */
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	if (ttm_tt_is_populated(tt)) {
+		if (!ttm_tt_purgeable(tt)) {
+			tt->page_flags |= TTM_TT_FLAG_DONTNEED;
+			if (ttm_tt_shrinkable(bdev, tt))
+				ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages,
+							    tt->num_pages);
+		}
+		return 0;
+	}
+
+	if (tt->swap_storage)
+		fput(tt->swap_storage);
+	tt->swap_storage = NULL;
+
+	return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_dontneed);
+
+/**
+ * ttm_tt_set_willneed() - Mark tt_tt content as needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as needed and update the shrinker accounting
+ * accordingly.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * was already purged.
+ */
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	if (ttm_tt_is_populated(tt)) {
+		if (ttm_tt_purgeable(tt)) {
+			tt->page_flags &= ~TTM_TT_FLAG_DONTNEED;
+			if (ttm_tt_shrinkable(bdev, tt))
+				ttm_tt_mod_shrinkable_pages(tt->num_pages,
+							    -(long)tt->num_pages);
+		}
+		return 0;
+	}
+
+	return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_willneed);
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 69467671c2dd..abb17527f76c 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -241,6 +241,9 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
 void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
 
 void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt);
+
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt);
 
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>

From patchwork Wed Feb 15 16:14:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141916
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 0B69CC636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:45 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 751D610EB0B;
	Wed, 15 Feb 2023 16:15:34 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 34B5B10EB0C;
 Wed, 15 Feb 2023 16:15:31 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477731; x=1708013731;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=0GOXh0gS6sWajQ1gHnS/5gVSVEN8A7soG6TR0c6oZTI=;
 b=gYVRHTJWoISOyS3PvWQZ/noq5kGrXANFEWovJbLKRLziGKsKsm6Cwknh
 q6gB14PCYW3qGLPvcbMMOclrv00aZqmyu8VSHEyZQi4UmQanc4c3J1HDM
 Q5HurouLk2KrbgsZWisms4/acyxCBKe8Y2hsig/pckMTlY8bs/op9zqwv
 pdkeeaQSu3vERhObT9uUdHTj9a+xG2OaptKajJt8GEo8Rjl3lEKXqSKW2
 hTz15TonmVwSUK5f3BPY6rgEk3mHs/Ijbn8GbidXh7xBhsa/H9aLA4WQ3
 A8LIcJYyMqgf/ZCTxnVSYAABohC4A4oFReFdcTWqvoKGu5ons3tERZUaE A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871091"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871091"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:30 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472640"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472640"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:25 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:01 +0100
Message-Id: <20230215161405.187368-13-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 12/16] mm: Add interfaces to back up and
 recover folio contents using swap
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 Dave Hansen <dave.hansen@intel.com>,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 linux-graphics-maintainer@vmware.com, Peter Xu <peterx@redhat.com>,
 Johannes Weiner <hannes@cmpxchg.org>, Dave Airlie <airlied@redhat.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

GPU drivers have traditionally used shmem to back up GPU buffer contents
for swap on physical memory shortage. Some integrated GPU drivers use
shmem files as the backing storage for their GPU buffers, other drivers,
in particular drivers that need a Write-Combining caching strategy on
system pages, (but also drivers for discrete gpus in general) need to copy
to shmem on anticipated memory shortage.

The latter strategy does not lend itself very well to shrinker usage,
since shmem memory needs to be allocated and page trylocking of pagecache
pages need to be performed from reclaim context and both are prone to
failures. That makes the approach very fragile at best.

Add interfaces for GPU drivers to directly insert pages into the
swap-cache, thereby bypassing shmem and avoiding the shmem page
allocation and locking at shrink time completely, as well as the
content copy.

Also add a kunit test for experimenting with the interface functionality,
currently it seems PMD size folios doesn't work properly. Needs
further investigation if this is a viable approach.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-mm@kvack.org

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/linux/swap.h        |  10 ++
 mm/Kconfig                  |  18 ++++
 mm/Makefile                 |   2 +
 mm/swap_backup_folio.c      | 178 ++++++++++++++++++++++++++++++++++++
 mm/swap_backup_folio_test.c | 111 ++++++++++++++++++++++
 5 files changed, 319 insertions(+)
 create mode 100644 mm/swap_backup_folio.c
 create mode 100644 mm/swap_backup_folio_test.c

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0ceed49516ad..fc38c72fe9ab 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -706,5 +706,15 @@ static inline bool mem_cgroup_swap_full(struct folio *folio)
 }
 #endif
 
+#ifdef CONFIG_SWAP_BACKUP_FOLIO
+swp_entry_t swap_backup_folio(struct folio *folio, bool writeback,
+			      gfp_t folio_gfp, gfp_t alloc_gfp);
+
+int swap_copy_folio(swp_entry_t swap, struct page *page, unsigned long index,
+		    bool killable);
+
+void swap_drop_folio(swp_entry_t swap);
+#endif
+
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..b9e0a40e9e1a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,10 @@ config ZSMALLOC_STAT
 	  information to userspace via debugfs.
 	  If unsure, say N.
 
+config SWAP_BACKUP_FOLIO
+       bool
+       default n
+
 menu "SLAB allocator options"
 
 choice
@@ -1183,6 +1187,20 @@ config LRU_GEN_STATS
 	  This option has a per-memcg and per-node memory overhead.
 # }
 
+config SWAP_BACKUP_FOLIO_KUNIT_TEST
+       tristate "KUnit tests for swap_backup_folio() functionality" if !KUNIT_ALL_TESTS
+       depends on SWAP && KUNIT && SWAP_BACKUP_FOLIO
+       help
+	 This builds unit tests for the swap_backup_folio_functionality().
+	 This option is not useful for distributions or general kernels,
+	 but only for kernel developers working on MM swap functionality.
+
+	 For more information on KUnit and unit tests in general,
+	 please refer to the KUnit documentation in
+	 Documentation/dev-tools/kunit/.
+
+	 If in doubt, say "N".
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 8e105e5b3e29..91cb9c73e16e 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -138,3 +138,5 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
 obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO) += swap_backup_folio.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO_KUNIT_TEST) += swap_backup_folio_test.o
diff --git a/mm/swap_backup_folio.c b/mm/swap_backup_folio.c
new file mode 100644
index 000000000000..f77ca478e625
--- /dev/null
+++ b/mm/swap_backup_folio.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#include <linux/mm_inline.h>
+#include "swap.h"
+
+/**
+ * swap_backup_folio() - Insert an isolated folio into the swap-cache.
+ * @folio: The folio to insert.
+ * @writeback: Whether to perform immediate writeback.
+ * @folio_gfp: The gfp value used when the folio was allocated. Used for
+ *             cgroup charging only.
+ * @alloc_fgp: The gfp value used for swap cache radix tree memory allocations.
+ *
+ * Insert a folio into the swap cache and get a swp_entry_t back as a reference.
+ * If the swap cache folio should be subject of immediate writeback to
+ * a swap device, @writeback should be set to true.
+ * After a call to swap_backup_folio() the caller can
+ * drop its folio reference and use swap_copy_folio() to get the folio
+ * content back, or swap_drop_folio() to drop it completely.
+ * Currently only PAGE_SIZE folios work, or if CONFIG_THP_SWAP is
+ * enabled, HPAGE_PMD_NR*PAGE_SIZE may work as well, although that
+ * needs further testing.
+ *
+ * Return: A swp_entry_t. If its .val field is zero, an error occurred.
+ */
+swp_entry_t swap_backup_folio(struct folio *folio, bool writeback,
+			      gfp_t folio_gfp, gfp_t alloc_gfp)
+{
+	swp_entry_t swap = {};
+
+	if (VM_WARN_ON_ONCE_FOLIO(folio_nr_pages(folio) != 1 &&
+				  !(IS_ENABLED(CONFIG_THP_SWAP) &&
+				    folio_nr_pages(folio) == HPAGE_PMD_NR),
+				  folio))
+		return swap;
+
+	if (VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio) != 1 ||
+				  folio_test_lru(folio) ||
+				  folio_test_locked(folio), folio))
+		return swap;
+
+	/*
+	 * Typically called from reclaim so use folio_trylock. If the folio
+	 * is isolated with refcount == 1, then this trylock should always
+	 * succeed.
+	 */
+	if (!folio_trylock(folio))
+		return swap;
+
+	__folio_mark_uptodate(folio);
+	__folio_set_swapbacked(folio);
+
+	mem_cgroup_charge(folio, NULL, folio_gfp);
+
+	swap = folio_alloc_swap(folio);
+	if (!swap.val)
+		goto out;
+
+	if (add_to_swap_cache(folio, swap, alloc_gfp, NULL) == 0) {
+		int ret = -EINVAL;
+
+		swap_shmem_alloc(swap);
+		folio_add_lru(folio);
+		lru_add_drain();
+
+		/* Stolen from pageout(). */
+		if (writeback && folio_clear_dirty_for_io(folio)) {
+			struct writeback_control wbc = {
+				.sync_mode = WB_SYNC_NONE,
+				.nr_to_write = SWAP_CLUSTER_MAX,
+				.range_start = 0,
+				.range_end = LLONG_MAX,
+				.for_reclaim = 1,
+			};
+
+			folio_set_reclaim(folio);
+			ret = swap_writepage(folio_page(folio, 0), &wbc);
+			if (!folio_test_writeback(folio))
+				folio_clear_reclaim(folio);
+		}
+
+		if (ret)
+			folio_unlock(folio);
+		return swap;
+	}
+
+	put_swap_folio(folio, swap);
+out:
+	folio_clear_swapbacked(folio);
+	folio_mark_dirty(folio);
+	folio_unlock(folio);
+	mem_cgroup_uncharge(folio);
+
+	return swap;
+}
+EXPORT_SYMBOL(swap_backup_folio);
+
+/**
+ * swap_copy_folio() - Copy folio content that was previously backed up
+ * @swap: The swp_entry_t returned from swap_backup_folio().
+ * @to_page: The page to copy to.
+ * @index: The index to the source page in the folio represented by @swap.
+ * @killable: Whether to perform sleeping operations killable.
+ *
+ * Copies content that was previously backed up using swap_backup_folio(),
+ * to the destination page to_page. The swp_entry_t @swap is not freed, and
+ * copying can thus be done multiple times using @swap.
+ *
+ * Return: Zero on success, negative error code on error. In particular,
+ * -EINTR may be returned if a fatal signal is pending during wait for
+ * page-lock or wait for writeback and @killable is set to true.
+ */
+int swap_copy_folio(swp_entry_t swap, struct page *to_page,
+		    unsigned long index, bool killable)
+{
+	struct folio *folio = swap_cache_get_folio(swap, NULL, 0);
+	int ret;
+
+	if (!folio) {
+		struct vm_fault vmf = {};
+		struct page *page;
+
+		page = swap_cluster_readahead(swap, GFP_HIGHUSER_MOVABLE, &vmf);
+		if (page)
+			folio = page_folio(page);
+	}
+
+	if (!folio)
+		return -ENOMEM;
+
+	if (killable) {
+		ret = __folio_lock_killable(folio);
+		if (ret)
+			goto out_err;
+	} else {
+		folio_lock(folio);
+	}
+
+	VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio) ||
+			      folio_swap_entry(folio).val != swap.val ||
+			      !folio_test_uptodate(folio), folio);
+
+	if (killable) {
+		ret = folio_wait_writeback_killable(folio);
+		if (ret)
+			goto out_err;
+	} else {
+		folio_wait_writeback(folio);
+	}
+
+	arch_swap_restore(swap, folio);
+	folio_unlock(folio);
+
+	copy_highpage(to_page, folio_page(folio, index));
+out_err:
+	folio_put(folio);
+	return ret;
+}
+EXPORT_SYMBOL(swap_copy_folio);
+
+/**
+ * swap_drop_folio - Drop a swap entry and its associated swap cache folio
+ * if any.
+ * @swap: The swap entry.
+ *
+ * Releases resources associated with a swap entry returned from
+ * swap_backup_folio().
+ */
+void swap_drop_folio(swp_entry_t swap)
+{
+	free_swap_and_cache(swap);
+}
+EXPORT_SYMBOL(swap_drop_folio);
diff --git a/mm/swap_backup_folio_test.c b/mm/swap_backup_folio_test.c
new file mode 100644
index 000000000000..34cde56d2a57
--- /dev/null
+++ b/mm/swap_backup_folio_test.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: MIT or GPL-2.0
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <kunit/test.h>
+#include <linux/delay.h>
+#include <linux/swap.h>
+#include <linux/sysinfo.h>
+
+struct gpu_swapped_page {
+	struct list_head link;
+	swp_entry_t swap;
+};
+
+static void swap_backup_test(struct kunit *test)
+{
+	gfp_t gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+	struct gpu_swapped_page *gsp, *next;
+	struct folio *folio;
+	LIST_HEAD(list);
+	long i = 0L;
+	long num_folios;
+	unsigned long avail_ram;
+
+	avail_ram = si_mem_available() << PAGE_SHIFT;
+	kunit_info(test, "Available RAM is %lu MiB.\n", avail_ram / SZ_1M);
+	num_folios = get_nr_swap_pages();
+	num_folios = min_t(long, num_folios, avail_ram >> PAGE_SHIFT);
+
+	kunit_info(test, "Trying %ld swap pages\n", num_folios);
+
+	do {
+		/*
+		 * Expect folio_alloc() (out-of-physical-memory) or
+		 * swap_backup_folio() (out-of-swap-space) to fail before
+		 * this kzalloc().
+		 */
+		gsp = kzalloc(sizeof(*gsp), GFP_KERNEL);
+		if (!gsp) {
+			KUNIT_FAIL(test, "alloc gsp failed.\n");
+			break;
+		}
+
+		folio = vma_alloc_folio(gfp, 0, NULL, 0, false);
+		if (!folio) {
+			kunit_info(test, "folio_alloc failed.\n");
+			kfree(gsp);
+			break;
+		}
+
+		folio_mark_dirty(folio);
+
+		/* Use true instead of false here to trigger immediate writeback. */
+		gsp->swap = swap_backup_folio(folio, false, gfp,
+					      GFP_KERNEL | __GFP_HIGH |
+					      __GFP_NOWARN);
+		if (gsp->swap.val == 0) {
+			kunit_info(test, "swap_backup_folio() failed.\n");
+			folio_put(folio);
+			kfree(gsp);
+			break;
+		}
+
+		list_add_tail(&gsp->link, &list);
+		folio_put(folio);
+		cond_resched();
+		if (i % 1000 == 0)
+			kunit_info(test, "Backed up %ld\n", i);
+	} while (i++ < num_folios);
+
+	i = 0;
+	list_for_each_entry_safe(gsp, next, &list, link) {
+		int ret;
+
+		folio = folio_alloc(GFP_HIGHUSER, 0);
+		if (!folio) {
+			KUNIT_FAIL(test, "Allocation of readback folio failed.\n");
+		} else {
+			ret = swap_copy_folio(gsp->swap, folio_page(folio, 0),
+					      0, false);
+			if (ret)
+				KUNIT_FAIL(test, "swap_copy_folio() failed.\n");
+		}
+		folio_put(folio);
+		swap_drop_folio(gsp->swap);
+		list_del(&gsp->link);
+		kfree(gsp);
+		i++;
+		cond_resched();
+		if (i % 1000 == 0)
+			kunit_info(test, "Recovered %ld\n", i);
+	}
+
+	kunit_info(test, "Recover_total: %ld\n", i);
+}
+
+static struct kunit_case swap_backup_tests[] = {
+	KUNIT_CASE(swap_backup_test),
+	{}
+};
+
+static struct kunit_suite swap_backup_test_suite = {
+	.name = "swap_backup_folio",
+	.test_cases = swap_backup_tests,
+};
+
+kunit_test_suite(swap_backup_test_suite);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("Dual MIT/GPL");

From patchwork Wed Feb 15 16:14:02 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141918
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 97AEDC636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:16:08 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 0628D10EB2F;
	Wed, 15 Feb 2023 16:15:53 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 3AE8010EB14;
 Wed, 15 Feb 2023 16:15:46 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477746; x=1708013746;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=zt1DKEeSn7xRuS1yXMlaTraoBCdZSI/zSbCQL3fb6rc=;
 b=e5MetwQttMSnG+a0Xv3t5z1jCwN1J4LnMZeLADkWaxj2j81ftRy8hbMC
 hXOif9VfO94wbcALNqyCuQ5Byo92CRwl0/ym0v5ERSyrtx3CzSLV2r1JM
 PDgXvXZD0qVS+xyu91idqxYfj4vvfjiyUtrePXfYBIztv8nMFyKv8B7zx
 Xp4XQ24mHq1kccbOkDVI9KCrvB8N4H77kVr0YlWE/2Za1d/jXUXI6JbMU
 mkMwmuT+4SB5OjtF5m0gdMWLe97EbZ2qDwwwQxGq1wfTH9bv73IuYaHuZ
 /+Qpw4lj8WhOrKiu7CtlGJmS58KOa9jsMd7x61WcMzrxXkDgt0yrSDvCz w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871107"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871107"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:38 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472708"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472708"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:30 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:02 +0100
Message-Id: <20230215161405.187368-14-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 13/16] drm/ttm: Make the call to
 ttm_tt_populate() interruptible when faulting
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

When swapping in, or under memory pressure ttm_tt_populate() may sleep
for a substantiable amount of time. Allow interrupts during the sleep.
This will also allow us to inject -EINTR errors during swapin in upcoming
patches.

Also avoid returning VM_FAULT_OOM, since that will confuse the core
mm, making it print out a confused message and retrying the fault.
Return VM_FAULT_SIGBUS also under OOM conditions.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 3ecda6db24b8..80f106bfe385 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -218,14 +218,21 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 	prot = ttm_io_prot(bo, bo->resource, prot);
 	if (!bo->resource->bus.is_iomem) {
 		struct ttm_operation_ctx ctx = {
-			.interruptible = false,
+			.interruptible = true,
 			.no_wait_gpu = false,
 			.force_alloc = true
 		};
 
 		ttm = bo->ttm;
-		if (ttm_tt_populate(bdev, bo->ttm, &ctx))
-			return VM_FAULT_OOM;
+		err = ttm_tt_populate(bdev, bo->ttm, &ctx);
+		if (err) {
+			if (err == -EINTR || err == -ERESTARTSYS ||
+			    err == -EAGAIN)
+				return VM_FAULT_NOPAGE;
+
+			pr_debug("TTM fault hit %pe.\n", ERR_PTR(err));
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		/* Iomem should not be marked encrypted */
 		prot = pgprot_decrypted(prot);

From patchwork Wed Feb 15 16:14:03 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141917
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 6C450C636CC
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:15:54 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id F166710EB1A;
	Wed, 15 Feb 2023 16:15:49 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 3D47910EB14;
 Wed, 15 Feb 2023 16:15:47 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477747; x=1708013747;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=TcHhg0q7WVCkFTZEWmcu3BUhdxit+IJqyVFdQl4YAf0=;
 b=RAcbzupHx6khcVcKtGVLn6a4rRlQTn1e6sXl8plRaNaj7/n1E0YNzddk
 p1lQno4kxf5rShkd3KRlyvQy05hJ5fjf82Jh7/SLXcuhYk5edMUQ9nADL
 z2YszlB1Y/A3YR1UZRLPo7+ybJ14nI3rEbzsIwmussS4K2ZZIOLBXHM04
 txh6RpVk+4Swr87gC9EN4hd/4zcWRzrkOCQekxbFBBCoaInjInD+5kSL3
 FE+IhH0JsR/irck4kd+hVystVuI3aHkjjf1t3nwB4NQjjbKUb6J8bdXYi
 g379P/dbeeewJSOWnyCZ6UT885jjb2j4XH1BXSU6YB9XDA+f5Q/4Ic4jV Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871127"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871127"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:45 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472782"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472782"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:35 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:03 +0100
Message-Id: <20230215161405.187368-15-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Provide a helper to be used by the driver bo_shrink() callback to either
insert the pages of a struct ttm_tt into the swap-cache or to purge them
if the struct ttm_tt is purgeable. For pages with write-combined or
uncached linear kernel map, that linear kernel map is first changed to
cached.

Release pages with as little intermediate memory allocation as
possible, however some memory might be allocated during swapout for the
swap space radix tree.

Due to swapout- or swapin errors, allow partially swapped out struct
ttm_tt's, although mark them as swapped out stopping them from being
swapped out a second time. More details in the ttm_pool.c DOC section.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Kconfig        |   1 +
 drivers/gpu/drm/ttm/ttm_pool.c | 403 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/ttm/ttm_tt.c   |  34 +++
 include/drm/ttm/ttm_pool.h     |   4 +
 include/drm/ttm/ttm_tt.h       |  10 +
 5 files changed, 437 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index dc0f94f02a82..1efd33411a92 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -196,6 +196,7 @@ source "drivers/gpu/drm/display/Kconfig"
 config DRM_TTM
 	tristate
 	depends on DRM && MMU
+	select SWAP_BACKUP_FOLIO
 	help
 	  GPU memory management subsystem for devices with multiple
 	  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 8787fb6a218b..319998b4a325 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -38,6 +38,7 @@
 #include <linux/debugfs.h>
 #include <linux/highmem.h>
 #include <linux/sched/mm.h>
+#include <linux/swap.h>
 
 #ifdef CONFIG_X86
 #include <asm/set_memory.h>
@@ -72,6 +73,32 @@ struct ttm_pool_dma {
 	unsigned long vaddr;
 };
 
+/**
+ * struct ttm_pool_tt_restore - State representing restore from swap.
+ * @alloced_pages: Total number of already allocated pages for the ttm_tt.
+ * @restored_pages: Number of (sub) pages restored from swap for this
+ *		     chunk of 1 << @order pages.
+ * @first_page: The ttm page ptr representing for @old_pages[0].
+ * @caching_divide: Page pointer where subsequent pages are cached.
+ * @old_pages: Backup copy of page pointers that were replaced by the new
+ *	       page allocation.
+ * @pool: The pool used for page allocation while restoring.
+ * @order: The order of the last page allocated while restoring.
+ *
+ * Recovery from swap space might fail when we've recovered less than the
+ * full ttm_tt. In order not to loose any data (yet), keep information
+ * around that allows us to restart a failed ttm swap-space recovery.
+ */
+struct ttm_pool_tt_restore {
+	pgoff_t alloced_pages;
+	pgoff_t restored_pages;
+	struct page **first_page;
+	struct page **caching_divide;
+	struct page *old_pages[1 << TTM_MAX_ORDER];
+	struct ttm_pool *pool;
+	unsigned int order;
+};
+
 static unsigned long page_pool_size;
 
 MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
@@ -91,6 +118,23 @@ static struct shrinker mm_shrinker;
 
 static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
 
+static struct page *ttm_pool_swap_to_page_ptr(swp_entry_t swap)
+{
+	return (struct page *)(swap.val << 1 | 1);
+}
+
+static swp_entry_t ttm_pool_page_ptr_to_swap(const struct page *p)
+{
+	swp_entry_t swap = {.val = ((unsigned long)p) >> 1};
+
+	return swap;
+}
+
+static bool ttm_pool_page_ptr_is_swap(const struct page *p)
+{
+	return ((unsigned long)p) & 1;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -361,11 +405,99 @@ static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
 	return p->private;
 }
 
+/*
+ * To be able to insert single pages into the swap cache directly,
+ * we need to split multi-order page allocations and make them look
+ * like single page-allocations.
+ */
+static void ttm_pool_split_for_swap(struct ttm_pool *pool, struct page *p)
+{
+	unsigned int order = ttm_pool_page_order(pool, p);
+	pgoff_t nr;
+
+	if (!order)
+		return;
+
+	split_page(p, order);
+	nr = 1UL << order;
+	while (nr--)
+		(p++)->private = 0;
+}
+
+/**
+ * DOC: Partial shrinking and restoration of a struct ttm_tt.
+ *
+ * Swapout using swap_backup_folio() and swapin using swap_copy_folio() may fail.
+ * The former most likely due to lack of swap-space or memory, the latter due
+ * to lack of memory or because of signal interruption during waits.
+ *
+ * Swapout failure is easily handled by using a ttm_tt pages vector that holds
+ * both swap entries and page pointers. This has to be taken into account when
+ * restoring such a ttm_tt from swap, and when freeing it while swapped out.
+ * When restoring, for simplicity, new pages are actually allocated from the
+ * pool and the contents of any old pages are copied in and then the old pages
+ * are released.
+ *
+ * For swapin failures, the struct ttm_pool_tt_restore holds sufficient state
+ * to be able to resume an interrupted restore, and that structure is freed once
+ * the restoration is complete. If the struct ttm_tt is destroyed while there
+ * is a valid struct ttm_pool_tt_restore attached, that is also properly taken
+ * care of.
+ */
+
+static bool ttm_pool_restore_valid(const struct ttm_pool_tt_restore *restore)
+{
+	return restore && restore->restored_pages < (1 << restore->order);
+}
+
+static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
+			   struct ttm_operation_ctx *ctx)
+{
+	unsigned int i, nr = 1 << restore->order;
+	int ret = 0;
+
+	if (!ttm_pool_restore_valid(restore))
+		return 0;
+
+	for (i = restore->restored_pages; i < nr; ++i) {
+		struct page *p = restore->old_pages[i];
+
+		if (ttm_pool_page_ptr_is_swap(p)) {
+			swp_entry_t swap = ttm_pool_page_ptr_to_swap(p);
+
+			if (swap.val == 0)
+				continue;
+
+			ret = swap_copy_folio(swap, restore->first_page[i], 0,
+					      ctx->interruptible);
+			if (ret)
+				break;
+
+			swap_drop_folio(swap);
+		} else if (p) {
+			/*
+			 * We could probably avoid splitting the old page
+			 * using clever logic, but ATM we don't care.
+			 */
+			ttm_pool_split_for_swap(restore->pool, p);
+			copy_highpage(restore->first_page[i], p);
+			__free_pages(p, 0);
+		}
+
+		restore->restored_pages++;
+		restore->old_pages[i] = NULL;
+		cond_resched();
+	}
+
+	return ret;
+}
+
 /* Called when we got a page, either from a pool or newly allocated */
 static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 				   struct page *p, dma_addr_t **dma_addr,
 				   unsigned long *num_pages,
-				   struct page ***pages)
+				   struct page ***pages,
+				   struct ttm_pool_tt_restore *restore)
 {
 	unsigned int i;
 	int r;
@@ -376,6 +508,16 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 			return r;
 	}
 
+	if (restore) {
+		memcpy(restore->old_pages, *pages,
+		       (1 << order) * sizeof(*restore->old_pages));
+		memset(*pages, 0, (1 << order) * sizeof(**pages));
+		restore->order = order;
+		restore->restored_pages = 0;
+		restore->first_page = *pages;
+		restore->alloced_pages += 1UL << order;
+	}
+
 	*num_pages -= 1 << order;
 	for (i = 1 << order; i; --i, ++(*pages), ++p)
 		**pages = p;
@@ -387,32 +529,48 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
 			    struct page **caching_divide,
 			    enum ttm_caching initial_caching,
 			    enum ttm_caching subseq_caching,
-			    pgoff_t num_pages)
+			    pgoff_t start_page, pgoff_t end_page)
 {
 	enum ttm_caching caching = subseq_caching;
-	struct page **pages = tt->pages;
+	struct page **pages = tt->pages + start_page;
 	unsigned int order;
 	pgoff_t i, nr;
 
 	if (pool && caching_divide)
 		caching = initial_caching;
 
-	for (i = 0; i < num_pages; i += nr, pages += nr) {
+	for (i = start_page; i < end_page; i += nr, pages += nr) {
 		struct ttm_pool_type *pt = NULL;
+		struct page *p = *pages;
 
 		if (unlikely(caching_divide == pages))
 			caching = subseq_caching;
 
-		order = ttm_pool_page_order(pool, *pages);
-		nr = (1UL << order);
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+		if (ttm_pool_page_ptr_is_swap(p)) {
+			swp_entry_t swap = ttm_pool_page_ptr_to_swap(p);
+
+			nr = 1;
+			if (swap.val != 0)
+				swap_drop_folio(swap);
+			continue;
+		}
+
+		if (pool) {
+			order = ttm_pool_page_order(pool, p);
+			nr = (1UL << order);
+			if (tt->dma_address)
+				ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+			pt = ttm_pool_select_type(pool, caching, order);
+		} else {
+			order = p->private;
+			nr = (1UL << order);
+		}
 
-		pt = ttm_pool_select_type(pool, caching, order);
 		if (pt)
-			ttm_pool_type_give(pt, *pages);
+			ttm_pool_type_give(pt, p);
 		else
-			ttm_pool_free_page(pool, caching, order, *pages);
+			ttm_pool_free_page(pool, caching, order, p);
 	}
 }
 
@@ -467,6 +625,28 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		gfp_flags |= GFP_HIGHUSER;
 
 	order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN) {
+		if (!tt->restore) {
+			tt->restore = kvzalloc(sizeof(*tt->restore),
+					       GFP_KERNEL);
+			if (!tt->restore)
+				return -ENOMEM;
+		} else if (ttm_pool_restore_valid(tt->restore)) {
+			struct ttm_pool_tt_restore *restore = tt->restore;
+
+			num_pages -= restore->alloced_pages;
+			order = ttm_pool_select_order(restore->order, num_pages);
+			pages += restore->alloced_pages;
+			r = ttm_pool_swapin(restore, ctx);
+			if (r)
+				return r;
+			caching = restore->caching_divide;
+		}
+
+		tt->restore->pool = pool;
+	}
+
 	for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
 		struct ttm_pool_type *pt;
 
@@ -484,11 +664,18 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
 							    &num_pages,
-							    &pages);
+							    &pages,
+							    tt->restore);
 				if (r)
 					goto error_free_page;
 
 				caching = pages;
+				if (ttm_pool_restore_valid(tt->restore)) {
+					r = ttm_pool_swapin(tt->restore, ctx);
+					if (r)
+						goto error_free_all;
+				}
+
 				if (num_pages < (1 << order))
 					break;
 
@@ -508,9 +695,17 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
-						    &num_pages, &pages);
+						    &num_pages, &pages,
+						    tt->restore);
 			if (r)
 				goto error_free_page;
+
+			if (ttm_pool_restore_valid(tt->restore)) {
+				r = ttm_pool_swapin(tt->restore, ctx);
+				if (r)
+					goto error_free_all;
+			}
+
 			if (PageHighMem(p))
 				caching = pages;
 		}
@@ -529,15 +724,29 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	if (r)
 		goto error_free_all;
 
+	if (tt->restore) {
+		kvfree(tt->restore);
+		tt->restore = NULL;
+	}
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN)
+		tt->page_flags &= ~(TTM_TT_FLAG_PRIV_SHRUNKEN |
+				    TTM_TT_FLAG_SWAPPED);
+
 	return 0;
 
 error_free_page:
 	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN) {
+		tt->restore->caching_divide = caching;
+		return r;
+	}
+
 	num_pages = tt->num_pages - num_pages;
 	__ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
-			num_pages);
+			0, num_pages);
 
 	return r;
 }
@@ -554,13 +763,177 @@ EXPORT_SYMBOL(ttm_pool_alloc);
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
 	__ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
-			tt->num_pages);
+			0, tt->num_pages);
 
 	while (atomic_long_read(&allocated_pages) > page_pool_size)
 		ttm_pool_shrink();
 }
 EXPORT_SYMBOL(ttm_pool_free);
 
+/**
+ * ttm_pool_release_shrunken() - Release content of a swapped-out struct ttm_tt
+ * @tt: The struct ttm_tt.
+ *
+ * Release swap entries with associated content or any remaining pages of
+ * a swapped-out struct ttm_tt.
+ */
+void ttm_pool_release_shrunken(struct ttm_tt *tt)
+{
+	struct ttm_pool_tt_restore *restore;
+	struct page **caching_divide = NULL;
+	struct ttm_pool *pool = NULL;
+	pgoff_t i, start_page = 0;
+	swp_entry_t swap;
+
+	if (!(tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN))
+		return;
+
+	restore = tt->restore;
+
+	if (ttm_pool_restore_valid(restore)) {
+		pgoff_t nr = 1UL << restore->order;
+
+		for (i = restore->restored_pages; i < nr; ++i) {
+			struct page *p = restore->old_pages[i];
+
+			if (ttm_pool_page_ptr_is_swap(p)) {
+				swap = ttm_pool_page_ptr_to_swap(p);
+				if (swap.val == 0)
+					continue;
+
+				swap_drop_folio(swap);
+			} else if (p) {
+				ttm_pool_split_for_swap(restore->pool, p);
+				__free_pages(p, 0);
+			}
+		}
+	}
+
+	if (restore) {
+		pool = restore->pool;
+		caching_divide = restore->caching_divide;
+		start_page = restore->alloced_pages;
+		/* Pages that might be dma-mapped and non-cached */
+		__ttm_pool_free(pool, tt, caching_divide, tt->caching,
+				ttm_cached, 0, start_page);
+	}
+
+	/* Shrunken pages. Cached and not dma-mapped. */
+	__ttm_pool_free(NULL, tt, NULL, ttm_cached, ttm_cached, start_page,
+			tt->num_pages);
+
+	if (restore) {
+		kvfree(restore);
+		tt->restore = NULL;
+	}
+
+	tt->page_flags &= ~(TTM_TT_FLAG_PRIV_SHRUNKEN | TTM_TT_FLAG_SWAPPED);
+}
+
+/**
+ * ttm_pool_shrink_tt() - Swap out or purge a struct ttm_tt
+ * @pool: The pool used when allocating the struct ttm_tt.
+ * @ttm: The struct ttm_tt.
+ *
+ * Swap out or purge a struct ttm_tt. If @ttm is marked purgeable, then
+ * all pages will be freed directly to the system rather than to the pool
+ * they were allocated from, making the function behave similarly to
+ * ttm_pool_free(). If @ttm is not marked purgeable, the pages will be
+ * inserted into the swap cache instead, exchanged for a swap entry.
+ * A subsequent call to ttm_pool_alloc() will then read back the content and
+ * a subsequent call to ttm_pool_release_shrunken() will drop it.
+ * If swapout of a page fails for whatever reason, @ttm will still be
+ * partially swapped out, retaining those pages for which swapout fails.
+ *
+ * @Return: Number of pages actually swapped out or freed, or negative
+ * error code on error.
+ */
+long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm)
+{
+	struct page *page;
+	struct folio *folio;
+	swp_entry_t swap;
+	gfp_t alloc_gfp;
+	gfp_t gfp;
+	int ret = 0;
+	pgoff_t shrunken = 0;
+	pgoff_t i, num_pages;
+	bool purge = ttm_tt_purgeable(ttm);
+
+	if ((!get_nr_swap_pages() && purge) ||
+	    pool->use_dma_alloc ||
+	    (ttm->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN))
+		return -EBUSY;
+
+#ifdef CONFIG_X86
+	/* Anything returned to the system needs to be cached. */
+	if (ttm->caching != ttm_cached)
+		set_pages_array_wb(ttm->pages, ttm->num_pages);
+#endif
+
+	if (ttm->dma_address || purge) {
+		for (i = 0; i < ttm->num_pages; i += num_pages) {
+			unsigned int order;
+
+			page = ttm->pages[i];
+			if (unlikely(!page))
+				continue;
+
+			order = 1UL << ttm_pool_page_order(pool, page);
+			num_pages = 1UL << order;
+			if (ttm->dma_address)
+				ttm_pool_unmap(pool, ttm->dma_address[i],
+					       num_pages);
+			if (purge) {
+				shrunken += num_pages;
+				__free_pages(page, order);
+				memset(ttm->pages + i, 0,
+				       num_pages * sizeof(*ttm->pages));
+			}
+		}
+	}
+
+	if (purge)
+		return shrunken;
+
+	if (pool->use_dma32)
+		gfp = GFP_DMA32;
+	else
+		gfp = GFP_HIGHUSER;
+
+	alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN;
+	if (current_is_kswapd())
+		alloc_gfp |= __GFP_NOMEMALLOC;
+
+	for (i = 0; i < ttm->num_pages; ++i) {
+		page = ttm->pages[i];
+		if (unlikely(!page))
+			continue;
+
+		ttm_pool_split_for_swap(pool, page);
+
+		folio = page_folio(page);
+		folio_mark_dirty(folio);
+		swap = swap_backup_folio(folio, false, gfp, alloc_gfp);
+		if (swap.val) {
+			ttm->pages[i] = ttm_pool_swap_to_page_ptr(swap);
+			folio_put(folio);
+			shrunken++;
+		} else {
+			/* We allow partially shrunken tts */
+			ret = -ENOMEM;
+			break;
+		}
+		cond_resched();
+	}
+
+	if (shrunken)
+		ttm->page_flags |= (TTM_TT_FLAG_PRIV_SHRUNKEN |
+				    TTM_TT_FLAG_SWAPPED);
+
+	return shrunken ? shrunken : ret;
+}
+
 /**
  * ttm_pool_init - Initialize a pool
  *
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index c63be8f5ed2a..8ac4a9cba34d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -133,6 +133,8 @@ int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt)
 		fput(tt->swap_storage);
 	tt->swap_storage = NULL;
 
+	ttm_pool_release_shrunken(tt);
+
 	return -EALREADY;
 }
 EXPORT_SYMBOL(ttm_tt_set_dontneed);
@@ -253,6 +255,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 	ttm->swap_storage = NULL;
 	ttm->sg = bo->sg;
 	ttm->caching = caching;
+	ttm->restore = NULL;
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
@@ -277,6 +280,8 @@ void ttm_tt_fini(struct ttm_tt *ttm)
 		fput(ttm->swap_storage);
 	ttm->swap_storage = NULL;
 
+	ttm_pool_release_shrunken(ttm);
+
 	if (ttm->pages)
 		kvfree(ttm->pages);
 	else
@@ -347,6 +352,35 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 	return ret;
 }
 
+/**
+ * ttm_tt_shrink() - Helper for the driver bo_shrink() method.
+ * @bdev: The TTM device.
+ * @tt: The struct ttm_tt.
+ *
+ * Helper for a TTM driver to use from the bo_shrink() method to shrink
+ * a struct ttm_tt, after it has done the necessary unbinding. This function
+ * will update the page accounting and call ttm_pool_shrink_tt to free pages
+ * or move them to the swap cache.
+ *
+ * Return: Number of pages freed or swapped out, or negative error code on
+ * error.
+ */
+long ttm_tt_shrink(struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	long ret = ttm_pool_shrink_tt(&bdev->pool, tt);
+
+	if (ret > 0) {
+		tt->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
+		if (ttm_tt_purgeable(tt))
+			ttm_tt_mod_shrinkable_pages(0, -(long)tt->num_pages);
+		else
+			ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages, 0);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(ttm_tt_shrink);
+
 /**
  * ttm_tt_swapout - swap out tt object
  * @bdev: TTM device structure.
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index c1200552892e..bfe14138a992 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -86,6 +86,10 @@ void ttm_pool_fini(struct ttm_pool *pool);
 
 int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
 
+void ttm_pool_release_shrunken(struct ttm_tt *tt);
+
+long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm);
+
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index abb17527f76c..0fa71292b676 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -37,6 +37,7 @@ struct ttm_tt;
 struct ttm_resource;
 struct ttm_buffer_object;
 struct ttm_operation_ctx;
+struct ttm_pool_tt_restore;
 
 /**
  * struct ttm_tt - This is a structure holding the pages, caching- and aperture
@@ -79,6 +80,10 @@ struct ttm_tt {
 	 *   page_flags = TTM_TT_FLAG_EXTERNAL |
 	 *		  TTM_TT_FLAG_EXTERNAL_MAPPABLE;
 	 *
+	 * TTM_TT_FLAG_PRIV_SHRUNKEN: TTM internal only. This is set if the
+	 * struct ttm_tt has been (possibly partially) swapped out to the
+	 * swap cache.
+	 *
 	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
@@ -89,6 +94,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 #define TTM_TT_FLAG_DONTNEED		BIT(4)
 
+#define TTM_TT_FLAG_PRIV_SHRUNKEN	BIT(30)
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
@@ -104,6 +110,8 @@ struct ttm_tt {
 	 * ttm_caching.
 	 */
 	enum ttm_caching caching;
+	/** @restore: Swap restore state. Drivers keep off. */
+	struct ttm_pool_tt_restore *restore;
 };
 
 /**
@@ -226,6 +234,8 @@ void ttm_tt_mgr_fini(void);
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 
+long ttm_tt_shrink(struct ttm_device *bdev, struct ttm_tt *tt);
+
 /**
  * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is purgeable
  * @tt: The struct ttm_tt to consider.

From patchwork Wed Feb 15 16:14:04 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141919
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id BC476C636D4
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:16:11 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 610B910EB02;
	Wed, 15 Feb 2023 16:15:58 +0000 (UTC)
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 9A88510EB26;
 Wed, 15 Feb 2023 16:15:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477751; x=1708013751;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=lg54Wj3Uus7sgYkxzG0fkQLBuUlNaWkXXIuWPp8+ayM=;
 b=UdK5Mro7bgLWRglXkZgifToheaq4mfXffpULsFJs0Hik7Ky09bl/ehzH
 UEdGNYEshB95cY46F4Zcz4tCWTAvsxQ8I43UxI94BJF5Tlsxpgf5k0+HD
 GMjrSaUKjHlLdMeXOPWKtY1OYwfLnTWUpBC5Lj9oO9tB5BSbHZQbKfqS5
 F7LvgDTdLIVcaLpVSEVl4fGXen2Yn226PJ8QeJoHfiyjSEKZBfdG1Xpaz
 YyND4zOsNOiJfQ3n3/PEeqJfhMa8W3aC38LAJ4H+ZwIppmhGcVaaI08xF
 Zf17T+cIaZGBy5+DHMxuPrhBs9X/pC34fC2LgOsVYIbwF1f6+sgTxzPmi A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="393871158"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="393871158"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:46 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472824"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472824"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:41 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:04 +0100
Message-Id: <20230215161405.187368-16-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 15/16] drm/ttm: Use fault-injection to test
 error paths
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Use fault-injection to test partial TTM swapout and interrupted swapin.
Return -EINTR for swapin to test the callers ability to handle and
restart the swapin, and on swapout perform a partial swapout to test that
the swapin and release_shrunken functionality.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Kconfig        | 10 ++++++++++
 drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 1efd33411a92..a78eed9af2c1 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -202,6 +202,16 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_TTM_SHRINK_FAULT_INJECT
+	bool "Enable fault injection during TTM shrinking"
+	depends on DRM_TTM
+	default n
+	help
+	  Inject recoverable failures during TTM shrinking and recovery of
+	  shrunken objects. For DRM driver developers only.
+
+	  If in doubt, choose N.
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 319998b4a325..d7c604593689 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -453,6 +453,7 @@ static bool ttm_pool_restore_valid(const struct ttm_pool_tt_restore *restore)
 static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
 			   struct ttm_operation_ctx *ctx)
 {
+	static unsigned long __maybe_unused swappedin;
 	unsigned int i, nr = 1 << restore->order;
 	int ret = 0;
 
@@ -468,6 +469,13 @@ static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
 			if (swap.val == 0)
 				continue;
 
+			if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT) &&
+			    ctx->interruptible &&
+			    ++swappedin % 100 == 0) {
+				ret = -EINTR;
+				break;
+			}
+
 			ret = swap_copy_folio(swap, restore->first_page[i], 0,
 					      ctx->interruptible);
 			if (ret)
@@ -905,7 +913,14 @@ long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm)
 	if (current_is_kswapd())
 		alloc_gfp |= __GFP_NOMEMALLOC;
 
-	for (i = 0; i < ttm->num_pages; ++i) {
+	num_pages = ttm->num_pages;
+
+	/* Pretend doing fault injection by shrinking only half of the pages. */
+
+	if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT))
+		num_pages = DIV_ROUND_UP(num_pages, 2);
+
+	for (i = 0; i < num_pages; ++i) {
 		page = ttm->pages[i];
 		if (unlikely(!page))
 			continue;

From patchwork Wed Feb 15 16:14:05 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
X-Patchwork-Id: 13141920
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id E7C94C636D7
	for <intel-gfx@archiver.kernel.org>; Wed, 15 Feb 2023 16:16:10 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 1A81110EB21;
	Wed, 15 Feb 2023 16:15:56 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 673B610EB31;
 Wed, 15 Feb 2023 16:15:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1676477754; x=1708013754;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=uHHTc21WA9wMzOHL8IDszyygo8f/rZl9Do07vkRiEvc=;
 b=ZQE0VYCLe9x6glpIQtlQOHMOJIyOQPM/sB6k/4OEX7EFWUccPt/xc0Cd
 u+nfWwiqwqGW1hoMOxv7RausT7/f3Utpe+VTx3svwh4TjGS4wbDHSFhdl
 MerS887DUZyo+bavPt2RTour/XLW2ALt8p5H/AwSIlPPNciUkCr95sOeY
 Euo08pG2dYdBek1N6CyzMyqC0lUGFMJvV9Q5JMMLCPyTQ3tB4fxdTISF2
 D7L0TTv5KzHeTPpRaT00ZJ4qVgzWdsrGWDA2ssGSzGKHNmxJ3lmbzybjf
 dWZ4ykjeeOnaE/EPGM4pLqcYmZj8zvyAwE6SrxXHaP3mq5N0CvQbELINy g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="358890262"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="358890262"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:52 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10622"; a="758472863"
X-IronPort-AV: E=Sophos;i="5.97,300,1669104000"; d="scan'208";a="758472863"
Received: from auliel-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com)
 ([10.249.254.14])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 15 Feb 2023 08:15:46 -0800
From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: dri-devel@lists.freedesktop.org
Date: Wed, 15 Feb 2023 17:14:05 +0100
Message-Id: <20230215161405.187368-17-thomas.hellstrom@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
References: <20230215161405.187368-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [RFC PATCH 16/16] drm/i915,
 drm/ttm: Use the TTM shrinker rather than the external shmem pool
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, NeilBrown <neilb@suse.de>,
 Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org,
 "Matthew Wilcox \(Oracle\)" <willy@infradead.org>, linux-mm@kvack.org,
 Dave Hansen <dave.hansen@intel.com>, linux-graphics-maintainer@vmware.com,
 Peter Xu <peterx@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Dave Airlie <airlied@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
 Christian Koenig <christian.koenig@amd.com>,
 Matthew Auld <matthew.auld@intel.com>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Remove the external i915 TTM shmem pool and replace it with the
normal TTM page allocation. Also provide a callback for the TTM
shrinker functionality.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 273 +++---------------
 drivers/gpu/drm/i915/i915_gem.c               |   3 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |   6 +-
 drivers/gpu/drm/ttm/ttm_tt.c                  |   3 -
 include/drm/ttm/ttm_tt.h                      |  15 +-
 8 files changed, 53 insertions(+), 264 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index f9a8acbba715..f694b5d479e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -282,12 +282,6 @@ i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
 	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
 }
 
-static inline bool
-i915_gem_object_has_self_managed_shrink_list(const struct drm_i915_gem_object *obj)
-{
-	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_SELF_MANAGED_SHRINK_LIST);
-}
-
 static inline bool
 i915_gem_object_is_proxy(const struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 19c9bdd8f905..511dc1384a9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -544,12 +544,6 @@ struct drm_i915_gem_object {
 		 */
 		atomic_t shrink_pin;
 
-		/**
-		 * @ttm_shrinkable: True when the object is using shmem pages
-		 * underneath. Protected by the object lock.
-		 */
-		bool ttm_shrinkable;
-
 		/**
 		 * @unknown_state: Indicate that the object is effectively
 		 * borked. This is write-once and set if we somehow encounter a
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..c39d45661b84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -73,7 +73,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 		shrinkable = false;
 	}
 
-	if (shrinkable && !i915_gem_object_has_self_managed_shrink_list(obj)) {
+	if (shrinkable) {
 		struct list_head *list;
 		unsigned long flags;
 
@@ -216,8 +216,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_is_volatile(obj))
 		obj->mm.madv = I915_MADV_WILLNEED;
 
-	if (!i915_gem_object_has_self_managed_shrink_list(obj))
-		i915_gem_object_make_unshrinkable(obj);
+	i915_gem_object_make_unshrinkable(obj);
 
 	if (obj->mm.mapping) {
 		unmap_object(obj, page_mask_bits(obj->mm.mapping));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 341b94672abc..f9bd4f50d495 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -3,8 +3,6 @@
  * Copyright © 2021 Intel Corporation
  */
 
-#include <linux/shmem_fs.h>
-
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/drm_buddy.h>
@@ -37,8 +35,6 @@
  * @ttm: The base TTM page vector.
  * @dev: The struct device used for dma mapping and unmapping.
  * @cached_rsgt: The cached scatter-gather table.
- * @is_shmem: Set if using shmem.
- * @filp: The shmem file, if using shmem backend.
  *
  * Note that DMA may be going on right up to the point where the page-
  * vector is unpopulated in delayed destroy. Hence keep the
@@ -50,9 +46,6 @@ struct i915_ttm_tt {
 	struct ttm_tt ttm;
 	struct device *dev;
 	struct i915_refct_sgt cached_rsgt;
-
-	bool is_shmem;
-	struct file *filp;
 };
 
 static const struct ttm_place sys_placement_flags = {
@@ -185,75 +178,6 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
 	placement->busy_placement = busy;
 }
 
-static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev,
-				      struct ttm_tt *ttm,
-				      struct ttm_operation_ctx *ctx)
-{
-	struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
-	struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM];
-	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
-	const unsigned int max_segment = i915_sg_segment_size(i915->drm.dev);
-	const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT;
-	struct file *filp = i915_tt->filp;
-	struct sgt_iter sgt_iter;
-	struct sg_table *st;
-	struct page *page;
-	unsigned long i;
-	int err;
-
-	if (!filp) {
-		struct address_space *mapping;
-		gfp_t mask;
-
-		filp = shmem_file_setup("i915-shmem-tt", size, VM_NORESERVE);
-		if (IS_ERR(filp))
-			return PTR_ERR(filp);
-
-		mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
-
-		mapping = filp->f_mapping;
-		mapping_set_gfp_mask(mapping, mask);
-		GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
-
-		i915_tt->filp = filp;
-	}
-
-	st = &i915_tt->cached_rsgt.table;
-	err = shmem_sg_alloc_table(i915, st, size, mr, filp->f_mapping,
-				   max_segment);
-	if (err)
-		return err;
-
-	err = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL,
-			      DMA_ATTR_SKIP_CPU_SYNC);
-	if (err)
-		goto err_free_st;
-
-	i = 0;
-	for_each_sgt_page(page, sgt_iter, st)
-		ttm->pages[i++] = page;
-
-	if (ttm->page_flags & TTM_TT_FLAG_SWAPPED)
-		ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
-
-	return 0;
-
-err_free_st:
-	shmem_sg_free_table(st, filp->f_mapping, false, false);
-
-	return err;
-}
-
-static void i915_ttm_tt_shmem_unpopulate(struct ttm_tt *ttm)
-{
-	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
-	bool backup = ttm->page_flags & TTM_TT_FLAG_SWAPPED;
-	struct sg_table *st = &i915_tt->cached_rsgt.table;
-
-	shmem_sg_free_table(st, file_inode(i915_tt->filp)->i_mapping,
-			    backup, backup);
-}
-
 static void i915_ttm_tt_release(struct kref *ref)
 {
 	struct i915_ttm_tt *i915_tt =
@@ -292,11 +216,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
 
 	caching = i915_ttm_select_tt_caching(obj);
-	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
-		page_flags |= TTM_TT_FLAG_EXTERNAL |
-			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
-		i915_tt->is_shmem = true;
-	}
 
 	if (i915_gem_object_needs_ccs_pages(obj))
 		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
@@ -325,9 +244,6 @@ static int i915_ttm_tt_populate(struct ttm_device *bdev,
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
-	if (i915_tt->is_shmem)
-		return i915_ttm_tt_shmem_populate(bdev, ttm, ctx);
-
 	return ttm_pool_alloc(&bdev->pool, ttm, ctx);
 }
 
@@ -339,21 +255,46 @@ static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	if (st->sgl)
 		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
 
-	if (i915_tt->is_shmem) {
-		i915_ttm_tt_shmem_unpopulate(ttm);
-	} else {
-		sg_free_table(st);
-		ttm_pool_free(&bdev->pool, ttm);
+	sg_free_table(st);
+	ttm_pool_free(&bdev->pool, ttm);
+}
+
+static long i915_ttm_bo_shrink(struct ttm_buffer_object *bo,
+			       struct ttm_operation_ctx *ctx)
+
+{
+	struct ttm_tt *tt = bo->ttm;
+	struct i915_ttm_tt *i915_tt = container_of(tt, typeof(*i915_tt), ttm);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+	long ret;
+
+	if (!i915_ttm_is_ghost_object(bo)) {
+		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+		long ret = i915_ttm_move_notify(bo);
+
+		if (ret)
+			return ret;
+
+		if (obj->mm.madv == I915_MADV_DONTNEED) {
+			GEM_WARN_ON(!(tt->page_flags & TTM_TT_FLAG_DONTNEED));
+			obj->mm.madv = __I915_MADV_PURGED;
+		}
 	}
+
+	if (st->sgl)
+		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
+
+	sg_free_table(st);
+
+	ret = ttm_tt_shrink(bo->bdev, tt);
+
+	return ret;
 }
 
 static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
-	if (i915_tt->filp)
-		fput(i915_tt->filp);
-
 	ttm_tt_fini(ttm);
 	i915_refct_sgt_put(&i915_tt->cached_rsgt);
 }
@@ -366,14 +307,6 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 	if (i915_ttm_is_ghost_object(bo))
 		return false;
 
-	/*
-	 * EXTERNAL objects should never be swapped out by TTM, instead we need
-	 * to handle that ourselves. TTM will already skip such objects for us,
-	 * but we would like to avoid grabbing locks for no good reason.
-	 */
-	if (bo->ttm && bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)
-		return false;
-
 	/* Will do for now. Our pinned objects are still on TTM's LRU lists */
 	if (!i915_gem_object_evictable(obj))
 		return false;
@@ -439,18 +372,6 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret;
 
-	if (bo->ttm && i915_tt->filp) {
-		/*
-		 * The below fput(which eventually calls shmem_truncate) might
-		 * be delayed by worker, so when directly called to purge the
-		 * pages(like by the shrinker) we should try to be more
-		 * aggressive and release the pages immediately.
-		 */
-		shmem_truncate_range(file_inode(i915_tt->filp),
-				     0, (loff_t)-1);
-		fput(fetch_and_zero(&i915_tt->filp));
-	}
-
 	obj->write_domain = 0;
 	obj->read_domains = 0;
 	i915_ttm_adjust_gem_after_move(obj);
@@ -460,53 +381,6 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static int i915_ttm_shrink(struct drm_i915_gem_object *obj, unsigned int flags)
-{
-	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	struct i915_ttm_tt *i915_tt =
-		container_of(bo->ttm, typeof(*i915_tt), ttm);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = true,
-		.no_wait_gpu = flags & I915_GEM_OBJECT_SHRINK_NO_GPU_WAIT,
-	};
-	struct ttm_placement place = {};
-	int ret;
-
-	if (!bo->ttm || i915_ttm_cpu_maps_iomem(bo->resource))
-		return 0;
-
-	GEM_BUG_ON(!i915_tt->is_shmem);
-
-	if (!i915_tt->filp)
-		return 0;
-
-	ret = ttm_bo_wait_ctx(bo, &ctx);
-	if (ret)
-		return ret;
-
-	switch (obj->mm.madv) {
-	case I915_MADV_DONTNEED:
-		return i915_ttm_purge(obj);
-	case __I915_MADV_PURGED:
-		return 0;
-	}
-
-	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
-		return 0;
-
-	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
-	ret = ttm_bo_validate(bo, &place, &ctx);
-	if (ret) {
-		bo->ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
-		return ret;
-	}
-
-	if (flags & I915_GEM_OBJECT_SHRINK_WRITEBACK)
-		__shmem_writeback(obj->base.size, i915_tt->filp->f_mapping);
-
-	return 0;
-}
-
 static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
@@ -765,6 +639,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = {
 	.io_mem_reserve = i915_ttm_io_mem_reserve,
 	.io_mem_pfn = i915_ttm_io_mem_pfn,
 	.access_memory = i915_ttm_access_memory,
+	.bo_shrink = i915_ttm_bo_shrink,
 };
 
 /**
@@ -931,8 +806,6 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj)
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
 	struct i915_ttm_tt *i915_tt =
 		container_of(bo->ttm, typeof(*i915_tt), ttm);
-	bool shrinkable =
-		bo->ttm && i915_tt->filp && ttm_tt_is_populated(bo->ttm);
 
 	/*
 	 * Don't manipulate the TTM LRUs while in TTM bo destruction.
@@ -941,54 +814,25 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj)
 	if (!kref_read(&bo->kref))
 		return;
 
-	/*
-	 * We skip managing the shrinker LRU in set_pages() and just manage
-	 * everything here. This does at least solve the issue with having
-	 * temporary shmem mappings(like with evicted lmem) not being visible to
-	 * the shrinker. Only our shmem objects are shrinkable, everything else
-	 * we keep as unshrinkable.
-	 *
-	 * To make sure everything plays nice we keep an extra shrink pin in TTM
-	 * if the underlying pages are not currently shrinkable. Once we release
-	 * our pin, like when the pages are moved to shmem, the pages will then
-	 * be added to the shrinker LRU, assuming the caller isn't also holding
-	 * a pin.
-	 *
-	 * TODO: consider maybe also bumping the shrinker list here when we have
-	 * already unpinned it, which should give us something more like an LRU.
-	 *
-	 * TODO: There is a small window of opportunity for this function to
-	 * get called from eviction after we've dropped the last GEM refcount,
-	 * but before the TTM deleted flag is set on the object. Avoid
-	 * adjusting the shrinker list in such cases, since the object is
-	 * not available to the shrinker anyway due to its zero refcount.
-	 * To fix this properly we should move to a TTM shrinker LRU list for
-	 * these objects.
-	 */
-	if (kref_get_unless_zero(&obj->base.refcount)) {
-		if (shrinkable != obj->mm.ttm_shrinkable) {
-			if (shrinkable) {
-				if (obj->mm.madv == I915_MADV_WILLNEED)
-					__i915_gem_object_make_shrinkable(obj);
-				else
-					__i915_gem_object_make_purgeable(obj);
-			} else {
-				i915_gem_object_make_unshrinkable(obj);
-			}
-
-			obj->mm.ttm_shrinkable = shrinkable;
-		}
-		i915_gem_object_put(obj);
+	if (bo->ttm) {
+		int ret = 0;
+
+		if (obj->mm.madv == I915_MADV_DONTNEED &&
+		    !ttm_tt_purgeable(bo->ttm))
+			ret = ttm_tt_set_dontneed(bo->bdev, bo->ttm);
+		else if (obj->mm.madv == I915_MADV_WILLNEED &&
+			 ttm_tt_purgeable(bo->ttm))
+			ret = ttm_tt_set_willneed(bo->bdev, bo->ttm);
+
+		if (ret == -EALREADY)
+			obj->mm.madv = __I915_MADV_PURGED;
 	}
 
 	/*
 	 * Put on the correct LRU list depending on the MADV status
 	 */
 	spin_lock(&bo->bdev->lru_lock);
-	if (shrinkable) {
-		/* Try to keep shmem_tt from being considered for shrinking. */
-		bo->priority = TTM_MAX_BO_PRIORITY - 1;
-	} else if (obj->mm.madv != I915_MADV_WILLNEED) {
+	if (obj->mm.madv != I915_MADV_WILLNEED) {
 		bo->priority = I915_TTM_PRIO_PURGE;
 	} else if (!i915_gem_object_has_pages(obj)) {
 		bo->priority = I915_TTM_PRIO_NO_PAGES;
@@ -1226,13 +1070,10 @@ static void i915_ttm_unmap_virtual(struct drm_i915_gem_object *obj)
 
 static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
-	.flags = I915_GEM_OBJECT_IS_SHRINKABLE |
-		 I915_GEM_OBJECT_SELF_MANAGED_SHRINK_LIST,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
 	.truncate = i915_ttm_truncate,
-	.shrink = i915_ttm_shrink,
 
 	.adjust_lru = i915_ttm_adjust_lru,
 	.delayed_free = i915_ttm_delayed_free,
@@ -1251,18 +1092,6 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
 	if (obj->ttm.created) {
-		/*
-		 * We freely manage the shrinker LRU outide of the mm.pages life
-		 * cycle. As a result when destroying the object we should be
-		 * extra paranoid and ensure we remove it from the LRU, before
-		 * we free the object.
-		 *
-		 * Touching the ttm_shrinkable outside of the object lock here
-		 * should be safe now that the last GEM object ref was dropped.
-		 */
-		if (obj->mm.ttm_shrinkable)
-			i915_gem_object_make_unshrinkable(obj);
-
 		i915_ttm_backup_free(obj);
 
 		/* This releases all gem object bindings to the backend. */
@@ -1318,14 +1147,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	/* Forcing the page size is kernel internal only */
 	GEM_BUG_ON(page_size && obj->mm.n_placements);
 
-	/*
-	 * Keep an extra shrink pin to prevent the object from being made
-	 * shrinkable too early. If the ttm_tt is ever allocated in shmem, we
-	 * drop the pin. The TTM backend manages the shrinker LRU itself,
-	 * outside of the normal mm.pages life cycle.
-	 */
-	i915_gem_object_make_unshrinkable(obj);
-
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 35950fa91406..4dff76614347 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1068,8 +1068,7 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 			obj->ops->adjust_lru(obj);
 	}
 
-	if (i915_gem_object_has_pages(obj) ||
-	    i915_gem_object_has_self_managed_shrink_list(obj)) {
+	if (i915_gem_object_has_pages(obj)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&i915->mm.obj_lock, flags);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 80f106bfe385..7537bc300e34 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -150,10 +150,8 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 	 * (if at all) by redirecting mmap to the exporter.
 	 */
 	if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		if (!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
-			dma_resv_unlock(bo->base.resv);
-			return VM_FAULT_SIGBUS;
-		}
+		dma_resv_unlock(bo->base.resv);
+		return VM_FAULT_SIGBUS;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8ac4a9cba34d..b0533833d581 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -198,9 +198,6 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
 	if (unlikely(bo->ttm == NULL))
 		return -ENOMEM;
 
-	WARN_ON(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE &&
-		!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL));
-
 	return 0;
 }
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 0fa71292b676..0d1d377903e0 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -68,18 +68,6 @@ struct ttm_tt {
 	 * Note that enum ttm_bo_type.ttm_bo_type_sg objects will always enable
 	 * this flag.
 	 *
-	 * TTM_TT_FLAG_EXTERNAL_MAPPABLE: Same behaviour as
-	 * TTM_TT_FLAG_EXTERNAL, but with the reduced restriction that it is
-	 * still valid to use TTM to map the pages directly. This is useful when
-	 * implementing a ttm_tt backend which still allocates driver owned
-	 * pages underneath(say with shmem).
-	 *
-	 * Note that since this also implies TTM_TT_FLAG_EXTERNAL, the usage
-	 * here should always be:
-	 *
-	 *   page_flags = TTM_TT_FLAG_EXTERNAL |
-	 *		  TTM_TT_FLAG_EXTERNAL_MAPPABLE;
-	 *
 	 * TTM_TT_FLAG_PRIV_SHRUNKEN: TTM internal only. This is set if the
 	 * struct ttm_tt has been (possibly partially) swapped out to the
 	 * swap cache.
@@ -91,8 +79,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_SWAPPED		BIT(0)
 #define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
 #define TTM_TT_FLAG_EXTERNAL		BIT(2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
-#define TTM_TT_FLAG_DONTNEED		BIT(4)
+#define TTM_TT_FLAG_DONTNEED		BIT(3)
 
 #define TTM_TT_FLAG_PRIV_SHRUNKEN	BIT(30)
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)