From patchwork Thu May 20 15:09:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12270785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A6B6C43462 for ; Thu, 20 May 2021 15:10:47 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5AB0D60C3D for ; Thu, 20 May 2021 15:10:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5AB0D60C3D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3676F6F47A; Thu, 20 May 2021 15:10:40 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 147296E1BE; Thu, 20 May 2021 15:10:38 +0000 (UTC) IronPort-SDR: mbX+d8G7gik3ZjQnTm7UeDCooH+Gy2d5oYicvKUlXFJOnJOUnTUmIEz14uy+OMX5R/iOSPfxew /OTjv5tV6SfA== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="222341180" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="222341180" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:00 -0700 IronPort-SDR: IOVnqwaDlb4a9XEeMYvp7bbL/kLlusFzRrnLZ7UePtd+DiS9Kd65Vs50uYbGQU8l+Ko183Z1xz TmRg/XvCANDQ== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="395728149" Received: from cbjoerns-mobl1.ger.corp.intel.com (HELO thellst-mobl1.intel.com) ([10.249.254.247]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:09:58 -0700 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 20 May 2021 17:09:43 +0200 Message-Id: <20210520150947.803891-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> References: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 1/5] drm/ttm: Add a generic TTM memcpy move for page-based iomem X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The internal ttm_bo_util memcpy uses ioremap functionality, and while it probably might be possible to use it for copying in- and out of sglist represented io memory, using io_mem_reserve() / io_mem_free() callbacks, that would cause problems with fault(). Instead, implement a method mapping page-by-page using kmap_local() semantics. As an additional benefit we then avoid the occasional global TLB flushes of ioremap() and consuming ioremap space, elimination of a critical point of failure and with a slight change of semantics we could also push the memcpy out async for testing and async driver development purposes. A special linear iomem iterator is introduced internally to mimic the old ioremap behaviour for code-paths that can't immediately be ported over. This adds to the code size and should be considered a temporary solution. Looking at the code we have a lot of checks for iomap tagged pointers. Ideally we should extend the core memremap functions to also accept uncached memory and kmap_local functionality. Then we could strip a lot of code. Cc: Christian König Signed-off-by: Thomas Hellström Reported-by: kernel test robot --- drivers/gpu/drm/ttm/ttm_bo_util.c | 468 ++++++++++++++++++++---------- include/drm/ttm/ttm_bo_driver.h | 94 ++++++ 2 files changed, 407 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index ae8b61460724..bad9b16e96ba 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -35,11 +35,13 @@ #include #include #include +#include #include #include #include #include #include +#include struct ttm_transfer_obj { struct ttm_buffer_object base; @@ -72,190 +74,366 @@ void ttm_mem_io_free(struct ttm_device *bdev, mem->bus.addr = NULL; } -static int ttm_resource_ioremap(struct ttm_device *bdev, - struct ttm_resource *mem, - void **virtual) +static pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp) { - int ret; - void *addr; + /* Cached mappings need no adjustment */ + if (caching == ttm_cached) + return tmp; - *virtual = NULL; - ret = ttm_mem_io_reserve(bdev, mem); - if (ret || !mem->bus.is_iomem) - return ret; +#if defined(__i386__) || defined(__x86_64__) + if (caching == ttm_write_combined) + tmp = pgprot_writecombine(tmp); + else if (boot_cpu_data.x86 > 3) + tmp = pgprot_noncached(tmp); +#endif +#if defined(__ia64__) || defined(__arm__) || defined(__aarch64__) || \ + defined(__powerpc__) || defined(__mips__) + if (caching == ttm_write_combined) + tmp = pgprot_writecombine(tmp); + else + tmp = pgprot_noncached(tmp); +#endif +#if defined(__sparc__) + tmp = pgprot_noncached(tmp); +#endif + return tmp; +} - if (mem->bus.addr) { - addr = mem->bus.addr; - } else { - size_t bus_size = (size_t)mem->num_pages << PAGE_SHIFT; +static void ttm_kmap_iter_tt_kmap_local(struct ttm_kmap_iter *iter, + struct dma_buf_map *dmap, + pgoff_t i) +{ + struct ttm_kmap_iter_tt *iter_tt = + container_of(iter, typeof(*iter_tt), base); - if (mem->bus.caching == ttm_write_combined) - addr = ioremap_wc(mem->bus.offset, bus_size); -#ifdef CONFIG_X86 - else if (mem->bus.caching == ttm_cached) - addr = ioremap_cache(mem->bus.offset, bus_size); -#endif - else - addr = ioremap(mem->bus.offset, bus_size); - if (!addr) { - ttm_mem_io_free(bdev, mem); - return -ENOMEM; - } + dma_buf_map_set_vaddr(dmap, kmap_local_page_prot(iter_tt->tt->pages[i], + iter_tt->prot)); +} + +static void ttm_kmap_iter_iomap_kmap_local(struct ttm_kmap_iter *iter, + struct dma_buf_map *dmap, + pgoff_t i) +{ + struct ttm_kmap_iter_iomap *iter_io = + container_of(iter, typeof(*iter_io), base); + void __iomem *addr; + +retry: + while (i >= iter_io->cache.end) { + iter_io->cache.sg = iter_io->cache.sg ? + sg_next(iter_io->cache.sg) : iter_io->st->sgl; + iter_io->cache.i = iter_io->cache.end; + iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >> + PAGE_SHIFT; + iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) - + iter_io->start; } - *virtual = addr; - return 0; + + if (i < iter_io->cache.i) { + iter_io->cache.end = 0; + iter_io->cache.sg = NULL; + goto retry; + } + + addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs + + (((resource_size_t)i - iter_io->cache.i) + << PAGE_SHIFT)); + dma_buf_map_set_vaddr_iomem(dmap, addr); } -static void ttm_resource_iounmap(struct ttm_device *bdev, - struct ttm_resource *mem, - void *virtual) +static const struct ttm_kmap_iter_ops ttm_kmap_iter_tt_ops = { + .kmap_local = ttm_kmap_iter_tt_kmap_local, + .needs_unmap = true +}; + +static const struct ttm_kmap_iter_ops ttm_kmap_iter_io_ops = { + .kmap_local = ttm_kmap_iter_iomap_kmap_local, + .needs_unmap = true +}; + +/* If needed, make unmap functionality part of ttm_kmap_iter_ops */ +static void kunmap_local_iter(struct ttm_kmap_iter *iter, + struct dma_buf_map *map) { - if (virtual && mem->bus.addr == NULL) - iounmap(virtual); - ttm_mem_io_free(bdev, mem); + if (!iter->ops->needs_unmap) + return; + + if (map->is_iomem) + io_mapping_unmap_local(map->vaddr_iomem); + else + kunmap_local(map->vaddr); } -static int ttm_copy_io_page(void *dst, void *src, unsigned long page) +/** + * ttm_move_memcpy - Helper to perform a memcpy ttm move operation. + * @bo: The struct ttm_buffer_object. + * @new_mem: The struct ttm_resource we're moving to (copy destination). + * @new_iter: A struct ttm_kmap_iter representing the destination resource. + * @old_iter: A struct ttm_kmap_iter representing the source resource. + * + * This function is intended to be able to move out async under a + * dma-fence if desired. + */ +void ttm_move_memcpy(struct ttm_buffer_object *bo, + struct ttm_resource *new_mem, + struct ttm_kmap_iter *new_iter, + struct ttm_kmap_iter *old_iter) { - uint32_t *dstP = - (uint32_t *) ((unsigned long)dst + (page << PAGE_SHIFT)); - uint32_t *srcP = - (uint32_t *) ((unsigned long)src + (page << PAGE_SHIFT)); - - int i; - for (i = 0; i < PAGE_SIZE / sizeof(uint32_t); ++i) - iowrite32(ioread32(srcP++), dstP++); - return 0; + struct ttm_device *bdev = bo->bdev; + struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); + struct ttm_tt *ttm = bo->ttm; + struct ttm_resource *old_mem = &bo->mem; + struct ttm_resource_manager *old_man = ttm_manager_type(bdev, old_mem->mem_type); + struct dma_buf_map old_map, new_map; + pgoff_t i; + + /* Single TTM move. NOP */ + if (old_man->use_tt && man->use_tt) + return; + + /* Don't move nonexistent data. Clear destination instead. */ + if (old_man->use_tt && !man->use_tt && + (!ttm || !ttm_tt_is_populated(ttm))) { + if (ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)) + return; + + for (i = 0; i < new_mem->num_pages; ++i) { + new_iter->ops->kmap_local(new_iter, &new_map, i); + if (new_map.is_iomem) + memset_io(new_map.vaddr_iomem, 0, PAGE_SIZE); + else + memset(new_map.vaddr, 0, PAGE_SIZE); + kunmap_local_iter(new_iter, &new_map); + } + return; + } + + for (i = 0; i < new_mem->num_pages; ++i) { + new_iter->ops->kmap_local(new_iter, &new_map, i); + old_iter->ops->kmap_local(old_iter, &old_map, i); + + if (!old_map.is_iomem && !new_map.is_iomem) { + memcpy(new_map.vaddr, old_map.vaddr, PAGE_SIZE); + } else if (!old_map.is_iomem) { + dma_buf_map_memcpy_to(&new_map, old_map.vaddr, + PAGE_SIZE); + } else if (!new_map.is_iomem) { + memcpy_fromio(new_map.vaddr, old_map.vaddr_iomem, + PAGE_SIZE); + } else { + int j; + u32 __iomem *src = old_map.vaddr_iomem; + u32 __iomem *dst = new_map.vaddr_iomem; + + for (j = 0; j < (PAGE_SIZE >> 2); ++j) + iowrite32(ioread32(src++), dst++); + } + kunmap_local_iter(old_iter, &old_map); + kunmap_local_iter(new_iter, &new_map); + } } +EXPORT_SYMBOL(ttm_move_memcpy); -static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src, - unsigned long page, - pgprot_t prot) +/** + * ttm_kmap_iter_iomap_init - Initialize a struct ttm_kmap_iter_iomap + * @iter_io: The struct ttm_kmap_iter_iomap to initialize. + * @iomap: The struct io_mapping representing the underlying linear io_memory. + * @st: sg_table into @iomap, representing the memory of the struct + * ttm_resource. + * @start: Offset that needs to be subtracted from @st to make + * sg_dma_address(st->sgl) - @start == 0 for @iomap start. + * + * Return: Pointer to the embedded struct ttm_kmap_iter. + */ +struct ttm_kmap_iter * +ttm_kmap_iter_iomap_init(struct ttm_kmap_iter_iomap *iter_io, + struct io_mapping *iomap, + struct sg_table *st, + resource_size_t start) { - struct page *d = ttm->pages[page]; - void *dst; + iter_io->base.ops = &ttm_kmap_iter_io_ops; + iter_io->iomap = iomap; + iter_io->st = st; + iter_io->start = start; + memset(&iter_io->cache, 0, sizeof(iter_io->cache)); - if (!d) - return -ENOMEM; + return &iter_io->base; +} +EXPORT_SYMBOL(ttm_kmap_iter_iomap_init); - src = (void *)((unsigned long)src + (page << PAGE_SHIFT)); - dst = kmap_atomic_prot(d, prot); - if (!dst) - return -ENOMEM; +/** + * ttm_kmap_iter_tt_init - Initialize a struct ttm_kmap_iter_tt + * @iter_tt: The struct ttm_kmap_iter_tt to initialize. + * @tt: Struct ttm_tt holding page pointers of the struct ttm_resource. + * + * Return: Pointer to the embedded struct ttm_kmap_iter. + */ +struct ttm_kmap_iter * +ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt, + struct ttm_tt *tt) +{ + iter_tt->base.ops = &ttm_kmap_iter_tt_ops; + iter_tt->tt = tt; + iter_tt->prot = ttm_prot_from_caching(tt->caching, PAGE_KERNEL); - memcpy_fromio(dst, src, PAGE_SIZE); + return &iter_tt->base; +} +EXPORT_SYMBOL(ttm_kmap_iter_tt_init); - kunmap_atomic(dst); +/** + * DOC: Linear io iterator + * + * This code should die in the not too near future. Best would be if we could + * make io-mapping use memremap for all io memory, and have memremap + * implement a kmap_local functionality. We could then strip a huge amount of + * code. These linear io iterators are implemented to mimic old functionality, + * and they don't use kmap_local semantics at all internally. Rather ioremap or + * friends, and at least on 32-bit they add global TLB flushes and points + * of failure. + */ - return 0; +/** + * struct ttm_kmap_iter_linear_io - Iterator specialization for linear io + * @base: The base iterator + * @dmap: Points to the starting address of the region + * @needs_unmap: Whether we need to unmap on fini + */ +struct ttm_kmap_iter_linear_io { + struct ttm_kmap_iter base; + struct dma_buf_map dmap; + bool needs_unmap; +}; + +static void ttm_kmap_iter_linear_io_kmap_local(struct ttm_kmap_iter *iter, + struct dma_buf_map *dmap, + pgoff_t i) +{ + struct ttm_kmap_iter_linear_io *iter_io = + container_of(iter, typeof(*iter_io), base); + + *dmap = iter_io->dmap; + dma_buf_map_incr(dmap, i * PAGE_SIZE); } -static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst, - unsigned long page, - pgprot_t prot) +static const struct ttm_kmap_iter_ops ttm_kmap_iter_linear_io_ops = { + .kmap_local = ttm_kmap_iter_linear_io_kmap_local +}; + +static struct ttm_kmap_iter * +ttm_kmap_iter_linear_io_init(struct ttm_kmap_iter_linear_io *iter_io, + struct ttm_device *bdev, + struct ttm_resource *mem) { - struct page *s = ttm->pages[page]; - void *src; + int ret; - if (!s) - return -ENOMEM; + ret = ttm_mem_io_reserve(bdev, mem); + if (ret) + goto out_err; + if (!mem->bus.is_iomem) { + ret = -EINVAL; + goto out_io_free; + } - dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT)); - src = kmap_atomic_prot(s, prot); - if (!src) - return -ENOMEM; + if (mem->bus.addr) { + dma_buf_map_set_vaddr(&iter_io->dmap, mem->bus.addr); + iter_io->needs_unmap = false; + } else { + size_t bus_size = (size_t)mem->num_pages << PAGE_SHIFT; - memcpy_toio(dst, src, PAGE_SIZE); + iter_io->needs_unmap = true; + if (mem->bus.caching == ttm_write_combined) + dma_buf_map_set_vaddr_iomem(&iter_io->dmap, + ioremap_wc(mem->bus.offset, + bus_size)); + else if (mem->bus.caching == ttm_cached) + dma_buf_map_set_vaddr(&iter_io->dmap, + memremap(mem->bus.offset, bus_size, + MEMREMAP_WB)); + else + dma_buf_map_set_vaddr_iomem(&iter_io->dmap, + ioremap(mem->bus.offset, + bus_size)); + if (dma_buf_map_is_null(&iter_io->dmap)) { + ret = -ENOMEM; + goto out_io_free; + } + } - kunmap_atomic(src); + iter_io->base.ops = &ttm_kmap_iter_linear_io_ops; + return &iter_io->base; - return 0; +out_io_free: + ttm_mem_io_free(bdev, mem); +out_err: + return ERR_PTR(ret); +} + +static void +ttm_kmap_iter_linear_io_fini(struct ttm_kmap_iter_linear_io *iter_io, + struct ttm_device *bdev, + struct ttm_resource *mem) +{ + if (iter_io->needs_unmap && dma_buf_map_is_set(&iter_io->dmap)) { + if (iter_io->dmap.is_iomem) + iounmap(iter_io->dmap.vaddr_iomem); + else + memunmap(iter_io->dmap.vaddr); + } + + ttm_mem_io_free(bdev, mem); } +static int ttm_bo_wait_free_node(struct ttm_buffer_object *bo, + bool dst_use_tt); + int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, struct ttm_resource *new_mem) { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); + struct ttm_resource_manager *new_man = + ttm_manager_type(bo->bdev, new_mem->mem_type); struct ttm_tt *ttm = bo->ttm; struct ttm_resource *old_mem = &bo->mem; struct ttm_resource old_copy = *old_mem; - void *old_iomap; - void *new_iomap; + union { + struct ttm_kmap_iter_tt tt; + struct ttm_kmap_iter_linear_io io; + } _new_iter, _old_iter; + struct ttm_kmap_iter *new_iter, *old_iter; int ret; - unsigned long i; - - ret = ttm_bo_wait_ctx(bo, ctx); - if (ret) - return ret; - - ret = ttm_resource_ioremap(bdev, old_mem, &old_iomap); - if (ret) - return ret; - ret = ttm_resource_ioremap(bdev, new_mem, &new_iomap); - if (ret) - goto out; - - /* - * Single TTM move. NOP. - */ - if (old_iomap == NULL && new_iomap == NULL) - goto out2; - /* - * Don't move nonexistent data. Clear destination instead. - */ - if (old_iomap == NULL && - (ttm == NULL || (!ttm_tt_is_populated(ttm) && - !(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)))) { - memset_io(new_iomap, 0, new_mem->num_pages*PAGE_SIZE); - goto out2; - } - - /* - * TTM might be null for moves within the same region. - */ if (ttm) { ret = ttm_tt_populate(bdev, ttm, ctx); if (ret) - goto out1; + return ret; } - for (i = 0; i < new_mem->num_pages; ++i) { - if (old_iomap == NULL) { - pgprot_t prot = ttm_io_prot(bo, old_mem, PAGE_KERNEL); - ret = ttm_copy_ttm_io_page(ttm, new_iomap, i, - prot); - } else if (new_iomap == NULL) { - pgprot_t prot = ttm_io_prot(bo, new_mem, PAGE_KERNEL); - ret = ttm_copy_io_ttm_page(ttm, old_iomap, i, - prot); - } else { - ret = ttm_copy_io_page(new_iomap, old_iomap, i); - } - if (ret) - goto out1; + new_iter = new_man->use_tt ? + ttm_kmap_iter_tt_init(&_new_iter.tt, bo->ttm) : + ttm_kmap_iter_linear_io_init(&_new_iter.io, bdev, new_mem); + if (IS_ERR(new_iter)) + return PTR_ERR(new_iter); + + old_iter = man->use_tt ? + ttm_kmap_iter_tt_init(&_old_iter.tt, bo->ttm) : + ttm_kmap_iter_linear_io_init(&_old_iter.io, bdev, old_mem); + if (IS_ERR(old_iter)) { + ret = PTR_ERR(old_iter); + goto out_old_iter; } - mb(); -out2: - old_copy = *old_mem; - ttm_bo_assign_mem(bo, new_mem); + ttm_move_memcpy(bo, new_mem, new_iter, old_iter); + old_copy = *old_mem; + ret = ttm_bo_wait_free_node(bo, new_man->use_tt); if (!man->use_tt) - ttm_bo_tt_destroy(bo); + ttm_kmap_iter_linear_io_fini(&_old_iter.io, bdev, &old_copy); +out_old_iter: + if (!new_man->use_tt) + ttm_kmap_iter_linear_io_fini(&_new_iter.io, bdev, new_mem); -out1: - ttm_resource_iounmap(bdev, old_mem, new_iomap); -out: - ttm_resource_iounmap(bdev, &old_copy, old_iomap); - - /* - * On error, keep the mm node! - */ - if (!ret) - ttm_resource_free(bo, &old_copy); return ret; } EXPORT_SYMBOL(ttm_bo_move_memcpy); @@ -336,27 +514,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res, man = ttm_manager_type(bo->bdev, res->mem_type); caching = man->use_tt ? bo->ttm->caching : res->bus.caching; - /* Cached mappings need no adjustment */ - if (caching == ttm_cached) - return tmp; - -#if defined(__i386__) || defined(__x86_64__) - if (caching == ttm_write_combined) - tmp = pgprot_writecombine(tmp); - else if (boot_cpu_data.x86 > 3) - tmp = pgprot_noncached(tmp); -#endif -#if defined(__ia64__) || defined(__arm__) || defined(__aarch64__) || \ - defined(__powerpc__) || defined(__mips__) - if (caching == ttm_write_combined) - tmp = pgprot_writecombine(tmp); - else - tmp = pgprot_noncached(tmp); -#endif -#if defined(__sparc__) - tmp = pgprot_noncached(tmp); -#endif - return tmp; + return ttm_prot_from_caching(caching, tmp); } EXPORT_SYMBOL(ttm_io_prot); diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index dbccac957f8f..30f1dce987b2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -332,4 +332,98 @@ int ttm_range_man_init(struct ttm_device *bdev, int ttm_range_man_fini(struct ttm_device *bdev, unsigned type); +struct dma_buf_map; +struct io_mapping; +struct sg_table; +struct scatterlist; +struct ttm_kmap_iter; + +/** + * struct ttm_kmap_iter_ops - Ops structure for a struct + * ttm_kmap_iter. + * @needs_unmap - Whether a kunmap_local is needed to balance @kmap_local. + */ +struct ttm_kmap_iter_ops { + /** + * kmap_local - Map a PAGE_SIZE part of the resource using + * kmap_local semantics. + * @res_kmap: Pointer to the struct ttm_kmap_iter representing + * the resource. + * @dmap: The struct dma_buf_map holding the virtual address after + * the operation. + * @i: The location within the resource to map. PAGE_SIZE granularity. + */ + void (*kmap_local)(struct ttm_kmap_iter *res_kmap, + struct dma_buf_map *dmap, pgoff_t i); + bool needs_unmap; +}; + +/** + * struct ttm_kmap_iter - Iterator for kmap_local type operations on a + * resource. + * @ops: Pointer to the operations struct. + * + * This struct is intended to be embedded in a resource-specific specialization + * implementing operations for the resource. + * + * Nothing stops us from extending the operations to vmap, vmap_pfn etc, + * replacing some or parts of the ttm_bo_util. cpu-map functionality. + */ +struct ttm_kmap_iter { + const struct ttm_kmap_iter_ops *ops; +}; + +/** + * struct ttm_kmap_iter_tt - Specialization for a tt (page) backed struct + * ttm_resource. + * @base: Embedded struct ttm_kmap_iter providing the usage interface + * @tt: Cached struct ttm_tt. + */ +struct ttm_kmap_iter_tt { + struct ttm_kmap_iter base; + struct ttm_tt *tt; + pgprot_t prot; +}; + +/** + * struct ttm_kmap_iter_iomap - Specialization for a struct io_mapping + + * struct sg_table backed struct ttm_resource. + * @base: Embedded struct ttm_kmap_iter providing the usage interface. + * @iomap: struct io_mapping representing the underlying linear io_memory. + * @st: sg_table into @iomap, representing the memory of the struct ttm_resource. + * @start: Offset that needs to be subtracted from @st to make + * sg_dma_address(st->sgl) - @start == 0 for @iomap start. + * @cache: Scatterlist traversal cache for fast lookups. + * @cache.sg: Pointer to the currently cached scatterlist segment. + * @cache.i: First index of @sg. PAGE_SIZE granularity. + * @cache.end: Last index + 1 of @sg. PAGE_SIZE granularity. + * @cache.offs: First offset into @iomap of @sg. PAGE_SIZE granularity. + */ +struct ttm_kmap_iter_iomap { + struct ttm_kmap_iter base; + struct io_mapping *iomap; + struct sg_table *st; + resource_size_t start; + struct { + struct scatterlist *sg; + pgoff_t i; + pgoff_t end; + pgoff_t offs; + } cache; +}; + +void ttm_move_memcpy(struct ttm_buffer_object *bo, + struct ttm_resource *new_mem, + struct ttm_kmap_iter *new_iter, + struct ttm_kmap_iter *old_iter); + +struct ttm_kmap_iter * +ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt, + struct ttm_tt *tt); + +struct ttm_kmap_iter * +ttm_kmap_iter_iomap_init(struct ttm_kmap_iter_iomap *iter_io, + struct io_mapping *iomap, + struct sg_table *st, + resource_size_t start); #endif From patchwork Thu May 20 15:09:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12270781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74CF3C433ED for ; Thu, 20 May 2021 15:10:45 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15390611ED for ; Thu, 20 May 2021 15:10:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15390611ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 48F8A6F476; Thu, 20 May 2021 15:10:40 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4B9CC6E222; Thu, 20 May 2021 15:10:38 +0000 (UTC) IronPort-SDR: lClOXr194kNU9DLEMestjxumYqtwGIz4axiATG6/PMQ8bgfyD856xjzRkYZ6QvaScgNZ2DA2Kk Uhl/Eg5TvJIg== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="222341193" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="222341193" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:02 -0700 IronPort-SDR: ezOoIMHB9Y5Os/yrnhHkLeiwI9OkjERwT6P5ovvnl6lIVAw4R9r+aHB2zHp7rCemrJnnqVltKw c+gtaBYfFr9Q== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="395728160" Received: from cbjoerns-mobl1.ger.corp.intel.com (HELO thellst-mobl1.intel.com) ([10.249.254.247]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:00 -0700 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 20 May 2021 17:09:44 +0200 Message-Id: <20210520150947.803891-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> References: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 2/5] drm, drm/i915: Move the memcpy_from_wc functionality to core drm X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Memcpy from wc will be used as well by TTM memcpy. Move it to core drm, and make the interface do the right thing even on !X86. Cc: Christian König Cc: Daniel Vetter Cc: Dave Airlie Signed-off-by: Thomas Hellström --- drivers/gpu/drm/Makefile | 2 +- drivers/gpu/drm/drm_drv.c | 2 + .../drm/{i915/i915_memcpy.c => drm_memcpy.c} | 31 +++++++------- drivers/gpu/drm/i915/Makefile | 1 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_object.c | 5 ++- drivers/gpu/drm/i915/gt/selftest_reset.c | 7 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++--- drivers/gpu/drm/i915/i915_cmd_parser.c | 4 +- drivers/gpu/drm/i915/i915_drv.c | 2 - drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++-- drivers/gpu/drm/i915/i915_memcpy.h | 34 --------------- .../drm/i915/selftests/intel_memory_region.c | 7 ++-- include/drm/drm_memcpy.h | 41 +++++++++++++++++++ 14 files changed, 83 insertions(+), 76 deletions(-) rename drivers/gpu/drm/{i915/i915_memcpy.c => drm_memcpy.c} (84%) delete mode 100644 drivers/gpu/drm/i915/i915_memcpy.h create mode 100644 include/drm/drm_memcpy.h diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index a91cc7684904..f3ab8586c3d7 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -18,7 +18,7 @@ drm-y := drm_aperture.o drm_auth.o drm_cache.o \ drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \ drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \ drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \ - drm_managed.o drm_vblank_work.o + drm_managed.o drm_vblank_work.o drm_memcpy.o \ drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o drm_dma.o \ drm_legacy_misc.o drm_lock.o drm_memory.o drm_scatter.o \ diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 3d8d68a98b95..351cc2900cf1 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -1041,6 +1042,7 @@ static int __init drm_core_init(void) drm_connector_ida_init(); idr_init(&drm_minors_idr); + drm_memcpy_init_early(); ret = drm_sysfs_init(); if (ret < 0) { diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/drm_memcpy.c similarity index 84% rename from drivers/gpu/drm/i915/i915_memcpy.c rename to drivers/gpu/drm/drm_memcpy.c index 1b021a4902de..03688425a096 100644 --- a/drivers/gpu/drm/i915/i915_memcpy.c +++ b/drivers/gpu/drm/drm_memcpy.c @@ -1,3 +1,4 @@ +// SPDX-License-Identifier: MIT /* * Copyright © 2016 Intel Corporation * @@ -22,16 +23,11 @@ * */ +#ifdef CONFIG_X86 #include #include -#include "i915_memcpy.h" - -#if IS_ENABLED(CONFIG_DRM_I915_DEBUG) -#define CI_BUG_ON(expr) BUG_ON(expr) -#else -#define CI_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr) -#endif +#include "drm/drm_memcpy.h" static DEFINE_STATIC_KEY_FALSE(has_movntdqa); @@ -94,23 +90,23 @@ static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) } /** - * i915_memcpy_from_wc: perform an accelerated *aligned* read from WC + * drm_memcpy_from_wc: perform an accelerated *aligned* read from WC * @dst: destination pointer * @src: source pointer * @len: how many bytes to copy * - * i915_memcpy_from_wc copies @len bytes from @src to @dst using + * drm_memcpy_from_wc copies @len bytes from @src to @dst using * non-temporal instructions where available. Note that all arguments * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple * of 16. * * To test whether accelerated reads from WC are supported, use - * i915_memcpy_from_wc(NULL, NULL, 0); + * drm_memcpy_from_wc(NULL, NULL, 0); * * Returns true if the copy was successful, false if the preconditions * are not met. */ -bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len) +bool drm_memcpy_from_wc(void *dst, const void *src, unsigned long len) { if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15)) return false; @@ -123,24 +119,23 @@ bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len) return false; } +EXPORT_SYMBOL(drm_memcpy_from_wc); /** - * i915_unaligned_memcpy_from_wc: perform a mostly accelerated read from WC + * drm_unaligned_memcpy_from_wc: perform a mostly accelerated read from WC * @dst: destination pointer * @src: source pointer * @len: how many bytes to copy * - * Like i915_memcpy_from_wc(), the unaligned variant copies @len bytes from + * Like drm_memcpy_from_wc(), the unaligned variant copies @len bytes from * @src to @dst using * non-temporal instructions where available, but * accepts that its arguments may not be aligned, but are valid for the * potential 16-byte read past the end. */ -void i915_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len) +void drm_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len) { unsigned long addr; - CI_BUG_ON(!i915_has_memcpy_from_wc()); - addr = (unsigned long)src; if (!IS_ALIGNED(addr, 16)) { unsigned long x = min(ALIGN(addr, 16) - addr, len); @@ -155,8 +150,9 @@ void i915_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len if (likely(len)) __memcpy_ntdqu(dst, src, DIV_ROUND_UP(len, 16)); } +EXPORT_SYMBOL(drm_unaligned_memcpy_from_wc); -void i915_memcpy_init_early(struct drm_i915_private *dev_priv) +void drm_memcpy_init_early(void) { /* * Some hypervisors (e.g. KVM) don't support VEX-prefix instructions @@ -166,3 +162,4 @@ void i915_memcpy_init_early(struct drm_i915_private *dev_priv) !boot_cpu_has(X86_FEATURE_HYPERVISOR)) static_branch_enable(&has_movntdqa); } +#endif diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index cb8823570996..998606b7f49f 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -61,7 +61,6 @@ i915-y += i915_drv.o \ # core library code i915-y += \ dma_resv_utils.o \ - i915_memcpy.o \ i915_mm.o \ i915_sw_fence.o \ i915_sw_fence_work.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 297143511f99..77285e421fb8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -10,6 +10,7 @@ #include #include +#include #include "display/intel_frontbuffer.h" @@ -28,7 +29,6 @@ #include "i915_sw_fence_work.h" #include "i915_trace.h" #include "i915_user_extensions.h" -#include "i915_memcpy.h" struct eb_vma { struct i915_vma *vma; @@ -2503,7 +2503,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, !(batch->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ); pw->batch_map = ERR_PTR(-ENODEV); - if (needs_clflush && i915_has_memcpy_from_wc()) + if (needs_clflush && drm_has_memcpy_from_wc()) pw->batch_map = i915_gem_object_pin_map(batch, I915_MAP_WC); if (IS_ERR(pw->batch_map)) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index c8953e3f5c70..f1eb857fa172 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -24,6 +24,8 @@ #include +#include + #include "display/intel_frontbuffer.h" #include "i915_drv.h" #include "i915_gem_clflush.h" @@ -31,7 +33,6 @@ #include "i915_gem_mman.h" #include "i915_gem_object.h" #include "i915_globals.h" -#include "i915_memcpy.h" #include "i915_trace.h" static struct i915_global_object { @@ -374,7 +375,7 @@ i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset PAGE_SIZE); src_ptr = src_map + offset_in_page(offset); - if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size)) + if (!drm_memcpy_from_wc(dst, (void __force *)src_ptr, size)) memcpy_fromio(dst, src_ptr, size); io_mapping_unmap(src_map); diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 8784257ec808..92ada67a3835 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -5,9 +5,10 @@ #include +#include + #include "gem/i915_gem_stolen.h" -#include "i915_memcpy.h" #include "i915_selftest.h" #include "intel_gpu_commands.h" #include "selftests/igt_reset.h" @@ -99,7 +100,7 @@ __igt_reset_stolen(struct intel_gt *gt, memset_io(s, STACK_MAGIC, PAGE_SIZE); in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) + if (drm_memcpy_from_wc(tmp, in, PAGE_SIZE)) in = tmp; crc[page] = crc32_le(0, in, PAGE_SIZE); @@ -135,7 +136,7 @@ __igt_reset_stolen(struct intel_gt *gt, PAGE_SIZE); in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) + if (drm_memcpy_from_wc(tmp, in, PAGE_SIZE)) in = tmp; x = crc32_le(0, in, PAGE_SIZE); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index c36d5eb5bbb9..f045e42be6ca 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -5,9 +5,10 @@ #include +#include + #include "gt/intel_gt.h" #include "i915_drv.h" -#include "i915_memcpy.h" #include "intel_guc_log.h" static void guc_log_capture_logs(struct intel_guc_log *log); @@ -295,13 +296,13 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) /* Just copy the newly written data */ if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc(dst_data, src_data, write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + drm_memcpy_from_wc(dst_data + read_offset, + src_data + read_offset, bytes_to_copy); src_data += buffer_size; dst_data += buffer_size; @@ -569,7 +570,7 @@ int intel_guc_log_relay_open(struct intel_guc_log *log) * it should be present on the chipsets supporting GuC based * submisssions. */ - if (!i915_has_memcpy_from_wc()) { + if (!drm_has_memcpy_from_wc()) { ret = -ENXIO; goto out_unlock; } diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 5b4b2bd46e7c..98653f1a2b1d 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -24,12 +24,12 @@ * Brad Volkin * */ +#include #include "gt/intel_engine.h" #include "gt/intel_gpu_commands.h" #include "i915_drv.h" -#include "i915_memcpy.h" /** * DOC: batch buffer command parser @@ -1152,7 +1152,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj, if (src) { GEM_BUG_ON(!needs_clflush); - i915_unaligned_memcpy_from_wc(dst, src + offset, length); + drm_unaligned_memcpy_from_wc(dst, src + offset, length); } else { struct scatterlist *sg; void *ptr; diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 122dd297b6af..0df9dd62c717 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -72,7 +72,6 @@ #include "i915_drv.h" #include "i915_ioc32.h" #include "i915_irq.h" -#include "i915_memcpy.h" #include "i915_perf.h" #include "i915_query.h" #include "i915_suspend.h" @@ -325,7 +324,6 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv) mutex_init(&dev_priv->pps_mutex); mutex_init(&dev_priv->hdcp_comp_mutex); - i915_memcpy_init_early(dev_priv); intel_runtime_pm_init_early(&dev_priv->runtime_pm); ret = i915_workqueues_init(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 99ca242ec13b..ee11920fbea5 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -34,6 +34,7 @@ #include #include +#include #include #include "display/intel_csr.h" @@ -46,7 +47,6 @@ #include "i915_drv.h" #include "i915_gpu_error.h" -#include "i915_memcpy.h" #include "i915_scatterlist.h" #define ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) @@ -255,7 +255,7 @@ static bool compress_init(struct i915_vma_compress *c) } c->tmp = NULL; - if (i915_has_memcpy_from_wc()) + if (drm_has_memcpy_from_wc()) c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); return true; @@ -295,7 +295,7 @@ static int compress_page(struct i915_vma_compress *c, struct z_stream_s *zstream = &c->zstream; zstream->next_in = src; - if (wc && c->tmp && i915_memcpy_from_wc(c->tmp, src, PAGE_SIZE)) + if (wc && c->tmp && drm_memcpy_from_wc(c->tmp, src, PAGE_SIZE)) zstream->next_in = c->tmp; zstream->avail_in = PAGE_SIZE; @@ -395,7 +395,7 @@ static int compress_page(struct i915_vma_compress *c, if (!ptr) return -ENOMEM; - if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) + if (!(wc && drm_memcpy_from_wc(ptr, src, PAGE_SIZE))) memcpy(ptr, src, PAGE_SIZE); dst->pages[dst->page_count++] = ptr; cond_resched(); diff --git a/drivers/gpu/drm/i915/i915_memcpy.h b/drivers/gpu/drm/i915/i915_memcpy.h deleted file mode 100644 index 3df063a3293b..000000000000 --- a/drivers/gpu/drm/i915/i915_memcpy.h +++ /dev/null @@ -1,34 +0,0 @@ -/* SPDX-License-Identifier: MIT */ -/* - * Copyright © 2019 Intel Corporation - */ - -#ifndef __I915_MEMCPY_H__ -#define __I915_MEMCPY_H__ - -#include - -struct drm_i915_private; - -void i915_memcpy_init_early(struct drm_i915_private *i915); - -bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len); -void i915_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len); - -/* The movntdqa instructions used for memcpy-from-wc require 16-byte alignment, - * as well as SSE4.1 support. i915_memcpy_from_wc() will report if it cannot - * perform the operation. To check beforehand, pass in the parameters to - * to i915_can_memcpy_from_wc() - since we only care about the low 4 bits, - * you only need to pass in the minor offsets, page-aligned pointers are - * always valid. - * - * For just checking for SSE4.1, in the foreknowledge that the future use - * will be correctly aligned, just use i915_has_memcpy_from_wc(). - */ -#define i915_can_memcpy_from_wc(dst, src, len) \ - i915_memcpy_from_wc((void *)((unsigned long)(dst) | (unsigned long)(src) | (len)), NULL, 0) - -#define i915_has_memcpy_from_wc() \ - i915_memcpy_from_wc(NULL, NULL, 0) - -#endif /* __I915_MEMCPY_H__ */ diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index c85d516b85cd..6bb399e9be78 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -6,6 +6,8 @@ #include #include +#include + #include "../i915_selftest.h" #include "mock_drm.h" @@ -20,7 +22,6 @@ #include "gem/selftests/mock_context.h" #include "gt/intel_engine_user.h" #include "gt/intel_gt.h" -#include "i915_memcpy.h" #include "selftests/igt_flush_test.h" #include "selftests/i915_random.h" @@ -901,7 +902,7 @@ static inline void igt_memcpy(void *dst, const void *src, size_t size) static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) { - i915_memcpy_from_wc(dst, src, size); + drm_memcpy_from_wc(dst, src, size); } static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -925,7 +926,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc, - !i915_has_memcpy_from_wc(), + !drm_has_memcpy_from_wc(), }, }; struct drm_i915_gem_object *src, *dst; diff --git a/include/drm/drm_memcpy.h b/include/drm/drm_memcpy.h new file mode 100644 index 000000000000..dbf52cd43382 --- /dev/null +++ b/include/drm/drm_memcpy.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2019 Intel Corporation + */ + +#ifndef __DRM_MEMCPY_H__ +#define __DRM_MEMCPY_H__ + +#include + +#ifdef CONFIG_X86 +bool drm_memcpy_from_wc(void *dst, const void *src, unsigned long len); +void drm_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len); + +/* The movntdqa instructions used for memcpy-from-wc require 16-byte alignment, + * as well as SSE4.1 support. drm_memcpy_from_wc() will report if it cannot + * perform the operation. To check beforehand, pass in the parameters to + * drm_can_memcpy_from_wc() - since we only care about the low 4 bits, + * you only need to pass in the minor offsets, page-aligned pointers are + * always valid. + * + * For just checking for SSE4.1, in the foreknowledge that the future use + * will be correctly aligned, just use drm_has_memcpy_from_wc(). + */ +#define drm_can_memcpy_from_wc(dst, src, len) \ + drm_memcpy_from_wc((void *)((unsigned long)(dst) | (unsigned long)(src) | (len)), NULL, 0) + +#define drm_has_memcpy_from_wc() \ + drm_memcpy_from_wc(NULL, NULL, 0) + +void drm_memcpy_init_early(void); + +#else + +#define drm_memcpy_from_wc(_dst, _src, _len) (false) +#define drm_can_memcpy_from_wc(_dst, _src, _len) (false) +#define drm_has_memcpy_from_wc() (false) +#define drm_unaligned_memcpy_from_wc(_dst, _src, _len) WARN_ON(1) +#define drm_memcpy_init_early() do {} while (0) +#endif /* CONFIG_X86 */ +#endif /* __DRM_MEMCPY_H__ */ From patchwork Thu May 20 15:09:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12270783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F26FC433B4 for ; Thu, 20 May 2021 15:10:46 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 44F6360C3D for ; Thu, 20 May 2021 15:10:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44F6360C3D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CA94D6E222; Thu, 20 May 2021 15:10:40 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id F3F7A6F476; Thu, 20 May 2021 15:10:38 +0000 (UTC) IronPort-SDR: soodJrzJrGB2H3JHvLrO4VoE/mbQFtgYRJRpAjbeiI+VmQu6BU5RTlnwvKmO3vD43qE/TxHePD 3xsePCamKJDw== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="222341204" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="222341204" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:04 -0700 IronPort-SDR: sWL1XLZ6gJ82jJQCMa9OAbhu+4UAbec+tsXQVHEYidVGiZ7eko4VIQbkixvId+o+XhOP7mnqS2 Vvw0HCh3IkiQ== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="395728192" Received: from cbjoerns-mobl1.ger.corp.intel.com (HELO thellst-mobl1.intel.com) ([10.249.254.247]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:03 -0700 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 20 May 2021 17:09:45 +0200 Message-Id: <20210520150947.803891-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> References: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 3/5] drm/ttm: Use drm_memcpy_from_wc for TTM bo moves X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Use fast wc memcpy for reading out of wc memory for TTM bo moves. Cc: Dave Airlie Cc: Christian König Cc: Daniel Vetter Signed-off-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_bo_util.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index bad9b16e96ba..919ee03f7eb3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -31,6 +31,7 @@ #include #include +#include #include #include #include @@ -185,6 +186,7 @@ void ttm_move_memcpy(struct ttm_buffer_object *bo, struct ttm_resource *old_mem = &bo->mem; struct ttm_resource_manager *old_man = ttm_manager_type(bdev, old_mem->mem_type); struct dma_buf_map old_map, new_map; + bool wc_memcpy; pgoff_t i; /* Single TTM move. NOP */ @@ -208,11 +210,25 @@ void ttm_move_memcpy(struct ttm_buffer_object *bo, return; } + wc_memcpy = ((!old_man->use_tt || bo->ttm->caching != ttm_cached) && + drm_has_memcpy_from_wc()); + + /* + * We use some nasty aliasing for drm_memcpy_from_wc, but assuming + * that we can move to memremapping in the not too distant future, + * reduce the fragility for now with a build assert. + */ + BUILD_BUG_ON(offsetof(typeof(old_map), vaddr) != + offsetof(typeof(old_map), vaddr_iomem)); + for (i = 0; i < new_mem->num_pages; ++i) { new_iter->ops->kmap_local(new_iter, &new_map, i); old_iter->ops->kmap_local(old_iter, &old_map, i); - if (!old_map.is_iomem && !new_map.is_iomem) { + if (wc_memcpy) { + drm_memcpy_from_wc(new_map.vaddr, old_map.vaddr, + PAGE_SIZE); + } else if (!old_map.is_iomem && !new_map.is_iomem) { memcpy(new_map.vaddr, old_map.vaddr, PAGE_SIZE); } else if (!old_map.is_iomem) { dma_buf_map_memcpy_to(&new_map, old_map.vaddr, From patchwork Thu May 20 15:09:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12270789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA5F6C43461 for ; Thu, 20 May 2021 15:10:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1F46611ED for ; Thu, 20 May 2021 15:10:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1F46611ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 90C976F47D; Thu, 20 May 2021 15:10:43 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8B8C36E222; Thu, 20 May 2021 15:10:39 +0000 (UTC) IronPort-SDR: +/X7bcty0zXF6mtjyrXOBEZuXwDozxjIOWImHz52BqMm+AiOUxXeHG6xCLHgEae8Ds+csIvlLt xdmVx7yqYuBQ== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="222341214" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="222341214" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:06 -0700 IronPort-SDR: /9hSwakneEPVnVDZhXNKI70+o49IW/xXCgRkt/WNJajpddL6blTd7hS97+wX27Jhw7Uh9t/0G7 Fg/FPDJ1+0Vg== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="395728200" Received: from cbjoerns-mobl1.ger.corp.intel.com (HELO thellst-mobl1.intel.com) ([10.249.254.247]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:05 -0700 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 20 May 2021 17:09:46 +0200 Message-Id: <20210520150947.803891-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> References: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 4/5] drm/ttm: Document and optimize ttm_bo_pipeline_gutting() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If the bo is idle when calling ttm_bo_pipeline_gutting(), we unnecessarily create a ghost object and push it out to delayed destroy. Fix this by adding a path for idle, and document the function. Also avoid having the bo end up in a bad state vulnerable to user-space triggered kernel BUGs if the call to ttm_tt_create() fails. Finally reuse ttm_bo_pipeline_gutting() in ttm_bo_evict(). Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_bo.c | 20 +++++----- drivers/gpu/drm/ttm/ttm_bo_util.c | 63 ++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_tt.c | 5 +++ include/drm/ttm/ttm_tt.h | 10 +++++ 4 files changed, 75 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ca1b098b6a56..a8fa3375b8aa 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -501,10 +501,15 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, bdev->funcs->evict_flags(bo, &placement); if (!placement.num_placement && !placement.num_busy_placement) { - ttm_bo_wait(bo, false, false); + ret = ttm_bo_wait(bo, true, false); + if (ret) + return ret; - ttm_bo_cleanup_memtype_use(bo); - return ttm_tt_create(bo, false); + /* + * Since we've already synced, this frees backing store + * immediately. + */ + return ttm_bo_pipeline_gutting(bo); } ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); @@ -974,13 +979,8 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, /* * Remove the backing store if no placement is given. */ - if (!placement->num_placement && !placement->num_busy_placement) { - ret = ttm_bo_pipeline_gutting(bo); - if (ret) - return ret; - - return ttm_tt_create(bo, false); - } + if (!placement->num_placement && !placement->num_busy_placement) + return ttm_bo_pipeline_gutting(bo); /* * Check whether we need to move buffer. diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 919ee03f7eb3..1860e2e7563f 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -479,7 +479,8 @@ static void ttm_transfered_destroy(struct ttm_buffer_object *bo) */ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, - struct ttm_buffer_object **new_obj) + struct ttm_buffer_object **new_obj, + bool realloc_tt) { struct ttm_transfer_obj *fbo; int ret; @@ -493,6 +494,17 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ttm_bo_get(bo); fbo->bo = bo; + if (realloc_tt) { + bo->ttm = NULL; + ret = ttm_tt_create(bo, true); + if (ret) { + bo->ttm = fbo->base.ttm; + kfree(fbo); + ttm_bo_put(bo); + return ret; + } + } + /** * Fix up members that we shouldn't copy directly: * TODO: Explicit member copy would probably be better here. @@ -763,7 +775,7 @@ static int ttm_bo_move_to_ghost(struct ttm_buffer_object *bo, dma_fence_put(bo->moving); bo->moving = dma_fence_get(fence); - ret = ttm_buffer_object_transfer(bo, &ghost_obj); + ret = ttm_buffer_object_transfer(bo, &ghost_obj, false); if (ret) return ret; @@ -836,26 +848,51 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, } EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); +/** + * ttm_bo_pipeline_gutting - purge the contents of a bo + * @bo: The buffer object + * + * Purge the contents of a bo, async if the bo is not idle. + * After a successful call, the bo is left unpopulated in + * system placement. The function may wait uninterruptible + * for idle on OOM. + * + * Return: 0 if successful, negative error code on failure. + */ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) { static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM }; struct ttm_buffer_object *ghost; int ret; - ret = ttm_buffer_object_transfer(bo, &ghost); - if (ret) - return ret; + /* If already idle, no need for ghost object dance. */ + ret = ttm_bo_wait(bo, false, true); + if (ret == -EBUSY) { + ret = ttm_buffer_object_transfer(bo, &ghost, true); + if (ret) + return ret; - ret = dma_resv_copy_fences(&ghost->base._resv, bo->base.resv); - /* Last resort, wait for the BO to be idle when we are OOM */ - if (ret) - ttm_bo_wait(bo, false, false); + ret = dma_resv_copy_fences(&ghost->base._resv, bo->base.resv); + /* Last resort, wait for the BO to be idle when we are OOM */ + if (ret) + ttm_bo_wait(bo, false, false); - ttm_resource_alloc(bo, &sys_mem, &bo->mem); - bo->ttm = NULL; + dma_resv_unlock(&ghost->base._resv); + ttm_bo_put(ghost); + } else { + if (!bo->ttm) { + ret = ttm_tt_create(bo, true); + if (ret) + return ret; + } else { + ttm_tt_unpopulate(bo->bdev, bo->ttm); + if (bo->type == ttm_bo_type_device) + ttm_tt_mark_for_clear(bo->ttm); + } + ttm_resource_free(bo, &bo->mem); + } - dma_resv_unlock(&ghost->base._resv); - ttm_bo_put(ghost); + ttm_resource_alloc(bo, &sys_mem, &bo->mem); return 0; } diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 539e0232cb3b..0b1053e93db2 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -134,6 +134,11 @@ void ttm_tt_destroy_common(struct ttm_device *bdev, struct ttm_tt *ttm) } EXPORT_SYMBOL(ttm_tt_destroy_common); +void ttm_tt_mark_for_clear(struct ttm_tt *ttm) +{ + ttm->page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; +} + void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm) { bdev->funcs->ttm_tt_destroy(bdev, ttm); diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index 134d09ef7766..91552c83ac79 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -157,6 +157,16 @@ int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, struct ttm_oper */ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm); +/** + * ttm_tt_mark_for_clear - Mark pages for clearing on populate. + * + * @ttm: Pointer to the ttm_tt structure + * + * Marks pages for clearing so that the next time the page vector is + * populated, the pages will be cleared. + */ +void ttm_tt_mark_for_clear(struct ttm_tt *ttm); + void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages); #if IS_ENABLED(CONFIG_AGP) From patchwork Thu May 20 15:09:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12270787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52491C433B4 for ; Thu, 20 May 2021 15:10:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1A3D7611ED for ; Thu, 20 May 2021 15:10:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A3D7611ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 33C726F47C; Thu, 20 May 2021 15:10:43 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id C3E426E1BE; Thu, 20 May 2021 15:10:39 +0000 (UTC) IronPort-SDR: 7HoshW6DOFErz9cFHTLxYOp+Rx0iIquKU3adhUINbNau6KAB06Q7IgMjW8ZqZmAczzzCt+IrAg I71fouXp2EPg== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="222341217" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="222341217" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:08 -0700 IronPort-SDR: aOJ2IxW9vdZuz0XsnjVp4bFqTypiLEI3W44H9LMBAvmXCtTNdknCk4AAfLclx/A8ypexmjdk17 HqeTPmGT9X1Q== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="395728204" Received: from cbjoerns-mobl1.ger.corp.intel.com (HELO thellst-mobl1.intel.com) ([10.249.254.247]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 08:10:06 -0700 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Thu, 20 May 2021 17:09:47 +0200 Message-Id: <20210520150947.803891-6-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> References: <20210520150947.803891-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 5/5] drm/ttm, drm/amdgpu: Allow the driver some control over swapping X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We are calling the eviction_valuable driver callback at eviction time to determine whether we actually can evict a buffer object. The upcoming i915 TTM backend needs the same functionality for swapout, and that might actually be beneficial to other drivers as well. Add an eviction_valuable call also in the swapout path. Try to keep the current behaviour for all drivers by returning true if the buffer object is already in the TTM_PL_SYSTEM placement. We change behaviour for the case where a buffer object is in a TT backed placement when swapped out, in which case the drivers normal eviction_valuable path is run. Finally make sure we don't try to swapout a bo that was recently purged and therefore unpopulated. Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +++ drivers/gpu/drm/ttm/ttm_bo.c | 41 +++++++++++++++---------- drivers/gpu/drm/ttm/ttm_tt.c | 4 +++ 3 files changed, 33 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8c7ec09eb1a4..d5a9d7a88315 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1399,6 +1399,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo, struct dma_fence *f; int i; + /* Swapout? */ + if (bo->mem.mem_type == TTM_PL_SYSTEM) + return true; + if (bo->type == ttm_bo_type_kernel && !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo))) return false; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index a8fa3375b8aa..3f7c64a1cda1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -536,6 +536,10 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo, const struct ttm_place *place) { + dma_resv_assert_held(bo->base.resv); + if (bo->mem.mem_type == TTM_PL_SYSTEM) + return true; + /* Don't evict this BO if it's outside of the * requested placement range */ @@ -558,7 +562,9 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable); * b. Otherwise, trylock it. */ static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo, - struct ttm_operation_ctx *ctx, bool *locked, bool *busy) + struct ttm_operation_ctx *ctx, + const struct ttm_place *place, + bool *locked, bool *busy) { bool ret = false; @@ -576,6 +582,12 @@ static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo, *busy = !ret; } + if (ret && place && !bo->bdev->funcs->eviction_valuable(bo, place)) { + ret = false; + if (locked) + dma_resv_unlock(bo->base.resv); + } + return ret; } @@ -630,20 +642,14 @@ int ttm_mem_evict_first(struct ttm_device *bdev, list_for_each_entry(bo, &man->lru[i], lru) { bool busy; - if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, - &busy)) { + if (!ttm_bo_evict_swapout_allowable(bo, ctx, place, + &locked, &busy)) { if (busy && !busy_bo && ticket != dma_resv_locking_ctx(bo->base.resv)) busy_bo = bo; continue; } - if (place && !bdev->funcs->eviction_valuable(bo, - place)) { - if (locked) - dma_resv_unlock(bo->base.resv); - continue; - } if (!ttm_bo_get_unless_zero(bo)) { if (locked) dma_resv_unlock(bo->base.resv); @@ -1138,10 +1144,18 @@ EXPORT_SYMBOL(ttm_bo_wait); int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, gfp_t gfp_flags) { + struct ttm_place place = {}; bool locked; int ret; - if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) + /* + * While the bo may already reside in SYSTEM placement, set + * SYSTEM as new placement to cover also the move further below. + * The driver may use the fact that we're moving from SYSTEM + * as an indication that we're about to swap out. + */ + place.mem_type = TTM_PL_SYSTEM; + if (!ttm_bo_evict_swapout_allowable(bo, ctx, &place, &locked, NULL)) return -EBUSY; if (!ttm_bo_get_unless_zero(bo)) { @@ -1166,12 +1180,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, if (bo->mem.mem_type != TTM_PL_SYSTEM) { struct ttm_operation_ctx ctx = { false, false }; struct ttm_resource evict_mem; - struct ttm_place place, hop; - - memset(&place, 0, sizeof(place)); - memset(&hop, 0, sizeof(hop)); - - place.mem_type = TTM_PL_SYSTEM; + struct ttm_place hop = {}; ret = ttm_resource_alloc(bo, &place, &evict_mem); if (unlikely(ret)) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 0b1053e93db2..d46c702ce0ca 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -263,6 +263,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm, struct page *to_page; int i, ret; + if (!ttm_tt_is_populated(ttm)) + return 0; + swap_storage = shmem_file_setup("ttm swap", size, 0); if (IS_ERR(swap_storage)) { pr_err("Failed allocating swap storage\n"); @@ -404,6 +407,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED; } +EXPORT_SYMBOL(ttm_tt_unpopulate); #ifdef CONFIG_DEBUG_FS