Message ID | 20200220122719.4302-1-thomas_os@shipmail.org (mailing list archive) |
---|---|
Headers | show |
Series | Huge page-table entries for TTM | expand |
Andrew, Michal I'm wondering what's the best way here to get the patches touching mm reviewed and accepted? While drm people and VMware internal people have looked at them, I think the huge_fault() fallback splitting and the introduction of vma_is_special_huge() needs looking at more thoroughly. Apart from that, if possible, I think the best way to merge this series is also through a DRM tree. Thanks, Thomas On 2/20/20 1:27 PM, Thomas Hellström (VMware) wrote: > In order to reduce TLB misses and CPU usage this patchset enables huge- > and giant page-table entries for TTM and TTM-enabled graphics drivers. > > Patch 1 and 2 introduce a vma_is_special_huge() function to make the mm code > take the same path as DAX when splitting huge- and giant page table entries, > (which currently means zapping the page-table entry and rely on re-faulting). > > Patch 3 makes the mm code split existing huge page-table entries > on huge_fault fallbacks. Typically on COW or on buffer-objects that want > write-notify. COW and write-notification is always done on the lowest > page-table level. See the patch log message for additional considerations. > > Patch 4 introduces functions to allow the graphics drivers to manipulate > the caching- and encryption flags of huge page-table entries without ugly > hacks. > > Patch 5 implements the huge_fault handler in TTM. > This enables huge page-table entries, provided that the kernel is configured > to support transhuge pages, either by default or using madvise(). > However, they are unlikely to be inserted unless the kernel buffer object > pfns and user-space addresses align perfectly. There are various options > here, but since buffer objects that reside in system pages typically start > at huge page boundaries if they are backed by huge pages, we try to enforce > buffer object starting pfns and user-space addresses to be huge page-size > aligned if their size exceeds a huge page-size. If pud-size transhuge > ("giant") pages are enabled by the arch, the same holds for those. > > Patch 6 implements a specialized huge_fault handler for vmwgfx. > The vmwgfx driver may perform dirty-tracking and needs some special code > to handle that correctly. > > Patch 7 implements a drm helper to align user-space addresses according > to the above scheme, if possible. > > Patch 8 implements a TTM range manager for vmwgfx that does the same for > graphics IO memory. This may later be reused by other graphics drivers > if necessary. > > Patch 9 finally hooks up the helpers of patch 7 and 8 to the vmwgfx driver. > A similar change is needed for graphics drivers that want a reasonable > likelyhood of actually using huge page-table entries. > > If a buffer object size is not huge-page or giant-page aligned, > its size will NOT be inflated by this patchset. This means that the buffer > object tail will use smaller size page-table entries and thus no memory > overhead occurs. Drivers that want to pay the memory overhead price need to > implement their own scheme to inflate buffer-object sizes. > > PMD size huge page-table-entries have been tested with vmwgfx and found to > work well both with system memory backed and IO memory backed buffer objects. > > PUD size giant page-table-entries have seen limited (fault and COW) testing > using a modified kernel (to support 1GB page allocations) and a fake vmwgfx > TTM memory type. The vmwgfx driver does otherwise not support 1GB-size IO > memory resources. > > Comments and suggestions welcome. > Thomas > > Changes since RFC: > * Check for buffer objects present in contigous IO Memory (Christian König) > * Rebased on the vmwgfx emulated coherent memory functionality. That rebase > adds patch 5. > Changes since v1: > * Make the new TTM range manager vmwgfx-specific. (Christian König) > * Minor fixes for configs that don't support or only partially support > transhuge pages. > Changes since v2: > * Minor coding style and doc fixes in patch 5/9 (Christian König) > * Patch 5/9 doesn't touch mm. Remove from the patch title. > Changes since v3: > * Added reviews and acks > * Implemented ugly but generic ttm_pgprot_is_wrprotecting() instead of arch > specific code. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: Dan Williams <dan.j.williams@intel.com> > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, 28 Feb 2020 14:08:04 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > I'm wondering what's the best way here to get the patches touching mm > reviewed and accepted? > While drm people and VMware internal people have looked at them, I think > the huge_fault() fallback splitting and the introduction of > vma_is_special_huge() needs looking at more thoroughly. > > Apart from that, if possible, I think the best way to merge this series > is also through a DRM tree. Patches 1-3 look OK to me. I just had a few commenting/changelogging niggles.
On Thu, 20 Feb 2020 13:27:15 +0100 Thomas Hellstrom wrote: > + > +static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf, > + enum page_entry_size pe_size) > +{ > + struct vm_area_struct *vma = vmf->vma; > + pgprot_t prot; > + struct ttm_buffer_object *bo = vma->vm_private_data; > + vm_fault_t ret; > + pgoff_t fault_page_size = 0; > + bool write = vmf->flags & FAULT_FLAG_WRITE; > + > + switch (pe_size) { > + case PE_SIZE_PMD: > + fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT; > + break; > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > + case PE_SIZE_PUD: > + fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT; > + break; > +#endif > + default: > + WARN_ON_ONCE(1); > + return VM_FAULT_FALLBACK; > + } > + > + /* Fallback on write dirty-tracking or COW */ > + if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot)) > + return VM_FAULT_FALLBACK; > + > + ret = ttm_bo_vm_reserve(bo, vmf); > + if (ret) > + return ret; > + > + prot = vm_get_page_prot(vma->vm_flags); > + ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size); > + if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) > + return ret; Seems it does not make much sense to check VM_FAULT_RETRY and return as at least resv lock is left behind without care. > + > + dma_resv_unlock(bo->base.resv); > + > + return ret; > +}
On Sun, 2020-03-01 at 21:49 +0800, Hillf Danton wrote: > On Thu, 20 Feb 2020 13:27:15 +0100 Thomas Hellstrom wrote: > > + > > +static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf, > > + enum page_entry_size pe_size) > > +{ > > + struct vm_area_struct *vma = vmf->vma; > > + pgprot_t prot; > > + struct ttm_buffer_object *bo = vma->vm_private_data; > > + vm_fault_t ret; > > + pgoff_t fault_page_size = 0; > > + bool write = vmf->flags & FAULT_FLAG_WRITE; > > + > > + switch (pe_size) { > > + case PE_SIZE_PMD: > > + fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT; > > + break; > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > > + case PE_SIZE_PUD: > > + fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT; > > + break; > > +#endif > > + default: > > + WARN_ON_ONCE(1); > > + return VM_FAULT_FALLBACK; > > + } > > + > > + /* Fallback on write dirty-tracking or COW */ > > + if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot)) > > + return VM_FAULT_FALLBACK; > > + > > + ret = ttm_bo_vm_reserve(bo, vmf); > > + if (ret) > > + return ret; > > + > > + prot = vm_get_page_prot(vma->vm_flags); > > + ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size); > > + if (ret == VM_FAULT_RETRY && !(vmf->flags & > > FAULT_FLAG_RETRY_NOWAIT)) > > + return ret; > > Seems it does not make much sense to check VM_FAULT_RETRY and return > as > at least resv lock is left behind without care. With this particular flag combination, both the mm_sem and the dma_resv lock have already been released by TTM. It's a special case allowing for drivers to release the mmap_sem when waiting for IO. That should probably be documented better in TTM. /Thomas
On Fri 28-02-20 14:08:04, Thomas Hellström (VMware) wrote: > Andrew, Michal > > I'm wondering what's the best way here to get the patches touching mm > reviewed and accepted? I am sorry, but I am busy with other stuff and unlikely to find time to review this series.
On 3/1/20 5:04 AM, Andrew Morton wrote: > On Fri, 28 Feb 2020 14:08:04 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > >> I'm wondering what's the best way here to get the patches touching mm >> reviewed and accepted? >> While drm people and VMware internal people have looked at them, I think >> the huge_fault() fallback splitting and the introduction of >> vma_is_special_huge() needs looking at more thoroughly. >> >> Apart from that, if possible, I think the best way to merge this series >> is also through a DRM tree. > Patches 1-3 look OK to me. I just had a few commenting/changelogging > niggles. Thanks for reviewing, Andrew. I just updated the series following your comments. Thanks, Thomas