Message ID | 1439363150-8661-30-git-send-email-hch@lst.de (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Dan Williams |
Headers | show |
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig <hch@lst.de> wrote: > Make all cache invalidation conditional on sg_has_page() and use > sg_phys to get the physical address directly. So this worries me a bit (I'm just reacting to one random patch in the series). The reason? I think this wants a big honking comment somewhere saying "non-sg_page accesses are not necessarily cache coherent"). Now, I don't think that's _wrong_, but it's an important distinction: if you look up pages in the page tables directly, there's a very subtle difference between then saving just the pfn and saving the "struct page" of the result. On sane architectures, this whole cache flushing thing doesn't matter. Which just means that it's going to be even more subtle on the odd broken ones.. I'm assuming that anybody who wants to use the page-less scatter-gather lists always does so on memory that isn't actually virtually mapped at all, or only does so on sane architectures that are cache coherent at a physical level, but I'd like that assumption *documented* somewhere. (And maybe it is, and I just didn't get to that patch yet) Linus
On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote: > I'm assuming that anybody who wants to use the page-less > scatter-gather lists always does so on memory that isn't actually > virtually mapped at all, or only does so on sane architectures that > are cache coherent at a physical level, but I'd like that assumption > *documented* somewhere. It's temporarily mapped by kmap-like helpers. That code isn't in this series. The most recent version of it is here: https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0 note that it's not doing the cache flushing it would have to do yet, but it's also only enabled for x86 at the moment.
On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote: >> I'm assuming that anybody who wants to use the page-less >> scatter-gather lists always does so on memory that isn't actually >> virtually mapped at all, or only does so on sane architectures that >> are cache coherent at a physical level, but I'd like that assumption >> *documented* somewhere. > > It's temporarily mapped by kmap-like helpers. That code isn't in > this series. The most recent version of it is here: > > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0 > > note that it's not doing the cache flushing it would have to do yet, but > it's also only enabled for x86 at the moment. For virtually tagged caches I assume we would temporarily map with kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements powerpc support. However with DAX we could end up with multiple virtual aliases for a page-less pfn.
On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote: > On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote: > >> I'm assuming that anybody who wants to use the page-less > >> scatter-gather lists always does so on memory that isn't actually > >> virtually mapped at all, or only does so on sane architectures that > >> are cache coherent at a physical level, but I'd like that assumption > >> *documented* somewhere. > > > > It's temporarily mapped by kmap-like helpers. That code isn't in > > this series. The most recent version of it is here: > > > > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0 > > > > note that it's not doing the cache flushing it would have to do yet, but > > it's also only enabled for x86 at the moment. > > For virtually tagged caches I assume we would temporarily map with > kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements > powerpc support. However with DAX we could end up with multiple > virtual aliases for a page-less pfn. At least on some PA architectures, you have to be very careful. Improperly managed, multiple aliases will cause the system to crash (actually a machine check in the cache chequerboard). For the most temperamental systems, we need the cache line flushed and the alias mapping ejected from the TLB cache before we access the same page at an inequivalent alias. James
From: James Bottomley <James.Bottomley@HansenPartnership.com> Date: Thu, 13 Aug 2015 20:59:20 -0700 > On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote: >> On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote: >> > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote: >> >> I'm assuming that anybody who wants to use the page-less >> >> scatter-gather lists always does so on memory that isn't actually >> >> virtually mapped at all, or only does so on sane architectures that >> >> are cache coherent at a physical level, but I'd like that assumption >> >> *documented* somewhere. >> > >> > It's temporarily mapped by kmap-like helpers. That code isn't in >> > this series. The most recent version of it is here: >> > >> > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0 >> > >> > note that it's not doing the cache flushing it would have to do yet, but >> > it's also only enabled for x86 at the moment. >> >> For virtually tagged caches I assume we would temporarily map with >> kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements >> powerpc support. However with DAX we could end up with multiple >> virtual aliases for a page-less pfn. > > At least on some PA architectures, you have to be very careful. > Improperly managed, multiple aliases will cause the system to crash > (actually a machine check in the cache chequerboard). For the most > temperamental systems, we need the cache line flushed and the alias > mapping ejected from the TLB cache before we access the same page at an > inequivalent alias. Also, I want to mention that on sparc64 we manage the cache aliasing state in the page struct. Until a page is mapped into userspace, we just record the most recent cpu to store into that page with kernel side mappings. Once the page ends up being mapped or the cpu doing kernel side stores changes, we actually perform the cache flush. Generally speaking, I think that all actual physical memory the kernel operates on should have a struct page backing it. So this whole discussion of operating on physical memory in scatter lists without backing page structs feels really foreign to me.
On Thu, Aug 13, 2015 at 9:11 PM, David Miller <davem@davemloft.net> wrote: > From: James Bottomley <James.Bottomley@HansenPartnership.com> >> At least on some PA architectures, you have to be very careful. >> Improperly managed, multiple aliases will cause the system to crash >> (actually a machine check in the cache chequerboard). For the most >> temperamental systems, we need the cache line flushed and the alias >> mapping ejected from the TLB cache before we access the same page at an >> inequivalent alias. > > Also, I want to mention that on sparc64 we manage the cache aliasing > state in the page struct. > > Until a page is mapped into userspace, we just record the most recent > cpu to store into that page with kernel side mappings. Once the page > ends up being mapped or the cpu doing kernel side stores changes, we > actually perform the cache flush. > > Generally speaking, I think that all actual physical memory the kernel > operates on should have a struct page backing it. So this whole > discussion of operating on physical memory in scatter lists without > backing page structs feels really foreign to me. So the only way for page-less pfns to enter the system is through the ->direct_access() method provided by a pmem device's struct block_device_operations. Architectures that require struct page for cache management to must disable ->direct_access() in this case. If an arch still wants to support pmem+DAX then it needs something like this patchset (feedback welcome) to map pmem pfns: https://lkml.org/lkml/2015/8/12/970 Effectively this would disable ->direct_access() on /dev/pmem0, but permit ->direct_access() on /dev/pmem0m.
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index b9402c9..6cad0e0 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -483,11 +483,13 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, int n BUG_ON(direction == DMA_NONE); for_each_sg(sglist, sg, nents, i) { - unsigned long vaddr = (unsigned long)sg_virt(sg); - - sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr); + sg_dma_address(sg) = sg_phys(sg); sg_dma_len(sg) = sg->length; - flush_kernel_dcache_range(vaddr, sg->length); + + if (sg_has_page(sg)) { + flush_kernel_dcache_range((unsigned long)sg_virt(sg), + sg->length); + } } return nents; } @@ -504,9 +506,10 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, in /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg->length); - return; + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg->length); + } } static void pa11_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, unsigned long offset, size_t size, enum dma_data_direction direction) @@ -530,8 +533,10 @@ static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg->length); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg->length); + } } static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction) @@ -541,8 +546,10 @@ static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist * /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg->length); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg->length); + } } struct hppa_dma_ops pcxl_dma_ops = {
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig <hch@lst.de> --- arch/parisc/kernel/pci-dma.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-)