diff mbox

DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

Message ID be3bb850-a9f2-61fa-e378-eb44489256e0@chelsio.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show

Commit Message

Harsh Jain Sept. 16, 2017, 6:11 a.m. UTC
Hi,

While debugging DMA mapping error in chelsio crypto driver we observed that when scatter/gather list received by driver has some entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA error.  Without IOMMU it works fine.

Before reaching to chelsio crypto driver(driver/crypto/chelsio) following entities change the sg'

1) IN esp_output() "__skb_to_sgvec()" convert skb frags to scatter gather list. At that moment sg->offset was 4094.
2) From esp_output control reaches to "crypto_authenc_encrypt()". Here in "scatterwalk_ffwd()" sg->offset become 4110.
3) Same sg list received by chelsio crypto driver(chcr). When chcr try to do DMA mapping it starts giving DMA errors.

Following error observed. first two prints are added for debugging in chcr. Kernel version used to reproduce is 4.9.28 on x86_64.

Sep 15 12:40:52 heptagon kernel: process_cipher req src ffff8803cb41f0a8
Sep 15 12:40:52 heptagon kernel: ========= issue    hit offset:4110 ======= dma_addr f24b000e ==> DMA mapped address returned by dma_map_sg()

Sep 15 12:40:52 heptagon kernel: DMAR: DRHD: handling fault status reg 2
Sep 15 12:40:52 heptagon kernel: DMAR: [DMA Write] Request device [02:00.4] fault addr f24b0000 [fault reason 05] PTE Write access is not set

 By applying following hack in kernel. Things start working.

1) We are not expecting issue in "scatterwalk_ffwd" because it is not the only place where kernel 
updates src->offset without checking page boundary. similar logic used in "__skb_to_sgvec".
 
2) It cannot be driver's responsibilty to update received sg entries to adjust offset and page 
because we are not the only one who directly uses received sg list.

3) Since Without IOMMU every thing works fine. We are expecting IOMMU bugs.

Regards

Harsh Jain

Comments

Herbert Xu Sept. 20, 2017, 8:01 a.m. UTC | #1
Harsh Jain <Harsh@chelsio.com> wrote:
> 
> While debugging DMA mapping error in chelsio crypto driver we observed that when scatter/gather list received by driver has some entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA error.  Without IOMMU it works fine.

This is not a bug.  The network stack can and will feed us such
SG lists.

> 2) It cannot be driver's responsibilty to update received sg entries to adjust offset and page 
> because we are not the only one who directly uses received sg list.

No the driver must deal with this.  Having said that, if we can
improve our driver helper interface to make this easier then we
should do that too.  What we certainly shouldn't do is to take a
whack-a-mole approach like this patch does.

Cheers,
Harsh Jain Sept. 20, 2017, 11:30 a.m. UTC | #2
On 20-09-2017 13:31, Herbert Xu wrote:
> Harsh Jain <Harsh@chelsio.com> wrote:
>> While debugging DMA mapping error in chelsio crypto driver we observed that when scatter/gather list received by driver has some entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA error.  Without IOMMU it works fine.
> This is not a bug.  The network stack can and will feed us such
> SG lists.
>
>> 2) It cannot be driver's responsibilty to update received sg entries to adjust offset and page 
>> because we are not the only one who directly uses received sg list.
> No the driver must deal with this.  Having said that, if we can
> improve our driver helper interface to make this easier then we
> should do that too.  What we certainly shouldn't do is to take a
> whack-a-mole approach like this patch does.
Agreed,I added that patch for understanding purpose only. Today I referred other crypto driver for DMA related code. Most of them are using dma_map_sg except QAT. In QAT, They are first updating the Page address using offset then mapping each page in for loop with dma_map_single(0. I will try the same in chelsio driver will see the behavior.
>
> Cheers,
David Woodhouse Sept. 25, 2017, 6:45 p.m. UTC | #3
On Wed, 2017-09-20 at 16:01 +0800, Herbert Xu wrote:
> Harsh Jain <Harsh@chelsio.com> wrote:
> > 
> > While debugging DMA mapping error in chelsio crypto driver we
> observed that when scatter/gather list received by driver has some
> entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA
> error.  Without IOMMU it works fine.
> 
> This is not a bug.  The network stack can and will feed us such
> SG lists.

Hm? Under what circumstances is the offset permitted to be >
PAGE_SIZE?
Casey Leedom Sept. 25, 2017, 8:19 p.m. UTC | #4
| From: David Woodhouse <dwmw2@infradead.org>
| Sent: Monday, September 25, 2017 11:45 AM
|
| On Wed, 2017-09-20 at 16:01 +0800, Herbert Xu wrote:
| > Harsh Jain <Harsh@chelsio.com> wrote:
| > >
| > > While debugging DMA mapping error in chelsio crypto driver we
| > observed that when scatter/gather list received by driver has some
| > entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA
| > error.  Without IOMMU it works fine.
| >
| > This is not a bug.  The network stack can and will feed us such
| > SG lists.
|
| Hm? Under what circumstances is the offset permitted to be >
| PAGE_SIZE?

  As I noted earlier, this is an area of the kernel with which I'm not super
familiar.  Both Herbert Xu and our local VM Expert have said that having
Scatter/Gather Lists with Offsets greater than Page Size is not a bug ...

  I'm mostly trying to help out keeping focus on this because Harsh is in
India (presumable enjoying a good night's sleep while we look at this.
Hopefully we'll have a present of a bug fix for him when he wakes up ... :-)

Casey
Harsh Jain Sept. 26, 2017, 11:17 a.m. UTC | #5
On 26-09-2017 00:15, David Woodhouse wrote:
> On Wed, 2017-09-20 at 16:01 +0800, Herbert Xu wrote:
>> Harsh Jain <Harsh@chelsio.com> wrote:
>>>  
>>> While debugging DMA mapping error in chelsio crypto driver we
>> observed that when scatter/gather list received by driver has some
>> entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA
>> error.  Without IOMMU it works fine.
>>
>> This is not a bug.  The network stack can and will feed us such
>> SG lists.
> Hm? Under what circumstances is the offset permitted to be >
> PAGE_SIZE? 
Its random, Kernel API's don't check offset value after arithmetic operations like in "__skb_to_sgvec()", "scatterwalk_ffwd()".
diff mbox

Patch

diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c
index c16c94f8..1d75a3a 100644
--- a/crypto/scatterwalk.c
+++ b/crypto/scatterwalk.c
@@ -78,6 +78,8 @@  struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2]
                                     struct scatterlist *src,
                                     unsigned int len)
 {
+       unsigned int mod_page_offset;
+
        for (;;) {
                if (!len)
                        return src;
@@ -90,7 +92,9 @@  struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2]
        }

        sg_init_table(dst, 2);
-       sg_set_page(dst, sg_page(src), src->length - len, src->offset + len);
+        mod_page_offset = (src->offset + len) / PAGE_SIZE;
+       sg_set_page(dst, sg_page(src) + mod_page_offset, src->length - len,
+                   (src->offset + len) - (mod_page_offset * PAGE_SIZE));
        scatterwalk_crypto_chain(dst, sg_next(src), 0, 2);