tpm: fix cacheline alignment for DMA-able buffers

Message ID	1469761153-85576-1-git-send-email-apronin@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <tpmdd-devel-bounces@lists.sourceforge.net> Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of chromium.org designates 209.85.220.46 as permitted sender) client-ip=209.85.220.46; envelope-from=apronin@chromium.org; helo=mail-pa0-f46.google.com; From: Andrey Pronin <apronin@chromium.org> To: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Date: Thu, 28 Jul 2016 19:59:13 -0700 Message-Id: <1469761153-85576-1-git-send-email-apronin@chromium.org> Cc: Christophe Ricard <christophe.ricard@gmail.com>, linux-kernel@vger.kernel.org, tpmdd-devel@lists.sourceforge.net, dtor@chromium.org Subject: [tpmdd-devel] [PATCH] tpm: fix cacheline alignment for DMA-able buffers Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: tpmdd-devel-bounces@lists.sourceforge.net

Andrey Pronin July 29, 2016, 2:59 a.m. UTC

Annotate buffers used in spi transactions as ____cacheline_aligned
to use in DMA transfers.

Signed-off-by: Andrey Pronin <apronin@chromium.org>
---
 drivers/char/tpm/st33zp24/spi.c | 4 ++--
 drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Jason Gunthorpe July 29, 2016, 5:27 p.m. UTC | #1

On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> Annotate buffers used in spi transactions as ____cacheline_aligned
> to use in DMA transfers.
> 
> Signed-off-by: Andrey Pronin <apronin@chromium.org>
>  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
> index 9f5a011..0e9aad9 100644
> +++ b/drivers/char/tpm/st33zp24/spi.c
> @@ -70,8 +70,8 @@
>  struct st33zp24_spi_phy {
>  	struct spi_device *spi_device;
>  
> -	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> -	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> +	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> +	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>  
>  	int io_lpcpd;
>  	int latency;

Hurm, this still looks wrong to me. Aligning the start of buffers is
not enough, the DMA'able space must also end on a cache line as well.

So, the buffers must also always be placed at the end of the struct.

IMHO It would be cleaner and safer to always kmalloc the DMA buffer
alone than to try and optimize like this.

Jason

------------------------------------------------------------------------------

Dmitry Torokhov July 29, 2016, 5:30 p.m. UTC | #2

On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe <
jgunthorpe@obsidianresearch.com> wrote:

> On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> > Annotate buffers used in spi transactions as ____cacheline_aligned
> > to use in DMA transfers.
> >
> > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/char/tpm/st33zp24/spi.c
> b/drivers/char/tpm/st33zp24/spi.c
> > index 9f5a011..0e9aad9 100644
> > +++ b/drivers/char/tpm/st33zp24/spi.c
> > @@ -70,8 +70,8 @@
> >  struct st33zp24_spi_phy {
> >       struct spi_device *spi_device;
> >
> > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >
> >       int io_lpcpd;
> >       int latency;
>
> Hurm, this still looks wrong to me. Aligning the start of buffers is
> not enough, the DMA'able space must also end on a cache line as well.
>
> So, the buffers must also always be placed at the end of the struct.
>
> IMHO It would be cleaner and safer to always kmalloc the DMA buffer
> alone than to try and optimize like this.
>

In this case moving them to the end of the structure and commenting why
they have to be at the end might be less invasive change. More
performance-efficient and resilient in low memory situations too.

Thanks,
Dmitry
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Jarkko Sakkinen Aug. 9, 2016, 9:46 a.m. UTC | #3

On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
>    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
>    <jgunthorpe@obsidianresearch.com> wrote:
> 
>      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
>      > Annotate buffers used in spi transactions as ____cacheline_aligned
>      > to use in DMA transfers.
>      >
>      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
>      >Â  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>      >Â  drivers/char/tpm/tpm_tis_spi.cÂ  | 4 ++--
>      >Â  2 files changed, 4 insertions(+), 4 deletions(-)
>      >
>      > diff --git a/drivers/char/tpm/st33zp24/spi.c
>      b/drivers/char/tpm/st33zp24/spi.c
>      > index 9f5a011..0e9aad9 100644
>      > +++ b/drivers/char/tpm/st33zp24/spi.c
>      > @@ -70,8 +70,8 @@
>      >Â  struct st33zp24_spi_phy {
>      >Â  Â  Â  Â struct spi_device *spi_device;
>      >
>      > -Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > -Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > +Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>      > +Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>      >
>      >Â  Â  Â  Â int io_lpcpd;
>      >Â  Â  Â  Â int latency;
> 
>      Hurm, this still looks wrong to me. Aligning the start of buffers is
>      not enough, the DMA'able space must also end on a cache line as well.
> 
>      So, the buffers must also always be placed at the end of the struct.
> 
>      IMHO It would be cleaner and safer to always kmalloc the DMA buffer
>      alone than to try and optimize like this.
> 
>    In this case moving them to the end of the structure and commenting why
>    they have to be at the end might be less invasive change. More
>    performance-efficient and resilient in low memory situations too.

kmallocs would be done in the driver initialization:

* you rarely are in low memory situation
* performance gain/loss is insignificant

I really don't see your point.

>    Thanks,
>    Dmitry

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Jarkko Sakkinen Aug. 9, 2016, 3:01 p.m. UTC | #4

On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
> On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
> >    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
> >    <jgunthorpe@obsidianresearch.com> wrote:
> > 
> >      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> >      > Annotate buffers used in spi transactions as ____cacheline_aligned
> >      > to use in DMA transfers.
> >      >
> >      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> >      >Â  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> >      >Â  drivers/char/tpm/tpm_tis_spi.cÂ  | 4 ++--
> >      >Â  2 files changed, 4 insertions(+), 4 deletions(-)
> >      >
> >      > diff --git a/drivers/char/tpm/st33zp24/spi.c
> >      b/drivers/char/tpm/st33zp24/spi.c
> >      > index 9f5a011..0e9aad9 100644
> >      > +++ b/drivers/char/tpm/st33zp24/spi.c
> >      > @@ -70,8 +70,8 @@
> >      >Â  struct st33zp24_spi_phy {
> >      >Â  Â  Â  Â struct spi_device *spi_device;
> >      >
> >      > -Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> >      > -Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> >      > +Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >      > +Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >      >
> >      >Â  Â  Â  Â int io_lpcpd;
> >      >Â  Â  Â  Â int latency;
> > 
> >      Hurm, this still looks wrong to me. Aligning the start of buffers is
> >      not enough, the DMA'able space must also end on a cache line as well.
> > 
> >      So, the buffers must also always be placed at the end of the struct.
> > 
> >      IMHO It would be cleaner and safer to always kmalloc the DMA buffer
> >      alone than to try and optimize like this.
> > 
> >    In this case moving them to the end of the structure and commenting why
> >    they have to be at the end might be less invasive change. More
> >    performance-efficient and resilient in low memory situations too.
> 
> kmallocs would be done in the driver initialization:
> 
> * you rarely are in low memory situation
> * performance gain/loss is insignificant
> 
> I really don't see your point.

I'm fine having them at the end of the structure mainly for simplicity
reasons but those arguments just didn't hold at all.

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Dmitry Torokhov Aug. 9, 2016, 3:18 p.m. UTC | #5

On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen <
jarkko.sakkinen@linux.intel.com> wrote:

> On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
> > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
> > >    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
> > >    <jgunthorpe@obsidianresearch.com> wrote:
> > >
> > >      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> > >      > Annotate buffers used in spi transactions as
> ____cacheline_aligned
> > >      > to use in DMA transfers.
> > >      >
> > >      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> > >      >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> > >      >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
> > >      >  2 files changed, 4 insertions(+), 4 deletions(-)
> > >      >
> > >      > diff --git a/drivers/char/tpm/st33zp24/spi.c
> > >      b/drivers/char/tpm/st33zp24/spi.c
> > >      > index 9f5a011..0e9aad9 100644
> > >      > +++ b/drivers/char/tpm/st33zp24/spi.c
> > >      > @@ -70,8 +70,8 @@
> > >      >  struct st33zp24_spi_phy {
> > >      >       struct spi_device *spi_device;
> > >      >
> > >      > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > >      > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > >      > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]
> ____cacheline_aligned;
> > >      > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]
> ____cacheline_aligned;
> > >      >
> > >      >       int io_lpcpd;
> > >      >       int latency;
> > >
> > >      Hurm, this still looks wrong to me. Aligning the start of buffers
> is
> > >      not enough, the DMA'able space must also end on a cache line as
> well.
> > >
> > >      So, the buffers must also always be placed at the end of the
> struct.
> > >
> > >      IMHO It would be cleaner and safer to always kmalloc the DMA
> buffer
> > >      alone than to try and optimize like this.
> > >
> > >    In this case moving them to the end of the structure and commenting
> why
> > >    they have to be at the end might be less invasive change. More
> > >    performance-efficient and resilient in low memory situations too.
> >
> > kmallocs would be done in the driver initialization:
> >
> > * you rarely are in low memory situation
> > * performance gain/loss is insignificant
> >
> > I really don't see your point.
>
> I'm fine having them at the end of the structure mainly for simplicity
> reasons but those arguments just didn't hold at all.
>

Well, the main reason was simplicity and invasiveness of the change.

But I still maintain that doing 3 memory allocations instead of 1 is less
performant and puts more pressure on the kernel. Yes, it is at bind time,
but you do not have to do 3 times work when one allocation will suffice.
Also, driver binding does not necessarily happen at boot time. I can always
unbind and rebind the driver or reload the module.

Thanks,
Dmitry
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Jason Gunthorpe Aug. 9, 2016, 10:08 p.m. UTC | #6

On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote:

>    Well, the main reason was simplicity and invasiveness of the
>    change.

Well, it isn't simple, because the proposed patches have had subtle
problems with DMA. Simple is to use a guaranteed dma-able allocation
for DMA memory and stop trying to over optimize.

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Jarkko Sakkinen Aug. 10, 2016, 10:36 a.m. UTC | #7

On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote:
>    On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen
>    <jarkko.sakkinen@linux.intel.com> wrote:
> 
>      On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
>      > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
>      > >Â  Â  On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
>      > >Â  Â  <jgunthorpe@obsidianresearch.com> wrote:
>      > >
>      > >Â  Â  Â  On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin
>      wrote:
>      > >Â  Â  Â  > Annotate buffers used in spi transactions as
>      ____cacheline_aligned
>      > >Â  Â  Â  > to use in DMA transfers.
>      > >Â  Â  Â  >
>      > >Â  Â  Â  > Signed-off-by: Andrey Pronin <apronin@chromium.org>
>      > >Â  Â  Â  >Â  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>      > >Â  Â  Â  >Â  drivers/char/tpm/tpm_tis_spi.cÂ  | 4 ++--
>      > >Â  Â  Â  >Â  2 files changed, 4 insertions(+), 4 deletions(-)
>      > >Â  Â  Â  >
>      > >Â  Â  Â  > diff --git a/drivers/char/tpm/st33zp24/spi.c
>      > >Â  Â  Â  b/drivers/char/tpm/st33zp24/spi.c
>      > >Â  Â  Â  > index 9f5a011..0e9aad9 100644
>      > >Â  Â  Â  > +++ b/drivers/char/tpm/st33zp24/spi.c
>      > >Â  Â  Â  > @@ -70,8 +70,8 @@
>      > >Â  Â  Â  >Â  struct st33zp24_spi_phy {
>      > >Â  Â  Â  >Â  Â  Â  Â struct spi_device *spi_device;
>      > >Â  Â  Â  >
>      > >Â  Â  Â  > -Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > >Â  Â  Â  > -Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > >Â  Â  Â  > +Â  Â  Â u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]
>      ____cacheline_aligned;
>      > >Â  Â  Â  > +Â  Â  Â u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]
>      ____cacheline_aligned;
>      > >Â  Â  Â  >
>      > >Â  Â  Â  >Â  Â  Â  Â int io_lpcpd;
>      > >Â  Â  Â  >Â  Â  Â  Â int latency;
>      > >
>      > >Â  Â  Â  Hurm, this still looks wrong to me. Aligning the start of
>      buffers is
>      > >Â  Â  Â  not enough, the DMA'able space must also end on a cache line
>      as well.
>      > >
>      > >Â  Â  Â  So, the buffers must also always be placed at the end of the
>      struct.
>      > >
>      > >Â  Â  Â  IMHO It would be cleaner and safer to always kmalloc the DMA
>      buffer
>      > >Â  Â  Â  alone than to try and optimize like this.
>      > >
>      > >Â  Â  In this case moving them to the end of the structure and
>      commenting why
>      > >Â  Â  they have to be at the end might be less invasive change. More
>      > >Â  Â  performance-efficient and resilient in low memory situations
>      too.
>      >
>      > kmallocs would be done in the driver initialization:
>      >
>      > * you rarely are in low memory situation
>      > * performance gain/loss is insignificant
>      >
>      > I really don't see your point.
> 
>      I'm fine having them at the end of the structure mainly for simplicity
>      reasons but those arguments just didn't hold at all.
> 
>    Well, the main reason was simplicity and invasiveness of the change.
>    But I still maintain that doing 3 memory allocations instead of 1 is less
>    performant and puts more pressure on the kernel. Yes, it is at bind time,
>    but you do not have to do 3 times work when one allocation will suffice.
>    Also, driver binding does not necessarily happen at boot time. I can
>    always unbind and rebind the driver or reload the module.

I'm fine with either approach.

>    Thanks,
>    Dmitry

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

tpm: fix cacheline alignment for DMA-able buffers

Commit Message

Comments

Patch