diff mbox series

drm: panel: fix excessive stack usage in td028ttec1_prepare

Message ID 20200107212747.4182515-1-arnd@arndb.de (mailing list archive)
State New, archived
Headers show
Series drm: panel: fix excessive stack usage in td028ttec1_prepare | expand

Commit Message

Arnd Bergmann Jan. 7, 2020, 9:27 p.m. UTC
With gcc -O3, the compiler can inline very aggressively,
leading to rather large stack usage:

drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare':
drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
 }

Marking jbt_reg_write_1() as noinline avoids the case where
multiple instances of this function get inlined into the same
stack frame and each one adds a copy of 'tx_buf'.

Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/panel/panel-tpo-td028ttec1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Laurent Pinchart Jan. 7, 2020, 10 p.m. UTC | #1
Hi Arnd,

Thank you for the patch.

On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote:
> With gcc -O3, the compiler can inline very aggressively,
> leading to rather large stack usage:
> 
> drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare':
> drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
>  }
> 
> Marking jbt_reg_write_1() as noinline avoids the case where
> multiple instances of this function get inlined into the same
> stack frame and each one adds a copy of 'tx_buf'.
> 
> Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Isn't this something that should be fixed at the compiler level ?

> ---
>  drivers/gpu/drm/panel/panel-tpo-td028ttec1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
> index cf29405a2dbe..17ee5e87141f 100644
> --- a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
> +++ b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
> @@ -105,7 +105,7 @@ static int jbt_ret_write_0(struct td028ttec1_panel *lcd, u8 reg, int *err)
>  	return ret;
>  }
>  
> -static int jbt_reg_write_1(struct td028ttec1_panel *lcd,
> +static int noinline_for_stack jbt_reg_write_1(struct td028ttec1_panel *lcd,
>  			   u8 reg, u8 data, int *err)
>  {
>  	struct spi_device *spi = lcd->spi;
Arnd Bergmann Jan. 7, 2020, 10:09 p.m. UTC | #2
On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Arnd,
>
> Thank you for the patch.
>
> On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote:
> > With gcc -O3, the compiler can inline very aggressively,
> > leading to rather large stack usage:
> >
> > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare':
> > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
> >  }
> >
> > Marking jbt_reg_write_1() as noinline avoids the case where
> > multiple instances of this function get inlined into the same
> > stack frame and each one adds a copy of 'tx_buf'.
> >
> > Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> Isn't this something that should be fixed at the compiler level ?

I suspect but have not verified that structleak gcc plugin is partly at
fault here as well, it has caused similar problems elsewhere.

If you like I can try to dig deeper before that patch gets merged,
and explain more in the changelog or open a gcc bug if necessary.

      Arnd
Laurent Pinchart Jan. 7, 2020, 10:12 p.m. UTC | #3
Hi Arnd,

On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote:
> On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote:
> > On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote:
> > > With gcc -O3, the compiler can inline very aggressively,
> > > leading to rather large stack usage:
> > >
> > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare':
> > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
> > >  }
> > >
> > > Marking jbt_reg_write_1() as noinline avoids the case where
> > > multiple instances of this function get inlined into the same
> > > stack frame and each one adds a copy of 'tx_buf'.
> > >
> > > Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> >
> > Isn't this something that should be fixed at the compiler level ?
> 
> I suspect but have not verified that structleak gcc plugin is partly at
> fault here as well, it has caused similar problems elsewhere.
> 
> If you like I can try to dig deeper before that patch gets merged,
> and explain more in the changelog or open a gcc bug if necessary.

I think we'll need to merge this in the meantime, but if gcc is able to
detect too large frame sizes, I think it should have the ability to take
a frame size limit into account when optimizing. I haven't checked if
this is already possible and just not honoured here (possibly due to a
bug) or if the feature is entirely missing. In any case we'll likely
have to live with this compiler issue for quite some time.
Arnd Bergmann Jan. 7, 2020, 10:26 p.m. UTC | #4
On Tue, Jan 7, 2020 at 11:12 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote:
> > On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote:

> > > Isn't this something that should be fixed at the compiler level ?
> >
> > I suspect but have not verified that structleak gcc plugin is partly at
> > fault here as well, it has caused similar problems elsewhere.

I checked that now, and it's indeed the structleak plugin.

Interestingly the problem goes away without the -fconserve-stack
option, which is meant to reduce the stack usage bug has the
opposite effect here (!).

I'll do some more tests tomorrow.

> > If you like I can try to dig deeper before that patch gets merged,
> > and explain more in the changelog or open a gcc bug if necessary.
>
> I think we'll need to merge this in the meantime, but if gcc is able to
> detect too large frame sizes, I think it should have the ability to take
> a frame size limit into account when optimizing. I haven't checked if
> this is already possible and just not honoured here (possibly due to a
> bug) or if the feature is entirely missing. In any case we'll likely
> have to live with this compiler issue for quite some time.

When talking to gcc developers about other files that use excessive
amounts of stack space, it was pointed out to me that this is a
fundamentally hard problem to solve in general: what usually happens
is that one optimization step uses a heuristic for inlining, but the
register allocator much later runs out of registers and spills them to
the stack at a point when it's too late to undo the earlier optimizations.

        Arnd
Arnd Bergmann Jan. 8, 2020, 10:31 a.m. UTC | #5
On Tue, Jan 7, 2020 at 11:26 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jan 7, 2020 at 11:12 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> > On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote:
> > > On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote:
>
> > > > Isn't this something that should be fixed at the compiler level ?
> > >
> > > I suspect but have not verified that structleak gcc plugin is partly at
> > > fault here as well, it has caused similar problems elsewhere.
>
> I checked that now, and it's indeed the structleak plugin.
>
> Interestingly the problem goes away without the -fconserve-stack
> option, which is meant to reduce the stack usage bug has the
> opposite effect here (!).
>
> I'll do some more tests tomorrow.

Here's a reduced test case:

struct list_head {
  struct list_head *next, *prev;
} typedef initcall_t;
struct sg_table {
  int sgl;
  int nents;
  int orig_nents;
};
struct spi_transfer {
  void *tx_buf;
  void *rx_buf;
  unsigned len;
  int tx_dma;
  int rx_dma;
  struct sg_table tx_sg;
  struct sg_table rx_sg;
  short delay_usecs;
  int delay;
  int cs_change_delay;
  int word_delay;
  int speed_hz;
  int effective_speed_hz;
  int ptp_sts_word_pre;
  int ptp_sts_word_post;
  int ptp_sts;
  _Bool timestamped_pre;
  struct list_head transfer_list;
};
void spi_sync_transfer(struct spi_transfer *, int);
void spi_write(void) {
  struct spi_transfer t;
  spi_sync_transfer(&t, 0);
}
int jbt_ret_write_0_err;
void jbt_ret_write_0(void) {
  if (jbt_ret_write_0_err)
    spi_write();
}
void jbt_reg_write_1(int *err) {
  if (*err) {
    struct spi_transfer t;
    spi_sync_transfer(&t, 1);
  }
}
void jbt_reg_write_2(int *err) {
  short tx_buf[3];
  if (err) {
    struct spi_transfer t = {tx_buf};
    spi_sync_transfer(&t, 0);
  }
}
int td028ttec1_prepare_i;
void td028ttec1_prepare() {
  int ret;
  for (; td028ttec1_prepare_i; ++td028ttec1_prepare_i) {
    jbt_ret_write_0();
  }
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_2(&ret);
  jbt_ret_write_0();
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
  jbt_reg_write_1(&ret);
}

$ arm-linux-gnueabi/bin/arm-linux-gnueabi-gcc panel-tpo-td028ttec1.c
-fplugin=scripts/gcc-plugins/structleak_plugin.so
-fplugin-arg-structleak_plugin-byref-all  -S -O3
-Wframe-larger-than=128
panel-tpo-td028ttec1.i: In function 'td028ttec1_prepare':
panel-tpo-td028ttec1.i:80:1: warning: the frame size of 192 bytes is
larger than 128 bytes [-Wframe-larger-than=]

$ arm-linux-gnueabi/bin/arm-linux-gnueabi-gcc panel-tpo-td028ttec1.c
-fplugin=scripts/gcc-plugins/structleak_plugin.so
-fplugin-arg-structleak_plugin-byref-all  -S -O3
-Wframe-larger-than=128 -fconserve-stack
panel-tpo-td028ttec1.i: In function 'td028ttec1_prepare':
panel-tpo-td028ttec1.i:80:1: warning: the frame size of 2032 bytes is
larger than 128 bytes [-Wframe-larger-than=]

I'm still not entirely sure what to make of this. The -fconserve-stack
is supposed
to prevent inlining when the frames get too large, but it appears that inlining
less has the opposite effect here, as it leaves larger structures on the stack
of the caller. structleak_plugin-byref-all causes each copy of the
'struct spi_transfer'
to be initialized (intentionally) and left on the stack (as a
side-effect of a somewhat
suboptimal implementation).

         Arnd
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
index cf29405a2dbe..17ee5e87141f 100644
--- a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
+++ b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c
@@ -105,7 +105,7 @@  static int jbt_ret_write_0(struct td028ttec1_panel *lcd, u8 reg, int *err)
 	return ret;
 }
 
-static int jbt_reg_write_1(struct td028ttec1_panel *lcd,
+static int noinline_for_stack jbt_reg_write_1(struct td028ttec1_panel *lcd,
 			   u8 reg, u8 data, int *err)
 {
 	struct spi_device *spi = lcd->spi;