Message ID | 20200107212747.4182515-1-arnd@arndb.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: panel: fix excessive stack usage in td028ttec1_prepare | expand |
Hi Arnd, Thank you for the patch. On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote: > With gcc -O3, the compiler can inline very aggressively, > leading to rather large stack usage: > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare': > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > } > > Marking jbt_reg_write_1() as noinline avoids the case where > multiple instances of this function get inlined into the same > stack frame and each one adds a copy of 'tx_buf'. > > Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Isn't this something that should be fixed at the compiler level ? > --- > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c > index cf29405a2dbe..17ee5e87141f 100644 > --- a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c > +++ b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c > @@ -105,7 +105,7 @@ static int jbt_ret_write_0(struct td028ttec1_panel *lcd, u8 reg, int *err) > return ret; > } > > -static int jbt_reg_write_1(struct td028ttec1_panel *lcd, > +static int noinline_for_stack jbt_reg_write_1(struct td028ttec1_panel *lcd, > u8 reg, u8 data, int *err) > { > struct spi_device *spi = lcd->spi;
On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Arnd, > > Thank you for the patch. > > On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote: > > With gcc -O3, the compiler can inline very aggressively, > > leading to rather large stack usage: > > > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare': > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > } > > > > Marking jbt_reg_write_1() as noinline avoids the case where > > multiple instances of this function get inlined into the same > > stack frame and each one adds a copy of 'tx_buf'. > > > > Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Isn't this something that should be fixed at the compiler level ? I suspect but have not verified that structleak gcc plugin is partly at fault here as well, it has caused similar problems elsewhere. If you like I can try to dig deeper before that patch gets merged, and explain more in the changelog or open a gcc bug if necessary. Arnd
Hi Arnd, On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote: > On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote: > > On Tue, Jan 07, 2020 at 10:27:33PM +0100, Arnd Bergmann wrote: > > > With gcc -O3, the compiler can inline very aggressively, > > > leading to rather large stack usage: > > > > > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare': > > > drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > > } > > > > > > Marking jbt_reg_write_1() as noinline avoids the case where > > > multiple instances of this function get inlined into the same > > > stack frame and each one adds a copy of 'tx_buf'. > > > > > > Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > Isn't this something that should be fixed at the compiler level ? > > I suspect but have not verified that structleak gcc plugin is partly at > fault here as well, it has caused similar problems elsewhere. > > If you like I can try to dig deeper before that patch gets merged, > and explain more in the changelog or open a gcc bug if necessary. I think we'll need to merge this in the meantime, but if gcc is able to detect too large frame sizes, I think it should have the ability to take a frame size limit into account when optimizing. I haven't checked if this is already possible and just not honoured here (possibly due to a bug) or if the feature is entirely missing. In any case we'll likely have to live with this compiler issue for quite some time.
On Tue, Jan 7, 2020 at 11:12 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote: > > On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote: > > > Isn't this something that should be fixed at the compiler level ? > > > > I suspect but have not verified that structleak gcc plugin is partly at > > fault here as well, it has caused similar problems elsewhere. I checked that now, and it's indeed the structleak plugin. Interestingly the problem goes away without the -fconserve-stack option, which is meant to reduce the stack usage bug has the opposite effect here (!). I'll do some more tests tomorrow. > > If you like I can try to dig deeper before that patch gets merged, > > and explain more in the changelog or open a gcc bug if necessary. > > I think we'll need to merge this in the meantime, but if gcc is able to > detect too large frame sizes, I think it should have the ability to take > a frame size limit into account when optimizing. I haven't checked if > this is already possible and just not honoured here (possibly due to a > bug) or if the feature is entirely missing. In any case we'll likely > have to live with this compiler issue for quite some time. When talking to gcc developers about other files that use excessive amounts of stack space, it was pointed out to me that this is a fundamentally hard problem to solve in general: what usually happens is that one optimization step uses a heuristic for inlining, but the register allocator much later runs out of registers and spills them to the stack at a point when it's too late to undo the earlier optimizations. Arnd
On Tue, Jan 7, 2020 at 11:26 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Jan 7, 2020 at 11:12 PM Laurent Pinchart > <laurent.pinchart@ideasonboard.com> wrote: > > On Tue, Jan 07, 2020 at 11:09:13PM +0100, Arnd Bergmann wrote: > > > On Tue, Jan 7, 2020 at 11:00 PM Laurent Pinchart wrote: > > > > > Isn't this something that should be fixed at the compiler level ? > > > > > > I suspect but have not verified that structleak gcc plugin is partly at > > > fault here as well, it has caused similar problems elsewhere. > > I checked that now, and it's indeed the structleak plugin. > > Interestingly the problem goes away without the -fconserve-stack > option, which is meant to reduce the stack usage bug has the > opposite effect here (!). > > I'll do some more tests tomorrow. Here's a reduced test case: struct list_head { struct list_head *next, *prev; } typedef initcall_t; struct sg_table { int sgl; int nents; int orig_nents; }; struct spi_transfer { void *tx_buf; void *rx_buf; unsigned len; int tx_dma; int rx_dma; struct sg_table tx_sg; struct sg_table rx_sg; short delay_usecs; int delay; int cs_change_delay; int word_delay; int speed_hz; int effective_speed_hz; int ptp_sts_word_pre; int ptp_sts_word_post; int ptp_sts; _Bool timestamped_pre; struct list_head transfer_list; }; void spi_sync_transfer(struct spi_transfer *, int); void spi_write(void) { struct spi_transfer t; spi_sync_transfer(&t, 0); } int jbt_ret_write_0_err; void jbt_ret_write_0(void) { if (jbt_ret_write_0_err) spi_write(); } void jbt_reg_write_1(int *err) { if (*err) { struct spi_transfer t; spi_sync_transfer(&t, 1); } } void jbt_reg_write_2(int *err) { short tx_buf[3]; if (err) { struct spi_transfer t = {tx_buf}; spi_sync_transfer(&t, 0); } } int td028ttec1_prepare_i; void td028ttec1_prepare() { int ret; for (; td028ttec1_prepare_i; ++td028ttec1_prepare_i) { jbt_ret_write_0(); } jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_2(&ret); jbt_ret_write_0(); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); jbt_reg_write_1(&ret); } $ arm-linux-gnueabi/bin/arm-linux-gnueabi-gcc panel-tpo-td028ttec1.c -fplugin=scripts/gcc-plugins/structleak_plugin.so -fplugin-arg-structleak_plugin-byref-all -S -O3 -Wframe-larger-than=128 panel-tpo-td028ttec1.i: In function 'td028ttec1_prepare': panel-tpo-td028ttec1.i:80:1: warning: the frame size of 192 bytes is larger than 128 bytes [-Wframe-larger-than=] $ arm-linux-gnueabi/bin/arm-linux-gnueabi-gcc panel-tpo-td028ttec1.c -fplugin=scripts/gcc-plugins/structleak_plugin.so -fplugin-arg-structleak_plugin-byref-all -S -O3 -Wframe-larger-than=128 -fconserve-stack panel-tpo-td028ttec1.i: In function 'td028ttec1_prepare': panel-tpo-td028ttec1.i:80:1: warning: the frame size of 2032 bytes is larger than 128 bytes [-Wframe-larger-than=] I'm still not entirely sure what to make of this. The -fconserve-stack is supposed to prevent inlining when the frames get too large, but it appears that inlining less has the opposite effect here, as it leaves larger structures on the stack of the caller. structleak_plugin-byref-all causes each copy of the 'struct spi_transfer' to be initialized (intentionally) and left on the stack (as a side-effect of a somewhat suboptimal implementation). Arnd
diff --git a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c index cf29405a2dbe..17ee5e87141f 100644 --- a/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c +++ b/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c @@ -105,7 +105,7 @@ static int jbt_ret_write_0(struct td028ttec1_panel *lcd, u8 reg, int *err) return ret; } -static int jbt_reg_write_1(struct td028ttec1_panel *lcd, +static int noinline_for_stack jbt_reg_write_1(struct td028ttec1_panel *lcd, u8 reg, u8 data, int *err) { struct spi_device *spi = lcd->spi;
With gcc -O3, the compiler can inline very aggressively, leading to rather large stack usage: drivers/gpu/drm/panel/panel-tpo-td028ttec1.c: In function 'td028ttec1_prepare': drivers/gpu/drm/panel/panel-tpo-td028ttec1.c:233:1: error: the frame size of 2768 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] } Marking jbt_reg_write_1() as noinline avoids the case where multiple instances of this function get inlined into the same stack frame and each one adds a copy of 'tx_buf'. Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/gpu/drm/panel/panel-tpo-td028ttec1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)