Message ID | 20190219170209.4180739-2-arnd@arndb.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] media: saa7146: avoid high stack usage with clang | expand |
On Tue, Feb 19, 2019 at 9:02 AM Arnd Bergmann <arnd@arndb.de> wrote: > > Clang-9 makes some different inlining decisions compared to gcc, which > leads to a warning about a possible stack overflow problem when building > with CONFIG_KASAN, including when setting asan-stack=0, which avoids > most other frame overflow warnings: > > drivers/media/platform/vicodec/codec-fwht.c:673:12: error: stack frame size of 2224 bytes in function 'encode_plane' > > Manually adding noinline_for_stack annotations in those functions Thanks for the fix! In general, for -Wstack-frame-larger-than= warnings, is it possible that these sets of stack frames are already too large if entered? Sure, inlining was a little aggressive, causing more stack space use than maybe otherwise necessary at runtime, but isn't it also possible that "no inlining" a stack frame can still be a problem should the stack frame be entered? Doesn't the kernel have a way of estimating the stack depth for any given frame? I guess I was always curious if the best fix for these kind of warnings was to non-stack allocate (kmalloc) certain locally allocated structs, or no-inline the function. Surely there's cases where no-inlining is safe, but I was curious if it's still maybe dangerous to enter the problematic child most stack frame? > called by encode_plane() or decode_plane() that require a significant > amount of kernel stack makes this impossible to happen with any > compiler. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > drivers/media/platform/vicodec/codec-fwht.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/drivers/media/platform/vicodec/codec-fwht.c b/drivers/media/platform/vicodec/codec-fwht.c > index d1d6085da9f1..135d56bcc2c5 100644 > --- a/drivers/media/platform/vicodec/codec-fwht.c > +++ b/drivers/media/platform/vicodec/codec-fwht.c > @@ -47,7 +47,7 @@ static const uint8_t zigzag[64] = { > }; > > > -static int rlc(const s16 *in, __be16 *output, int blocktype) > +static int noinline_for_stack rlc(const s16 *in, __be16 *output, int blocktype) > { > s16 block[8 * 8]; > s16 *wp = block; > @@ -106,8 +106,8 @@ static int rlc(const s16 *in, __be16 *output, int blocktype) > * This function will worst-case increase rlc_in by 65*2 bytes: > * one s16 value for the header and 8 * 8 coefficients of type s16. > */ > -static u16 derlc(const __be16 **rlc_in, s16 *dwht_out, > - const __be16 *end_of_input) > +static noinline_for_stack u16 > +derlc(const __be16 **rlc_in, s16 *dwht_out, const __be16 *end_of_input) > { > /* header */ > const __be16 *input = *rlc_in; > @@ -373,7 +373,8 @@ static void fwht(const u8 *block, s16 *output_block, unsigned int stride, > * Furthermore values can be negative... This is just a version that > * works with 16 signed data > */ > -static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) > +static void noinline_for_stack > +fwht16(const s16 *block, s16 *output_block, int stride, int intra) > { > /* we'll need more than 8 bits for the transformed coefficients */ > s32 workspace1[8], workspace2[8]; > @@ -456,7 +457,8 @@ static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) > } > } > > -static void ifwht(const s16 *block, s16 *output_block, int intra) > +static noinline_for_stack void > +ifwht(const s16 *block, s16 *output_block, int intra) > { > /* > * we'll need more than 8 bits for the transformed coefficients > @@ -604,9 +606,9 @@ static int var_inter(const s16 *old, const s16 *new) > return ret; > } > > -static int decide_blocktype(const u8 *cur, const u8 *reference, > - s16 *deltablock, unsigned int stride, > - unsigned int input_step) > +static noinline_for_stack int > +decide_blocktype(const u8 *cur, const u8 *reference, s16 *deltablock, > + unsigned int stride, unsigned int input_step) > { > s16 tmp[64]; > s16 old[64]; > -- > 2.20.0 >
On Tue, Feb 19, 2019 at 8:02 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > On Tue, Feb 19, 2019 at 9:02 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > Clang-9 makes some different inlining decisions compared to gcc, which > > leads to a warning about a possible stack overflow problem when building > > with CONFIG_KASAN, including when setting asan-stack=0, which avoids > > most other frame overflow warnings: > > > > drivers/media/platform/vicodec/codec-fwht.c:673:12: error: stack frame size of 2224 bytes in function 'encode_plane' > > > > Manually adding noinline_for_stack annotations in those functions > > Thanks for the fix! In general, for -Wstack-frame-larger-than= > warnings, is it possible that these sets of stack frames are already > too large if entered? Sure, inlining was a little aggressive, causing > more stack space use than maybe otherwise necessary at runtime, but > isn't it also possible that "no inlining" a stack frame can still be a > problem should the stack frame be entered? Doesn't the kernel have a > way of estimating the stack depth for any given frame? I guess I was > always curious if the best fix for these kind of warnings was to > non-stack allocate (kmalloc) certain locally allocated structs, or > no-inline the function. Surely there's cases where no-inlining is > safe, but I was curious if it's still maybe dangerous to enter the > problematic child most stack frame? What I think is happening here is that llvm fails to combine the stack allocations for the inlined functions in certain conditions, while gcc can reuse it here. We had similar issues in gcc a few years ago, and they got fixed there, but I have not looked at this one in more detail. My guess is that it's related to the bug I mentioned in patch 3. Arnd
On 2/19/19 6:01 PM, Arnd Bergmann wrote: > Clang-9 makes some different inlining decisions compared to gcc, which > leads to a warning about a possible stack overflow problem when building > with CONFIG_KASAN, including when setting asan-stack=0, which avoids > most other frame overflow warnings: > > drivers/media/platform/vicodec/codec-fwht.c:673:12: error: stack frame size of 2224 bytes in function 'encode_plane' > > Manually adding noinline_for_stack annotations in those functions > called by encode_plane() or decode_plane() that require a significant > amount of kernel stack makes this impossible to happen with any > compiler. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > drivers/media/platform/vicodec/codec-fwht.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/drivers/media/platform/vicodec/codec-fwht.c b/drivers/media/platform/vicodec/codec-fwht.c > index d1d6085da9f1..135d56bcc2c5 100644 > --- a/drivers/media/platform/vicodec/codec-fwht.c > +++ b/drivers/media/platform/vicodec/codec-fwht.c > @@ -47,7 +47,7 @@ static const uint8_t zigzag[64] = { > }; > > > -static int rlc(const s16 *in, __be16 *output, int blocktype) > +static int noinline_for_stack rlc(const s16 *in, __be16 *output, int blocktype) > { > s16 block[8 * 8]; > s16 *wp = block; > @@ -106,8 +106,8 @@ static int rlc(const s16 *in, __be16 *output, int blocktype) > * This function will worst-case increase rlc_in by 65*2 bytes: > * one s16 value for the header and 8 * 8 coefficients of type s16. > */ > -static u16 derlc(const __be16 **rlc_in, s16 *dwht_out, > - const __be16 *end_of_input) > +static noinline_for_stack u16 > +derlc(const __be16 **rlc_in, s16 *dwht_out, const __be16 *end_of_input) > { > /* header */ > const __be16 *input = *rlc_in; > @@ -373,7 +373,8 @@ static void fwht(const u8 *block, s16 *output_block, unsigned int stride, > * Furthermore values can be negative... This is just a version that > * works with 16 signed data > */ > -static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) > +static void noinline_for_stack > +fwht16(const s16 *block, s16 *output_block, int stride, int intra) > { > /* we'll need more than 8 bits for the transformed coefficients */ > s32 workspace1[8], workspace2[8]; > @@ -456,7 +457,8 @@ static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) > } > } > > -static void ifwht(const s16 *block, s16 *output_block, int intra) > +static noinline_for_stack void > +ifwht(const s16 *block, s16 *output_block, int intra) > { Please add it for fwht as well. It makes no sense to have it for fwht16, ifwht but not the fwht function. Got to say this is all very magic... I think it would be good to perhaps have a comment at the start of the source that explains why noinline_for_stack is added to selected functions. Patches 1 & 3 are fine, BTW. Regards, Hans > /* > * we'll need more than 8 bits for the transformed coefficients > @@ -604,9 +606,9 @@ static int var_inter(const s16 *old, const s16 *new) > return ret; > } > > -static int decide_blocktype(const u8 *cur, const u8 *reference, > - s16 *deltablock, unsigned int stride, > - unsigned int input_step) > +static noinline_for_stack int > +decide_blocktype(const u8 *cur, const u8 *reference, s16 *deltablock, > + unsigned int stride, unsigned int input_step) > { > s16 tmp[64]; > s16 old[64]; >
diff --git a/drivers/media/platform/vicodec/codec-fwht.c b/drivers/media/platform/vicodec/codec-fwht.c index d1d6085da9f1..135d56bcc2c5 100644 --- a/drivers/media/platform/vicodec/codec-fwht.c +++ b/drivers/media/platform/vicodec/codec-fwht.c @@ -47,7 +47,7 @@ static const uint8_t zigzag[64] = { }; -static int rlc(const s16 *in, __be16 *output, int blocktype) +static int noinline_for_stack rlc(const s16 *in, __be16 *output, int blocktype) { s16 block[8 * 8]; s16 *wp = block; @@ -106,8 +106,8 @@ static int rlc(const s16 *in, __be16 *output, int blocktype) * This function will worst-case increase rlc_in by 65*2 bytes: * one s16 value for the header and 8 * 8 coefficients of type s16. */ -static u16 derlc(const __be16 **rlc_in, s16 *dwht_out, - const __be16 *end_of_input) +static noinline_for_stack u16 +derlc(const __be16 **rlc_in, s16 *dwht_out, const __be16 *end_of_input) { /* header */ const __be16 *input = *rlc_in; @@ -373,7 +373,8 @@ static void fwht(const u8 *block, s16 *output_block, unsigned int stride, * Furthermore values can be negative... This is just a version that * works with 16 signed data */ -static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) +static void noinline_for_stack +fwht16(const s16 *block, s16 *output_block, int stride, int intra) { /* we'll need more than 8 bits for the transformed coefficients */ s32 workspace1[8], workspace2[8]; @@ -456,7 +457,8 @@ static void fwht16(const s16 *block, s16 *output_block, int stride, int intra) } } -static void ifwht(const s16 *block, s16 *output_block, int intra) +static noinline_for_stack void +ifwht(const s16 *block, s16 *output_block, int intra) { /* * we'll need more than 8 bits for the transformed coefficients @@ -604,9 +606,9 @@ static int var_inter(const s16 *old, const s16 *new) return ret; } -static int decide_blocktype(const u8 *cur, const u8 *reference, - s16 *deltablock, unsigned int stride, - unsigned int input_step) +static noinline_for_stack int +decide_blocktype(const u8 *cur, const u8 *reference, s16 *deltablock, + unsigned int stride, unsigned int input_step) { s16 tmp[64]; s16 old[64];
Clang-9 makes some different inlining decisions compared to gcc, which leads to a warning about a possible stack overflow problem when building with CONFIG_KASAN, including when setting asan-stack=0, which avoids most other frame overflow warnings: drivers/media/platform/vicodec/codec-fwht.c:673:12: error: stack frame size of 2224 bytes in function 'encode_plane' Manually adding noinline_for_stack annotations in those functions called by encode_plane() or decode_plane() that require a significant amount of kernel stack makes this impossible to happen with any compiler. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/media/platform/vicodec/codec-fwht.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)