diff mbox series

[v2,5.4,regression,fix] x86/boot: Provide memzero_explicit

Message ID 20191007134724.4019-1-hdegoede@redhat.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series [v2,5.4,regression,fix] x86/boot: Provide memzero_explicit | expand

Commit Message

Hans de Goede Oct. 7, 2019, 1:47 p.m. UTC
The purgatory code now uses the shared lib/crypto/sha256.c sha256
implementation. This needs memzero_explicit, implement this.

Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
Changes in v2:
- Add barrier_data() call after the memset, making the function really
  explicit. Using barrier_data() works fine in the purgatory (build)
  environment.
---
 arch/x86/boot/compressed/string.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Ingo Molnar Oct. 7, 2019, 2 p.m. UTC | #1
* Hans de Goede <hdegoede@redhat.com> wrote:

> The purgatory code now uses the shared lib/crypto/sha256.c sha256
> implementation. This needs memzero_explicit, implement this.
> 
> Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
> Changes in v2:
> - Add barrier_data() call after the memset, making the function really
>   explicit. Using barrier_data() works fine in the purgatory (build)
>   environment.
> ---
>  arch/x86/boot/compressed/string.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> index 81fc1eaa3229..654a7164a702 100644
> --- a/arch/x86/boot/compressed/string.c
> +++ b/arch/x86/boot/compressed/string.c
> @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
>  	return s;
>  }
>  
> +void memzero_explicit(void *s, size_t count)
> +{
> +	memset(s, 0, count);
> +	barrier_data(s);
> +}

So the barrier_data() is only there to keep LTO from optimizing out the 
seemingly unused function?

Is there no canonical way to do that, instead of a seemingly obscure 
barrier_data() call - which would require a comment at minimum.

We do have $(DISABLE_LTO), not sure whether it's applicable here though.

Thanks,

	Ingo
Hans de Goede Oct. 7, 2019, 2:11 p.m. UTC | #2
Hi,

On 07-10-2019 16:00, Ingo Molnar wrote:
> 
> * Hans de Goede <hdegoede@redhat.com> wrote:
> 
>> The purgatory code now uses the shared lib/crypto/sha256.c sha256
>> implementation. This needs memzero_explicit, implement this.
>>
>> Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
>> Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>> Changes in v2:
>> - Add barrier_data() call after the memset, making the function really
>>    explicit. Using barrier_data() works fine in the purgatory (build)
>>    environment.
>> ---
>>   arch/x86/boot/compressed/string.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
>> index 81fc1eaa3229..654a7164a702 100644
>> --- a/arch/x86/boot/compressed/string.c
>> +++ b/arch/x86/boot/compressed/string.c
>> @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
>>   	return s;
>>   }
>>   
>> +void memzero_explicit(void *s, size_t count)
>> +{
>> +	memset(s, 0, count);
>> +	barrier_data(s);
>> +}
> 
> So the barrier_data() is only there to keep LTO from optimizing out the
> seemingly unused function?

I believe that Stephan Mueller (who suggested adding the barrier)
was also worried about people using this as an example for other
"explicit" functions which actually might get inlined.

This is not so much about protecting against LTO as it is against
protecting against inlining, which in this case boils down to the
same thing. Also this change makes the arch/x86/boot/compressed/string.c
and lib/string.c versions identical which seems like a good thing to me
(except for the code duplication part of it).

But I agree a comment would be good, how about:

void memzero_explicit(void *s, size_t count)
{
	memset(s, 0, count);
	/* Avoid the memset getting optimized away if we ever get inlined */
	barrier_data(s);
}

?

Regards,

Hans
Ingo Molnar Oct. 7, 2019, 2:22 p.m. UTC | #3
* Hans de Goede <hdegoede@redhat.com> wrote:

> Hi,
> 
> On 07-10-2019 16:00, Ingo Molnar wrote:
> > 
> > * Hans de Goede <hdegoede@redhat.com> wrote:
> > 
> > > The purgatory code now uses the shared lib/crypto/sha256.c sha256
> > > implementation. This needs memzero_explicit, implement this.
> > > 
> > > Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> > > Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
> > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > ---
> > > Changes in v2:
> > > - Add barrier_data() call after the memset, making the function really
> > >    explicit. Using barrier_data() works fine in the purgatory (build)
> > >    environment.
> > > ---
> > >   arch/x86/boot/compressed/string.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> > > index 81fc1eaa3229..654a7164a702 100644
> > > --- a/arch/x86/boot/compressed/string.c
> > > +++ b/arch/x86/boot/compressed/string.c
> > > @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
> > >   	return s;
> > >   }
> > > +void memzero_explicit(void *s, size_t count)
> > > +{
> > > +	memset(s, 0, count);
> > > +	barrier_data(s);
> > > +}
> > 
> > So the barrier_data() is only there to keep LTO from optimizing out the
> > seemingly unused function?
> 
> I believe that Stephan Mueller (who suggested adding the barrier)
> was also worried about people using this as an example for other
> "explicit" functions which actually might get inlined.
> 
> This is not so much about protecting against LTO as it is against
> protecting against inlining, which in this case boils down to the
> same thing. Also this change makes the arch/x86/boot/compressed/string.c
> and lib/string.c versions identical which seems like a good thing to me
> (except for the code duplication part of it).
> 
> But I agree a comment would be good, how about:
> 
> void memzero_explicit(void *s, size_t count)
> {
> 	memset(s, 0, count);
> 	/* Avoid the memset getting optimized away if we ever get inlined */
> 	barrier_data(s);
> }

Well, the standard construct for preventing inlining would be 'noinline', 
right? Any reason that wouldn't work?

Thanks,

	Ingo
Hans de Goede Oct. 7, 2019, 2:29 p.m. UTC | #4
Hi,

On 07-10-2019 16:22, Ingo Molnar wrote:
> 
> * Hans de Goede <hdegoede@redhat.com> wrote:
> 
>> Hi,
>>
>> On 07-10-2019 16:00, Ingo Molnar wrote:
>>>
>>> * Hans de Goede <hdegoede@redhat.com> wrote:
>>>
>>>> The purgatory code now uses the shared lib/crypto/sha256.c sha256
>>>> implementation. This needs memzero_explicit, implement this.
>>>>
>>>> Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
>>>> Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>> Changes in v2:
>>>> - Add barrier_data() call after the memset, making the function really
>>>>     explicit. Using barrier_data() works fine in the purgatory (build)
>>>>     environment.
>>>> ---
>>>>    arch/x86/boot/compressed/string.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
>>>> index 81fc1eaa3229..654a7164a702 100644
>>>> --- a/arch/x86/boot/compressed/string.c
>>>> +++ b/arch/x86/boot/compressed/string.c
>>>> @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
>>>>    	return s;
>>>>    }
>>>> +void memzero_explicit(void *s, size_t count)
>>>> +{
>>>> +	memset(s, 0, count);
>>>> +	barrier_data(s);
>>>> +}
>>>
>>> So the barrier_data() is only there to keep LTO from optimizing out the
>>> seemingly unused function?
>>
>> I believe that Stephan Mueller (who suggested adding the barrier)
>> was also worried about people using this as an example for other
>> "explicit" functions which actually might get inlined.
>>
>> This is not so much about protecting against LTO as it is against
>> protecting against inlining, which in this case boils down to the
>> same thing. Also this change makes the arch/x86/boot/compressed/string.c
>> and lib/string.c versions identical which seems like a good thing to me
>> (except for the code duplication part of it).
>>
>> But I agree a comment would be good, how about:
>>
>> void memzero_explicit(void *s, size_t count)
>> {
>> 	memset(s, 0, count);
>> 	/* Avoid the memset getting optimized away if we ever get inlined */
>> 	barrier_data(s);
>> }
> 
> Well, the standard construct for preventing inlining would be 'noinline',
> right? Any reason that wouldn't work?

Good question. I guess the worry is that modern compilers are getting
more aggressive with optimizing and then even if not inlined if the
function gets compiled in the same scope, then the compiler might
still notice it is only every writing to the memory passed in; and
then optimize it away of the write happens to memory which lifetime
ends immediately afterwards. I mean removing the call is not inlining,
so compiler developers might decide that that is still fine to do.

IMHO with trickycode like this is is best to just use the proven
version from lib/string.c

I guess I made the comment to specific though, so how about:

void memzero_explicit(void *s, size_t count)
{
	memset(s, 0, count);
	/* Tell the compiler to never remove / optimize away the memset */
	barrier_data(s);
}

Regards,

Hans
Ingo Molnar Oct. 7, 2019, 2:46 p.m. UTC | #5
* Hans de Goede <hdegoede@redhat.com> wrote:

> Hi,
> 
> On 07-10-2019 16:22, Ingo Molnar wrote:
> > 
> > * Hans de Goede <hdegoede@redhat.com> wrote:
> > 
> > > Hi,
> > > 
> > > On 07-10-2019 16:00, Ingo Molnar wrote:
> > > > 
> > > > * Hans de Goede <hdegoede@redhat.com> wrote:
> > > > 
> > > > > The purgatory code now uses the shared lib/crypto/sha256.c sha256
> > > > > implementation. This needs memzero_explicit, implement this.
> > > > > 
> > > > > Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> > > > > Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
> > > > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > > > ---
> > > > > Changes in v2:
> > > > > - Add barrier_data() call after the memset, making the function really
> > > > >     explicit. Using barrier_data() works fine in the purgatory (build)
> > > > >     environment.
> > > > > ---
> > > > >    arch/x86/boot/compressed/string.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> > > > > index 81fc1eaa3229..654a7164a702 100644
> > > > > --- a/arch/x86/boot/compressed/string.c
> > > > > +++ b/arch/x86/boot/compressed/string.c
> > > > > @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
> > > > >    	return s;
> > > > >    }
> > > > > +void memzero_explicit(void *s, size_t count)
> > > > > +{
> > > > > +	memset(s, 0, count);
> > > > > +	barrier_data(s);
> > > > > +}
> > > > 
> > > > So the barrier_data() is only there to keep LTO from optimizing out the
> > > > seemingly unused function?
> > > 
> > > I believe that Stephan Mueller (who suggested adding the barrier)
> > > was also worried about people using this as an example for other
> > > "explicit" functions which actually might get inlined.
> > > 
> > > This is not so much about protecting against LTO as it is against
> > > protecting against inlining, which in this case boils down to the
> > > same thing. Also this change makes the arch/x86/boot/compressed/string.c
> > > and lib/string.c versions identical which seems like a good thing to me
> > > (except for the code duplication part of it).
> > > 
> > > But I agree a comment would be good, how about:
> > > 
> > > void memzero_explicit(void *s, size_t count)
> > > {
> > > 	memset(s, 0, count);
> > > 	/* Avoid the memset getting optimized away if we ever get inlined */
> > > 	barrier_data(s);
> > > }
> > 
> > Well, the standard construct for preventing inlining would be 'noinline',
> > right? Any reason that wouldn't work?
> 
> Good question. I guess the worry is that modern compilers are getting
> more aggressive with optimizing and then even if not inlined if the
> function gets compiled in the same scope, then the compiler might
> still notice it is only every writing to the memory passed in; and
> then optimize it away of the write happens to memory which lifetime
> ends immediately afterwards. I mean removing the call is not inlining,
> so compiler developers might decide that that is still fine to do.
> 
> IMHO with trickycode like this is is best to just use the proven
> version from lib/string.c
> 
> I guess I made the comment to specific though, so how about:
> 
> void memzero_explicit(void *s, size_t count)
> {
> 	memset(s, 0, count);
> 	/* Tell the compiler to never remove / optimize away the memset */
> 	barrier_data(s);
> }

Ok, I guess this will work.

Thanks,

	Ingo
Arvind Sankar Oct. 7, 2019, 3:20 p.m. UTC | #6
On Mon, Oct 07, 2019 at 04:46:00PM +0200, Ingo Molnar wrote:
> 
> * Hans de Goede <hdegoede@redhat.com> wrote:
> 
> > Hi,
> > 
> > On 07-10-2019 16:22, Ingo Molnar wrote:
> > > 
> > > * Hans de Goede <hdegoede@redhat.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > On 07-10-2019 16:00, Ingo Molnar wrote:
> > > > > 
> > > > > * Hans de Goede <hdegoede@redhat.com> wrote:
> > > > > 
> > > > > > The purgatory code now uses the shared lib/crypto/sha256.c sha256
> > > > > > implementation. This needs memzero_explicit, implement this.
> > > > > > 
> > > > > > Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> > > > > > Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
> > > > > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > > > > ---
> > > > > > Changes in v2:
> > > > > > - Add barrier_data() call after the memset, making the function really
> > > > > >     explicit. Using barrier_data() works fine in the purgatory (build)
> > > > > >     environment.
> > > > > > ---
> > > > > >    arch/x86/boot/compressed/string.c | 6 ++++++
> > > > > >    1 file changed, 6 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> > > > > > index 81fc1eaa3229..654a7164a702 100644
> > > > > > --- a/arch/x86/boot/compressed/string.c
> > > > > > +++ b/arch/x86/boot/compressed/string.c
> > > > > > @@ -50,6 +50,12 @@ void *memset(void *s, int c, size_t n)
> > > > > >    	return s;
> > > > > >    }
> > > > > > +void memzero_explicit(void *s, size_t count)
> > > > > > +{
> > > > > > +	memset(s, 0, count);
> > > > > > +	barrier_data(s);
> > > > > > +}
> > > > > 
> > > > > So the barrier_data() is only there to keep LTO from optimizing out the
> > > > > seemingly unused function?
> > > > 
> > > > I believe that Stephan Mueller (who suggested adding the barrier)
> > > > was also worried about people using this as an example for other
> > > > "explicit" functions which actually might get inlined.
> > > > 
> > > > This is not so much about protecting against LTO as it is against
> > > > protecting against inlining, which in this case boils down to the
> > > > same thing. Also this change makes the arch/x86/boot/compressed/string.c
> > > > and lib/string.c versions identical which seems like a good thing to me
> > > > (except for the code duplication part of it).
> > > > 
> > > > But I agree a comment would be good, how about:
> > > > 
> > > > void memzero_explicit(void *s, size_t count)
> > > > {
> > > > 	memset(s, 0, count);
> > > > 	/* Avoid the memset getting optimized away if we ever get inlined */
> > > > 	barrier_data(s);
> > > > }
> > > 
> > > Well, the standard construct for preventing inlining would be 'noinline',
> > > right? Any reason that wouldn't work?
> > 
> > Good question. I guess the worry is that modern compilers are getting
> > more aggressive with optimizing and then even if not inlined if the
> > function gets compiled in the same scope, then the compiler might
> > still notice it is only every writing to the memory passed in; and
> > then optimize it away of the write happens to memory which lifetime
> > ends immediately afterwards. I mean removing the call is not inlining,
> > so compiler developers might decide that that is still fine to do.
> > 
> > IMHO with trickycode like this is is best to just use the proven
> > version from lib/string.c
> > 
> > I guess I made the comment to specific though, so how about:
> > 
> > void memzero_explicit(void *s, size_t count)
> > {
> > 	memset(s, 0, count);
> > 	/* Tell the compiler to never remove / optimize away the memset */
> > 	barrier_data(s);
> > }
> 
> Ok, I guess this will work.
> 
> Thanks,
> 
> 	Ingo

With the barrier in there, is there any reason to *not* inline the
function? barrier_data() is an asm statement that tells the compiler
that the asm uses the memory that was set to zero, thus preventing it
from removing the memset even if nothing else uses that memory later. A
more detailed comment is there in compiler-gcc.h. I can't see why it
wouldn't work even if it were inlined.

If the function can indeed be inlined, we could just make the common
implementation a macro and avoid duplicating it? As mentioned in another
mail, we otherwise will likely need another duplicate implementation for
arch/s390/purgatory as well.
Ingo Molnar Oct. 7, 2019, 3:40 p.m. UTC | #7
* Arvind Sankar <nivedita@alum.mit.edu> wrote:

> With the barrier in there, is there any reason to *not* inline the
> function? barrier_data() is an asm statement that tells the compiler
> that the asm uses the memory that was set to zero, thus preventing it
> from removing the memset even if nothing else uses that memory later. A
> more detailed comment is there in compiler-gcc.h. I can't see why it
> wouldn't work even if it were inlined.
> 
> If the function can indeed be inlined, we could just make the common
> implementation a macro and avoid duplicating it? As mentioned in another
> mail, we otherwise will likely need another duplicate implementation for
> arch/s390/purgatory as well.

I suspect macro would be justified in this case. Mind sending a v3 patch 
to demonstrate how it would all look like?

I'll zap v2 if the macro solution looks better.

Thanks,

	Ingo
Arvind Sankar Oct. 7, 2019, 6:42 p.m. UTC | #8
On Mon, Oct 07, 2019 at 05:40:07PM +0200, Ingo Molnar wrote:
> 
> * Arvind Sankar <nivedita@alum.mit.edu> wrote:
> 
> > With the barrier in there, is there any reason to *not* inline the
> > function? barrier_data() is an asm statement that tells the compiler
> > that the asm uses the memory that was set to zero, thus preventing it
> > from removing the memset even if nothing else uses that memory later. A
> > more detailed comment is there in compiler-gcc.h. I can't see why it
> > wouldn't work even if it were inlined.
> > 
> > If the function can indeed be inlined, we could just make the common
> > implementation a macro and avoid duplicating it? As mentioned in another
> > mail, we otherwise will likely need another duplicate implementation for
> > arch/s390/purgatory as well.
> 
> I suspect macro would be justified in this case. Mind sending a v3 patch 
> to demonstrate how it would all look like?
> 
> I'll zap v2 if the macro solution looks better.
> 
> Thanks,
> 
> 	Ingo

Patch attached to turn memzero_explicit into inline function.
Hans de Goede Oct. 7, 2019, 7:36 p.m. UTC | #9
Hi,

On 07-10-2019 20:42, Arvind Sankar wrote:
> On Mon, Oct 07, 2019 at 05:40:07PM +0200, Ingo Molnar wrote:
>>
>> * Arvind Sankar <nivedita@alum.mit.edu> wrote:
>>
>>> With the barrier in there, is there any reason to *not* inline the
>>> function? barrier_data() is an asm statement that tells the compiler
>>> that the asm uses the memory that was set to zero, thus preventing it
>>> from removing the memset even if nothing else uses that memory later. A
>>> more detailed comment is there in compiler-gcc.h. I can't see why it
>>> wouldn't work even if it were inlined.
>>>
>>> If the function can indeed be inlined, we could just make the common
>>> implementation a macro and avoid duplicating it? As mentioned in another
>>> mail, we otherwise will likely need another duplicate implementation for
>>> arch/s390/purgatory as well.
>>
>> I suspect macro would be justified in this case. Mind sending a v3 patch
>> to demonstrate how it would all look like?
>>
>> I'll zap v2 if the macro solution looks better.
>>
>> Thanks,
>>
>> 	Ingo
> 
> Patch attached to turn memzero_explicit into inline function.

Hehe, I had prepared and have just tested the exact same patch
(only the commit msg was different).

I've just booted a kernel build with that patch and that works
fine (as expected).

So your patch is:

Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>

Since this is a bit of a core change though, I think it is
best if you send it to the linux-kernel list (with my tags from above
added) as is normally done for kernel patches. Then others, who may
not be following this thread, will get a chance to give feedback on it.

Regards,

Hans
diff mbox series

Patch

diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 81fc1eaa3229..654a7164a702 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -50,6 +50,12 @@  void *memset(void *s, int c, size_t n)
 	return s;
 }
 
+void memzero_explicit(void *s, size_t count)
+{
+	memset(s, 0, count);
+	barrier_data(s);
+}
+
 void *memmove(void *dest, const void *src, size_t n)
 {
 	unsigned char *d = dest;