diff mbox

[for-4.9] livepatch: Declare live patching as a supported feature

Message ID 20170626153650.23017-1-ross.lagerwall@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ross Lagerwall June 26, 2017, 3:36 p.m. UTC
Xen Live Patching has been available as tech preview feature since Xen
4.7 and has now had a couple of releases to stabilize. Xen Live patching
has been used by multiple vendors to fix several real-world security
issues without any severe bugs encountered. Additionally, there are now
tests in OSSTest that test live patching to ensure that no regressions
are introduced.

Based on the amount of testing and usage it has had, we are ready to
declare live patching as a 'Supported' feature.

Live patching is slightly peculiar when it comes to support because it
allows the host administrator to break their system rather easily
depending on the content of the live patch.
Because of this, it is worth detailing out the scope of security
support:

* Unprivileged access to live patching operations:
    Live patching operations should only be accessible to privileged
    guests and it shall be treated as a security issue if this is not
    the case.

* Bugs in the patch-application code such that vulnerabilities exist
  after application:
    If a correct live patch is loaded but it is not applied correctly
    such that it might result in an insecure system (e.g. not all
    functions are patched), it shall be treated as a security issue.

* Bugs in livepatch-build-tools creating incorrect live patch that
  results in an insecure host:
    If livepatch-build-tools creates an incorrect live patch that
    results in an insecure host, this shall not be considered a security
    issue. There are too many OSes and toolchains to consider supporting
    this. A live patch should be checked to verify that it is valid
    before loading.

* Loading an incorrect live patch that results in an insecure host or
  host crash:
    If a live patch (whether created using livepatch-build-tools or some
    alternative) is loaded and it results in an insecure host or host
    crash due to the content of the live patch being incorrect or the
    issue being inappropriate to live patch, this is not considered as a
    security issue.

* Bugs in the live patch parsing code (the ELF loader):
    Bugs in the live patch parsing code such as out-of-bounds reads
    caused by invalid ELF files are not considered to be security issues
    because the it can only be triggered by a privileged domain.

* Bugs which allow a guest to prevent the application of a livepatch:
    A guest should not be able to prevent the application of a live
    patch. If an unprivileged guest can prevent the application of a
    live patch, it shall be treated as a security issue.

There are also some generic security questions which it is worth asking:

1) Is guest->host privilege escalation possible?

The new live patching sysctl subops are only accessible to privileged
domains and this is tested by OSSTest with an XTF test.
There is a caveat -- an incorrect live patch can introduce a guest->host
privilege escalation.

2) Is guest user->guest kernel escalation possible?

No, although an incorrect live patch can introduce a guest user->guest
kernel privilege escalation.

3) Is there any information leakage?

The new live patching sysctl subops are only accessible to privileged
domains so it is not possible for an unprivileged guest to access the
list of loaded live patches. This is tested by OSSTest with an XTF test.
There is a caveat -- an incorrect live patch can introduce an
information leakage.

4) Can a Denial-of-Service be triggered?

There are no known ways that an unprivileged guest can prevent a live
patch from being loaded.
Once again, there is a caveat that an incorrect live patch can introduce
an arbitrary denial of service.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/common/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Cooper June 26, 2017, 4:39 p.m. UTC | #1
On 26/06/17 16:36, Ross Lagerwall wrote:
> Xen Live Patching has been available as tech preview feature since Xen
> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
> has been used by multiple vendors to fix several real-world security
> issues without any severe bugs encountered. Additionally, there are now
> tests in OSSTest that test live patching to ensure that no regressions
> are introduced.
>
> Based on the amount of testing and usage it has had, we are ready to
> declare live patching as a 'Supported' feature.
>
> Live patching is slightly peculiar when it comes to support because it
> allows the host administrator to break their system rather easily
> depending on the content of the live patch.
> Because of this, it is worth detailing out the scope of security
> support:
>
> * Unprivileged access to live patching operations:
>     Live patching operations should only be accessible to privileged
>     guests and it shall be treated as a security issue if this is not
>     the case.
>
> * Bugs in the patch-application code such that vulnerabilities exist
>   after application:
>     If a correct live patch is loaded but it is not applied correctly
>     such that it might result in an insecure system (e.g. not all
>     functions are patched), it shall be treated as a security issue.
>
> * Bugs in livepatch-build-tools creating incorrect live patch that
>   results in an insecure host:
>     If livepatch-build-tools creates an incorrect live patch that
>     results in an insecure host, this shall not be considered a security
>     issue. There are too many OSes and toolchains to consider supporting
>     this. A live patch should be checked to verify that it is valid
>     before loading.
>
> * Loading an incorrect live patch that results in an insecure host or
>   host crash:
>     If a live patch (whether created using livepatch-build-tools or some
>     alternative) is loaded and it results in an insecure host or host
>     crash due to the content of the live patch being incorrect or the
>     issue being inappropriate to live patch, this is not considered as a
>     security issue.
>
> * Bugs in the live patch parsing code (the ELF loader):
>     Bugs in the live patch parsing code such as out-of-bounds reads
>     caused by invalid ELF files are not considered to be security issues
>     because the it can only be triggered by a privileged domain.

For these last points, I think it is worth stating that people using
livepatching are expected to test their patches in a test environment first.

>
> * Bugs which allow a guest to prevent the application of a livepatch:
>     A guest should not be able to prevent the application of a live
>     patch. If an unprivileged guest can prevent the application of a
>     live patch, it shall be treated as a security issue.

This one is harder to say.  We know that enough concurrent live
migrations can, which extends to "lots of activity in the guest".  Its
perhaps worth noting the potential workaround of `xl pause $DOM;
xen-livepatch ...; xl unpause`.

I'd prefer that we excluded situations like this from being within
security support.  "guest having heavy workloads" is normal for end
users, so shouldn't constitute a security vulnerability, as there is
nothing we can do about it.

>
> There are also some generic security questions which it is worth asking:
>
> 1) Is guest->host privilege escalation possible?
>
> The new live patching sysctl subops are only accessible to privileged
> domains and this is tested by OSSTest with an XTF test.
> There is a caveat -- an incorrect live patch can introduce a guest->host
> privilege escalation.
>
> 2) Is guest user->guest kernel escalation possible?
>
> No, although an incorrect live patch can introduce a guest user->guest
> kernel privilege escalation.
>
> 3) Is there any information leakage?
>
> The new live patching sysctl subops are only accessible to privileged
> domains so it is not possible for an unprivileged guest to access the
> list of loaded live patches. This is tested by OSSTest with an XTF test.
> There is a caveat -- an incorrect live patch can introduce an
> information leakage.
>
> 4) Can a Denial-of-Service be triggered?
>
> There are no known ways that an unprivileged guest can prevent a live
> patch from being loaded.
> Once again, there is a caveat that an incorrect live patch can introduce
> an arbitrary denial of service.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

This is all good, but this information needs to be in a file in
docs/features/, most probably livepatching.pandoc

> ---
>  xen/common/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index dc8e876..876086c 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -226,7 +226,7 @@ config CRYPTO
>  	bool
>  
>  config LIVEPATCH
> -	bool "Live patching support (TECH PREVIEW)"
> +	bool "Live patching support"
>  	default n

This default should flip as well.

~Andrew

>  	depends on HAS_BUILD_ID = "y"
>  	---help---
George Dunlap June 26, 2017, 4:50 p.m. UTC | #2
On 26/06/17 17:39, Andrew Cooper wrote:
>> * Bugs which allow a guest to prevent the application of a livepatch:
>>     A guest should not be able to prevent the application of a live
>>     patch. If an unprivileged guest can prevent the application of a
>>     live patch, it shall be treated as a security issue.
> 
> This one is harder to say.  We know that enough concurrent live
> migrations can, which extends to "lots of activity in the guest".  Its
> perhaps worth noting the potential workaround of `xl pause $DOM;
> xen-livepatch ...; xl unpause`.

And what if the guest can prevent itself from being paused?

Or, what if the guest can trigger some other persistent state change
such that livepatching will fail even if the domain is paused (or
destroyed)?

I agree that as long as the patch can be applied after "xl pause", then
the domain cannot be said to be preventing the application of the
livepatch.  But if either 'xl pause' doesn't work, or if livepatching
fails due to a malicious domain's actions after 'xl pause' (or 'xl
destroy'), then it should be treated as a security issue.

> This is all good, but this information needs to be in a file in
> docs/features/, most probably livepatching.pandoc

+1

 -George
Ross Lagerwall June 26, 2017, 4:50 p.m. UTC | #3
On 06/26/2017 05:39 PM, Andrew Cooper wrote:
> On 26/06/17 16:36, Ross Lagerwall wrote:
snip
>> * Unprivileged access to live patching operations:
>>      Live patching operations should only be accessible to privileged
>>      guests and it shall be treated as a security issue if this is not
>>      the case.
>>
>> * Bugs in the patch-application code such that vulnerabilities exist
>>    after application:
>>      If a correct live patch is loaded but it is not applied correctly
>>      such that it might result in an insecure system (e.g. not all
>>      functions are patched), it shall be treated as a security issue.
>>
>> * Bugs in livepatch-build-tools creating incorrect live patch that
>>    results in an insecure host:
>>      If livepatch-build-tools creates an incorrect live patch that
>>      results in an insecure host, this shall not be considered a security
>>      issue. There are too many OSes and toolchains to consider supporting
>>      this. A live patch should be checked to verify that it is valid
>>      before loading.
>>
>> * Loading an incorrect live patch that results in an insecure host or
>>    host crash:
>>      If a live patch (whether created using livepatch-build-tools or some
>>      alternative) is loaded and it results in an insecure host or host
>>      crash due to the content of the live patch being incorrect or the
>>      issue being inappropriate to live patch, this is not considered as a
>>      security issue.
>>
>> * Bugs in the live patch parsing code (the ELF loader):
>>      Bugs in the live patch parsing code such as out-of-bounds reads
>>      caused by invalid ELF files are not considered to be security issues
>>      because the it can only be triggered by a privileged domain.
> 
> For these last points, I think it is worth stating that people using
> livepatching are expected to test their patches in a test environment first.

OK.

> 
>>
>> * Bugs which allow a guest to prevent the application of a livepatch:
>>      A guest should not be able to prevent the application of a live
>>      patch. If an unprivileged guest can prevent the application of a
>>      live patch, it shall be treated as a security issue.
> 
> This one is harder to say.  We know that enough concurrent live
> migrations can, which extends to "lots of activity in the guest".  Its
> perhaps worth noting the potential workaround of `xl pause $DOM;
> xen-livepatch ...; xl unpause`.
> 
> I'd prefer that we excluded situations like this from being within
> security support.  "guest having heavy workloads" is normal for end
> users, so shouldn't constitute a security vulnerability, as there is
> nothing we can do about it.

But surely live migrations cannot be triggered by the guest, only the 
host administrator? I don't know of any way of triggering the timeout 
from within an unprivileged guest.

> 
>>
>> There are also some generic security questions which it is worth asking:
>>
>> 1) Is guest->host privilege escalation possible?
>>
>> The new live patching sysctl subops are only accessible to privileged
>> domains and this is tested by OSSTest with an XTF test.
>> There is a caveat -- an incorrect live patch can introduce a guest->host
>> privilege escalation.
>>
>> 2) Is guest user->guest kernel escalation possible?
>>
>> No, although an incorrect live patch can introduce a guest user->guest
>> kernel privilege escalation.
>>
>> 3) Is there any information leakage?
>>
>> The new live patching sysctl subops are only accessible to privileged
>> domains so it is not possible for an unprivileged guest to access the
>> list of loaded live patches. This is tested by OSSTest with an XTF test.
>> There is a caveat -- an incorrect live patch can introduce an
>> information leakage.
>>
>> 4) Can a Denial-of-Service be triggered?
>>
>> There are no known ways that an unprivileged guest can prevent a live
>> patch from being loaded.
>> Once again, there is a caveat that an incorrect live patch can introduce
>> an arbitrary denial of service.
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> This is all good, but this information needs to be in a file in
> docs/features/, most probably livepatching.pandoc

OK.

> 
>> ---
>>   xen/common/Kconfig | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
>> index dc8e876..876086c 100644
>> --- a/xen/common/Kconfig
>> +++ b/xen/common/Kconfig
>> @@ -226,7 +226,7 @@ config CRYPTO
>>   	bool
>>   
>>   config LIVEPATCH
>> -	bool "Live patching support (TECH PREVIEW)"
>> +	bool "Live patching support"
>>   	default n
> 
> This default should flip as well.
> 

OK.
Ian Jackson June 26, 2017, 4:53 p.m. UTC | #4
George Dunlap writes ("Re: [PATCH for-4.9] livepatch: Declare live patching as a supported feature"):
> I agree that as long as the patch can be applied after "xl pause", then
> the domain cannot be said to be preventing the application of the
> livepatch.  But if either 'xl pause' doesn't work, or if livepatching
> fails due to a malicious domain's actions after 'xl pause' (or 'xl
> destroy'), then it should be treated as a security issue.

+1

Ian.
George Dunlap June 26, 2017, 5 p.m. UTC | #5
On 26/06/17 16:36, Ross Lagerwall wrote:
> Xen Live Patching has been available as tech preview feature since Xen
> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
> has been used by multiple vendors to fix several real-world security
> issues without any severe bugs encountered. Additionally, there are now
> tests in OSSTest that test live patching to ensure that no regressions
> are introduced.
> 
> Based on the amount of testing and usage it has had, we are ready to
> declare live patching as a 'Supported' feature.

Great write-up, Ross, thanks.  I more or less agree with everything
except...

> * Bugs in livepatch-build-tools creating incorrect live patch that
>   results in an insecure host:
>     If livepatch-build-tools creates an incorrect live patch that
>     results in an insecure host, this shall not be considered a security
>     issue. There are too many OSes and toolchains to consider supporting
>     this. A live patch should be checked to verify that it is valid
>     before loading.

I'm not sure I follow the argument here.  Suppose in one months' time it
is discovered that livepatch-build-tools, under some circumstances,
creates patches that open up a side vulnerability.  Do you really think
we should just post a fix to the mailing list, without alerting anybody
who may be affected by it?

Rememeber, "security support" doesn't mean, "We promise there are no
bugs".  It means, "If bugs are discovered, we will notify people
according to the XenProject Security Response Process"; and this is not
only for people on the pre-disclosure list, but for everyone *not* on
the list as well, to have one place to find all security-related issues
relevant to Xen.

 -George
Andrew Cooper June 26, 2017, 5:04 p.m. UTC | #6
On 26/06/17 17:50, Ross Lagerwall wrote:
> On 06/26/2017 05:39 PM, Andrew Cooper wrote:
>> On 26/06/17 16:36, Ross Lagerwall wrote:
>>
>>>
>>> * Bugs which allow a guest to prevent the application of a livepatch:
>>>      A guest should not be able to prevent the application of a live
>>>      patch. If an unprivileged guest can prevent the application of a
>>>      live patch, it shall be treated as a security issue.
>>
>> This one is harder to say.  We know that enough concurrent live
>> migrations can, which extends to "lots of activity in the guest".  Its
>> perhaps worth noting the potential workaround of `xl pause $DOM;
>> xen-livepatch ...; xl unpause`.
>>
>> I'd prefer that we excluded situations like this from being within
>> security support.  "guest having heavy workloads" is normal for end
>> users, so shouldn't constitute a security vulnerability, as there is
>> nothing we can do about it.
>
> But surely live migrations cannot be triggered by the guest, only the
> host administrator? I don't know of any way of triggering the timeout
> from within an unprivileged guest.

Every VCPU issuing a loop of decrease/increase reservation on a single
gfn will cause a similar quantity of p2m lock contention.

On older AMD hardware, we have to hold the p2m read lock to service
hypercalls, which is why XSA-114 was issued.

~Andrew
Andrew Cooper June 26, 2017, 5:18 p.m. UTC | #7
On 26/06/17 17:50, George Dunlap wrote:
> On 26/06/17 17:39, Andrew Cooper wrote:
>>> * Bugs which allow a guest to prevent the application of a livepatch:
>>>     A guest should not be able to prevent the application of a live
>>>     patch. If an unprivileged guest can prevent the application of a
>>>     live patch, it shall be treated as a security issue.
>> This one is harder to say.  We know that enough concurrent live
>> migrations can, which extends to "lots of activity in the guest".  Its
>> perhaps worth noting the potential workaround of `xl pause $DOM;
>> xen-livepatch ...; xl unpause`.
> And what if the guest can prevent itself from being paused?

In which case, that is an XSA in its own right.

The underlying implementation uses XEN_DOMCTL_{,un}pausedomain which
call straight into domain_{un,}pause().  We have very big problems if
the guest has any influence in this...

>
> Or, what if the guest can trigger some other persistent state change
> such that livepatching will fail even if the domain is paused (or
> destroyed)?

Such as?

The guest being able to cause damaging mutative state change in Xen is
clearly a security issue, irrespective of any livepatch involvement.

However, livepatch content (hook function for example) which trips over
state as found in the hypervisor at the point of application is a bad
livepatch, not a vulnerability in livepatching.

> I agree that as long as the patch can be applied after "xl pause", then
> the domain cannot be said to be preventing the application of the
> livepatch.  But if either 'xl pause' doesn't work, or if livepatching
> fails due to a malicious domain's actions after 'xl pause' (or 'xl
> destroy'), then it should be treated as a security issue.

I broadly agree, but these bugs feel like they would be self-standing,
perhaps with an impact to applying a livepatch, rather than XSAs in
livepatching itself.

~Andrew
Andrew Cooper June 26, 2017, 5:30 p.m. UTC | #8
On 26/06/17 18:00, George Dunlap wrote:
> On 26/06/17 16:36, Ross Lagerwall wrote:
>> Xen Live Patching has been available as tech preview feature since Xen
>> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
>> has been used by multiple vendors to fix several real-world security
>> issues without any severe bugs encountered. Additionally, there are now
>> tests in OSSTest that test live patching to ensure that no regressions
>> are introduced.
>>
>> Based on the amount of testing and usage it has had, we are ready to
>> declare live patching as a 'Supported' feature.
> Great write-up, Ross, thanks.  I more or less agree with everything
> except...
>
>> * Bugs in livepatch-build-tools creating incorrect live patch that
>>   results in an insecure host:
>>     If livepatch-build-tools creates an incorrect live patch that
>>     results in an insecure host, this shall not be considered a security
>>     issue. There are too many OSes and toolchains to consider supporting
>>     this. A live patch should be checked to verify that it is valid
>>     before loading.
> I'm not sure I follow the argument here.  Suppose in one months' time it
> is discovered that livepatch-build-tools, under some circumstances,
> creates patches that open up a side vulnerability.  Do you really think
> we should just post a fix to the mailing list, without alerting anybody
> who may be affected by it?

There are a million ways this could happen, starting from the simple
cases of accidentally building the livepatch from a non-clean working
tree, or accidentally using a compiler other than the one used to build
the running hypervisor.

We absolutely cannot be in the position of issuing XSAs for situations
like this, because there are too many ways where it definitely will go
wrong, and we'd end up issuing XSAs saying "remember to clean your
working tree before building a livepatch".  This is of course absurd.

IMO, The only viable option is to exclude livepatch-build-tools entirely
from security scope.  It is already the case that people producing
livepatches need to check the resulting livepatch binary for sanity, and
test it suitably in a development environment before use in production.

~Andrew
Julien Grall June 26, 2017, 6:29 p.m. UTC | #9
Hi,

On 06/26/2017 04:36 PM, Ross Lagerwall wrote:
> Xen Live Patching has been available as tech preview feature since Xen
> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
> has been used by multiple vendors to fix several real-world security
> issues without any severe bugs encountered. Additionally, there are now
> tests in OSSTest that test live patching to ensure that no regressions
> are introduced.
> 
> Based on the amount of testing and usage it has had, we are ready to
> declare live patching as a 'Supported' feature.

There are only test for x86 and amd64. We likely want to have those test 
enabled for all architectures by default.

Also, I am not aware of anyone using in production livepatch on ARM64 
and ARM32. So did anyone give a good kick at the ARM implementation?

If not, then we should  do it before even considering as a supported 
feature for ARM.

Cheers,
Konrad Rzeszutek Wilk June 26, 2017, 9:07 p.m. UTC | #10
On Mon, Jun 26, 2017 at 07:29:22PM +0100, Julien Grall wrote:
> Hi,
> 
> On 06/26/2017 04:36 PM, Ross Lagerwall wrote:
> > Xen Live Patching has been available as tech preview feature since Xen
> > 4.7 and has now had a couple of releases to stabilize. Xen Live patching
> > has been used by multiple vendors to fix several real-world security
> > issues without any severe bugs encountered. Additionally, there are now
> > tests in OSSTest that test live patching to ensure that no regressions
> > are introduced.
> > 
> > Based on the amount of testing and usage it has had, we are ready to
> > declare live patching as a 'Supported' feature.
> 
> There are only test for x86 and amd64. We likely want to have those test

The test-cases are also for ARM32.

> enabled for all architectures by default.

And the OSSTest can test all of those.
> 
> Also, I am not aware of anyone using in production livepatch on ARM64 and
> ARM32. So did anyone give a good kick at the ARM implementation?

I am not aware of anybody using it on production on ARM32 or ARM64.

The test-cases are there, the code is there, but yes nobody has kicked
the tires on ARM32/ARM64 extensively with it. I would be excited to
see vendors that use it and their reports but I am not aware of any.

> 
> If not, then we should  do it before even considering as a supported feature
> for ARM.

OK. Perhaps then only for x86 until ARM operational users pipe up?

> 
> Cheers,
> 
> -- 
> Julien Grall
Jan Beulich June 27, 2017, 6:04 a.m. UTC | #11
>>> Andrew Cooper <andrew.cooper3@citrix.com> 06/26/17 6:40 PM >>>
>> --- a/xen/common/Kconfig
>> +++ b/xen/common/Kconfig
>> @@ -226,7 +226,7 @@ config CRYPTO
>>  	bool
>>  
>>  config LIVEPATCH
>> -	bool "Live patching support (TECH PREVIEW)"
>> +	bool "Live patching support"
>>  	default n
>
>This default should flip as well.

For unstable maybe. But please not for 4.9 - people shouldn't be caught by
surprise after 9 RCs that a config option's default changes. Furthermore, if
we default it to y, will there be much of a difference to simply removing the
config option?

Jan
Julien Grall June 27, 2017, 7:19 a.m. UTC | #12
Hi Jan,

On 06/27/2017 07:04 AM, Jan Beulich wrote:
>>>> Andrew Cooper <andrew.cooper3@citrix.com> 06/26/17 6:40 PM >>>
>>> --- a/xen/common/Kconfig
>>> +++ b/xen/common/Kconfig
>>> @@ -226,7 +226,7 @@ config CRYPTO
>>>   	bool
>>>   
>>>   config LIVEPATCH
>>> -	bool "Live patching support (TECH PREVIEW)"
>>> +	bool "Live patching support"
>>>   	default n
>>
>> This default should flip as well.
> 
> For unstable maybe. But please not for 4.9 - people shouldn't be caught by
> surprise after 9 RCs that a config option's default changes. Furthermore, if
> we default it to y, will there be much of a difference to simply removing the
> config option?

I think we should allow user to disable livepatch at build time. I have 
mind any people looking at using Xen in constraint environment and 
require some certifications. They will likely want to disable some part 
of the Xen.

Cheers,
Julien Grall June 27, 2017, 7:24 a.m. UTC | #13
On 06/26/2017 10:07 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 26, 2017 at 07:29:22PM +0100, Julien Grall wrote:
>> Hi,
>>
>> On 06/26/2017 04:36 PM, Ross Lagerwall wrote:
>>> Xen Live Patching has been available as tech preview feature since Xen
>>> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
>>> has been used by multiple vendors to fix several real-world security
>>> issues without any severe bugs encountered. Additionally, there are now
>>> tests in OSSTest that test live patching to ensure that no regressions
>>> are introduced.
>>>
>>> Based on the amount of testing and usage it has had, we are ready to
>>> declare live patching as a 'Supported' feature.
>>
>> There are only test for x86 and amd64. We likely want to have those test
> 
> The test-cases are also for ARM32.
> 
>> enabled for all architectures by default.
> 
> And the OSSTest can test all of those.

Can we enable them by default? I know that we limited the number of 
tests for ARM64 due to limited bandwidth. But I don't think we have 
anything preventing it on ARM32.

>>
>> Also, I am not aware of anyone using in production livepatch on ARM64 and
>> ARM32. So did anyone give a good kick at the ARM implementaton?
> 
> I am not aware of anybody using it on production on ARM32 or ARM64.
> 
> The test-cases are there, the code is there, but yes nobody has kicked
> the tires on ARM32/ARM64 extensively with it. I would be excited to
> see vendors that use it and their reports but I am not aware of any.
> 
>>
>> If not, then we should  do it before even considering as a supported feature
>> for ARM.
> 
> OK. Perhaps then only for x86 until ARM operational users pipe up?

That would be my preference. My main concern is to handle security issue 
afterwards because we didn't give any kick at the code.

Cheers,
Lars Kurth June 27, 2017, 8:09 a.m. UTC | #14
Hi all,

there was also a discussion on IRC, which Ian said we should formally
summarise in e-mail, just so there is no doubt. So here is my go at it. As
far as I can tell - besides the technical discussion in this thread, there
are several issues which need to be clarified:

* For Xen 4.9 we can declare live patching supported, without spinning
another RC to update the in-tree documentation: in other words, we would
apply the documentation/policy changes + to the 4.9 tree sometimes after
this discussion has been concluded. In effect this means that
docs/features/livepatching.pandoc (or similar) and associated changes to
KCONFIG options would not show up until Xen 4.9.1 is spun, but the
security team would treat live patching as supported for 4.9. In other
words for now, we can update the table in the wiki
(https://wiki.xenproject.org/wiki/Xen_Project_Release_Features) and live
with in-tree artefacts being out-of-sync with the support status for a few
months. We need to fix this anyway in-tree and there is a concrete
proposal which should be discussed at the summit.

* There was a proposal to declare live patching supported for older
releases (aka "back port" docs/features/livepatching.pandoc), but Royger
pointed out that the toolstack in question needs to support buildid. If
so, we should include back-porting requests and d

* Julien pointed out that maybe we shouldn't declare live patching as
supported for ARM32/64. I don't see an issue to declare it supported for
x86/amd64 only for now. But it is obviously up to committers to make that
call.

I think that covers the ghist of the IRC discussion

Regards
Lars

On 27/06/2017, 08:24, "Julien Grall" <julien.grall@arm.com> wrote:

>

>

>On 06/26/2017 10:07 PM, Konrad Rzeszutek Wilk wrote:

>> On Mon, Jun 26, 2017 at 07:29:22PM +0100, Julien Grall wrote:

>>> Hi,

>>>

>>> On 06/26/2017 04:36 PM, Ross Lagerwall wrote:

>>>> Xen Live Patching has been available as tech preview feature since Xen

>>>> 4.7 and has now had a couple of releases to stabilize. Xen Live

>>>>patching

>>>> has been used by multiple vendors to fix several real-world security

>>>> issues without any severe bugs encountered. Additionally, there are

>>>>now

>>>> tests in OSSTest that test live patching to ensure that no regressions

>>>> are introduced.

>>>>

>>>> Based on the amount of testing and usage it has had, we are ready to

>>>> declare live patching as a 'Supported' feature.

>>>

>>> There are only test for x86 and amd64. We likely want to have those

>>>test

>> 

>> The test-cases are also for ARM32.

>> 

>>> enabled for all architectures by default.

>> 

>> And the OSSTest can test all of those.

>

>Can we enable them by default? I know that we limited the number of

>tests for ARM64 due to limited bandwidth. But I don't think we have

>anything preventing it on ARM32.

>

>>>

>>> Also, I am not aware of anyone using in production livepatch on ARM64

>>>and

>>> ARM32. So did anyone give a good kick at the ARM implementaton?

>> 

>> I am not aware of anybody using it on production on ARM32 or ARM64.

>> 

>> The test-cases are there, the code is there, but yes nobody has kicked

>> the tires on ARM32/ARM64 extensively with it. I would be excited to

>> see vendors that use it and their reports but I am not aware of any.

>> 

>>>

>>> If not, then we should  do it before even considering as a supported

>>>feature

>>> for ARM.

>> 

>> OK. Perhaps then only for x86 until ARM operational users pipe up?

>

>That would be my preference. My main concern is to handle security issue

>afterwards because we didn't give any kick at the code.

>

>Cheers,

>

>-- 

>Julien Grall
George Dunlap June 27, 2017, 8:37 a.m. UTC | #15
On 26/06/17 18:18, Andrew Cooper wrote:
> On 26/06/17 17:50, George Dunlap wrote:
>> On 26/06/17 17:39, Andrew Cooper wrote:
>>>> * Bugs which allow a guest to prevent the application of a livepatch:
>>>>     A guest should not be able to prevent the application of a live
>>>>     patch. If an unprivileged guest can prevent the application of a
>>>>     live patch, it shall be treated as a security issue.
>>> This one is harder to say.  We know that enough concurrent live
>>> migrations can, which extends to "lots of activity in the guest".  Its
>>> perhaps worth noting the potential workaround of `xl pause $DOM;
>>> xen-livepatch ...; xl unpause`.
>> And what if the guest can prevent itself from being paused?
> 
> In which case, that is an XSA in its own right.
> 
> The underlying implementation uses XEN_DOMCTL_{,un}pausedomain which
> call straight into domain_{un,}pause().  We have very big problems if
> the guest has any influence in this...
> 
>>
>> Or, what if the guest can trigger some other persistent state change
>> such that livepatching will fail even if the domain is paused (or
>> destroyed)?
> 
> Such as?
> 
> The guest being able to cause damaging mutative state change in Xen is
> clearly a security issue, irrespective of any livepatch involvement.
> 
> However, livepatch content (hook function for example) which trips over
> state as found in the hypervisor at the point of application is a bad
> livepatch, not a vulnerability in livepatching.
> 
>> I agree that as long as the patch can be applied after "xl pause", then
>> the domain cannot be said to be preventing the application of the
>> livepatch.  But if either 'xl pause' doesn't work, or if livepatching
>> fails due to a malicious domain's actions after 'xl pause' (or 'xl
>> destroy'), then it should be treated as a security issue.
> 
> I broadly agree, but these bugs feel like they would be self-standing,
> perhaps with an impact to applying a livepatch, rather than XSAs in
> livepatching itself.

So let me get this right.

You think that all possible cases in which a guest can persistently
prevent a livepatch from being applied would already be a security issue
for other reasons.

Therefore, you think we should include a paragraph in our security
support statement specifically stating that we do not provide security
support if the guest can prevent a livepatch.

Is that correct?

 -George
George Dunlap June 27, 2017, 9:17 a.m. UTC | #16
On 26/06/17 18:30, Andrew Cooper wrote:
> On 26/06/17 18:00, George Dunlap wrote:
>> On 26/06/17 16:36, Ross Lagerwall wrote:
>>> Xen Live Patching has been available as tech preview feature since Xen
>>> 4.7 and has now had a couple of releases to stabilize. Xen Live patching
>>> has been used by multiple vendors to fix several real-world security
>>> issues without any severe bugs encountered. Additionally, there are now
>>> tests in OSSTest that test live patching to ensure that no regressions
>>> are introduced.
>>>
>>> Based on the amount of testing and usage it has had, we are ready to
>>> declare live patching as a 'Supported' feature.
>> Great write-up, Ross, thanks.  I more or less agree with everything
>> except...
>>
>>> * Bugs in livepatch-build-tools creating incorrect live patch that
>>>   results in an insecure host:
>>>     If livepatch-build-tools creates an incorrect live patch that
>>>     results in an insecure host, this shall not be considered a security
>>>     issue. There are too many OSes and toolchains to consider supporting
>>>     this. A live patch should be checked to verify that it is valid
>>>     before loading.
>> I'm not sure I follow the argument here.  Suppose in one months' time it
>> is discovered that livepatch-build-tools, under some circumstances,
>> creates patches that open up a side vulnerability.  Do you really think
>> we should just post a fix to the mailing list, without alerting anybody
>> who may be affected by it?
> 
> There are a million ways this could happen, starting from the simple
> cases of accidentally building the livepatch from a non-clean working
> tree, or accidentally using a compiler other than the one used to build
> the running hypervisor.
> 
> We absolutely cannot be in the position of issuing XSAs for situations
> like this, because there are too many ways where it definitely will go
> wrong, and we'd end up issuing XSAs saying "remember to clean your
> working tree before building a livepatch".  This is of course absurd.

Your argument is that because we do not issue XSAs for *user mistakes*,
that therefore we should not issue XSAs for *bugs in the tool*.

That is of course absurd.  We do not issue XSAs for user mistakes in
building the hypervisor either (for instance, switching gcc versions
without cleaning the hypervisor tree), and yet we still issue XSAs for
bugs in the hypervisor itself.

> IMO, The only viable option is to exclude livepatch-build-tools entirely
> from security scope.  It is already the case that people producing
> livepatches need to check the resulting livepatch binary for sanity, and
> test it suitably in a development environment before use in production.

Look, it sounds like right now you are going through all the livepatches
with a fine-tooth comb *because* the tools are (or recently have been)
unreliable.  But at some point in the future, the patch generation
mechanism will become more reliable.  After 20 XSAs over six months in
which the livepatch tool created the correct patch, you will become more
complacent.  You won't look as closely; it's human nature.

You seem to be simply refusing to use your imagination.  Step back.
Imagine yourself in one year.  You come to the office and find an e-mail
on security@ which says, "Livepatch tools open a security hole when
compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
gcc x.yy, so you take a closer look at that livepatch, only to discover
that the livepatches generated actually do contain the bug, but you
missed it because ${LATEST-[0,1]} were perfectly fine (since they used
newer versions of gcc), the difference was subtle, and it passed all the
functional tests.

Now all of the customers that have applied those patches are vulnerable.

Do you:

1. Tell the reporter to post it publicly to xen-devel immediately, since
livepatch tools are not security supported -- thus "zero-day"-ing all
your customers (as well as anyone else who happens to have used x.yy to
build a hypervisor)?

2. Secretly take advantage of Citrix' privileged position on the
security list, and try to get an update out to your customers before it
gets announced (but allowing everyone *else* using gcc x.yy to
experience a zero-day)?

3. Issue an XSA so that everyone has the opportunity to fix things up
before making a public announcement, and so that anyone not on the
embargo list gets an alert, so they know to either update their own
livepatches, or look for updates from their software provider?

I think #3 is the only possible choice.

 -George
Ian Jackson June 27, 2017, 10:46 a.m. UTC | #17
Julien Grall writes ("Re: [PATCH for-4.9] livepatch: Declare live patching as a supported feature"):
> On 06/26/2017 10:07 PM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jun 26, 2017 at 07:29:22PM +0100, Julien Grall wrote:
> >> enabled for all architectures by default.
> > 
> > And the OSSTest can test all of those.
> 
> Can we enable them by default? I know that we limited the number of 
> tests for ARM64 due to limited bandwidth. But I don't think we have 
> anything preventing it on ARM32.

We're slightly short of armhf bandwidth too and are about to become
more so, with the expansion of the facility (and our difficulty
getting more armhf hardware).

> > OK. Perhaps then only for x86 until ARM operational users pipe up?
> 
> That would be my preference. My main concern is to handle security issue 
> afterwards because we didn't give any kick at the code.

SGTM

Ian.
Andrew Cooper June 27, 2017, 10:47 a.m. UTC | #18
On 27/06/17 09:37, George Dunlap wrote:
> On 26/06/17 18:18, Andrew Cooper wrote:
>> On 26/06/17 17:50, George Dunlap wrote:
>>> On 26/06/17 17:39, Andrew Cooper wrote:
>>>>> * Bugs which allow a guest to prevent the application of a livepatch:
>>>>>     A guest should not be able to prevent the application of a live
>>>>>     patch. If an unprivileged guest can prevent the application of a
>>>>>     live patch, it shall be treated as a security issue.
>>>> This one is harder to say.  We know that enough concurrent live
>>>> migrations can, which extends to "lots of activity in the guest".  Its
>>>> perhaps worth noting the potential workaround of `xl pause $DOM;
>>>> xen-livepatch ...; xl unpause`.
>>> And what if the guest can prevent itself from being paused?
>> In which case, that is an XSA in its own right.
>>
>> The underlying implementation uses XEN_DOMCTL_{,un}pausedomain which
>> call straight into domain_{un,}pause().  We have very big problems if
>> the guest has any influence in this...
>>
>>> Or, what if the guest can trigger some other persistent state change
>>> such that livepatching will fail even if the domain is paused (or
>>> destroyed)?
>> Such as?
>>
>> The guest being able to cause damaging mutative state change in Xen is
>> clearly a security issue, irrespective of any livepatch involvement.
>>
>> However, livepatch content (hook function for example) which trips over
>> state as found in the hypervisor at the point of application is a bad
>> livepatch, not a vulnerability in livepatching.
>>
>>> I agree that as long as the patch can be applied after "xl pause", then
>>> the domain cannot be said to be preventing the application of the
>>> livepatch.  But if either 'xl pause' doesn't work, or if livepatching
>>> fails due to a malicious domain's actions after 'xl pause' (or 'xl
>>> destroy'), then it should be treated as a security issue.
>> I broadly agree, but these bugs feel like they would be self-standing,
>> perhaps with an impact to applying a livepatch, rather than XSAs in
>> livepatching itself.
> So let me get this right.
>
> You think that all possible cases in which a guest can persistently
> prevent a livepatch from being applied would already be a security issue
> for other reasons.

Yes.  The only possible ways a guest (potentially) has of preventing
livepatching from functioning (in the case that it is paused) is by
mechanisms such as causing memory corruption, mis-refcounting or by
having already achieved code injection.

> Therefore, you think we should include a paragraph in our security
> support statement specifically stating that we do not provide security
> support if the guest can prevent a livepatch.
>
> Is that correct?

Incorrect.  I don't think it is worth mentioning at all.

By calling it out, you are adding confusion to the area, as it is
redundant with the rest of our policy.

~Andrew
Ian Jackson June 27, 2017, 10:49 a.m. UTC | #19
Lars Kurth writes ("Re: [PATCH for-4.9] livepatch: Declare live patching as a supported feature"):
> * For Xen 4.9 we can declare live patching supported, without spinning
> another RC to update the in-tree documentation: in other words, we would
> apply the documentation/policy changes + to the 4.9 tree sometimes after
> this discussion has been concluded. In effect this means that
> docs/features/livepatching.pandoc (or similar) and associated changes to
> KCONFIG options would not show up until Xen 4.9.1 is spun,

I think that the effective version of these documents is not the one
in the most recent release, but the one in (one of the) git branches.
In this case, I think xenbits:xen.git#stable-4.9.

(This is an essential way to look at it because we have releases which
are still in security support, and might need updates to their support
status, but which are out of functional support and do not get
releases of updates.)

> * There was a proposal to declare live patching supported for older
> releases (aka "back port" docs/features/livepatching.pandoc), but Royger
> pointed out that the toolstack in question needs to support buildid. If
> so, we should include back-porting requests and d

("and do it then" or something?)  Yes.

Ian.
George Dunlap June 27, 2017, 10:49 a.m. UTC | #20
On 27/06/17 11:47, Andrew Cooper wrote:
> On 27/06/17 09:37, George Dunlap wrote:
>> On 26/06/17 18:18, Andrew Cooper wrote:
>>> On 26/06/17 17:50, George Dunlap wrote:
>>>> On 26/06/17 17:39, Andrew Cooper wrote:
>>>>>> * Bugs which allow a guest to prevent the application of a livepatch:
>>>>>>     A guest should not be able to prevent the application of a live
>>>>>>     patch. If an unprivileged guest can prevent the application of a
>>>>>>     live patch, it shall be treated as a security issue.
>>>>> This one is harder to say.  We know that enough concurrent live
>>>>> migrations can, which extends to "lots of activity in the guest".  Its
>>>>> perhaps worth noting the potential workaround of `xl pause $DOM;
>>>>> xen-livepatch ...; xl unpause`.
>>>> And what if the guest can prevent itself from being paused?
>>> In which case, that is an XSA in its own right.
>>>
>>> The underlying implementation uses XEN_DOMCTL_{,un}pausedomain which
>>> call straight into domain_{un,}pause().  We have very big problems if
>>> the guest has any influence in this...
>>>
>>>> Or, what if the guest can trigger some other persistent state change
>>>> such that livepatching will fail even if the domain is paused (or
>>>> destroyed)?
>>> Such as?
>>>
>>> The guest being able to cause damaging mutative state change in Xen is
>>> clearly a security issue, irrespective of any livepatch involvement.
>>>
>>> However, livepatch content (hook function for example) which trips over
>>> state as found in the hypervisor at the point of application is a bad
>>> livepatch, not a vulnerability in livepatching.
>>>
>>>> I agree that as long as the patch can be applied after "xl pause", then
>>>> the domain cannot be said to be preventing the application of the
>>>> livepatch.  But if either 'xl pause' doesn't work, or if livepatching
>>>> fails due to a malicious domain's actions after 'xl pause' (or 'xl
>>>> destroy'), then it should be treated as a security issue.
>>> I broadly agree, but these bugs feel like they would be self-standing,
>>> perhaps with an impact to applying a livepatch, rather than XSAs in
>>> livepatching itself.
>> So let me get this right.
>>
>> You think that all possible cases in which a guest can persistently
>> prevent a livepatch from being applied would already be a security issue
>> for other reasons.
> 
> Yes.  The only possible ways a guest (potentially) has of preventing
> livepatching from functioning (in the case that it is paused) is by
> mechanisms such as causing memory corruption, mis-refcounting or by
> having already achieved code injection.
> 
>> Therefore, you think we should include a paragraph in our security
>> support statement specifically stating that we do not provide security
>> support if the guest can prevent a livepatch.
>>
>> Is that correct?
> 
> Incorrect.  I don't think it is worth mentioning at all.
> 
> By calling it out, you are adding confusion to the area, as it is
> redundant with the rest of our policy.

So we shouldn't tell people we'll issue an XSA, because they should
already be able to infer from other things in our policy that an XSA
will be issued?  And if we clearly state that we'll issue an XSA in
those circumstances, they'll be confused?

 -George
Lars Kurth June 27, 2017, 10:59 a.m. UTC | #21
On 27/06/2017, 11:49, "Ian Jackson" <ian.jackson@eu.citrix.com> wrote:


>> * There was a proposal to declare live patching supported for older

>> releases (aka "back port" docs/features/livepatching.pandoc), but Royger

>> pointed out that the toolstack in question needs to support buildid. If

>> so, we should include back-porting requests and d

>

>("and do it then" or something?)  Yes.


Correct: "and do it then" ... Must have deleted the text when hovering
over it. 
Lars
Jan Beulich June 27, 2017, 11:23 a.m. UTC | #22
>>> Julien Grall <julien.grall@arm.com> 06/27/17 9:20 AM >>>
>On 06/27/2017 07:04 AM, Jan Beulich wrote:
>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 06/26/17 6:40 PM >>>
>>>> --- a/xen/common/Kconfig
>>>> +++ b/xen/common/Kconfig
>>>> @@ -226,7 +226,7 @@ config CRYPTO
>>>>   	bool
>>>>   
>>>>   config LIVEPATCH
>>>> -	bool "Live patching support (TECH PREVIEW)"
>>>> +	bool "Live patching support"
>>>>   	default n
>>>
>>> This default should flip as well.
>> 
>> For unstable maybe. But please not for 4.9 - people shouldn't be caught by
>> surprise after 9 RCs that a config option's default changes. Furthermore, if
>> we default it to y, will there be much of a difference to simply removing the
>> config option?
>
>I think we should allow user to disable livepatch at build time. I have 
>mind any people looking at using Xen in constraint environment and 
>require some certifications. They will likely want to disable some part 
>of the Xen.

The at least I'd like to see it gain an EXPERT dependency. For now our
overall goal still is to limit the number of possible different configurations.

Jan
George Dunlap June 27, 2017, 11:34 a.m. UTC | #23
On Tue, Jun 27, 2017 at 7:04 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>> Andrew Cooper <andrew.cooper3@citrix.com> 06/26/17 6:40 PM >>>
>>> --- a/xen/common/Kconfig
>>> +++ b/xen/common/Kconfig
>>> @@ -226,7 +226,7 @@ config CRYPTO
>>>      bool
>>>
>>>  config LIVEPATCH
>>> -    bool "Live patching support (TECH PREVIEW)"
>>> +    bool "Live patching support"
>>>      default n
>>
>>This default should flip as well.
>
> For unstable maybe. But please not for 4.9 - people shouldn't be caught by
> surprise after 9 RCs that a config option's default changes.

Leaving it the same as the RCs makes sense to me.

 -George
Ross Lagerwall June 28, 2017, 4:18 p.m. UTC | #24
On 06/27/2017 10:17 AM, George Dunlap wrote:
> On 26/06/17 18:30, Andrew Cooper wrote:
>> On 26/06/17 18:00, George Dunlap wrote:
>>> On 26/06/17 16:36, Ross Lagerwall wrote:
...
>>
>> We absolutely cannot be in the position of issuing XSAs for situations
>> like this, because there are too many ways where it definitely will go
>> wrong, and we'd end up issuing XSAs saying "remember to clean your
>> working tree before building a livepatch".  This is of course absurd.
> 
> Your argument is that because we do not issue XSAs for *user mistakes*,
> that therefore we should not issue XSAs for *bugs in the tool*.
> 
> That is of course absurd.  We do not issue XSAs for user mistakes in
> building the hypervisor either (for instance, switching gcc versions
> without cleaning the hypervisor tree), and yet we still issue XSAs for
> bugs in the hypervisor itself.
> 
>> IMO, The only viable option is to exclude livepatch-build-tools entirely
>> from security scope.  It is already the case that people producing
>> livepatches need to check the resulting livepatch binary for sanity, and
>> test it suitably in a development environment before use in production.
> 
> Look, it sounds like right now you are going through all the livepatches
> with a fine-tooth comb *because* the tools are (or recently have been)
> unreliable.  But at some point in the future, the patch generation
> mechanism will become more reliable.  After 20 XSAs over six months in
> which the livepatch tool created the correct patch, you will become more
> complacent.  You won't look as closely; it's human nature.
> 
> You seem to be simply refusing to use your imagination.  Step back.
> Imagine yourself in one year.  You come to the office and find an e-mail
> on security@ which says, "Livepatch tools open a security hole when
> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
> gcc x.yy, so you take a closer look at that livepatch, only to discover
> that the livepatches generated actually do contain the bug, but you
> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
> newer versions of gcc), the difference was subtle, and it passed all the
> functional tests.
> 
> Now all of the customers that have applied those patches are vulnerable.
> 
> Do you:
> 
> 1. Tell the reporter to post it publicly to xen-devel immediately, since
> livepatch tools are not security supported -- thus "zero-day"-ing all
> your customers (as well as anyone else who happens to have used x.yy to
> build a hypervisor)?
> 
> 2. Secretly take advantage of Citrix' privileged position on the
> security list, and try to get an update out to your customers before it
> gets announced (but allowing everyone *else* using gcc x.yy to
> experience a zero-day)?
> 
> 3. Issue an XSA so that everyone has the opportunity to fix things up
> before making a public announcement, and so that anyone not on the
> embargo list gets an alert, so they know to either update their own
> livepatches, or look for updates from their software provider?
> 
> I think #3 is the only possible choice.
> 
>   -George
> 

The issue here is that any bug in livepatch-build-tools which still 
results in output being generated would be a security issue, because 
someone might have used it to patch a security issue. 
livepatch-build-tools is certainly not stable enough yet (ever?) to be 
treated in this fashion.
Konrad Rzeszutek Wilk June 28, 2017, 4:41 p.m. UTC | #25
. snip..
> > Now all of the customers that have applied those patches are vulnerable.
> > 
> > Do you:
> > 
> > 1. Tell the reporter to post it publicly to xen-devel immediately, since
> > livepatch tools are not security supported -- thus "zero-day"-ing all
> > your customers (as well as anyone else who happens to have used x.yy to
> > build a hypervisor)?
> > 
> > 2. Secretly take advantage of Citrix' privileged position on the
> > security list, and try to get an update out to your customers before it
> > gets announced (but allowing everyone *else* using gcc x.yy to
> > experience a zero-day)?
> > 
> > 3. Issue an XSA so that everyone has the opportunity to fix things up
> > before making a public announcement, and so that anyone not on the
> > embargo list gets an alert, so they know to either update their own
> > livepatches, or look for updates from their software provider?
> > 
> > I think #3 is the only possible choice.
> > 
> >   -George
> > 
> 
> The issue here is that any bug in livepatch-build-tools which still results
> in output being generated would be a security issue, because someone might
> have used it to patch a security issue. livepatch-build-tools is certainly
> not stable enough yet (ever?) to be treated in this fashion.
> 

To add a bit. The livepatch-build-tools does not have to be used to 
create the livepatches. One can use other tools to create the livepatches
(like for example the test-cases).

And there is a nice design http://xenbits.xen.org/docs/unstable/misc/livepatch.html
(see "Design of payload format") which describes what the format of this livepatch
MUST be.

> -- 
> Ross Lagerwall
George Dunlap June 30, 2017, 1:42 p.m. UTC | #26
On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
> On 06/27/2017 10:17 AM, George Dunlap wrote:
>> On 26/06/17 18:30, Andrew Cooper wrote:
>>> On 26/06/17 18:00, George Dunlap wrote:
>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
> ...
>>>
>>> We absolutely cannot be in the position of issuing XSAs for situations
>>> like this, because there are too many ways where it definitely will go
>>> wrong, and we'd end up issuing XSAs saying "remember to clean your
>>> working tree before building a livepatch".  This is of course absurd.
>>
>> Your argument is that because we do not issue XSAs for *user mistakes*,
>> that therefore we should not issue XSAs for *bugs in the tool*.
>>
>> That is of course absurd.  We do not issue XSAs for user mistakes in
>> building the hypervisor either (for instance, switching gcc versions
>> without cleaning the hypervisor tree), and yet we still issue XSAs for
>> bugs in the hypervisor itself.
>>
>>> IMO, The only viable option is to exclude livepatch-build-tools entirely
>>> from security scope.  It is already the case that people producing
>>> livepatches need to check the resulting livepatch binary for sanity, and
>>> test it suitably in a development environment before use in production.
>>
>> Look, it sounds like right now you are going through all the livepatches
>> with a fine-tooth comb *because* the tools are (or recently have been)
>> unreliable.  But at some point in the future, the patch generation
>> mechanism will become more reliable.  After 20 XSAs over six months in
>> which the livepatch tool created the correct patch, you will become more
>> complacent.  You won't look as closely; it's human nature.
>>
>> You seem to be simply refusing to use your imagination.  Step back.
>> Imagine yourself in one year.  You come to the office and find an e-mail
>> on security@ which says, "Livepatch tools open a security hole when
>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
>> gcc x.yy, so you take a closer look at that livepatch, only to discover
>> that the livepatches generated actually do contain the bug, but you
>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
>> newer versions of gcc), the difference was subtle, and it passed all the
>> functional tests.
>>
>> Now all of the customers that have applied those patches are vulnerable.
>>
>> Do you:
>>
>> 1. Tell the reporter to post it publicly to xen-devel immediately, since
>> livepatch tools are not security supported -- thus "zero-day"-ing all
>> your customers (as well as anyone else who happens to have used x.yy to
>> build a hypervisor)?
>>
>> 2. Secretly take advantage of Citrix' privileged position on the
>> security list, and try to get an update out to your customers before it
>> gets announced (but allowing everyone *else* using gcc x.yy to
>> experience a zero-day)?
>>
>> 3. Issue an XSA so that everyone has the opportunity to fix things up
>> before making a public announcement, and so that anyone not on the
>> embargo list gets an alert, so they know to either update their own
>> livepatches, or look for updates from their software provider?
>>
>> I think #3 is the only possible choice.
>>
>>   -George
>>
> 
> The issue here is that any bug in livepatch-build-tools which still
> results in output being generated would be a security issue, because
> someone might have used it to patch a security issue.
> livepatch-build-tools is certainly not stable enough yet (ever?) to be
> treated in this fashion.

You didn't answer my question.  If the situation described happens, what
position do you want Andrew to be put in?  (If I missed a potential
action, let me know.)

 -George
Ross Lagerwall July 3, 2017, 2:53 p.m. UTC | #27
On 06/30/2017 02:42 PM, George Dunlap wrote:
> On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
>> On 06/27/2017 10:17 AM, George Dunlap wrote:
>>> On 26/06/17 18:30, Andrew Cooper wrote:
>>>> On 26/06/17 18:00, George Dunlap wrote:
>>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
>> ...
>>>>
>>>> We absolutely cannot be in the position of issuing XSAs for situations
>>>> like this, because there are too many ways where it definitely will go
>>>> wrong, and we'd end up issuing XSAs saying "remember to clean your
>>>> working tree before building a livepatch".  This is of course absurd.
>>>
>>> Your argument is that because we do not issue XSAs for *user mistakes*,
>>> that therefore we should not issue XSAs for *bugs in the tool*.
>>>
>>> That is of course absurd.  We do not issue XSAs for user mistakes in
>>> building the hypervisor either (for instance, switching gcc versions
>>> without cleaning the hypervisor tree), and yet we still issue XSAs for
>>> bugs in the hypervisor itself.
>>>
>>>> IMO, The only viable option is to exclude livepatch-build-tools entirely
>>>> from security scope.  It is already the case that people producing
>>>> livepatches need to check the resulting livepatch binary for sanity, and
>>>> test it suitably in a development environment before use in production.
>>>
>>> Look, it sounds like right now you are going through all the livepatches
>>> with a fine-tooth comb *because* the tools are (or recently have been)
>>> unreliable.  But at some point in the future, the patch generation
>>> mechanism will become more reliable.  After 20 XSAs over six months in
>>> which the livepatch tool created the correct patch, you will become more
>>> complacent.  You won't look as closely; it's human nature.
>>>
>>> You seem to be simply refusing to use your imagination.  Step back.
>>> Imagine yourself in one year.  You come to the office and find an e-mail
>>> on security@ which says, "Livepatch tools open a security hole when
>>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
>>> gcc x.yy, so you take a closer look at that livepatch, only to discover
>>> that the livepatches generated actually do contain the bug, but you
>>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
>>> newer versions of gcc), the difference was subtle, and it passed all the
>>> functional tests.
>>>
>>> Now all of the customers that have applied those patches are vulnerable.
>>>
>>> Do you:
>>>
>>> 1. Tell the reporter to post it publicly to xen-devel immediately, since
>>> livepatch tools are not security supported -- thus "zero-day"-ing all
>>> your customers (as well as anyone else who happens to have used x.yy to
>>> build a hypervisor)?
>>>
>>> 2. Secretly take advantage of Citrix' privileged position on the
>>> security list, and try to get an update out to your customers before it
>>> gets announced (but allowing everyone *else* using gcc x.yy to
>>> experience a zero-day)?
>>>
>>> 3. Issue an XSA so that everyone has the opportunity to fix things up
>>> before making a public announcement, and so that anyone not on the
>>> embargo list gets an alert, so they know to either update their own
>>> livepatches, or look for updates from their software provider?
>>>
>>> I think #3 is the only possible choice.
>>>
>>>    -George
>>>
>>
>> The issue here is that any bug in livepatch-build-tools which still
>> results in output being generated would be a security issue, because
>> someone might have used it to patch a security issue.
>> livepatch-build-tools is certainly not stable enough yet (ever?) to be
>> treated in this fashion.
> 
> You didn't answer my question.  If the situation described happens, what
> position do you want Andrew to be put in?  (If I missed a potential
> action, let me know.)
> 

I would choose #3 as it is the obvious choice. But I still don't think 
it is a sensible idea to have security support for the build tools, at 
least at this point. The same scenario could be posed for a nasty bug 
that affects Xen 4.4 only, but it is now just out of security support. 
IMO something being not supported doesn't preclude it from having an XSA 
released if there is a particularly nasty vulnerability found.
Roger Pau Monné July 4, 2017, 8:36 a.m. UTC | #28
On Mon, Jul 03, 2017 at 03:53:42PM +0100, Ross Lagerwall wrote:
> On 06/30/2017 02:42 PM, George Dunlap wrote:
> > On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
> > > On 06/27/2017 10:17 AM, George Dunlap wrote:
> > > > On 26/06/17 18:30, Andrew Cooper wrote:
> > > > > On 26/06/17 18:00, George Dunlap wrote:
> > > > > > On 26/06/17 16:36, Ross Lagerwall wrote:
> > > ...
> > > > > 
> > > > > We absolutely cannot be in the position of issuing XSAs for situations
> > > > > like this, because there are too many ways where it definitely will go
> > > > > wrong, and we'd end up issuing XSAs saying "remember to clean your
> > > > > working tree before building a livepatch".  This is of course absurd.
> > > > 
> > > > Your argument is that because we do not issue XSAs for *user mistakes*,
> > > > that therefore we should not issue XSAs for *bugs in the tool*.
> > > > 
> > > > That is of course absurd.  We do not issue XSAs for user mistakes in
> > > > building the hypervisor either (for instance, switching gcc versions
> > > > without cleaning the hypervisor tree), and yet we still issue XSAs for
> > > > bugs in the hypervisor itself.
> > > > 
> > > > > IMO, The only viable option is to exclude livepatch-build-tools entirely
> > > > > from security scope.  It is already the case that people producing
> > > > > livepatches need to check the resulting livepatch binary for sanity, and
> > > > > test it suitably in a development environment before use in production.
> > > > 
> > > > Look, it sounds like right now you are going through all the livepatches
> > > > with a fine-tooth comb *because* the tools are (or recently have been)
> > > > unreliable.  But at some point in the future, the patch generation
> > > > mechanism will become more reliable.  After 20 XSAs over six months in
> > > > which the livepatch tool created the correct patch, you will become more
> > > > complacent.  You won't look as closely; it's human nature.
> > > > 
> > > > You seem to be simply refusing to use your imagination.  Step back.
> > > > Imagine yourself in one year.  You come to the office and find an e-mail
> > > > on security@ which says, "Livepatch tools open a security hole when
> > > > compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
> > > > gcc x.yy, so you take a closer look at that livepatch, only to discover
> > > > that the livepatches generated actually do contain the bug, but you
> > > > missed it because ${LATEST-[0,1]} were perfectly fine (since they used
> > > > newer versions of gcc), the difference was subtle, and it passed all the
> > > > functional tests.
> > > > 
> > > > Now all of the customers that have applied those patches are vulnerable.
> > > > 
> > > > Do you:
> > > > 
> > > > 1. Tell the reporter to post it publicly to xen-devel immediately, since
> > > > livepatch tools are not security supported -- thus "zero-day"-ing all
> > > > your customers (as well as anyone else who happens to have used x.yy to
> > > > build a hypervisor)?
> > > > 
> > > > 2. Secretly take advantage of Citrix' privileged position on the
> > > > security list, and try to get an update out to your customers before it
> > > > gets announced (but allowing everyone *else* using gcc x.yy to
> > > > experience a zero-day)?
> > > > 
> > > > 3. Issue an XSA so that everyone has the opportunity to fix things up
> > > > before making a public announcement, and so that anyone not on the
> > > > embargo list gets an alert, so they know to either update their own
> > > > livepatches, or look for updates from their software provider?
> > > > 
> > > > I think #3 is the only possible choice.
> > > > 
> > > >    -George
> > > > 
> > > 
> > > The issue here is that any bug in livepatch-build-tools which still
> > > results in output being generated would be a security issue, because
> > > someone might have used it to patch a security issue.
> > > livepatch-build-tools is certainly not stable enough yet (ever?) to be
> > > treated in this fashion.
> > 
> > You didn't answer my question.  If the situation described happens, what
> > position do you want Andrew to be put in?  (If I missed a potential
> > action, let me know.)
> > 
> 
> I would choose #3 as it is the obvious choice. But I still don't think it is
> a sensible idea to have security support for the build tools, at least at
> this point. The same scenario could be posed for a nasty bug that affects
> Xen 4.4 only, but it is now just out of security support. IMO something
> being not supported doesn't preclude it from having an XSA released if there
> is a particularly nasty vulnerability found.

I think this is a grey area that we should try to avoid as much as
possible. What constitutes a XSA should be clearly defined, so that
there's no room for speculation or subjective decisions.

Following from the example above (and I'm really not doubting Andrew's
objective criteria here) but what might not seem as a relevant issue
to Andrew (or the security team) in general might be severe to other
parties IMHO, and since not everyone has a stake into what
constitutes a XSA the process would become unfair to them.

Roger.
George Dunlap Aug. 3, 2017, 5:20 p.m. UTC | #29
On 07/03/2017 03:53 PM, Ross Lagerwall wrote:
> On 06/30/2017 02:42 PM, George Dunlap wrote:
>> On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
>>> On 06/27/2017 10:17 AM, George Dunlap wrote:
>>>> On 26/06/17 18:30, Andrew Cooper wrote:
>>>>> On 26/06/17 18:00, George Dunlap wrote:
>>>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
>>> ...
>>>> You seem to be simply refusing to use your imagination.  Step back.
>>>> Imagine yourself in one year.  You come to the office and find an
>>>> e-mail
>>>> on security@ which says, "Livepatch tools open a security hole when
>>>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
>>>> gcc x.yy, so you take a closer look at that livepatch, only to discover
>>>> that the livepatches generated actually do contain the bug, but you
>>>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
>>>> newer versions of gcc), the difference was subtle, and it passed all
>>>> the
>>>> functional tests.
>>>>
>>>> Now all of the customers that have applied those patches are
>>>> vulnerable.
>>>>
>>>> Do you:
>>>>
>>>> 1. Tell the reporter to post it publicly to xen-devel immediately,
>>>> since
>>>> livepatch tools are not security supported -- thus "zero-day"-ing all
>>>> your customers (as well as anyone else who happens to have used x.yy to
>>>> build a hypervisor)?
>>>>
>>>> 2. Secretly take advantage of Citrix' privileged position on the
>>>> security list, and try to get an update out to your customers before it
>>>> gets announced (but allowing everyone *else* using gcc x.yy to
>>>> experience a zero-day)?
>>>>
>>>> 3. Issue an XSA so that everyone has the opportunity to fix things up
>>>> before making a public announcement, and so that anyone not on the
>>>> embargo list gets an alert, so they know to either update their own
>>>> livepatches, or look for updates from their software provider?
>>>>
>>>> I think #3 is the only possible choice.
>>>>
>>>>    -George
>>>>
>>>
>>> The issue here is that any bug in livepatch-build-tools which still
>>> results in output being generated would be a security issue, because
>>> someone might have used it to patch a security issue.
>>> livepatch-build-tools is certainly not stable enough yet (ever?) to be
>>> treated in this fashion.
>>
>> You didn't answer my question.  If the situation described happens, what
>> position do you want Andrew to be put in?  (If I missed a potential
>> action, let me know.)
>>
> 
> I would choose #3 as it is the obvious choice. But I still don't think
> it is a sensible idea to have security support for the build tools, at
> least at this point. The same scenario could be posed for a nasty bug
> that affects Xen 4.4 only, but it is now just out of security support.
> IMO something being not supported doesn't preclude it from having an XSA
> released if there is a particularly nasty vulnerability found.

Well basically I think we agree, but we're using different terms.  You
want to say, "This isn't security supported, but if important bug is
actually found then we'll issue an XSA".  I want to say, "This is
security supported, because if an important bug is actually found we'll
issue an XSA."

So it seems to me there are likely two things that make you resistant to
calling it "security supported":

1. The fear that we'll be issuing XSAs over trivial things that don't matter

2. The fear that people will not do due diligence when creating patches
with the tools.

I think #1 is just a misconception.  *Every* bug reported to us about
any part of the code we go through the process of trying to determine
its impact and whether we need to issue an XSA or not.  All of the
examples put forward of things we don't want to issue an XSA for are
things that I'm sure we would not issue an XSA for.

For #2, that is a reasonable fear, but we can deal with that in a
different way than calling the tools "unsupported".  We can, for
instance, mention that in the documents.  We can add a warning message
that the build tools output saying that the result should be manually
inspected for correctness.

 -George
George Dunlap Aug. 3, 2017, 5:21 p.m. UTC | #30
On 08/03/2017 06:20 PM, George Dunlap wrote:
> On 07/03/2017 03:53 PM, Ross Lagerwall wrote:
>> On 06/30/2017 02:42 PM, George Dunlap wrote:
>>> On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
>>>> On 06/27/2017 10:17 AM, George Dunlap wrote:
>>>>> On 26/06/17 18:30, Andrew Cooper wrote:
>>>>>> On 26/06/17 18:00, George Dunlap wrote:
>>>>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
>>>> ...
>>>>> You seem to be simply refusing to use your imagination.  Step back.
>>>>> Imagine yourself in one year.  You come to the office and find an
>>>>> e-mail
>>>>> on security@ which says, "Livepatch tools open a security hole when
>>>>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
>>>>> gcc x.yy, so you take a closer look at that livepatch, only to discover
>>>>> that the livepatches generated actually do contain the bug, but you
>>>>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
>>>>> newer versions of gcc), the difference was subtle, and it passed all
>>>>> the
>>>>> functional tests.
>>>>>
>>>>> Now all of the customers that have applied those patches are
>>>>> vulnerable.
>>>>>
>>>>> Do you:
>>>>>
>>>>> 1. Tell the reporter to post it publicly to xen-devel immediately,
>>>>> since
>>>>> livepatch tools are not security supported -- thus "zero-day"-ing all
>>>>> your customers (as well as anyone else who happens to have used x.yy to
>>>>> build a hypervisor)?
>>>>>
>>>>> 2. Secretly take advantage of Citrix' privileged position on the
>>>>> security list, and try to get an update out to your customers before it
>>>>> gets announced (but allowing everyone *else* using gcc x.yy to
>>>>> experience a zero-day)?
>>>>>
>>>>> 3. Issue an XSA so that everyone has the opportunity to fix things up
>>>>> before making a public announcement, and so that anyone not on the
>>>>> embargo list gets an alert, so they know to either update their own
>>>>> livepatches, or look for updates from their software provider?
>>>>>
>>>>> I think #3 is the only possible choice.
>>>>>
>>>>>    -George
>>>>>
>>>>
>>>> The issue here is that any bug in livepatch-build-tools which still
>>>> results in output being generated would be a security issue, because
>>>> someone might have used it to patch a security issue.
>>>> livepatch-build-tools is certainly not stable enough yet (ever?) to be
>>>> treated in this fashion.
>>>
>>> You didn't answer my question.  If the situation described happens, what
>>> position do you want Andrew to be put in?  (If I missed a potential
>>> action, let me know.)
>>>
>>
>> I would choose #3 as it is the obvious choice. But I still don't think
>> it is a sensible idea to have security support for the build tools, at
>> least at this point. The same scenario could be posed for a nasty bug
>> that affects Xen 4.4 only, but it is now just out of security support.
>> IMO something being not supported doesn't preclude it from having an XSA
>> released if there is a particularly nasty vulnerability found.
> 
> Well basically I think we agree, but we're using different terms.  You
> want to say, "This isn't security supported, but if important bug is
> actually found then we'll issue an XSA".  I want to say, "This is
> security supported, because if an important bug is actually found we'll
> issue an XSA."
> 
> So it seems to me there are likely two things that make you resistant to
> calling it "security supported":
> 
> 1. The fear that we'll be issuing XSAs over trivial things that don't matter
> 
> 2. The fear that people will not do due diligence when creating patches
> with the tools.
> 
> I think #1 is just a misconception.  *Every* bug reported to us about
> any part of the code we go through the process of trying to determine
> its impact and whether we need to issue an XSA or not.  All of the
> examples put forward of things we don't want to issue an XSA for are
> things that I'm sure we would not issue an XSA for.
> 
> For #2, that is a reasonable fear, but we can deal with that in a
> different way than calling the tools "unsupported".  We can, for
> instance, mention that in the documents.  We can add a warning message
> that the build tools output saying that the result should be manually
> inspected for correctness.

We need to get a resolution on this.  Anyone else (particarly
committers) want to give their opinion?

 -George
Konrad Rzeszutek Wilk Aug. 6, 2017, 12:07 a.m. UTC | #31
On Thu, Aug 03, 2017 at 06:21:30PM +0100, George Dunlap wrote:
> On 08/03/2017 06:20 PM, George Dunlap wrote:
> > On 07/03/2017 03:53 PM, Ross Lagerwall wrote:
> >> On 06/30/2017 02:42 PM, George Dunlap wrote:
> >>> On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
> >>>> On 06/27/2017 10:17 AM, George Dunlap wrote:
> >>>>> On 26/06/17 18:30, Andrew Cooper wrote:
> >>>>>> On 26/06/17 18:00, George Dunlap wrote:
> >>>>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
> >>>> ...
> >>>>> You seem to be simply refusing to use your imagination.  Step back.
> >>>>> Imagine yourself in one year.  You come to the office and find an
> >>>>> e-mail
> >>>>> on security@ which says, "Livepatch tools open a security hole when
> >>>>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
> >>>>> gcc x.yy, so you take a closer look at that livepatch, only to discover
> >>>>> that the livepatches generated actually do contain the bug, but you
> >>>>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
> >>>>> newer versions of gcc), the difference was subtle, and it passed all
> >>>>> the
> >>>>> functional tests.
> >>>>>
> >>>>> Now all of the customers that have applied those patches are
> >>>>> vulnerable.
> >>>>>
> >>>>> Do you:
> >>>>>
> >>>>> 1. Tell the reporter to post it publicly to xen-devel immediately,
> >>>>> since
> >>>>> livepatch tools are not security supported -- thus "zero-day"-ing all
> >>>>> your customers (as well as anyone else who happens to have used x.yy to
> >>>>> build a hypervisor)?
> >>>>>
> >>>>> 2. Secretly take advantage of Citrix' privileged position on the
> >>>>> security list, and try to get an update out to your customers before it
> >>>>> gets announced (but allowing everyone *else* using gcc x.yy to
> >>>>> experience a zero-day)?
> >>>>>
> >>>>> 3. Issue an XSA so that everyone has the opportunity to fix things up
> >>>>> before making a public announcement, and so that anyone not on the
> >>>>> embargo list gets an alert, so they know to either update their own
> >>>>> livepatches, or look for updates from their software provider?
> >>>>>
> >>>>> I think #3 is the only possible choice.
> >>>>>
> >>>>>    -George
> >>>>>
> >>>>
> >>>> The issue here is that any bug in livepatch-build-tools which still
> >>>> results in output being generated would be a security issue, because
> >>>> someone might have used it to patch a security issue.
> >>>> livepatch-build-tools is certainly not stable enough yet (ever?) to be
> >>>> treated in this fashion.
> >>>
> >>> You didn't answer my question.  If the situation described happens, what
> >>> position do you want Andrew to be put in?  (If I missed a potential
> >>> action, let me know.)
> >>>
> >>
> >> I would choose #3 as it is the obvious choice. But I still don't think
> >> it is a sensible idea to have security support for the build tools, at
> >> least at this point. The same scenario could be posed for a nasty bug
> >> that affects Xen 4.4 only, but it is now just out of security support.
> >> IMO something being not supported doesn't preclude it from having an XSA
> >> released if there is a particularly nasty vulnerability found.
> > 
> > Well basically I think we agree, but we're using different terms.  You
> > want to say, "This isn't security supported, but if important bug is
> > actually found then we'll issue an XSA".  I want to say, "This is
> > security supported, because if an important bug is actually found we'll
> > issue an XSA."
> > 
> > So it seems to me there are likely two things that make you resistant to
> > calling it "security supported":
> > 
> > 1. The fear that we'll be issuing XSAs over trivial things that don't matter
> > 
> > 2. The fear that people will not do due diligence when creating patches
> > with the tools.
> > 
> > I think #1 is just a misconception.  *Every* bug reported to us about
> > any part of the code we go through the process of trying to determine
> > its impact and whether we need to issue an XSA or not.  All of the
> > examples put forward of things we don't want to issue an XSA for are
> > things that I'm sure we would not issue an XSA for.
> > 
> > For #2, that is a reasonable fear, but we can deal with that in a
> > different way than calling the tools "unsupported".  We can, for
> > instance, mention that in the documents.  We can add a warning message
> > that the build tools output saying that the result should be manually
> > inspected for correctness.
> 
> We need to get a resolution on this.  Anyone else (particarly
> committers) want to give their opinion?

Changing title as this is all about now livepatch-build-tools.

The livepatch-build-tools get a lot of usage around XSA times.
And that is when the corner cases are being found. The three of them:
0c10457 Remove section alignment requirement
b30d34c Ignore .discard sections
6327ab9 create-diff-object: Update fixup offsets in .rela.ex_table

where thanks to generating XSAs. Now the folks who use these tools
are also the ones that do pre-disclosures. And the folks who
work on these tools also are the ones who have to get the livepatches out.

It is a stressful time and in the past the issues were off:
'oh, livepatch-build-tools won't generate the livepatch' 

which I don't even know how to classify - is it an XSA that it could not
create an livepatch?

And if the livepatch-build-tools does generate something mighty wrong
then the folks on the XSA pre-disclosure list should be let know
(and that has been happening).

But I am not really a fan of 'Oh, and one more XSA'


The second argument is that livepatch-build-tools is like the GCC compiler.
In fact it takes the binary blob of what the compiler has produced
and checks it against the other one. If the compiler adds extra instructions
or changes the instructions slightly we will classify that as
code needing to be patched (and yes that has come up).

This is very similar to what XSA-155 was - the GCC compiler optimizations
added a nice jump table that was accessed twice. And the offset was
retrieved from the shared ring.

But we didn't do an XSA-155 for the GCC compiler. That is we didn't
file a ticket with GCC saying 'Hey, your compiler can create an race
on shared memory. Could you make your compiler be smarter in these cases'
We instead wrote code with this optimization in mind with more
barriers.

I think livepatch-build-tools is in the same category as GCC or linkers.
> 
>  -George
George Dunlap Aug. 7, 2017, 10:26 a.m. UTC | #32
On 08/06/2017 01:07 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 03, 2017 at 06:21:30PM +0100, George Dunlap wrote:
>> On 08/03/2017 06:20 PM, George Dunlap wrote:
>>> On 07/03/2017 03:53 PM, Ross Lagerwall wrote:
>>>> On 06/30/2017 02:42 PM, George Dunlap wrote:
>>>>> On 06/28/2017 05:18 PM, Ross Lagerwall wrote:
>>>>>> On 06/27/2017 10:17 AM, George Dunlap wrote:
>>>>>>> On 26/06/17 18:30, Andrew Cooper wrote:
>>>>>>>> On 26/06/17 18:00, George Dunlap wrote:
>>>>>>>>> On 26/06/17 16:36, Ross Lagerwall wrote:
>>>>>> ...
>>>>>>> You seem to be simply refusing to use your imagination.  Step back.
>>>>>>> Imagine yourself in one year.  You come to the office and find an
>>>>>>> e-mail
>>>>>>> on security@ which says, "Livepatch tools open a security hole when
>>>>>>> compiling with gcc x.yy".  You realize that XenVerson ${LATEST-2} uses
>>>>>>> gcc x.yy, so you take a closer look at that livepatch, only to discover
>>>>>>> that the livepatches generated actually do contain the bug, but you
>>>>>>> missed it because ${LATEST-[0,1]} were perfectly fine (since they used
>>>>>>> newer versions of gcc), the difference was subtle, and it passed all
>>>>>>> the
>>>>>>> functional tests.
>>>>>>>
>>>>>>> Now all of the customers that have applied those patches are
>>>>>>> vulnerable.
>>>>>>>
>>>>>>> Do you:
>>>>>>>
>>>>>>> 1. Tell the reporter to post it publicly to xen-devel immediately,
>>>>>>> since
>>>>>>> livepatch tools are not security supported -- thus "zero-day"-ing all
>>>>>>> your customers (as well as anyone else who happens to have used x.yy to
>>>>>>> build a hypervisor)?
>>>>>>>
>>>>>>> 2. Secretly take advantage of Citrix' privileged position on the
>>>>>>> security list, and try to get an update out to your customers before it
>>>>>>> gets announced (but allowing everyone *else* using gcc x.yy to
>>>>>>> experience a zero-day)?
>>>>>>>
>>>>>>> 3. Issue an XSA so that everyone has the opportunity to fix things up
>>>>>>> before making a public announcement, and so that anyone not on the
>>>>>>> embargo list gets an alert, so they know to either update their own
>>>>>>> livepatches, or look for updates from their software provider?
>>>>>>>
>>>>>>> I think #3 is the only possible choice.
>>>>>>>
>>>>>>>    -George
>>>>>>>
>>>>>>
>>>>>> The issue here is that any bug in livepatch-build-tools which still
>>>>>> results in output being generated would be a security issue, because
>>>>>> someone might have used it to patch a security issue.
>>>>>> livepatch-build-tools is certainly not stable enough yet (ever?) to be
>>>>>> treated in this fashion.
>>>>>
>>>>> You didn't answer my question.  If the situation described happens, what
>>>>> position do you want Andrew to be put in?  (If I missed a potential
>>>>> action, let me know.)
>>>>>
>>>>
>>>> I would choose #3 as it is the obvious choice. But I still don't think
>>>> it is a sensible idea to have security support for the build tools, at
>>>> least at this point. The same scenario could be posed for a nasty bug
>>>> that affects Xen 4.4 only, but it is now just out of security support.
>>>> IMO something being not supported doesn't preclude it from having an XSA
>>>> released if there is a particularly nasty vulnerability found.
>>>
>>> Well basically I think we agree, but we're using different terms.  You
>>> want to say, "This isn't security supported, but if important bug is
>>> actually found then we'll issue an XSA".  I want to say, "This is
>>> security supported, because if an important bug is actually found we'll
>>> issue an XSA."
>>>
>>> So it seems to me there are likely two things that make you resistant to
>>> calling it "security supported":
>>>
>>> 1. The fear that we'll be issuing XSAs over trivial things that don't matter
>>>
>>> 2. The fear that people will not do due diligence when creating patches
>>> with the tools.
>>>
>>> I think #1 is just a misconception.  *Every* bug reported to us about
>>> any part of the code we go through the process of trying to determine
>>> its impact and whether we need to issue an XSA or not.  All of the
>>> examples put forward of things we don't want to issue an XSA for are
>>> things that I'm sure we would not issue an XSA for.
>>>
>>> For #2, that is a reasonable fear, but we can deal with that in a
>>> different way than calling the tools "unsupported".  We can, for
>>> instance, mention that in the documents.  We can add a warning message
>>> that the build tools output saying that the result should be manually
>>> inspected for correctness.
>>
>> We need to get a resolution on this.  Anyone else (particarly
>> committers) want to give their opinion?
> 
> Changing title as this is all about now livepatch-build-tools.
> 
> The livepatch-build-tools get a lot of usage around XSA times.
> And that is when the corner cases are being found. The three of them:
> 0c10457 Remove section alignment requirement
> b30d34c Ignore .discard sections
> 6327ab9 create-diff-object: Update fixup offsets in .rela.ex_table
> 
> where thanks to generating XSAs. Now the folks who use these tools
> are also the ones that do pre-disclosures. And the folks who
> work on these tools also are the ones who have to get the livepatches out.
> 
> It is a stressful time and in the past the issues were off:
> 'oh, livepatch-build-tools won't generate the livepatch' 
> 
> which I don't even know how to classify - is it an XSA that it could not
> create an livepatch?
> 
> And if the livepatch-build-tools does generate something mighty wrong
> then the folks on the XSA pre-disclosure list should be let know
> (and that has been happening).
> 
> But I am not really a fan of 'Oh, and one more XSA'

Thanks for weighing in, Konrad.

So it seems that people are still not quite clear about what I'm proposing.

In general, a bug is a security issue if a human told the computer to do
safe thing X, and the computer appeared to do safe thing X, but in fact
did unsafe thing Y.

Consider the question: "Is it an XSA that domain creation will fail for
a kernel with feature A enabled?"  No, that's not a security issue: The
human told the computer to do safe thing X ("boot with kernel A"), and
it did safe thing Z ("not boot").

Consider this question: "Is it an XSA that if you detach a block device
from one domain, and then attach it to another block device, that the
second user can read potentially sensitive data from the first domain?"
No, that's not a security issue: The human told the computer to do
unsafe thing Y ("expose block device from domain B to domain C"), and
the computer did unsafe thing Y.

So.  Suppose a developer / package maintainer / sysadmin tries to build
a livepatch, and the livepatch throws an error.  Is this a security
issue?  No -- the human told it to do safe thing X ("build me a
livepatch"), and it did safe thing Z ("print an error").

Suppose someone builds a livepatch that, when loaded, immediately
crashes the hypervisor in all cases.  Is this a security issue?  No --
the human told it to do safe thing X ("patch this vulnerability") and it
did safe thing Z ("crash the hypervisor immediately").  (This is safe
because we assume you do a reasonable amount of testing before deploying
a livepatch.)

According to my understanding, the above three fixes that happened as
part of devloping XSAs were like one of the above two examples: The tool
either just didn't make a livepatch at all, or it made one that clearly
didn't work.  As such, they would not be considered XSAs.

Suppose someone builds the livepatch with two different versions of the
compiler; or against stale versions of the binaries; and the livepatch
tool doesn't notice, and generates a livepatch which opens up a security
hole.  Is this a security issue?  No -- the human told it to do unsafe
thing Y, and it did unsafe thing Y.

Suppose someone builds a livepatch that with buggy fix-up code that
inadvertently gives PV guests access to hypervisor memory.  Is this a
security issue?  No -- the human told it to do unsafe thing Y ("give PV
guest access to hypervisor memory") and it did unsafe thing Y.

Suppose someone builds a livepatch with the correct compiler, with a
correct patch (that would fix the bug if rebooted into a new
hypervisor), with correct fix-up code.  Suppose that the bug passes all
reasonable testing; but that, *due to a bug in the tools*, the patch
also gives PV guests access to hypervisor memory.  Is this a security
issue?  Yes -- the human told it to do safe thing X ("build a livepatch
based on correct inputs to fix this bug") and it did unsafe thing Y
("build a livepatch that opens up a new security hole").

We could even place more restrictions on the scope if we wanted to.  We
could say that we only support the livepatch tools generating patches
for XSAs.

> This is very similar to what XSA-155 was - the GCC compiler optimizations
> added a nice jump table that was accessed twice. And the offset was
> retrieved from the shared ring.
>
> But we didn't do an XSA-155 for the GCC compiler. That is we didn't
> file a ticket with GCC saying 'Hey, your compiler can create an race
> on shared memory. Could you make your compiler be smarter in these cases'
> We instead wrote code with this optimization in mind with more
> barriers.

Right -- so the gcc compiler guys are using a specification that allows
that behavior.  So from their perspective, we told the compiler to do
unsafe thing Y (or at least, said that we were OK with it doing unsafe
thing Y), and it did unsafe thing Y -- a security issue for Xen, but not
for gcc.  If gcc had *violated* the spec when causing the security
issue, then we certainly would have called that a security issue in gcc.

> I think livepatch-build-tools is in the same category as GCC or linkers.

Indeed; gcc is "security-supported" in the way I am proposing
livepatching be security supported.

I don't think this is a hill worth dying on; if everyone agrees we
should call them "security un-supported", we can go with that.  I'm only
still talking because people seem to be confused about what I'm proposing.

 -George
Jan Beulich Aug. 7, 2017, 3:59 p.m. UTC | #33
>>> George Dunlap <george.dunlap@citrix.com> 08/07/17 12:27 PM >>>
>So it seems that people are still not quite clear about what I'm proposing.

And indeed your examples helped me understand better what you mean
(or at least I hope they did).

>Suppose someone builds a livepatch with the correct compiler, with a
>correct patch (that would fix the bug if rebooted into a new
>hypervisor), with correct fix-up code.  Suppose that the bug passes all
>reasonable testing; but that, *due to a bug in the tools*, the patch
>also gives PV guests access to hypervisor memory.  Is this a security
>issue?  Yes -- the human told it to do safe thing X ("build a livepatch
>based on correct inputs to fix this bug") and it did unsafe thing Y
>("build a livepatch that opens up a new security hole").

There's one more factor here: The livepatch tools may behave properly
with one version of the compiler, and improperly with another. Or they
may behave properly with the Oracle patched hypervisor sources, but
improperly with the Citrix ones (assuming every distro carries a different
set of patches on top of an upstream stable release or branch).

>We could even place more restrictions on the scope if we wanted to.  We
>could say that we only support the livepatch tools generating patches
>for XSAs.

For me, much depends on how tight such restrictions would be. I.e.
with the examples given above, how would we determine a canonical
livepatch-tools / hypervisor pair (or set of pairs)? After all tools
mis-behavior may be a result of some custom patch in someone's
derived tree.

>> This is very similar to what XSA-155 was - the GCC compiler optimizations
>> added a nice jump table that was accessed twice. And the offset was
>> retrieved from the shared ring.
>>
>> But we didn't do an XSA-155 for the GCC compiler. That is we didn't
>> file a ticket with GCC saying 'Hey, your compiler can create an race
>> on shared memory. Could you make your compiler be smarter in these cases'
>> We instead wrote code with this optimization in mind with more
>> barriers.
>
>Right -- so the gcc compiler guys are using a specification that allows
>that behavior.  So from their perspective, we told the compiler to do
>unsafe thing Y (or at least, said that we were OK with it doing unsafe
>thing Y), and it did unsafe thing Y -- a security issue for Xen, but not
>for gcc.  If gcc had *violated* the spec when causing the security
>issue, then we certainly would have called that a security issue in gcc.

But would we have issued an XSA? Wouldn't that rather be a CVE
against gcc then?

Jan
George Dunlap Aug. 8, 2017, 11:16 a.m. UTC | #34
On 08/07/2017 04:59 PM, Jan Beulich wrote:
>>>> George Dunlap <george.dunlap@citrix.com> 08/07/17 12:27 PM >>>
>> So it seems that people are still not quite clear about what I'm proposing.
> 
> And indeed your examples helped me understand better what you mean
> (or at least I hope they did).
> 
>> Suppose someone builds a livepatch with the correct compiler, with a
>> correct patch (that would fix the bug if rebooted into a new
>> hypervisor), with correct fix-up code.  Suppose that the bug passes all
>> reasonable testing; but that, *due to a bug in the tools*, the patch
>> also gives PV guests access to hypervisor memory.  Is this a security
>> issue?  Yes -- the human told it to do safe thing X ("build a livepatch
>> based on correct inputs to fix this bug") and it did unsafe thing Y
>> ("build a livepatch that opens up a new security hole").
> 
> There's one more factor here: The livepatch tools may behave properly
> with one version of the compiler, and improperly with another. 

I don't really understand the reasoning here.  Is this your argument:
"One can imagine a security-critical livepatch bug that only affects
say, gcc 6.x and not gcc 5.x or 7.x.  Therefore, we should never issue
XSAs for any security-critical livepatch bugs."

If we found that livepatching tools make an incorrect patch only when
using gcc 5.x, and we have reason to believe that some people may be
using gcc 5.x, then I think we should issue an XSA and say that it only
affects people compiling xen with gcc 5.x.

It probably would make sense to specify some range of compiler versions
for which we will issue XSAs for the livepatch tools.  A good baseline
would be what versions of gcc Xen uses, and then we can restrict it
further if we need to (for instance, if some versions of gcc are missing
requisite features, or if they are just known to be buggy).

And remember, this is not "We have tested all compiler versions and
promise you there are no bugs."  It's, "If someone finds a bug for this
set of compilers, we will tell you about it so you can do something
about it."

>> We could even place more restrictions on the scope if we wanted to.  We
>> could say that we only support the livepatch tools generating patches
>> for XSAs.
> 
> For me, much depends on how tight such restrictions would be. I.e.
> with the examples given above, how would we determine a canonical
> livepatch-tools / hypervisor pair (or set of pairs)? After all tools
> mis-behavior may be a result of some custom patch in someone's
> derived tree.

Well, suppose that we issued an XSA with a patch, and suppose it was
later discovered that the patch opened up a different security hole when
applied on the upstream tree.  Would we issue another XSA and/or an
update to the existing XSA?  I think obviously yes we would.

Suppose instead we issued an XSA with a patch, and that it was later
discovered that the patch opened up a different security hole when
applied on top of XenServer's patchqueue, but not on the baseline
XenProject.  Would we issue another XSA and/or an update to an existing XSA?

The obvious *default* answer to that is "No; it's not practical for us
to deal with software that is not inside the XenProject's control."  One
could imagine circumstances in which we issue statements or an XSA
anyway, but that would the exception and not the rule.

I think the same kind of thing would apply to the livepatch tools: *by
default*, we only issue XSAs for the livepatch tools if they create
security issues when generating blobs based on security patches issued
by the XenProject, and on top of XenProject-released software.  As
always, if there's some unforeseen circumstance then someone could argue
for an exception.

>>> This is very similar to what XSA-155 was - the GCC compiler optimizations
>>> added a nice jump table that was accessed twice. And the offset was
>>> retrieved from the shared ring.
>>>
>>> But we didn't do an XSA-155 for the GCC compiler. That is we didn't
>>> file a ticket with GCC saying 'Hey, your compiler can create an race
>>> on shared memory. Could you make your compiler be smarter in these cases'
>>> We instead wrote code with this optimization in mind with more
>>> barriers.
>>
>> Right -- so the gcc compiler guys are using a specification that allows
>> that behavior.  So from their perspective, we told the compiler to do
>> unsafe thing Y (or at least, said that we were OK with it doing unsafe
>> thing Y), and it did unsafe thing Y -- a security issue for Xen, but not
>> for gcc.  If gcc had *violated* the spec when causing the security
>> issue, then we certainly would have called that a security issue in gcc.
> 
> But would we have issued an XSA? Wouldn't that rather be a CVE
> against gcc then?

This is changing the question slightly, from "Should X have security
support", to "If X is to be security supported, what organization and
process should be used to support them?"

Obviously in the case of gcc, we would primarily handle the security
issue the way the gcc project handles security issues (which may be
nothing at all for all I know).  (Although depending on the bug and the
circumstances, we might still issue an advisory to raise awareness for
downstreams who might have compiled Xen with a particular version of gcc.)

If livepatch-tools were an external project run by somebody else, with
their own security process, then we would report the issue to them and
let them handle it.  Since livepatch-tools is developed by Xen
developers and for Xen downstreams and users, if it is to be security
supported, then it seems to me that the obvious thing to do is to
support it within the XenProject security response process.

I mean, if someone *wants* to set up an independent organization with an
independent security team and security process to handle the livepatch
project, then I guess that would be OK with me -- I don't care so much
*who* does the security support, as long as it gets done.

 -George
Jan Beulich Aug. 9, 2017, 7:36 a.m. UTC | #35
>>> On 08.08.17 at 13:16, <george.dunlap@citrix.com> wrote:
> On 08/07/2017 04:59 PM, Jan Beulich wrote:
>>>>> George Dunlap <george.dunlap@citrix.com> 08/07/17 12:27 PM >>>
>>> So it seems that people are still not quite clear about what I'm proposing.
>> 
>> And indeed your examples helped me understand better what you mean
>> (or at least I hope they did).
>> 
>>> Suppose someone builds a livepatch with the correct compiler, with a
>>> correct patch (that would fix the bug if rebooted into a new
>>> hypervisor), with correct fix-up code.  Suppose that the bug passes all
>>> reasonable testing; but that, *due to a bug in the tools*, the patch
>>> also gives PV guests access to hypervisor memory.  Is this a security
>>> issue?  Yes -- the human told it to do safe thing X ("build a livepatch
>>> based on correct inputs to fix this bug") and it did unsafe thing Y
>>> ("build a livepatch that opens up a new security hole").
>> 
>> There's one more factor here: The livepatch tools may behave properly
>> with one version of the compiler, and improperly with another. 
> 
> I don't really understand the reasoning here.  Is this your argument:
> "One can imagine a security-critical livepatch bug that only affects
> say, gcc 6.x and not gcc 5.x or 7.x.  Therefore, we should never issue
> XSAs for any security-critical livepatch bugs."
> 
> If we found that livepatching tools make an incorrect patch only when
> using gcc 5.x, and we have reason to believe that some people may be
> using gcc 5.x, then I think we should issue an XSA and say that it only
> affects people compiling xen with gcc 5.x.
> 
> It probably would make sense to specify some range of compiler versions
> for which we will issue XSAs for the livepatch tools.  A good baseline
> would be what versions of gcc Xen uses, and then we can restrict it
> further if we need to (for instance, if some versions of gcc are missing
> requisite features, or if they are just known to be buggy).
> 
> And remember, this is not "We have tested all compiler versions and
> promise you there are no bugs."  It's, "If someone finds a bug for this
> set of compilers, we will tell you about it so you can do something
> about it."

I can see and understand all of what you say; my argument,
however was more towards the matrix of what needs supporting
possibly becoming unreasonably large (no matter whether we
specify a range of compilers, as once again distros tend to not
ship plain unpatched upstream compiler versions).

>>> We could even place more restrictions on the scope if we wanted to.  We
>>> could say that we only support the livepatch tools generating patches
>>> for XSAs.
>> 
>> For me, much depends on how tight such restrictions would be. I.e.
>> with the examples given above, how would we determine a canonical
>> livepatch-tools / hypervisor pair (or set of pairs)? After all tools
>> mis-behavior may be a result of some custom patch in someone's
>> derived tree.
> 
> Well, suppose that we issued an XSA with a patch, and suppose it was
> later discovered that the patch opened up a different security hole when
> applied on the upstream tree.  Would we issue another XSA and/or an
> update to the existing XSA?  I think obviously yes we would.

Yes (this has happened in the past already).

> Suppose instead we issued an XSA with a patch, and that it was later
> discovered that the patch opened up a different security hole when
> applied on top of XenServer's patchqueue, but not on the baseline
> XenProject.  Would we issue another XSA and/or an update to an existing XSA?
> 
> The obvious *default* answer to that is "No; it's not practical for us
> to deal with software that is not inside the XenProject's control."  One
> could imagine circumstances in which we issue statements or an XSA
> anyway, but that would the exception and not the rule.
> 
> I think the same kind of thing would apply to the livepatch tools: *by
> default*, we only issue XSAs for the livepatch tools if they create
> security issues when generating blobs based on security patches issued
> by the XenProject, and on top of XenProject-released software.  As
> always, if there's some unforeseen circumstance then someone could argue
> for an exception.

Not sure here - if analysis showed that the same issue could happen
elsewhere, and others were just lucky so far, I think we'd have to
alter the default (and I'm hesitant to call this an exception). Plus
analysis may, the more different components are involved
(specifically the compiler, which perhaps none of us has deep enough
knowledge about), become more and more difficult.

Bottom line - while technically I agree it would be good for the tools
to be security supported, from a practical perspective I see too
much complexity for this to be reasonably manageable.

Jan
George Dunlap Aug. 21, 2017, 10:59 a.m. UTC | #36
On Wed, Aug 9, 2017 at 8:36 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 08.08.17 at 13:16, <george.dunlap@citrix.com> wrote:
>> On 08/07/2017 04:59 PM, Jan Beulich wrote:
>>>>>> George Dunlap <george.dunlap@citrix.com> 08/07/17 12:27 PM >>>
>>>> So it seems that people are still not quite clear about what I'm proposing.
>>>
>>> And indeed your examples helped me understand better what you mean
>>> (or at least I hope they did).
>>>
>>>> Suppose someone builds a livepatch with the correct compiler, with a
>>>> correct patch (that would fix the bug if rebooted into a new
>>>> hypervisor), with correct fix-up code.  Suppose that the bug passes all
>>>> reasonable testing; but that, *due to a bug in the tools*, the patch
>>>> also gives PV guests access to hypervisor memory.  Is this a security
>>>> issue?  Yes -- the human told it to do safe thing X ("build a livepatch
>>>> based on correct inputs to fix this bug") and it did unsafe thing Y
>>>> ("build a livepatch that opens up a new security hole").
>>>
>>> There's one more factor here: The livepatch tools may behave properly
>>> with one version of the compiler, and improperly with another.
>>
>> I don't really understand the reasoning here.  Is this your argument:
>> "One can imagine a security-critical livepatch bug that only affects
>> say, gcc 6.x and not gcc 5.x or 7.x.  Therefore, we should never issue
>> XSAs for any security-critical livepatch bugs."
>>
>> If we found that livepatching tools make an incorrect patch only when
>> using gcc 5.x, and we have reason to believe that some people may be
>> using gcc 5.x, then I think we should issue an XSA and say that it only
>> affects people compiling xen with gcc 5.x.
>>
>> It probably would make sense to specify some range of compiler versions
>> for which we will issue XSAs for the livepatch tools.  A good baseline
>> would be what versions of gcc Xen uses, and then we can restrict it
>> further if we need to (for instance, if some versions of gcc are missing
>> requisite features, or if they are just known to be buggy).
>>
>> And remember, this is not "We have tested all compiler versions and
>> promise you there are no bugs."  It's, "If someone finds a bug for this
>> set of compilers, we will tell you about it so you can do something
>> about it."
>
> I can see and understand all of what you say; my argument,
> however was more towards the matrix of what needs supporting
> possibly becoming unreasonably large (no matter whether we
> specify a range of compilers, as once again distros tend to not
> ship plain unpatched upstream compiler versions).

What do you mean, "The matrix of what needs supporting [may possibly
become] increasingly large"?   What is the problem with having a large
(implicit) "supported" matrix?  How is supporting a "large matrix" for
livepatch tools different than the current "large matrix" we support
for just building Xen at all?


>> Suppose instead we issued an XSA with a patch, and that it was later
>> discovered that the patch opened up a different security hole when
>> applied on top of XenServer's patchqueue, but not on the baseline
>> XenProject.  Would we issue another XSA and/or an update to an existing XSA?
>>
>> The obvious *default* answer to that is "No; it's not practical for us
>> to deal with software that is not inside the XenProject's control."  One
>> could imagine circumstances in which we issue statements or an XSA
>> anyway, but that would the exception and not the rule.
>>
>> I think the same kind of thing would apply to the livepatch tools: *by
>> default*, we only issue XSAs for the livepatch tools if they create
>> security issues when generating blobs based on security patches issued
>> by the XenProject, and on top of XenProject-released software.  As
>> always, if there's some unforeseen circumstance then someone could argue
>> for an exception.
>
> Not sure here - if analysis showed that the same issue could happen
> elsewhere, and others were just lucky so far, I think we'd have to
> alter the default (and I'm hesitant to call this an exception). Plus
> analysis may, the more different components are involved
> (specifically the compiler, which perhaps none of us has deep enough
> knowledge about), become more and more difficult.
>
> Bottom line - while technically I agree it would be good for the tools
> to be security supported, from a practical perspective I see too
> much complexity for this to be reasonably manageable.

But I still don't understand why you think so.  Every single objection
or question about what would or would  not be supported that has been
raised so far has analogs in what we already support.  It is no more
complex to support livepatch-tools than to support HVM USB
passthrough, or credit2.

I have elsewhere described a hypothetical scenario where I think we
should issue an XSA for livepatch-tools.  Are you really seriously
suggesting that in that scenario we should simply publish the
vulnerability onto xen-devel with no predisclosure?

 -George
Jan Beulich Aug. 21, 2017, 12:07 p.m. UTC | #37
>>> On 21.08.17 at 12:59, <george.dunlap@citrix.com> wrote:
> On Wed, Aug 9, 2017 at 8:36 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 08.08.17 at 13:16, <george.dunlap@citrix.com> wrote:
>>> On 08/07/2017 04:59 PM, Jan Beulich wrote:
>>>>>>> George Dunlap <george.dunlap@citrix.com> 08/07/17 12:27 PM >>>
>>>>> So it seems that people are still not quite clear about what I'm proposing.
>>>>
>>>> And indeed your examples helped me understand better what you mean
>>>> (or at least I hope they did).
>>>>
>>>>> Suppose someone builds a livepatch with the correct compiler, with a
>>>>> correct patch (that would fix the bug if rebooted into a new
>>>>> hypervisor), with correct fix-up code.  Suppose that the bug passes all
>>>>> reasonable testing; but that, *due to a bug in the tools*, the patch
>>>>> also gives PV guests access to hypervisor memory.  Is this a security
>>>>> issue?  Yes -- the human told it to do safe thing X ("build a livepatch
>>>>> based on correct inputs to fix this bug") and it did unsafe thing Y
>>>>> ("build a livepatch that opens up a new security hole").
>>>>
>>>> There's one more factor here: The livepatch tools may behave properly
>>>> with one version of the compiler, and improperly with another.
>>>
>>> I don't really understand the reasoning here.  Is this your argument:
>>> "One can imagine a security-critical livepatch bug that only affects
>>> say, gcc 6.x and not gcc 5.x or 7.x.  Therefore, we should never issue
>>> XSAs for any security-critical livepatch bugs."
>>>
>>> If we found that livepatching tools make an incorrect patch only when
>>> using gcc 5.x, and we have reason to believe that some people may be
>>> using gcc 5.x, then I think we should issue an XSA and say that it only
>>> affects people compiling xen with gcc 5.x.
>>>
>>> It probably would make sense to specify some range of compiler versions
>>> for which we will issue XSAs for the livepatch tools.  A good baseline
>>> would be what versions of gcc Xen uses, and then we can restrict it
>>> further if we need to (for instance, if some versions of gcc are missing
>>> requisite features, or if they are just known to be buggy).
>>>
>>> And remember, this is not "We have tested all compiler versions and
>>> promise you there are no bugs."  It's, "If someone finds a bug for this
>>> set of compilers, we will tell you about it so you can do something
>>> about it."
>>
>> I can see and understand all of what you say; my argument,
>> however was more towards the matrix of what needs supporting
>> possibly becoming unreasonably large (no matter whether we
>> specify a range of compilers, as once again distros tend to not
>> ship plain unpatched upstream compiler versions).
> 
> What do you mean, "The matrix of what needs supporting [may possibly
> become] increasingly large"?   What is the problem with having a large
> (implicit) "supported" matrix?  How is supporting a "large matrix" for
> livepatch tools different than the current "large matrix" we support
> for just building Xen at all?

The matrix of Xen only has just a single dimension. Since livepatch
tools and Xen are independent, any pair of them would need
building/testing in order to be sure things work in all supported
combinations.

>>> Suppose instead we issued an XSA with a patch, and that it was later
>>> discovered that the patch opened up a different security hole when
>>> applied on top of XenServer's patchqueue, but not on the baseline
>>> XenProject.  Would we issue another XSA and/or an update to an existing XSA?
>>>
>>> The obvious *default* answer to that is "No; it's not practical for us
>>> to deal with software that is not inside the XenProject's control."  One
>>> could imagine circumstances in which we issue statements or an XSA
>>> anyway, but that would the exception and not the rule.
>>>
>>> I think the same kind of thing would apply to the livepatch tools: *by
>>> default*, we only issue XSAs for the livepatch tools if they create
>>> security issues when generating blobs based on security patches issued
>>> by the XenProject, and on top of XenProject-released software.  As
>>> always, if there's some unforeseen circumstance then someone could argue
>>> for an exception.
>>
>> Not sure here - if analysis showed that the same issue could happen
>> elsewhere, and others were just lucky so far, I think we'd have to
>> alter the default (and I'm hesitant to call this an exception). Plus
>> analysis may, the more different components are involved
>> (specifically the compiler, which perhaps none of us has deep enough
>> knowledge about), become more and more difficult.
>>
>> Bottom line - while technically I agree it would be good for the tools
>> to be security supported, from a practical perspective I see too
>> much complexity for this to be reasonably manageable.
> 
> But I still don't understand why you think so.  Every single objection
> or question about what would or would  not be supported that has been
> raised so far has analogs in what we already support.  It is no more
> complex to support livepatch-tools than to support HVM USB
> passthrough, or credit2.

I don't think we currently have any scenario where it might be
required (rather than just being optional) to do in-depth analysis
of compiler behavior. If we ran into a compiler induced vulnerability
in Xen, I'm sure we'd forward this to the compiler folks. If, however,
there's an issue with a livepatch, and it's unclear whether the root
cause are the tool chain or the livepatch tools, both would need to
be analyzed to find the culprit. I don't think forwarding this to the
compiler folks would be appropriate until we're sure it's in their code
rather than ours.

> I have elsewhere described a hypothetical scenario where I think we
> should issue an XSA for livepatch-tools.  Are you really seriously
> suggesting that in that scenario we should simply publish the
> vulnerability onto xen-devel with no predisclosure?

Well, at least I'm not 100% convinced issuing an XSA in this case
would be appropriate.

Anyway - since it feels like we're moving in circles (which in part
may be because I can't express well enough the reasons for my
hesitation to go to the full XSA extent with the livepatch tools)
I'd like to just conclude my part here with saying that I'm not
going to stand in the way whichever decision is taken. I've
voiced my reservations, and that will have to do. I'd therefore
prefer to leave the discussion to those more familiar with those
tools (and their possible limitations and issues).

Jan
George Dunlap Aug. 21, 2017, 3:28 p.m. UTC | #38
On 08/21/2017 01:07 PM, Jan Beulich wrote:
>>>> And remember, this is not "We have tested all compiler versions and
>>>> promise you there are no bugs."  It's, "If someone finds a bug for this
>>>> set of compilers, we will tell you about it so you can do something
>>>> about it."
>>>
>>> I can see and understand all of what you say; my argument,
>>> however was more towards the matrix of what needs supporting
>>> possibly becoming unreasonably large (no matter whether we
>>> specify a range of compilers, as once again distros tend to not
>>> ship plain unpatched upstream compiler versions).
>>
>> What do you mean, "The matrix of what needs supporting [may possibly
>> become] increasingly large"?   What is the problem with having a large
>> (implicit) "supported" matrix?  How is supporting a "large matrix" for
>> livepatch tools different than the current "large matrix" we support
>> for just building Xen at all?
> 
> The matrix of Xen only has just a single dimension. Since livepatch
> tools and Xen are independent, any pair of them would need
> building/testing in order to be sure things work in all supported
> combinations.

So your argument seems to be:

1. We can only provide security support in situations where we can test
all possible combinations in the support matrix.

2. We cannot test the entire matrix of combinations for Xen x livepatch
tools x compilers

3. Therefore, we cannot provide security support for livepatching tools.

Put this way, I hope you can see what the flaw in the argument is: #1 is
false.  Xen has {Xen version} x {Linux version} x {Compiler} x
{Hardware}.  Hardware of course includes not only the chip itself, but
the BIOS / firmware, and the particular devices (and device firmware).
If we wanted we could add in {Python version} for people using pygrub,
and {Ocaml compiler version} for people running Ocaml, versions of
systemd -- I'm sure with effort I could find more dimensions to add to
the matrix.

We do not, and never have, *tested* the entire matrix of possible
combinations considered "security supported" to make sure they work.
Such a matrix is completely impossible to even consider, and even if we
did some sort of testing, that could not guarantee that they are bug free.

What we do for security support is:

1. Test a *representative sample* of combinations (via osstest, product
testing, user testing, &c)

2. Promise to issue XSAs if anyone *happens to discover* a combination
in the rest of the support matrix that has a security issue

That is the requirment for normal Xen, and it would be the same
requirement for livepatch-tools: That between osstest, product, and the
community, we get regular testing of *a representative sample* of {Xen,
livepatch-tools, compiler}, and (what primarily concerns me) issue an
XSA if anyone discovers a security issue somewhere in that matrix.

I'm not frustrated, but I am baffled by the fact that this "support
matrix" objection is so persistent.  Nearly everyone has brought it up,
as though "test every combination" was a necessary requirement, in spite
of the fact that 1) there is *no* piece of software for which we test
the entire matrix of possible combinations 2) I have said over and over
again (in fact, I specifically said a few replies ago -- it's there at
the top of this email) that we do not test all possible combinations.

>> I have elsewhere described a hypothetical scenario where I think we
>> should issue an XSA for livepatch-tools.  Are you really seriously
>> suggesting that in that scenario we should simply publish the
>> vulnerability onto xen-devel with no predisclosure?
> 
> Well, at least I'm not 100% convinced issuing an XSA in this case
> would be appropriate.
> 
> Anyway - since it feels like we're moving in circles (which in part
> may be because I can't express well enough the reasons for my
> hesitation to go to the full XSA extent with the livepatch tools)
> I'd like to just conclude my part here with saying that I'm not
> going to stand in the way whichever decision is taken. I've
> voiced my reservations, and that will have to do. I'd therefore
> prefer to leave the discussion to those more familiar with those
> tools (and their possible limitations and issues).

Indeed; and as I think I said before, I think we need to move forward
with getting a statement on livepatching in, and since most of the
voices involved in this conversation seem to be in favor of saying
livepatch-tools are *not* supported, I won't object. I'm only still
continuing this thread because people seem to be confused about what I
am asking people to do.

I think the likelihood of an XSA-worthy bug being found in the livepatch
tools is very low.  I'm happy to defer the argument about whether we
should issue an XSA for such a bug until such time as one becomes known.

 -George
Jan Beulich Aug. 22, 2017, 6:37 a.m. UTC | #39
>>> On 21.08.17 at 17:28, <george.dunlap@citrix.com> wrote:
> So your argument seems to be:
> 
> 1. We can only provide security support in situations where we can test
> all possible combinations in the support matrix.
> 
> 2. We cannot test the entire matrix of combinations for Xen x livepatch
> tools x compilers
> 
> 3. Therefore, we cannot provide security support for livepatching tools.
> 
> Put this way, I hope you can see what the flaw in the argument is: #1 is
> false.  Xen has {Xen version} x {Linux version} x {Compiler} x
> {Hardware}.  Hardware of course includes not only the chip itself, but
> the BIOS / firmware, and the particular devices (and device firmware).
> If we wanted we could add in {Python version} for people using pygrub,
> and {Ocaml compiler version} for people running Ocaml, versions of
> systemd -- I'm sure with effort I could find more dimensions to add to
> the matrix.
> 
> We do not, and never have, *tested* the entire matrix of possible
> combinations considered "security supported" to make sure they work.
> Such a matrix is completely impossible to even consider, and even if we
> did some sort of testing, that could not guarantee that they are bug free.
> 
> What we do for security support is:
> 
> 1. Test a *representative sample* of combinations (via osstest, product
> testing, user testing, &c)
> 
> 2. Promise to issue XSAs if anyone *happens to discover* a combination
> in the rest of the support matrix that has a security issue
> 
> That is the requirment for normal Xen, and it would be the same
> requirement for livepatch-tools: That between osstest, product, and the
> community, we get regular testing of *a representative sample* of {Xen,
> livepatch-tools, compiler}, and (what primarily concerns me) issue an
> XSA if anyone discovers a security issue somewhere in that matrix.
> 
> I'm not frustrated, but I am baffled by the fact that this "support
> matrix" objection is so persistent.  Nearly everyone has brought it up,
> as though "test every combination" was a necessary requirement, in spite
> of the fact that 1) there is *no* piece of software for which we test
> the entire matrix of possible combinations 2) I have said over and over
> again (in fact, I specifically said a few replies ago -- it's there at
> the top of this email) that we do not test all possible combinations.

Well, part of it may be that the other components involved in the
test matrix you suggest are external, i.e. we're just their consumers.
If we consider just our own portions, the matrix is - as said - one
dimensional. With the livepatching tools, a dimension is being added.
Even us issuing Linux XSAs is, with the current upstream status of
the Xen pieces in there, questionable imo. This is also considering
the fact that iirc we've never issued an XSA for another Xen guest
OS, yet it is hard to believe that only Linux would ever had any
vulnerability.

Jan
George Dunlap Aug. 22, 2017, 10:58 a.m. UTC | #40
On Tue, Aug 22, 2017 at 7:37 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 21.08.17 at 17:28, <george.dunlap@citrix.com> wrote:
>> So your argument seems to be:
>>
>> 1. We can only provide security support in situations where we can test
>> all possible combinations in the support matrix.
>>
>> 2. We cannot test the entire matrix of combinations for Xen x livepatch
>> tools x compilers
>>
>> 3. Therefore, we cannot provide security support for livepatching tools.
>>
>> Put this way, I hope you can see what the flaw in the argument is: #1 is
>> false.  Xen has {Xen version} x {Linux version} x {Compiler} x
>> {Hardware}.  Hardware of course includes not only the chip itself, but
>> the BIOS / firmware, and the particular devices (and device firmware).
>> If we wanted we could add in {Python version} for people using pygrub,
>> and {Ocaml compiler version} for people running Ocaml, versions of
>> systemd -- I'm sure with effort I could find more dimensions to add to
>> the matrix.
>>
>> We do not, and never have, *tested* the entire matrix of possible
>> combinations considered "security supported" to make sure they work.
>> Such a matrix is completely impossible to even consider, and even if we
>> did some sort of testing, that could not guarantee that they are bug free.
>>
>> What we do for security support is:
>>
>> 1. Test a *representative sample* of combinations (via osstest, product
>> testing, user testing, &c)
>>
>> 2. Promise to issue XSAs if anyone *happens to discover* a combination
>> in the rest of the support matrix that has a security issue
>>
>> That is the requirment for normal Xen, and it would be the same
>> requirement for livepatch-tools: That between osstest, product, and the
>> community, we get regular testing of *a representative sample* of {Xen,
>> livepatch-tools, compiler}, and (what primarily concerns me) issue an
>> XSA if anyone discovers a security issue somewhere in that matrix.
>>
>> I'm not frustrated, but I am baffled by the fact that this "support
>> matrix" objection is so persistent.  Nearly everyone has brought it up,
>> as though "test every combination" was a necessary requirement, in spite
>> of the fact that 1) there is *no* piece of software for which we test
>> the entire matrix of possible combinations 2) I have said over and over
>> again (in fact, I specifically said a few replies ago -- it's there at
>> the top of this email) that we do not test all possible combinations.
>
> Well, part of it may be that the other components involved in the
> test matrix you suggest are external, i.e. we're just their consumers.

But our response to me, you mentioned distros having patched compilers
as a reason that the matrix is untenably large.

> If we consider just our own portions, the matrix is - as said - one
> dimensional. With the livepatching tools, a dimension is being added.
> Even us issuing Linux XSAs is, with the current upstream status of
> the Xen pieces in there, questionable imo. This is also considering
> the fact that iirc we've never issued an XSA for another Xen guest
> OS, yet it is hard to believe that only Linux would ever had any
> vulnerability.

Well, no -- we have at very least {Linux} x {Xen}, for which we end up
testing *many* different configurations, but certainly not all.

I think guest OS support is actually a pretty good analog.  I can't
imagine not issuing XSAs for bugs in Linux, just as I can't imagine
not issuing XSAs for actual security issues that get found in the
livepatch tools.  If you think we shouldn't give security support for
Linux, it makes sense that you would feel the same way for
livepatch-tools (although I don't really understand why you think that
way about either).

We issue more XSAs for Linux than for other guests, in part because of
the complexity of the code inside Linux compared to other OSes; but
also in part due to the fact that that is the most tested and
looked-at.  There probably *are* more bugs in Linux than in NetBSD or
FreeBSD; but also more of them are found because more people are
testing and looking.

But in any case, in my mind, the promise never was "we test all
versions of Linux with all versions of Xen", much less "We test all
versions of all operating systems with all versions of Xen".  The
promise was, "We test a representative sample of Linux and Xen
combinations, and we promise to report issues if they are found."
That's what I thing is the right thing to do for livepatch-tools as
well.

 -George
Roger Pau Monné Aug. 22, 2017, 11:16 a.m. UTC | #41
On Tue, Aug 22, 2017 at 11:58:57AM +0100, George Dunlap wrote:
> I think guest OS support is actually a pretty good analog.  I can't
> imagine not issuing XSAs for bugs in Linux, just as I can't imagine
> not issuing XSAs for actual security issues that get found in the
> livepatch tools.  If you think we shouldn't give security support for
> Linux, it makes sense that you would feel the same way for
> livepatch-tools (although I don't really understand why you think that
> way about either).
> 
> We issue more XSAs for Linux than for other guests, in part because of
> the complexity of the code inside Linux compared to other OSes; but
> also in part due to the fact that that is the most tested and
> looked-at.  There probably *are* more bugs in Linux than in NetBSD or
> FreeBSD; but also more of them are found because more people are
> testing and looking.

IMHO, we issue XSA for Linux because Linux lacks a security process.
If a bug was found in the BSDs, it should be handled using the normal
security process that each BSD has, and a SA would be issued by the
security officer:

https://www.freebsd.org/security/advisories.html

For example NetBSD has recently released a SA for a Xen-specific
PV vulnerability in their implementation:

ftp://ftp.nl.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2017-003.txt.asc

Roger.
diff mbox

Patch

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dc8e876..876086c 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -226,7 +226,7 @@  config CRYPTO
 	bool
 
 config LIVEPATCH
-	bool "Live patching support (TECH PREVIEW)"
+	bool "Live patching support"
 	default n
 	depends on HAS_BUILD_ID = "y"
 	---help---