diff mbox series

[v2] CODING_STYLE: Document how to handle unexpected conditions

Message ID 20191206114843.4028617-1-george.dunlap@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v2] CODING_STYLE: Document how to handle unexpected conditions | expand

Commit Message

George Dunlap Dec. 6, 2019, 11:48 a.m. UTC
It's not always clear what the best way is to handle unexpected
conditions: whether with ASSERT(), domain_crash(), BUG_ON(), or some
other method.  All methods have a risk of introducing security
vulnerabilities and unnecessary instabilities to production systems.

Provide guidelines for different options and when to use them.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v2:
- Clarify meaning of "or" clause
- Add domain_crash as an option
- Make it clear that ASSERT() is not an error handling mechanism.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
 CODING_STYLE | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

Comments

Jan Beulich Dec. 6, 2019, 12:17 p.m. UTC | #1
On 06.12.2019 12:48, George Dunlap wrote:
> --- a/CODING_STYLE
> +++ b/CODING_STYLE
> @@ -133,3 +133,86 @@ the end of files.  It should be:
>   * indent-tabs-mode: nil
>   * End:
>   */
> +
> +Handling unexpected conditions
> +------------------------------
> +
> +GUIDELINES:
> +
> +Passing errors up the stack should be used when the caller is already
> +expecting to handle errors, and the state when the error was
> +discovered isn’t broken, or too isn't hard to fix.

Was the "too" meant to come later in the sentence?

> +domain_crash() should be used when passing errors up the stack is too
> +difficult, and/or when fixing up state of a guest is impractical, but
> +where fixing up the state of Xen will allow Xen to continue running.
> +
> +BUG_ON() should be used when you can’t pass errors up the stack, and
> +either continuing or crashing the guest would likely cause an
> +information leak or privilege escalation vulnerability.

Strictly speaking BUG_ON() isn't an error handling mechanism either.
Further down it becomes more clear (it rather to be used for
detecting broken assumptions), but I guess it wouldn't hurt to say
so here as well.

> +ASSERT() IT IS NOT AN ERROR HANDLING MECHANISM.  ASSERT is a way to

Nit: Stray "IT"?

> +move detection of a bug earlier in the programming cycle.  It should
> +only added after one of the other three error-handling mechanisms has

Nit: "only be added ..."?

> +been evaluated for reliability and security.
> +
> +RATIONALE:
> +
> +It's frequently the case that code is writen with the assumption that

Nit: written

> +certain conditions can never happen.  There are several possible
> +actions programmers can take in these situations:
> +
> +* Programmers can simply not handle those cases in any way, other than
> +perhaps to write a comment documenting what the assumption is.
> +
> +* Programmers can try to handle the case gracefully -- fixing up
> +in-progress state and returning an error to the user.
> +
> +* Programmers can crash the guest.
> +
> +* Programmers can use ASSERT(), which will cause the check to be
> +executed in DEBUG builds, and cause the hypervisor to crash if it's
> +violated
> +
> +* Programmers can use BUG_ON(), which will cause the check to be
> +executed in both DEBUG and non-DEBUG builds, and cause the hypervisor
> +to crash if it's violated.
> +
> +In selecting which response to use, we want to achieve several goals:
> +
> +- To minimize risk of introducing security vulnerabilities,
> +  particularly as the code evolves over time
> +
> +- To efficiently spend programmer time
> +
> +- To detect violations of assumptions as early as possible
> +
> +- To minimize the impact of bugs on production use cases
> +
> +The guidelines above attempt to balance these:
> +
> +- When the caller is expecting to handle errors, and there are no

Nit: s/ are / is / ?

> +broken state at the time the unexpected condition is discovered, or
> +when fixing the state is straightforward, then fixing up the state and
> +returning an error is the most robust thing to do.  However, if the
> +caller isn't expecting to handle errors, or if the state is difficult
> +to fix, then returning an error may require extensive refactoring,
> +which is not a good use of programmer time when they're certain that
> +this condition cannot occur.
> +
> +- BUG_ON() will stop all hypervisor action immediately.  In situations
> +where continuing might allow an attacker to escalate privilege, a
> +BUG_ON() can change a privilege escalation or information leak into a
> +denial-of-service (an improvement).  But in situations where
> +continuing (say, returning an error) might be safe, then BUG_ON() can
> +change a benign failure into denial-of-service (a degradation)

Nit: Full stop?

Jan

> +- ASSERT() will stop the hypervisor during development, but allow
> +hypervisor action to continue during production.  In situations where
> +continuing will at worst result in a denial-of-service, and at best
> +may have little effect other than perhaps quirky behavior, using an
> +ASSERT() will allow violation of assumptions to be detected as soon as
> +possible, while not causing undue degradation in production
> +hypervisors.  However, in situations where continuing could cause
> +privilege escalation or information leaks, using an ASSERT() can
> +introduce security vulnerabilities.
>
George Dunlap Dec. 6, 2019, 2:12 p.m. UTC | #2
On 12/6/19 12:17 PM, Jan Beulich wrote:
> On 06.12.2019 12:48, George Dunlap wrote:
>> --- a/CODING_STYLE
>> +++ b/CODING_STYLE
>> @@ -133,3 +133,86 @@ the end of files.  It should be:
>>   * indent-tabs-mode: nil
>>   * End:
>>   */
>> +
>> +Handling unexpected conditions
>> +------------------------------
>> +
>> +GUIDELINES:
>> +
>> +Passing errors up the stack should be used when the caller is already
>> +expecting to handle errors, and the state when the error was
>> +discovered isn’t broken, or too isn't hard to fix.
> 
> Was the "too" meant to come later in the sentence?

I did actually go through this several times; I don't have any idea how
I managed to miss all these editing mistakes!  All editing comments are
"ack" unless otherwise mentioned.

>> +domain_crash() should be used when passing errors up the stack is too
>> +difficult, and/or when fixing up state of a guest is impractical, but
>> +where fixing up the state of Xen will allow Xen to continue running.
>> +
>> +BUG_ON() should be used when you can’t pass errors up the stack, and
>> +either continuing or crashing the guest would likely cause an
>> +information leak or privilege escalation vulnerability.
> 
> Strictly speaking BUG_ON() isn't an error handling mechanism either.
> Further down it becomes more clear (it rather to be used for
> detecting broken assumptions), but I guess it wouldn't hurt to say
> so here as well.

I guess it depends on what you mean by "error handling mechanism".  The
BUG_ON() in page_alloc.c has reliably changed potential privilege
escalations into "mere" DoSes over the years.

The distinction I'm trying to draw between BUG_ON() and ASSERT() is that
BUG_ON() actually handles the situation (albeit with a very heavy
hammer).  ASSERT() is essentially a more noticeable printk.

 -George
diff mbox series

Patch

diff --git a/CODING_STYLE b/CODING_STYLE
index 810b71c16d..a205e4f5f5 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -133,3 +133,86 @@  the end of files.  It should be:
  * indent-tabs-mode: nil
  * End:
  */
+
+Handling unexpected conditions
+------------------------------
+
+GUIDELINES:
+
+Passing errors up the stack should be used when the caller is already
+expecting to handle errors, and the state when the error was
+discovered isn’t broken, or too isn't hard to fix.
+
+domain_crash() should be used when passing errors up the stack is too
+difficult, and/or when fixing up state of a guest is impractical, but
+where fixing up the state of Xen will allow Xen to continue running.
+
+BUG_ON() should be used when you can’t pass errors up the stack, and
+either continuing or crashing the guest would likely cause an
+information leak or privilege escalation vulnerability.
+
+ASSERT() IT IS NOT AN ERROR HANDLING MECHANISM.  ASSERT is a way to
+move detection of a bug earlier in the programming cycle.  It should
+only added after one of the other three error-handling mechanisms has
+been evaluated for reliability and security.
+
+RATIONALE:
+
+It's frequently the case that code is writen with the assumption that
+certain conditions can never happen.  There are several possible
+actions programmers can take in these situations:
+
+* Programmers can simply not handle those cases in any way, other than
+perhaps to write a comment documenting what the assumption is.
+
+* Programmers can try to handle the case gracefully -- fixing up
+in-progress state and returning an error to the user.
+
+* Programmers can crash the guest.
+
+* Programmers can use ASSERT(), which will cause the check to be
+executed in DEBUG builds, and cause the hypervisor to crash if it's
+violated
+
+* Programmers can use BUG_ON(), which will cause the check to be
+executed in both DEBUG and non-DEBUG builds, and cause the hypervisor
+to crash if it's violated.
+
+In selecting which response to use, we want to achieve several goals:
+
+- To minimize risk of introducing security vulnerabilities,
+  particularly as the code evolves over time
+
+- To efficiently spend programmer time
+
+- To detect violations of assumptions as early as possible
+
+- To minimize the impact of bugs on production use cases
+
+The guidelines above attempt to balance these:
+
+- When the caller is expecting to handle errors, and there are no
+broken state at the time the unexpected condition is discovered, or
+when fixing the state is straightforward, then fixing up the state and
+returning an error is the most robust thing to do.  However, if the
+caller isn't expecting to handle errors, or if the state is difficult
+to fix, then returning an error may require extensive refactoring,
+which is not a good use of programmer time when they're certain that
+this condition cannot occur.
+
+- BUG_ON() will stop all hypervisor action immediately.  In situations
+where continuing might allow an attacker to escalate privilege, a
+BUG_ON() can change a privilege escalation or information leak into a
+denial-of-service (an improvement).  But in situations where
+continuing (say, returning an error) might be safe, then BUG_ON() can
+change a benign failure into denial-of-service (a degradation)
+
+- ASSERT() will stop the hypervisor during development, but allow
+hypervisor action to continue during production.  In situations where
+continuing will at worst result in a denial-of-service, and at best
+may have little effect other than perhaps quirky behavior, using an
+ASSERT() will allow violation of assumptions to be detected as soon as
+possible, while not causing undue degradation in production
+hypervisors.  However, in situations where continuing could cause
+privilege escalation or information leaks, using an ASSERT() can
+introduce security vulnerabilities.