Message ID | 20191206114843.4028617-1-george.dunlap@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] CODING_STYLE: Document how to handle unexpected conditions | expand |
On 06.12.2019 12:48, George Dunlap wrote: > --- a/CODING_STYLE > +++ b/CODING_STYLE > @@ -133,3 +133,86 @@ the end of files. It should be: > * indent-tabs-mode: nil > * End: > */ > + > +Handling unexpected conditions > +------------------------------ > + > +GUIDELINES: > + > +Passing errors up the stack should be used when the caller is already > +expecting to handle errors, and the state when the error was > +discovered isn’t broken, or too isn't hard to fix. Was the "too" meant to come later in the sentence? > +domain_crash() should be used when passing errors up the stack is too > +difficult, and/or when fixing up state of a guest is impractical, but > +where fixing up the state of Xen will allow Xen to continue running. > + > +BUG_ON() should be used when you can’t pass errors up the stack, and > +either continuing or crashing the guest would likely cause an > +information leak or privilege escalation vulnerability. Strictly speaking BUG_ON() isn't an error handling mechanism either. Further down it becomes more clear (it rather to be used for detecting broken assumptions), but I guess it wouldn't hurt to say so here as well. > +ASSERT() IT IS NOT AN ERROR HANDLING MECHANISM. ASSERT is a way to Nit: Stray "IT"? > +move detection of a bug earlier in the programming cycle. It should > +only added after one of the other three error-handling mechanisms has Nit: "only be added ..."? > +been evaluated for reliability and security. > + > +RATIONALE: > + > +It's frequently the case that code is writen with the assumption that Nit: written > +certain conditions can never happen. There are several possible > +actions programmers can take in these situations: > + > +* Programmers can simply not handle those cases in any way, other than > +perhaps to write a comment documenting what the assumption is. > + > +* Programmers can try to handle the case gracefully -- fixing up > +in-progress state and returning an error to the user. > + > +* Programmers can crash the guest. > + > +* Programmers can use ASSERT(), which will cause the check to be > +executed in DEBUG builds, and cause the hypervisor to crash if it's > +violated > + > +* Programmers can use BUG_ON(), which will cause the check to be > +executed in both DEBUG and non-DEBUG builds, and cause the hypervisor > +to crash if it's violated. > + > +In selecting which response to use, we want to achieve several goals: > + > +- To minimize risk of introducing security vulnerabilities, > + particularly as the code evolves over time > + > +- To efficiently spend programmer time > + > +- To detect violations of assumptions as early as possible > + > +- To minimize the impact of bugs on production use cases > + > +The guidelines above attempt to balance these: > + > +- When the caller is expecting to handle errors, and there are no Nit: s/ are / is / ? > +broken state at the time the unexpected condition is discovered, or > +when fixing the state is straightforward, then fixing up the state and > +returning an error is the most robust thing to do. However, if the > +caller isn't expecting to handle errors, or if the state is difficult > +to fix, then returning an error may require extensive refactoring, > +which is not a good use of programmer time when they're certain that > +this condition cannot occur. > + > +- BUG_ON() will stop all hypervisor action immediately. In situations > +where continuing might allow an attacker to escalate privilege, a > +BUG_ON() can change a privilege escalation or information leak into a > +denial-of-service (an improvement). But in situations where > +continuing (say, returning an error) might be safe, then BUG_ON() can > +change a benign failure into denial-of-service (a degradation) Nit: Full stop? Jan > +- ASSERT() will stop the hypervisor during development, but allow > +hypervisor action to continue during production. In situations where > +continuing will at worst result in a denial-of-service, and at best > +may have little effect other than perhaps quirky behavior, using an > +ASSERT() will allow violation of assumptions to be detected as soon as > +possible, while not causing undue degradation in production > +hypervisors. However, in situations where continuing could cause > +privilege escalation or information leaks, using an ASSERT() can > +introduce security vulnerabilities. >
On 12/6/19 12:17 PM, Jan Beulich wrote: > On 06.12.2019 12:48, George Dunlap wrote: >> --- a/CODING_STYLE >> +++ b/CODING_STYLE >> @@ -133,3 +133,86 @@ the end of files. It should be: >> * indent-tabs-mode: nil >> * End: >> */ >> + >> +Handling unexpected conditions >> +------------------------------ >> + >> +GUIDELINES: >> + >> +Passing errors up the stack should be used when the caller is already >> +expecting to handle errors, and the state when the error was >> +discovered isn’t broken, or too isn't hard to fix. > > Was the "too" meant to come later in the sentence? I did actually go through this several times; I don't have any idea how I managed to miss all these editing mistakes! All editing comments are "ack" unless otherwise mentioned. >> +domain_crash() should be used when passing errors up the stack is too >> +difficult, and/or when fixing up state of a guest is impractical, but >> +where fixing up the state of Xen will allow Xen to continue running. >> + >> +BUG_ON() should be used when you can’t pass errors up the stack, and >> +either continuing or crashing the guest would likely cause an >> +information leak or privilege escalation vulnerability. > > Strictly speaking BUG_ON() isn't an error handling mechanism either. > Further down it becomes more clear (it rather to be used for > detecting broken assumptions), but I guess it wouldn't hurt to say > so here as well. I guess it depends on what you mean by "error handling mechanism". The BUG_ON() in page_alloc.c has reliably changed potential privilege escalations into "mere" DoSes over the years. The distinction I'm trying to draw between BUG_ON() and ASSERT() is that BUG_ON() actually handles the situation (albeit with a very heavy hammer). ASSERT() is essentially a more noticeable printk. -George
diff --git a/CODING_STYLE b/CODING_STYLE index 810b71c16d..a205e4f5f5 100644 --- a/CODING_STYLE +++ b/CODING_STYLE @@ -133,3 +133,86 @@ the end of files. It should be: * indent-tabs-mode: nil * End: */ + +Handling unexpected conditions +------------------------------ + +GUIDELINES: + +Passing errors up the stack should be used when the caller is already +expecting to handle errors, and the state when the error was +discovered isn’t broken, or too isn't hard to fix. + +domain_crash() should be used when passing errors up the stack is too +difficult, and/or when fixing up state of a guest is impractical, but +where fixing up the state of Xen will allow Xen to continue running. + +BUG_ON() should be used when you can’t pass errors up the stack, and +either continuing or crashing the guest would likely cause an +information leak or privilege escalation vulnerability. + +ASSERT() IT IS NOT AN ERROR HANDLING MECHANISM. ASSERT is a way to +move detection of a bug earlier in the programming cycle. It should +only added after one of the other three error-handling mechanisms has +been evaluated for reliability and security. + +RATIONALE: + +It's frequently the case that code is writen with the assumption that +certain conditions can never happen. There are several possible +actions programmers can take in these situations: + +* Programmers can simply not handle those cases in any way, other than +perhaps to write a comment documenting what the assumption is. + +* Programmers can try to handle the case gracefully -- fixing up +in-progress state and returning an error to the user. + +* Programmers can crash the guest. + +* Programmers can use ASSERT(), which will cause the check to be +executed in DEBUG builds, and cause the hypervisor to crash if it's +violated + +* Programmers can use BUG_ON(), which will cause the check to be +executed in both DEBUG and non-DEBUG builds, and cause the hypervisor +to crash if it's violated. + +In selecting which response to use, we want to achieve several goals: + +- To minimize risk of introducing security vulnerabilities, + particularly as the code evolves over time + +- To efficiently spend programmer time + +- To detect violations of assumptions as early as possible + +- To minimize the impact of bugs on production use cases + +The guidelines above attempt to balance these: + +- When the caller is expecting to handle errors, and there are no +broken state at the time the unexpected condition is discovered, or +when fixing the state is straightforward, then fixing up the state and +returning an error is the most robust thing to do. However, if the +caller isn't expecting to handle errors, or if the state is difficult +to fix, then returning an error may require extensive refactoring, +which is not a good use of programmer time when they're certain that +this condition cannot occur. + +- BUG_ON() will stop all hypervisor action immediately. In situations +where continuing might allow an attacker to escalate privilege, a +BUG_ON() can change a privilege escalation or information leak into a +denial-of-service (an improvement). But in situations where +continuing (say, returning an error) might be safe, then BUG_ON() can +change a benign failure into denial-of-service (a degradation) + +- ASSERT() will stop the hypervisor during development, but allow +hypervisor action to continue during production. In situations where +continuing will at worst result in a denial-of-service, and at best +may have little effect other than perhaps quirky behavior, using an +ASSERT() will allow violation of assumptions to be detected as soon as +possible, while not causing undue degradation in production +hypervisors. However, in situations where continuing could cause +privilege escalation or information leaks, using an ASSERT() can +introduce security vulnerabilities.
It's not always clear what the best way is to handle unexpected conditions: whether with ASSERT(), domain_crash(), BUG_ON(), or some other method. All methods have a risk of introducing security vulnerabilities and unnecessary instabilities to production systems. Provide guidelines for different options and when to use them. Signed-off-by: George Dunlap <george.dunlap@citrix.com> --- v2: - Clarify meaning of "or" clause - Add domain_crash as an option - Make it clear that ASSERT() is not an error handling mechanism. CC: Ian Jackson <ian.jackson@citrix.com> CC: Wei Liu <wl@xen.org> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Konrad Wilk <konrad.wilk@oracle.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien.grall@arm.com> --- CODING_STYLE | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+)