Message ID | 20221019194159.2923873-1-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3,1/1] vsprintf: protect kernel from panic due to non-canonical pointer dereference | expand |
On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote: > Having stepped on a local kernel bug where reading sysfs has led to > out-of-bound pointer dereference by vsprintf() which led to GPF panic. > And the reason for GPF is that the OOB pointer was turned to a > non-canonical address such as 0x7665645f63616465. > > vsprintf() already has this line of defense > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > return "(efault)"; > Since a non-canonical pointer can be detected by kern_addr_valid() > on architectures that present VM holes as well as meaningful > implementation of kern_addr_valid() that detects the non-canonical > addresses, this patch adds a check on non-canonical string pointer by > kern_addr_valid() and "(efault)" to alert user that something > is wrong instead of unecessarily panic the server. > > On the other hand, if the non-canonical string pointer is dereferenced > else where in the kernel, by virtue of being non-canonical, a crash > is expected to be immediate. What if there is no other dereference except the one happened in printf()? Just to point out here, that I formally NAKed this on the basis that NULL and error pointers are special, for the bogus pointers we need crash ASAP, no matter what the code issues it. I.o.w. printf() is not special for that kind of pointers (i.e. bogus pointers, but not special).
On 19/10/2022 21.41, Jane Chu wrote: > Having stepped on a local kernel bug where reading sysfs has led to > out-of-bound pointer dereference by vsprintf() which led to GPF panic. Just to be completely clear, the out-of-bounds dereference did not happen in vsprintf if I understand your description right. Essentially you have an array of char* pointers, and you accessed beyond that array, where of course some random memory contents then turned out not to be a real pointer, and that bogus pointer value was passed into vsprintf() as a %s argument. > And the reason for GPF is that the OOB pointer was turned to a > non-canonical address such as 0x7665645f63616465. That's ved_cade , or more properly edac_dev ... > > vsprintf() already has this line of defense > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > return "(efault)"; > Since a non-canonical pointer can be detected by kern_addr_valid() > on architectures that present VM holes as well as meaningful > implementation of kern_addr_valid() that detects the non-canonical > addresses, this patch adds a check on non-canonical string pointer by > kern_addr_valid() and "(efault)" to alert user that something > is wrong instead of unecessarily panic the server. > > On the other hand, if the non-canonical string pointer is dereferenced > else where in the kernel, by virtue of being non-canonical, a crash > is expected to be immediate. I'm with Andy on this one, we don't add random checks like this in the kernel, not in vsprintf or elsewhere. check_pointer_msg is/was actually more about checking the various %p<foo> extensions, where it is (more) expected that somebody does struct foo *f = get_a_foo(); pr_debug("got %pfoo\n", f); if (IS_ERR(f)) { ... } [possibly in a not so obvious path], and the PAGE_SIZE check is similarly for cases where the "base" pointer is actually NULL but what is passed is &f->member. Rasmus
On Wed 2022-10-19 13:41:59, Jane Chu wrote: > Having stepped on a local kernel bug where reading sysfs has led to > out-of-bound pointer dereference by vsprintf() which led to GPF panic. > And the reason for GPF is that the OOB pointer was turned to a > non-canonical address such as 0x7665645f63616465. > > vsprintf() already has this line of defense > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > return "(efault)"; > Since a non-canonical pointer can be detected by kern_addr_valid() > on architectures that present VM holes as well as meaningful > implementation of kern_addr_valid() that detects the non-canonical > addresses, this patch adds a check on non-canonical string pointer by > kern_addr_valid() and "(efault)" to alert user that something > is wrong instead of unecessarily panic the server. > > On the other hand, if the non-canonical string pointer is dereferenced > else where in the kernel, by virtue of being non-canonical, a crash > is expected to be immediate. Just for record, this patch is going to be abandoned. Some reasons are mentioned in this thread. Others are in the threads for previous versions, see https://lore.kernel.org/r/20221017194447.2579441-1-jane.chu@oracle.com https://lore.kernel.org/r/20221017191611.2577466-1-jane.chu@oracle.com Best Regards, Petr
On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote: > On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote: > > Having stepped on a local kernel bug where reading sysfs has led to > > out-of-bound pointer dereference by vsprintf() which led to GPF panic. > > And the reason for GPF is that the OOB pointer was turned to a > > non-canonical address such as 0x7665645f63616465. > > > > vsprintf() already has this line of defense > > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > return "(efault)"; > > Since a non-canonical pointer can be detected by kern_addr_valid() > > on architectures that present VM holes as well as meaningful > > implementation of kern_addr_valid() that detects the non-canonical > > addresses, this patch adds a check on non-canonical string pointer by > > kern_addr_valid() and "(efault)" to alert user that something > > is wrong instead of unecessarily panic the server. > > > > On the other hand, if the non-canonical string pointer is dereferenced > > else where in the kernel, by virtue of being non-canonical, a crash > > is expected to be immediate. > > What if there is no other dereference except the one happened in printf()? > > Just to point out here, that I formally NAKed this on the basis that NULL > and error pointers are special, for the bogus pointers we need crash ASAP, > no matter what the code issues it. I.o.w. printf() is not special for that > kind of pointers (i.e. bogus pointers, but not special). Hey Andy, Do we want to have user space programs crash the kernel? This patch leads to making the kernel more harden so that we do not crash when there are bugs but continue on. Would we not want that experience for users ? > > -- > With Best Regards, > Andy Shevchenko > >
On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote: > > On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote: > > > Having stepped on a local kernel bug where reading sysfs has led to > > > out-of-bound pointer dereference by vsprintf() which led to GPF panic. > > > And the reason for GPF is that the OOB pointer was turned to a > > > non-canonical address such as 0x7665645f63616465. > > > > > > vsprintf() already has this line of defense > > > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > > return "(efault)"; > > > Since a non-canonical pointer can be detected by kern_addr_valid() > > > on architectures that present VM holes as well as meaningful > > > implementation of kern_addr_valid() that detects the non-canonical > > > addresses, this patch adds a check on non-canonical string pointer by > > > kern_addr_valid() and "(efault)" to alert user that something > > > is wrong instead of unecessarily panic the server. > > > > > > On the other hand, if the non-canonical string pointer is dereferenced > > > else where in the kernel, by virtue of being non-canonical, a crash > > > is expected to be immediate. > > > > What if there is no other dereference except the one happened in printf()? > > > > Just to point out here, that I formally NAKed this on the basis that NULL > > and error pointers are special, for the bogus pointers we need crash ASAP, > > no matter what the code issues it. I.o.w. printf() is not special for that > > kind of pointers (i.e. bogus pointers, but not special). > > Hey Andy, > > Do we want to have user space programs crash the kernel? > > This patch leads to making the kernel more harden so that we do > not crash when there are bugs but continue on. Fine, how to push a user to report a bug in the kernel if for them there is no bug? OK, let's assume user recognizes this as a bug, what should they do in order to provide a better description of the bug, so developer can easily debug and fix it? > Would we not want that experience for users ? Yes, if it is a bug in the kernel we want to know it with all possible details. Hiding bugs is a way to nowhere.
On Thu 2022-10-20 19:03:23, Andy Shevchenko wrote: > On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote: > > > On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote: > > > > Having stepped on a local kernel bug where reading sysfs has led to > > > > out-of-bound pointer dereference by vsprintf() which led to GPF panic. > > > > And the reason for GPF is that the OOB pointer was turned to a > > > > non-canonical address such as 0x7665645f63616465. > > > > > > > > vsprintf() already has this line of defense > > > > if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > > > return "(efault)"; > > > > Since a non-canonical pointer can be detected by kern_addr_valid() > > > > on architectures that present VM holes as well as meaningful > > > > implementation of kern_addr_valid() that detects the non-canonical > > > > addresses, this patch adds a check on non-canonical string pointer by > > > > kern_addr_valid() and "(efault)" to alert user that something > > > > is wrong instead of unecessarily panic the server. > > > > > > > > On the other hand, if the non-canonical string pointer is dereferenced > > > > else where in the kernel, by virtue of being non-canonical, a crash > > > > is expected to be immediate. > > > > > > What if there is no other dereference except the one happened in printf()? > > > > > > Just to point out here, that I formally NAKed this on the basis that NULL > > > and error pointers are special, for the bogus pointers we need crash ASAP, > > > no matter what the code issues it. I.o.w. printf() is not special for that > > > kind of pointers (i.e. bogus pointers, but not special). > > > > Hey Andy, > > > > Do we want to have user space programs crash the kernel? > > > > This patch leads to making the kernel more harden so that we do > > not crash when there are bugs but continue on. > > Fine, how to push a user to report a bug in the kernel if for them > there is no bug? > > OK, let's assume user recognizes this as a bug, what should they do in order > to provide a better description of the bug, so developer can easily debug > and fix it? WARN() would provide similar information as panic() without actually crashing the kernel. > > Would we not want that experience for users ? > > Yes, if it is a bug in the kernel we want to know it with all possible details. > Hiding bugs is a way to nowhere. I agree but we should always distinguish between fatal problems where the system could hardly continue working and unexpected behavior that is not critical. Many error code paths handle unexpected situations. Some problems are caused by users and some by bugs in the code. The kernel could always refuse doing some operation rather than crash. People will report it because it does not work. And there are non-destructive ways how to show useful debugging information. Best Regards, Petr
On Tue, Oct 25, 2022 at 10:40:37AM +0200, Petr Mladek wrote: > On Thu 2022-10-20 19:03:23, Andy Shevchenko wrote: > > On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote: ... > > OK, let's assume user recognizes this as a bug, what should they do in order > > to provide a better description of the bug, so developer can easily debug > > and fix it? > > WARN() would provide similar information as panic() without actually > crashing the kernel. Unless one provides panic_on_warn (or how is it called?). > > > Would we not want that experience for users ? > > > > Yes, if it is a bug in the kernel we want to know it with all possible details. > > Hiding bugs is a way to nowhere. > > I agree but we should always distinguish between fatal problems where > the system could hardly continue working and unexpected behavior that > is not critical. > > Many error code paths handle unexpected situations. Some problems are > caused by users and some by bugs in the code. The kernel could always > refuse doing some operation rather than crash. People will report > it because it does not work. And there are non-destructive ways how > to show useful debugging information. Initially, if I understand correctly, the idea of that check was exactly to guard against special pointers (NULL and error). Now this is getting wider and I'm not sure hiding a crash is good thing to go. Hypothetical situation: the "invalid" pointer is just one that gets LSB shuffled a bit (some of the frameworks use lower bits to keep some information there). That said, kernel is not going to crash elsewhere. How user will know that unmasked pointer went to the printf()? I honestly think that this or similar change will bring more harm than help.
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c414a8d9f1ea..b38c12ef1e45 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -698,6 +698,9 @@ static const char *check_pointer_msg(const void *ptr) if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) return "(efault)"; + if (!kern_addr_valid((unsigned long)ptr)) + return "(efault)"; + return NULL; }
Having stepped on a local kernel bug where reading sysfs has led to out-of-bound pointer dereference by vsprintf() which led to GPF panic. And the reason for GPF is that the OOB pointer was turned to a non-canonical address such as 0x7665645f63616465. vsprintf() already has this line of defense if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) return "(efault)"; Since a non-canonical pointer can be detected by kern_addr_valid() on architectures that present VM holes as well as meaningful implementation of kern_addr_valid() that detects the non-canonical addresses, this patch adds a check on non-canonical string pointer by kern_addr_valid() and "(efault)" to alert user that something is wrong instead of unecessarily panic the server. On the other hand, if the non-canonical string pointer is dereferenced else where in the kernel, by virtue of being non-canonical, a crash is expected to be immediate. Signed-off-by: Jane Chu <jane.chu@oracle.com> --- lib/vsprintf.c | 3 +++ 1 file changed, 3 insertions(+)