Message ID | 20191018094304.37056-6-glider@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add KernelMemorySanitizer infrastructure | expand |
On Fri 2019-10-18 11:42:43, glider@google.com wrote: > Petr Mladek suggested we use > this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK > instead of checking the spinlock status in kmsan_pr_err() I would like to understand why the check is needed. My guess is that it prevents a infinite recursion. Is this the case? It might be possible to debug this using trace_printk(). It is important. If it is the recursion then there are two possibilities how to prevent it. Either prevent calling the recursive printk(). Or prevent calling the memory checker recursive. I am not sure what makes more sense. Is printk() the only piece of code where you need to avoid recursion? I wonder how it is prevented in the other cases. > This appears to be less intrusive, although we'll need to declare > printk_context in some printk header. It is easier than the original approach. But the main motivation is that it is more reliable. The spinlock is a global lock. But it seems that it is enough to check the state of the local CPU. Finally, could you please CC me to the patch(es) that are using this variable? I would actually prefer to be in CC of entire patchsets so that I see the full context. Best Regards, Petr
On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote: > > On Fri 2019-10-18 11:42:43, glider@google.com wrote: > > Petr Mladek suggested we use > > this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK > > instead of checking the spinlock status in kmsan_pr_err() > > I would like to understand why the check is needed. > > My guess is that it prevents a infinite recursion. > Is this the case? It might be possible to debug this using > trace_printk(). This indeed used to prevent infinite recursion, however looks like the check isn't needed anymore. When I remove it, KMSAN doesn't hang, probably because printk is doing a better job at deadlock prevention. What I'm seeing now is that e.g. in the following case: ptr = kmalloc(sizeof(int), GFP_KERNEL); if (ptr) pr_info("true\n"); else pr_info("false\n"); KMSAN detects errors within pr_info(), namely in vprintk_store(). If I understand correctly, printing from that point causes printk to use the per-cpu buffer, which is flushed once we're done with the original printing: [ 58.984971][ T8390] BUG: KMSAN: uninit-value in kmsan_handle_vprintk+0xa0/0xf0 ... [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 ... [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 ... [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 ... [ 59.063338][ C0] Lost 6207 message(s)! So it turns out we'd better disable reporting under logbuf_lock, otherwise these lost messages will confuse the developers. Because the first pr_info() isn't called by KMSAN itself, the tool still needs a way to know it's running inside printk. > It is important. If it is the recursion then there are > two possibilities how to prevent it. Either prevent > calling the recursive printk(). Or prevent calling > the memory checker recursive. I am not sure what makes > more sense. > > Is printk() the only piece of code where you need to avoid > recursion? I wonder how it is prevented in the other cases. > > > This appears to be less intrusive, although we'll need to declare > > printk_context in some printk header. > > It is easier than the original approach. But the main motivation > is that it is more reliable. The spinlock is a global lock. > But it seems that it is enough to check the state of the local > CPU. > > Finally, could you please CC me to the patch(es) that are using > this variable? I would actually prefer to be in CC of entire > patchsets so that I see the full context. > > Best Regards, > Petr
On Wed, Oct 23, 2019 at 7:57 PM Alexander Potapenko <glider@google.com> wrote: > > On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote: > > > > On Fri 2019-10-18 11:42:43, glider@google.com wrote: > > > Petr Mladek suggested we use > > > this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK > > > instead of checking the spinlock status in kmsan_pr_err() > > > > I would like to understand why the check is needed. > > > > My guess is that it prevents a infinite recursion. > > Is this the case? It might be possible to debug this using > > trace_printk(). > This indeed used to prevent infinite recursion, however looks like the > check isn't needed anymore. > When I remove it, KMSAN doesn't hang, probably because printk is doing > a better job at deadlock prevention. > > What I'm seeing now is that e.g. in the following case: > ptr = kmalloc(sizeof(int), GFP_KERNEL); > if (ptr) > pr_info("true\n"); > else > pr_info("false\n"); > > KMSAN detects errors within pr_info(), namely in vprintk_store(). > If I understand correctly, printing from that point causes printk to > use the per-cpu buffer, which is flushed once we're done with the > original printing: > > [ 58.984971][ T8390] BUG: KMSAN: uninit-value in > kmsan_handle_vprintk+0xa0/0xf0 > ... > [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 > ... > [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > ... > [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > ... > [ 59.063338][ C0] Lost 6207 message(s)! > > So it turns out we'd better disable reporting under logbuf_lock, > otherwise these lost messages will confuse the developers. > Because the first pr_info() isn't called by KMSAN itself, the tool > still needs a way to know it's running inside printk. > > > It is important. If it is the recursion then there are > > two possibilities how to prevent it. Either prevent > > calling the recursive printk(). Or prevent calling > > the memory checker recursive. I am not sure what makes > > more sense. > > > > Is printk() the only piece of code where you need to avoid > > recursion? I wonder how it is prevented in the other cases. > > > > > This appears to be less intrusive, although we'll need to declare > > > printk_context in some printk header. > > > > It is easier than the original approach. But the main motivation > > is that it is more reliable. The spinlock is a global lock. > > But it seems that it is enough to check the state of the local > > CPU. > > > > Finally, could you please CC me to the patch(es) that are using > > this variable? I would actually prefer to be in CC of entire > > patchsets so that I see the full context. I am very sorry about this. I'm going to re-send the patches soon and CC everyone on every patch. What I thought to be reducing the email burden on every reviewer turned out to be really inconvenient. > > Best Regards, > > Petr > > > > -- > Alexander Potapenko > Software Engineer > > Google Germany GmbH > Erika-Mann-Straße, 33 > 80636 München > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg
On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote: > On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote: > > > > On Fri 2019-10-18 11:42:43, glider@google.com wrote: > > > Petr Mladek suggested we use > > > this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK > > > instead of checking the spinlock status in kmsan_pr_err() > > > > I would like to understand why the check is needed. > > > > My guess is that it prevents a infinite recursion. > > Is this the case? It might be possible to debug this using > > trace_printk(). > This indeed used to prevent infinite recursion, however looks like the > check isn't needed anymore. > When I remove it, KMSAN doesn't hang, probably because printk is doing > a better job at deadlock prevention. Great. > What I'm seeing now is that e.g. in the following case: > ptr = kmalloc(sizeof(int), GFP_KERNEL); > if (ptr) > pr_info("true\n"); > else > pr_info("false\n"); > > KMSAN detects errors within pr_info(), namely in vprintk_store(). > If I understand correctly, printing from that point causes printk to > use the per-cpu buffer, which is flushed once we're done with the > original printing: > > [ 58.984971][ T8390] BUG: KMSAN: uninit-value in > kmsan_handle_vprintk+0xa0/0xf0 > ... > [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 > ... > [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > ... > [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 Please, do you have an explanation where the uninitialized values come from? Is it a false positive? Or is there really a bug how the printk() parameters are passed? > ... > [ 59.063338][ C0] Lost 6207 message(s)! This message comes from report_message_lost(). It means that many recursive messages did not fit into the limited per-CPU buffers. > So it turns out we'd better disable reporting under logbuf_lock, > otherwise these lost messages will confuse the developers. > Because the first pr_info() isn't called by KMSAN itself, the tool > still needs a way to know it's running inside printk. OK. Well, I still would like to understand why the recursive messages are printed at all. Is it a bug in the printk() code or a false positive? There should be no recursive messages if we fix the bug or make some annotations to prevent the false positives. That said, I am not completely against disabling the recursive messages. But it would be pity to hide some real bugs. Best Regards, Petr
On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote: > > On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote: > > On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote: > > > > > > On Fri 2019-10-18 11:42:43, glider@google.com wrote: > > > > Petr Mladek suggested we use > > > > this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK > > > > instead of checking the spinlock status in kmsan_pr_err() > > > > > > I would like to understand why the check is needed. > > > > > > My guess is that it prevents a infinite recursion. > > > Is this the case? It might be possible to debug this using > > > trace_printk(). > > This indeed used to prevent infinite recursion, however looks like the > > check isn't needed anymore. > > When I remove it, KMSAN doesn't hang, probably because printk is doing > > a better job at deadlock prevention. > > Great. > > > What I'm seeing now is that e.g. in the following case: > > ptr = kmalloc(sizeof(int), GFP_KERNEL); > > if (ptr) > > pr_info("true\n"); > > else > > pr_info("false\n"); > > > > KMSAN detects errors within pr_info(), namely in vprintk_store(). > > If I understand correctly, printing from that point causes printk to > > use the per-cpu buffer, which is flushed once we're done with the > > original printing: > > > > [ 58.984971][ T8390] BUG: KMSAN: uninit-value in > > kmsan_handle_vprintk+0xa0/0xf0 > > ... > > [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 > > ... > > [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > ... > > [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > Please, do you have an explanation where the uninitialized values come > from? Is it a false positive? Or is there really a bug how the > printk() parameters are passed? I believe these are true bugs. The problem is, when we pass an uninitialized argument into printk(), KMSAN will report an error every time this uninitialized argument is used. E.g. for an uninitialized format string there'll be at least strlen(fmt) reports in format_decode(), followed by several reports in vsnprintf(). Although these reports seem to be real, printing only the first of them should be more than enough. In the future we'll actually want KMSAN to check every printk() argument (which will require parsing the format string to figure out the arguments' lengths), but disable reports within printk. > > ... > > [ 59.063338][ C0] Lost 6207 message(s)! > > This message comes from report_message_lost(). It means that many > recursive messages did not fit into the limited per-CPU buffers. > > > So it turns out we'd better disable reporting under logbuf_lock, > > otherwise these lost messages will confuse the developers. > > Because the first pr_info() isn't called by KMSAN itself, the tool > > still needs a way to know it's running inside printk. > > OK. Well, I still would like to understand why the recursive messages > are printed at all. Is it a bug in the printk() code or a false positive? > > There should be no recursive messages if we fix the bug or make > some annotations to prevent the false positives. > > That said, I am not completely against disabling the recursive > messages. But it would be pity to hide some real bugs. > > Best Regards, > Petr -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
On Mon 2019-10-28 14:09:29, Alexander Potapenko wrote: > On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote: > > On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote: > > > What I'm seeing now is that e.g. in the following case: > > > ptr = kmalloc(sizeof(int), GFP_KERNEL); > > > if (ptr) > > > pr_info("true\n"); > > > else > > > pr_info("false\n"); > > > > > > KMSAN detects errors within pr_info(), namely in vprintk_store(). > > > If I understand correctly, printing from that point causes printk to > > > use the per-cpu buffer, which is flushed once we're done with the > > > original printing: > > > > > > [ 58.984971][ T8390] BUG: KMSAN: uninit-value in > > > kmsan_handle_vprintk+0xa0/0xf0 > > > ... > > > [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 > > > ... > > > [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > > ... > > > [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > > > Please, do you have an explanation where the uninitialized values come > > from? Is it a false positive? Or is there really a bug how the > > printk() parameters are passed? > I believe these are true bugs. > The problem is, when we pass an uninitialized argument into printk(), > KMSAN will report an error every time this uninitialized argument is > used. I see, thanks for explanation. > E.g. for an uninitialized format string there'll be at least > strlen(fmt) reports in format_decode(), followed by several reports in > vsnprintf(). > Although these reports seem to be real, printing only the first of > them should be more than enough. Isn't this a generic problem? I mean that uninitialized values can be passed and used also in many other locations. printk() is special because this problem causes infinite loop. But it would make sense to report also any other problematic value only once. > In the future we'll actually want KMSAN to check every printk() > argument (which will require parsing the format string to figure out > the arguments' lengths), but disable reports within printk. What is the motivation for this, please? It looks to me that you want to do very paranoid checks of variables passed to printk()? Do you want to prevent printk() crashes? Or do you want to make sure that printk() produces correct values? From my POV, printk() is debugging tool. It is used to show values that people are interested in. On one hand, it might make sense to warn people that a particular value was not initalized. On the other hand, printk() is not important for the kernel behavior. It just reads values and does not affect any behavior. I would like to understand how many printk-related code is worth the effort. Best Regards, Petr
On Tue, Oct 29, 2019 at 1:02 PM Petr Mladek <pmladek@suse.com> wrote: > > On Mon 2019-10-28 14:09:29, Alexander Potapenko wrote: > > On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote: > > > On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote: > > > > What I'm seeing now is that e.g. in the following case: > > > > ptr = kmalloc(sizeof(int), GFP_KERNEL); > > > > if (ptr) > > > > pr_info("true\n"); > > > > else > > > > pr_info("false\n"); > > > > > > > > KMSAN detects errors within pr_info(), namely in vprintk_store(). > > > > If I understand correctly, printing from that point causes printk to > > > > use the per-cpu buffer, which is flushed once we're done with the > > > > original printing: > > > > > > > > [ 58.984971][ T8390] BUG: KMSAN: uninit-value in > > > > kmsan_handle_vprintk+0xa0/0xf0 > > > > ... > > > > [ 59.061976][ C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0 > > > > ... > > > > [ 59.062457][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > > > ... > > > > [ 59.062961][ C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900 > > > > > > Please, do you have an explanation where the uninitialized values come > > > from? Is it a false positive? Or is there really a bug how the > > > printk() parameters are passed? > > I believe these are true bugs. > > The problem is, when we pass an uninitialized argument into printk(), > > KMSAN will report an error every time this uninitialized argument is > > used. > > I see, thanks for explanation. > > > E.g. for an uninitialized format string there'll be at least > > strlen(fmt) reports in format_decode(), followed by several reports in > > vsnprintf(). > > Although these reports seem to be real, printing only the first of > > them should be more than enough. > > Isn't this a generic problem? I mean that uninitialized values > can be passed and used also in many other locations. More or less, yes. > printk() is special because this problem causes infinite loop. But it > would make sense to report also any other problematic value only once. Yes, printk is a bit different, because finding errors in it causes other printk() calls which now don't cause infinite loops, but may produce out-of-order error messages. > > > In the future we'll actually want KMSAN to check every printk() > > argument (which will require parsing the format string to figure out > > the arguments' lengths), but disable reports within printk. > > What is the motivation for this, please? > > It looks to me that you want to do very paranoid checks of variables > passed to printk()? Do you want to prevent printk() crashes? Or > do you want to make sure that printk() produces correct values? Simply passing an uninitialized value to printk() may result in cryptic error messages from printk guts, which may be harder to understand without knowing how printk works. The idea is to check the arguments right before printk, so that we can print a really simple diagnostic message. > From my POV, printk() is debugging tool. It is used to show values > that people are interested in. On one hand, it might make sense to warn > people that a particular value was not initalized. On the other hand, > printk() is not important for the kernel behavior. It just > reads values and does not affect any behavior. > > I would like to understand how many printk-related code is > worth the effort. I think overall you're right. The feature I'm talking about isn't a critical part of KMSAN functionality, so in order to keep the code simple I'd better drop it, as long as we're able to report bugs when uninitialized memory is passed to printk. The only change to printk that I'll probably still have to make is to initialize the result of vscnprintf() here: https://github.com/google/kmsan/blob/master/kernel/printk/printk.c#L1921 Passing uninitialized data to vscnprintf() causes its result to also be uninitialized, which causes an avalanche of new reports in vprintk_store() I'll send the updated patch (together with other KMSAN patches) your way today. > Best Regards, > Petr
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index b4045e782743..5ebfc8bc5693 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -42,7 +42,7 @@ struct printk_safe_seq_buf { }; static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); -static DEFINE_PER_CPU(int, printk_context); +DEFINE_PER_CPU(int, printk_context); #ifdef CONFIG_PRINTK_NMI static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
Petr Mladek suggested we use this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK instead of checking the spinlock status in kmsan_pr_err() This appears to be less intrusive, although we'll need to declare printk_context in some printk header. Signed-off-by: Alexander Potapenko <glider@google.com> To: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: linux-mm@kvack.org --- Change-Id: Idb5e7c7cedf6ab6ba635a64423d58d702ad3eca3 --- kernel/printk/printk_safe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)