Message ID | 201707082230.ECB51545.JtFFFVHOOSMLOQ@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, On (07/08/17 22:30), Tetsuo Handa wrote: > Hmm... should we consider addressing console_sem problem before > introducing printing kernel thread and offloading to that kernel > thread? printk-kthread addresses a completely different set of problems. console_sem is hard to fix quickly, because it involves rework of tty/fbcon/drm/etc/etc/etc sub-systems; printk is the easiest part here... -ss
On Sat 2017-07-08 22:30:47, Tetsuo Handa wrote: > What I want to mention here is that messages which were sent to printk() > were not printed to not only /dev/tty0 but also /dev/ttyS0 (I'm passing > "console=ttyS0,115200n8 console=tty0" to kernel command line.) I don't care > if output to /dev/tty0 is delayed, but I expect that output to /dev/ttyS0 > is not delayed, for I'm anayzing things using printk() output sent to serial > console (serial.log in my VMware configuration). Hitting this problem when we > cannot allocate memory results in failing to save printk() output. Oops, it > is sad. Would it be acceptable to remove "console=tty0" parameter and push the messages only to the serial console? Also there is the patchset from Peter Zijlstra that allows to use early console all the time, see https://lkml.kernel.org/r/20161018170830.405990950@infradead.org The current code flushes each line to all enabled consoles one by one. If there is a deadlock in one console, everything gets blocked. We are trying to make printk() more robust. But it is much more complicated than we anticipated. Many changes open another can of worms. It seems to be a job for years. > Hmm... should we consider addressing console_sem problem before > introducing printing kernel thread and offloading to that kernel thread? As Sergey said, the console rework seems to be much bigger task than introducing the kthread. Also if we would want to handle each console separately (as a fallback) it would be helpful to have separate kthread for each enabled console or for the less reliable consoles at least. Best Regards, Petr
On Mon, Jul 10, 2017 at 2:59 PM, Petr Mladek <pmladek@suse.com> wrote: > On Sat 2017-07-08 22:30:47, Tetsuo Handa wrote: >> What I want to mention here is that messages which were sent to printk() >> were not printed to not only /dev/tty0 but also /dev/ttyS0 (I'm passing >> "console=ttyS0,115200n8 console=tty0" to kernel command line.) I don't care >> if output to /dev/tty0 is delayed, but I expect that output to /dev/ttyS0 >> is not delayed, for I'm anayzing things using printk() output sent to serial >> console (serial.log in my VMware configuration). Hitting this problem when we >> cannot allocate memory results in failing to save printk() output. Oops, it >> is sad. > > Would it be acceptable to remove "console=tty0" parameter and push > the messages only to the serial console? > > Also there is the patchset from Peter Zijlstra that allows to > use early console all the time, see > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org > > > The current code flushes each line to all enabled consoles one > by one. If there is a deadlock in one console, everything > gets blocked. > > We are trying to make printk() more robust. But it is much more > complicated than we anticipated. Many changes open another can > of worms. It seems to be a job for years. > > >> Hmm... should we consider addressing console_sem problem before >> introducing printing kernel thread and offloading to that kernel thread? > > As Sergey said, the console rework seems to be much bigger task > than introducing the kthread. > > Also if we would want to handle each console separately (as a > fallback) it would be helpful to have separate kthread for each > enabled console or for the less reliable consoles at least. Since the console-loggin-in-kthread comes up routinely, and equally often people say "but I dont want to make my serial console delayed": Should we make kthread-based printk a per-console opt-in? fbcon and other horror shows with deep nesting of entire subsystems and their locking hierarchy would do that. Truly simple console drivers like serial or maybe logging to some firmware/platform service for recovery after rebooting would not. Of course we'd also need one kthread per console, and we'd need to have at least some per-console locking (plus an overall console lock on top for both registering/unregistering consoles and all the legacy users like fbdev that need much more work to untangle). We could even restrict the per-console locking (i.e. those which can go ahead while someone else is holding the main or other console_locks) just for those console drivers which do not use a kthread, to cut down the audit burden to something manageable. Just my 2 cents, thrown in from the sideline. -Daniel
Hello, On (07/10/17 20:07), Daniel Vetter wrote: [..] > > Would it be acceptable to remove "console=tty0" parameter and push > > the messages only to the serial console? > > > > Also there is the patchset from Peter Zijlstra that allows to > > use early console all the time, see > > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org > > > > > > The current code flushes each line to all enabled consoles one > > by one. If there is a deadlock in one console, everything > > gets blocked. > > > > We are trying to make printk() more robust. But it is much more > > complicated than we anticipated. Many changes open another can > > of worms. It seems to be a job for years. > > > > > >> Hmm... should we consider addressing console_sem problem before > >> introducing printing kernel thread and offloading to that kernel thread? > > > > As Sergey said, the console rework seems to be much bigger task > > than introducing the kthread. > > > > Also if we would want to handle each console separately (as a > > fallback) it would be helpful to have separate kthread for each > > enabled console or for the less reliable consoles at least. > > Since the console-loggin-in-kthread comes up routinely, and equally > often people say "but I dont want to make my serial console delayed": > Should we make kthread-based printk a per-console opt-in? fbcon and > other horror shows with deep nesting of entire subsystems and their > locking hierarchy would do that. Truly simple console drivers like > serial or maybe logging to some firmware/platform service for recovery > after rebooting would not. > > Of course we'd also need one kthread per console, and we'd need to > have at least some per-console locking (plus an overall console lock > on top for both registering/unregistering consoles and all the legacy > users like fbdev that need much more work to untangle). We could even > restrict the per-console locking (i.e. those which can go ahead while > someone else is holding the main or other console_locks) just for > those console drivers which do not use a kthread, to cut down the > audit burden to something manageable. > > Just my 2 cents, thrown in from the sideline. (replying to both Petr and Daniel) interesting direction, gents. and this is what I thought about over the weekend; it's very sketchy and I didn't spend too much time on it. (I'm on a sick leave now, sorry). it's quite close to what you guys have mentioned above. a) keep console_sem only to protect console drivers list modification b) add a semaphore/mutex to struct console c) move global console_seq/etc to struct console e) use a single kthread for printing, but do console_unlock() multi passes, printing unseen logbuf messages on per-console basis so console_lock()/console_unlock() will simply protect console drivers list from concurrent manipulation; it will not prevent us from printing. now, there are places where console_lock() serves a special purpose - it makes sure that no new lines are printed to the console while we scroll it/flip it/etc. IOW while we do "some things" to a particular console. the problem here, is that this also blocks printing to all of the registered console drivers, not just the one we are touching now. therefore, what I was thinking about is to disable/enable that particular console in all of the places where we really want to stop printing to this console for a bit. IOW, something like console_lock() : down(console_sem); console_disable(con) : lock(con->lock); : con->flags &= ~CON_ENABLED; : unlock(con->lock) console_unlock() : for_each_console(con) : while (con->console_seq != log_next_seq) { : msg_print_text(); : con->console_seq++; : : call_console_drivers() : : if (con->flags & CON_ENABLED) : : con->write() : } : up(console_sem); // do "some things" to this console. it's disabled, so no // ->write() callback would be called in the meantime console_lock() : down(console_sem); console_enable(con) : lock(con->lock); : con->flags |= CON_ENABLED; : unlock(con->lock) // so now we enabled that console again. it's ->console_seq is // probably behind the rest of consoles, so console_unlock() // will ->write() all the unseen message to this console. console_unlock() : for_each_console(con) : while (con->console_seq != log_next_seq) { : msg_print_text(); : con->console_seq++; : : call_console_drivers() : : if (con->flags & CON_ENABLED) : : con->write() : } : up(console_sem); so this does change the behavior. may be even a lot. consoles now will not look the same, in some cases: some consoles can be ahead, some can be behind (as long as CON_ENABLED bit is cleared for the "do some things" part). and this requires a number of changes in fb/tty/etc code. not just shuffling of console_lock()/console_unlock() calls, but also console_disable()/console_enable() calls... and we need to pass struct console to console_disable()/console_enable()... another thing is, ideally, only !CON_ENABLED consoles will now see "dropped messages". if some particular console is !CON_ENABLED for long time, then well, just like it happens now, logbuf may run out of space and we will drop potentially unseen messages. but with this change, we will drop messages only on !CON_ENABLED consoles. if there are CON_ENABLED console(-s), we will print logbuf messages to those consoles. so may be we have more chances saving the kernel logs now. just a sketch... -ss
On (07/11/17 11:31), Sergey Senozhatsky wrote: [..] > (replying to both Petr and Daniel) > > interesting direction, gents. > > and this is what I thought about over the weekend; it's very sketchy and > I didn't spend too much time on it. (I'm on a sick leave now, sorry). > > it's quite close to what you guys have mentioned above. > > a) keep console_sem only to protect console drivers list modification > b) add a semaphore/mutex to struct console > c) move global console_seq/etc to struct console > e) use a single kthread for printing, but do console_unlock() multi passes, > printing unseen logbuf messages on per-console basis > > > so console_lock()/console_unlock() will simply protect console drivers > list from concurrent manipulation; it will not prevent us from printing. > now, there are places where console_lock() serves a special purpose - it > makes sure that no new lines are printed to the console while we scroll > it/flip it/etc. IOW while we do "some things" to a particular console. > the problem here, is that this also blocks printing to all of the registered > console drivers, not just the one we are touching now. therefore, what I was > thinking about is to disable/enable that particular console in all of the > places where we really want to stop printing to this console for a bit. > > IOW, something like > > > > console_lock() > : down(console_sem); > > console_disable(con) > : lock(con->lock); > : con->flags &= ~CON_ENABLED; > : unlock(con->lock) > > console_unlock() > : for_each_console(con) > : while (con->console_seq != log_next_seq) { > : msg_print_text(); > : con->console_seq++; > : > : call_console_drivers() > : : if (con->flags & CON_ENABLED) > : : con->write() > : } > : up(console_sem); > > > // do "some things" to this console. it's disabled, so no > // ->write() callback would be called in the meantime > > console_lock() > : down(console_sem); > > console_enable(con) > : lock(con->lock); > : con->flags |= CON_ENABLED; > : unlock(con->lock) > > > // so now we enabled that console again. it's ->console_seq is > // probably behind the rest of consoles, so console_unlock() > // will ->write() all the unseen message to this console. > > console_unlock() > : for_each_console(con) > : while (con->console_seq != log_next_seq) { > : msg_print_text(); > : con->console_seq++; > : > : call_console_drivers() > : : if (con->flags & CON_ENABLED) > : : con->write() > : } > : up(console_sem); > ok, obviously stupid. I meant to hold con->lock between console_disable() and console_enable(). so no other CPU can unregister it, etc. printk->console_unlock(), thus, can either have a racy con->flags check (no con->lock taken) or try something like down_trylock(&con->lock): if it fails, continue. but need to look more. -ss
On Tue, Jul 11, 2017 at 01:57:10PM +0900, Sergey Senozhatsky wrote: > On (07/11/17 11:31), Sergey Senozhatsky wrote: > [..] > > (replying to both Petr and Daniel) > > > > interesting direction, gents. > > > > and this is what I thought about over the weekend; it's very sketchy and > > I didn't spend too much time on it. (I'm on a sick leave now, sorry). > > > > it's quite close to what you guys have mentioned above. > > > > a) keep console_sem only to protect console drivers list modification > > b) add a semaphore/mutex to struct console > > c) move global console_seq/etc to struct console > > e) use a single kthread for printing, but do console_unlock() multi passes, > > printing unseen logbuf messages on per-console basis > > > > > > so console_lock()/console_unlock() will simply protect console drivers > > list from concurrent manipulation; it will not prevent us from printing. > > now, there are places where console_lock() serves a special purpose - it > > makes sure that no new lines are printed to the console while we scroll > > it/flip it/etc. IOW while we do "some things" to a particular console. > > the problem here, is that this also blocks printing to all of the registered > > console drivers, not just the one we are touching now. therefore, what I was > > thinking about is to disable/enable that particular console in all of the > > places where we really want to stop printing to this console for a bit. > > > > IOW, something like > > > > > > > > console_lock() > > : down(console_sem); > > > > console_disable(con) > > : lock(con->lock); > > : con->flags &= ~CON_ENABLED; > > : unlock(con->lock) > > > > console_unlock() > > : for_each_console(con) > > : while (con->console_seq != log_next_seq) { > > : msg_print_text(); > > : con->console_seq++; > > : > > : call_console_drivers() > > : : if (con->flags & CON_ENABLED) > > : : con->write() > > : } > > : up(console_sem); > > > > > > // do "some things" to this console. it's disabled, so no > > // ->write() callback would be called in the meantime > > > > console_lock() > > : down(console_sem); > > > > console_enable(con) > > : lock(con->lock); > > : con->flags |= CON_ENABLED; > > : unlock(con->lock) > > > > > > // so now we enabled that console again. it's ->console_seq is > > // probably behind the rest of consoles, so console_unlock() > > // will ->write() all the unseen message to this console. > > > > console_unlock() > > : for_each_console(con) > > : while (con->console_seq != log_next_seq) { > > : msg_print_text(); > > : con->console_seq++; > > : > > : call_console_drivers() > > : : if (con->flags & CON_ENABLED) > > : : con->write() > > : } > > : up(console_sem); > > > > ok, obviously stupid. > > I meant to hold con->lock between console_disable() and console_enable(). > so no other CPU can unregister it, etc. printk->console_unlock(), thus, > can either have a racy con->flags check (no con->lock taken) or try > something like down_trylock(&con->lock): if it fails, continue. I don't think you need the CON_ENABLED flag, just holding the per-console lock should be enough (I hope). Or what exactly is the idea behind this. I'm also not sure whether dropping the main console_lock is a good idea. What I had in mind for the printk look is to not even hold the main console_lock, but only grab the individual console_locks (plus the printk buffer spinlock ofc), so for_each_console(con) if (!mutex_trylock(con->mutex)) continue; /* pseudo-code, whatever we need to check to make sure * this console is real and fully registered. */ if (!(con->flags & CON_ENABLED)) continue; if (con_requires_kthread(con)) { wake_up(con->printk_wq); /* this is for consoles that grab massive amounts * of locks, like fbcon. We could repurpose klogd * for this perhaps. */ continue; } /* do the actual printing business */ } Very rough pseudo-code draft without looking at the details. The things we'd need to do to get there: - Audit _all_ the places that use console_lock to protect global data structures. Excessively sprinkle lockdep_assert_held(&console_lock_map); over them to make sure we don't break stuff. We'll probably want to stuff that lockdep assert into for_each_console (and have a special/open-coded one for the printk loop). - Add con->mutex. Make sure that lock properly serializes against the last step in register_console and the first step in unregister_console. CON_ENABLED seems like the critical flag. - Wrap all call to console callbacks in a mutex_lock(con->mutex) critical sections. - Sprinkle lockdep_assert_held(con->mutex) into all the console callbacks of a few common console backends (fbcon + serial should be enough), to make sure that part is solid. - Do the above changes in the printk loop. It also needs to be extracted from console_unlock so that we can replace the if (console_try_lock()) console_unlock(); pattern for pushing out the printk buffer with maybe a new printk_flush() function, which does _not_ try to acquire console_lock. - console_unlock() still needs to flush out the printk buffers, to make sure we haven't lost any lines. Or we'll rely on klogd to ensure everything gets printed, when the trylock path doesn't work out. I think this would give us a reasonable locking design, allows us to not stall on slow consoles when trying to dump emergency output to serial, and it would decouple printk entirely from the huge console_lock mess. And as long as we carefully audit for global stuff (everywhere, not just in printk.c) in the first step I think it should be a safe transition. Cheers, Daniel
On (07/11/17 09:50), Daniel Vetter wrote: [..] > > ok, obviously stupid. > > > > I meant to hold con->lock between console_disable() and console_enable(). > > so no other CPU can unregister it, etc. printk->console_unlock(), thus, > > can either have a racy con->flags check (no con->lock taken) or try > > something like down_trylock(&con->lock): if it fails, continue. > > I don't think you need the CON_ENABLED flag, just holding the per-console > lock should be enough (I hope). Or what exactly is the idea behind this. > I'm also not sure whether dropping the main console_lock is a good idea. CON_ENABLED thing is completely broken, yes. my apologies. what I really wanted to think about was something as below vprintk_emit() down_trylock(console_sem) console_unlock() console_unlock() runs under console_sem, but in order to ->write() it does down_trylock(con->lock) on every console. the functions that modify consoles do: down(console_sem) down(con->lock); up(console_sem); "do things to con" up(con->lock); down_trylock(console_sem) // if it fails then someone else will do the printing console_unlock(); so console_unlock() will "pass on" those "frozen" consoles. the next time we are in console_unlock() again, we will notice that console has its "seen message IDX" behind the current log idx and will flush the (if console semaphore will be available). so the loop is for_each_console (con) { if (!down_trylock(con->lock)) continue; while (con->console_seq != log_next_seq) { msg_print_text(); con->console_seq++; if (!(con->flags & CON_ENABLED)) break; if (!con->write) break; if (!cpu_online(smp_processor_id()) && !(con->flags & CON_ANYTIME)) break; if (con->flags & CON_EXTENDED) con->write(con, ext_text, ext_len); else con->write(con, text, len); } up(con->lock); } up(console_sem); > What I had in mind for the printk look is to not even hold the main > console_lock, but only grab the individual console_locks (plus the printk > buffer spinlock ofc), so may be console_sem won't be needed for printk at all. need to think more. I think I just wanted to jump over all those "suspend all console drivers" etc for hibernate and other cases that we might be missing at the moment. so, no big and heavy console manipulations should be performed under console_sem. we take console_sem briefly, find the right console, take its ->lock, unlock console_sem. from now on the we the console, nothing else should be able to concurrently un-register it, etc. etc. > for_each_console(con) > if (!mutex_trylock(con->mutex)) > continue; > > /* pseudo-code, whatever we need to check to make sure > * this console is real and fully registered. */ > if (!(con->flags & CON_ENABLED)) > continue; > > if (con_requires_kthread(con)) { > wake_up(con->printk_wq); > > /* this is for consoles that grab massive amounts > * of locks, like fbcon. We could repurpose klogd > * for this perhaps. */ > > continue; > } > > /* do the actual printing business */ > } hm... ok. very close, but not exactly what I was thinking about. may be guys your ideas are better. per-console printing kthread, hm. interesting. > Very rough pseudo-code draft without looking at the details. The things > we'd need to do to get there: > > - Audit _all_ the places that use console_lock to protect global data > structures. Excessively sprinkle lockdep_assert_held(&console_lock_map); > over them to make sure we don't break stuff. We'll probably want to > stuff that lockdep assert into for_each_console (and have a > special/open-coded one for the printk loop). > > - Add con->mutex. Make sure that lock properly serializes against > the last step in register_console and the first step in > unregister_console. CON_ENABLED seems like the critical flag. a silly side-note, we must be able to console_unlock() from IRQ. mutex_trylock() cannot be used in atomic, because it might sleep, unlike semaphore. > - Wrap all call to console callbacks in a mutex_lock(con->mutex) critical > sections. > > - Sprinkle lockdep_assert_held(con->mutex) into all the console callbacks > of a few common console backends (fbcon + serial should be enough), to > make sure that part is solid. > > - Do the above changes in the printk loop. It also needs to be extracted > from console_unlock so that we can replace the > > if (console_try_lock()) > console_unlock(); > > pattern for pushing out the printk buffer with maybe a new > printk_flush() function, which does _not_ try to acquire console_lock. > > - console_unlock() still needs to flush out the printk buffers, to make > sure we haven't lost any lines. Or we'll rely on klogd to ensure > everything gets printed, when the trylock path doesn't work out. > > I think this would give us a reasonable locking design, allows us to not > stall on slow consoles when trying to dump emergency output to serial, and > it would decouple printk entirely from the huge console_lock mess. And as > long as we carefully audit for global stuff (everywhere, not just in > printk.c) in the first step I think it should be a safe transition. many thanks for the inputs! I'll think more about it. -ss
On Tue, Jul 11, 2017 at 11:48 AM, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > On (07/11/17 09:50), Daniel Vetter wrote: > [..] >> > ok, obviously stupid. >> > >> > I meant to hold con->lock between console_disable() and console_enable(). >> > so no other CPU can unregister it, etc. printk->console_unlock(), thus, >> > can either have a racy con->flags check (no con->lock taken) or try >> > something like down_trylock(&con->lock): if it fails, continue. >> >> I don't think you need the CON_ENABLED flag, just holding the per-console >> lock should be enough (I hope). Or what exactly is the idea behind this. >> I'm also not sure whether dropping the main console_lock is a good idea. > > CON_ENABLED thing is completely broken, yes. my apologies. > what I really wanted to think about was something as below > > vprintk_emit() > down_trylock(console_sem) > console_unlock() > > console_unlock() runs under console_sem, but in order to ->write() it > does down_trylock(con->lock) on every console. > > the functions that modify consoles do: > > down(console_sem) > down(con->lock); > up(console_sem); > > "do things to con" > > up(con->lock); > > down_trylock(console_sem) // if it fails then someone else will do the printing > console_unlock(); > > so console_unlock() will "pass on" those "frozen" consoles. the next time > we are in console_unlock() again, we will notice that console has its "seen > message IDX" behind the current log idx and will flush the (if console > semaphore will be available). I think the problem with that is that console_sem is such a huge outer lock that dropping it is currently not possible. At least not until we've pushed it back down into register_console and friends, through the entire fbcon layer. That's why I think adding console->lock as an entirely inner lock is the only approach we can do right now. But that means printk writing must be extremely careful, and stop relying on console_sem completely. I think that is doable, but a bit of work (mostly in annotating all the current stuff protected by console_sem). > so the loop is > > for_each_console (con) { > if (!down_trylock(con->lock)) > continue; > > while (con->console_seq != log_next_seq) { > msg_print_text(); > con->console_seq++; > > if (!(con->flags & CON_ENABLED)) > break; > if (!con->write) > break; > if (!cpu_online(smp_processor_id()) && > !(con->flags & CON_ANYTIME)) > break; I think the above checks don't need to be done for every msg? Or do I misunderstand something here? As long as we hold con->lock, things shouldn't change. > if (con->flags & CON_EXTENDED) > con->write(con, ext_text, ext_len); > else > con->write(con, text, len); > } > > up(con->lock); > } > > up(console_sem); > >> What I had in mind for the printk look is to not even hold the main >> console_lock, but only grab the individual console_locks (plus the printk >> buffer spinlock ofc), so > > may be console_sem won't be needed for printk at all. need to think > more. I think I just wanted to jump over all those "suspend all console > drivers" etc for hibernate and other cases that we might be missing > at the moment. > > > so, no big and heavy console manipulations should be performed under > console_sem. we take console_sem briefly, find the right console, take > its ->lock, unlock console_sem. from now on the we the console, nothing > else should be able to concurrently un-register it, etc. etc. As-is (i.e. without rewriting fbcon/fbdev) you can't assume that console_sem is only held briefly. That's why I think any printk redesign needs to be entirely uncoupled from console_sem. And I think that's doable (with enough care). >> for_each_console(con) >> if (!mutex_trylock(con->mutex)) >> continue; >> >> /* pseudo-code, whatever we need to check to make sure >> * this console is real and fully registered. */ >> if (!(con->flags & CON_ENABLED)) >> continue; >> >> if (con_requires_kthread(con)) { >> wake_up(con->printk_wq); >> >> /* this is for consoles that grab massive amounts >> * of locks, like fbcon. We could repurpose klogd >> * for this perhaps. */ >> >> continue; >> } >> >> /* do the actual printing business */ >> } > > hm... ok. very close, but not exactly what I was thinking about. > may be guys your ideas are better. per-console printing kthread, > hm. interesting. I don't think we want a per-console kthread for everything, because that would delay serial console (and other very simple consoles that don't drag in an entire locking tree). But if you have something like fbcon in your console list there's a _very_ high chance that ->write or ->unblank (not 100% on the exact callchains here) will grab some random heavy-weight lock from a completely different subsystem and either deadlock, or cause massive delays. Per-console kthread would work around all these problems, as long as we make the critical path of printk _never_ attempt to take the console_sem (not even with a trylock imo, since then you'd get delays by retrying from klogd eventually). >> Very rough pseudo-code draft without looking at the details. The things >> we'd need to do to get there: >> >> - Audit _all_ the places that use console_lock to protect global data >> structures. Excessively sprinkle lockdep_assert_held(&console_lock_map); >> over them to make sure we don't break stuff. We'll probably want to >> stuff that lockdep assert into for_each_console (and have a >> special/open-coded one for the printk loop). >> >> - Add con->mutex. Make sure that lock properly serializes against >> the last step in register_console and the first step in >> unregister_console. CON_ENABLED seems like the critical flag. > > a silly side-note, > we must be able to console_unlock() from IRQ. mutex_trylock() cannot > be used in atomic, because it might sleep, unlike semaphore. TIL. mutex_trylock not working from atomic feels super-silly. Why is that? That would mean all the locking for console->lock needs to be a semaphore, and we also need to hand-roll the lockdep annotations. Yuck. We do need a lock that you can sleep under, because some console drivers will have to do that while holding that lock (hence also per-console kthread). >> - Wrap all call to console callbacks in a mutex_lock(con->mutex) critical >> sections. >> >> - Sprinkle lockdep_assert_held(con->mutex) into all the console callbacks >> of a few common console backends (fbcon + serial should be enough), to >> make sure that part is solid. >> >> - Do the above changes in the printk loop. It also needs to be extracted >> from console_unlock so that we can replace the >> >> if (console_try_lock()) >> console_unlock(); >> >> pattern for pushing out the printk buffer with maybe a new >> printk_flush() function, which does _not_ try to acquire console_lock. >> >> - console_unlock() still needs to flush out the printk buffers, to make >> sure we haven't lost any lines. Or we'll rely on klogd to ensure >> everything gets printed, when the trylock path doesn't work out. >> >> I think this would give us a reasonable locking design, allows us to not >> stall on slow consoles when trying to dump emergency output to serial, and >> it would decouple printk entirely from the huge console_lock mess. And as >> long as we carefully audit for global stuff (everywhere, not just in >> printk.c) in the first step I think it should be a safe transition. > > many thanks for the inputs! I'll think more about it. I'm super happy that finally someone is looking into fixing this chaos for real. It has hurt us for a long time in gfx-land. -Daniel
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3899,7 +3899,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) goto nopage; /* Make sure we know about allocations which stall for too long */ - if (time_after(jiffies, alloc_start + stall_timeout)) { + if (0 && time_after(jiffies, alloc_start + stall_timeout)) { warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask, "page allocation stalls for %ums, order:%u", jiffies_to_msecs(jiffies-alloc_start), order);