Message ID | 20230626115700.13873-1-byungchul@sk.com (mailing list archive) |
---|---|
Headers | show |
Series | DEPT(Dependency Tracker) | expand |
On Mon, Jun 26, 2023 at 08:56:35PM +0900, Byungchul Park wrote: > >From now on, I can work on LKML again! I'm wondering if DEPT has been > helping kernel debugging well even though it's a form of patches yet. > > I'm happy to see that DEPT reports a real problem in practice. See: > > https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t > https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/ > > Nevertheless, I apologize for the lack of document. I promise to add it > before it gets needed to use DEPT's APIs by users. For now, you can use > DEPT just with CONFIG_DEPT on. > > --- > > Hi Linus and folks, > > I've been developing a tool for detecting deadlock possibilities by > tracking wait/event rather than lock(?) acquisition order to try to > cover all synchonization machanisms. It's done on v6.2-rc2. > > Benifit: > > 0. Works with all lock primitives. > 1. Works with wait_for_completion()/complete(). > 2. Works with 'wait' on PG_locked. > 3. Works with 'wait' on PG_writeback. > 4. Works with swait/wakeup. > 5. Works with waitqueue. > 6. Works with wait_bit. > 7. Multiple reports are allowed. > 8. Deduplication control on multiple reports. > 9. Withstand false positives thanks to 6. > 10. Easy to tag any wait/event. > > Future work: > > 0. To make it more stable. > 1. To separates Dept from Lockdep. > 2. To improves performance in terms of time and space. > 3. To use Dept as a dependency engine for Lockdep. > 4. To add any missing tags of wait/event in the kernel. > 5. To deduplicate stack trace. If you run this today, does it find any issues with any subsystems / drivers that the current lockdep code does not find? Have you run your tool on patches sent to the different mailing lists for new drivers and code added to the tree to verify that it can find issues easily? In other words, why do we need this at all? What makes it 'better' than what we already have that works for us today? What benifit is it? thanks, greg k-h
On Mon, Jun 26, 2023 at 03:02:22PM +0200, Greg KH wrote: > On Mon, Jun 26, 2023 at 08:56:35PM +0900, Byungchul Park wrote: > > >From now on, I can work on LKML again! I'm wondering if DEPT has been > > helping kernel debugging well even though it's a form of patches yet. > > > > I'm happy to see that DEPT reports a real problem in practice. See: > > > > https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t > > https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/ > > > > Nevertheless, I apologize for the lack of document. I promise to add it > > before it gets needed to use DEPT's APIs by users. For now, you can use > > DEPT just with CONFIG_DEPT on. > > > > --- > > > > Hi Linus and folks, > > > > I've been developing a tool for detecting deadlock possibilities by > > tracking wait/event rather than lock(?) acquisition order to try to > > cover all synchonization machanisms. It's done on v6.2-rc2. > > > > Benifit: > > > > 0. Works with all lock primitives. > > 1. Works with wait_for_completion()/complete(). > > 2. Works with 'wait' on PG_locked. > > 3. Works with 'wait' on PG_writeback. > > 4. Works with swait/wakeup. > > 5. Works with waitqueue. > > 6. Works with wait_bit. > > 7. Multiple reports are allowed. > > 8. Deduplication control on multiple reports. > > 9. Withstand false positives thanks to 6. > > 10. Easy to tag any wait/event. > > > > Future work: > > > > 0. To make it more stable. > > 1. To separates Dept from Lockdep. > > 2. To improves performance in terms of time and space. > > 3. To use Dept as a dependency engine for Lockdep. > > 4. To add any missing tags of wait/event in the kernel. > > 5. To deduplicate stack trace. > > If you run this today, does it find any issues with any subsystems / > drivers that the current lockdep code does not find? Have you run your Yes, it found some deadlocks. The following issue was about a deadlock by PG_locked detected by DEPT which lockdep couldn't. Check it out: https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/ > tool on patches sent to the different mailing lists for new drivers and > code added to the tree to verify that it can find issues easily? I had been co-working with GPU driver developers for their new drivers adding to their CI system to verify that it can find issues easily. Now that I've almost organized my stuff, I will re-start it. > In other words, why do we need this at all? What makes it 'better' than > what we already have that works for us today? What benifit is it? AS IS : It can detect deadlocks by wrong lock usage e.g. acqusition order. Once it reports a issue, you must resolve it or work around to see further reports even if it's not one you are into. TO BE : It can detect deadlocks by not only locks but also any waits e.g. wait_for_completion(), PG_locked, PG_writeback, dma fence and so on. Last but not least, DEPT can report issues multiple times at a single system-up so that any issues that you are not into, no longer prevent further reports that is valuable to you. However, yes. DEPT needs to be more matured. I'd like to do that together. Byungchul > thanks, > > greg k-h