Message ID | 20250221050804.2764553-1-raag.jadav@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Cleanup io.h | expand |
On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote: > In a wider effort to improve build speeds, we're attempting to split/cleanup > core headers. > > This series attempts to cleanup io.h with "include what you need" approach. > > This depends on earlier modifications available in immutable tag[1]. Feel > free to carry your subsystem patches with it, or let Andy know if you'd > rather let him carry them. > > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/ > > v2: Fix drm_draw.c build errors Hi Raag, I think your patch is to linux/io.h is correct and we should eventually apply it, but I think the header file cleanup needs to be done in a little more structured way or it ends up causing a lot of extra work for very little gain. As you already found, removing an old indirect #include that is no longer needed usually leads to some files breaking. The more impactful your change is in terms of build speed, the more things break! I think in this case, removing linux/err.h and linux/bug.h made very little difference because they are very small files in terms of what else they include. The approach that I would suggest you take is: 1. identify a header file that is included indirectly in a lot of places and also has a ton of indirect inclusions inside it 2. split it up in a way that most indirect inclusions can be replaced with a much smaller varient 3. repeat step 1 and 2 for additional headers, splitting them up but leaving the inclusions in place for the moment 4. test thousands of randconfig builds across architectures with the unnecessary indirect inclusions removed, send fixup patches for the drivers to add the missing direct #include statements, one patch per subsystem 5. once the known regressions are fixed, actually propose removing the extra #include statements and fix the remaining regressions If you are looking for a good place to start with step 1, the way I've done this in the past is to calculate for each header file how often it gets included in a particular build and how large the preprocessed files are, then sort them by the product of the two. These are the top entries I see for an arm64 defconfig #incl #lines file 7054 41901 include/linux/module.i 7146 39685 include/linux/fs.i 7057 39996 include/linux/elf.i 5667 44323 include/linux/energy_model.i 5667 44322 include/linux/device.i 5668 42467 include/linux/device/driver.i 4707 47776 include/linux/mm.i 7151 29653 include/linux/kobject.i 7151 29494 include/linux/sysfs.i 7146 29090 include/linux/percpu-rwsem.i 7152 28963 include/linux/rcuwait.i 7154 28929 include/linux/sched/signal.i 7151 28608 include/linux/kernfs.i 3713 55040 include/linux/kprobes.i 3707 55113 include/linux/kgdb.i 3718 54591 include/linux/ftrace.i 7366 27519 include/linux/slab.i 7146 28276 include/linux/ioprio.i 7190 28029 include/linux/radix-tree.i 7159 28137 include/linux/idr.i 7146 28098 include/linux/iocontext.i 3390 59188 include/linux/bio.i 7146 28073 include/linux/list_lru.i 7192 27842 include/linux/xarray.i 3390 58566 include/linux/blk_types.i 3176 62442 include/linux/writeback.i 7366 26873 include/linux/percpu-refcount.i 3538 55872 include/linux/highmem.i 7054 27934 include/linux/kmod.i 7382 26691 include/linux/gfp.i 7054 27925 include/linux/umh.i 3538 55255 include/linux/cacheflush.i 7382 26416 include/linux/topology.i 3053 63637 include/linux/memcontrol.i 7193 27010 include/linux/sched/mm.i 7382 26210 include/linux/mmzone.i 7382 26210 include/linux/memory_hotplug.i 4707 40968 include/linux/huge_mm.i 3428 56061 include/linux/bvec.i 3713 49335 include/linux/rethook.i ... 5159 28781 include/linux/io.i You can pick any one of these and try to analyze why the header gets included absolutely everywhere, and why it is so huge, then try to come up with an idea for how to improve that based on what you found. When you do the analysis for linux/io.h, you find that it's only on #72 on that list (for arm64 defconfig), and that almost all of its 28781 lines come from one of these indirectly included files (sorted by number of lines after preprocessing), not from bug.h or err.h: 27569 include/linux/pgtable.h 24636 include/linux/mmu_notifier.h 24217 include/linux/page-flags.h 23860 include/linux/mmap_lock.h 23670 include/linux/mm_types.h 21468 include/linux/srcu.h 21355 include/linux/uprobes.h 20746 include/linux/workqueue.h 20091 include/linux/timer.h 19950 include/linux/ktime.h 19949 include/linux/timekeeping.h 19536 include/linux/jiffies.h 19428 include/linux/timex.h 19428 include/linux/time.h 19427 include/linux/time32.h 18716 include/linux/percpu_counter.h 18595 include/linux/percpu.h 18390 include/linux/sched.h 16879 include/linux/maple_tree.h 16699 include/linux/seqlock.h 16527 include/linux/kref.h 16472 include/linux/completion.h 16429 include/linux/swait.h 16388 include/linux/wait.h 16364 include/linux/rwsem.h Note that a lot of these are indirectly included multiple times below linux/io.h. Almost none of the above have any business being included in linux/io.h. Arnd
On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote: > On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote: > > In a wider effort to improve build speeds, we're attempting to split/cleanup > > core headers. > > > > This series attempts to cleanup io.h with "include what you need" approach. > > > > This depends on earlier modifications available in immutable tag[1]. Feel > > free to carry your subsystem patches with it, or let Andy know if you'd > > rather let him carry them. > > > > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/ > > > > v2: Fix drm_draw.c build errors > > Hi Raag, > > I think your patch is to linux/io.h is correct and we should > eventually apply it, but I think the header file cleanup needs > to be done in a little more structured way or it ends up causing > a lot of extra work for very little gain. > > As you already found, removing an old indirect #include that is > no longer needed usually leads to some files breaking. The more > impactful your change is in terms of build speed, the more > things break! I think in this case, removing linux/err.h and > linux/bug.h made very little difference because they are very > small files in terms of what else they include. > > The approach that I would suggest you take is: > > 1. identify a header file that is included indirectly in a lot > of places and also has a ton of indirect inclusions inside it > 2. split it up in a way that most indirect inclusions can be > replaced with a much smaller varient > 3. repeat step 1 and 2 for additional headers, splitting them > up but leaving the inclusions in place for the moment > 4. test thousands of randconfig builds across architectures > with the unnecessary indirect inclusions removed, send > fixup patches for the drivers to add the missing direct > #include statements, one patch per subsystem > 5. once the known regressions are fixed, actually propose > removing the extra #include statements and fix the remaining > regressions > > If you are looking for a good place to start with step 1, > the way I've done this in the past is to calculate for each > header file how often it gets included in a particular build > and how large the preprocessed files are, then sort them > by the product of the two. These are the top entries I > see for an arm64 defconfig > > #incl #lines file > 7054 41901 include/linux/module.i > 7146 39685 include/linux/fs.i > 7057 39996 include/linux/elf.i > 5667 44323 include/linux/energy_model.i > 5667 44322 include/linux/device.i > 5668 42467 include/linux/device/driver.i > 4707 47776 include/linux/mm.i > 7151 29653 include/linux/kobject.i > 7151 29494 include/linux/sysfs.i > 7146 29090 include/linux/percpu-rwsem.i > 7152 28963 include/linux/rcuwait.i > 7154 28929 include/linux/sched/signal.i > 7151 28608 include/linux/kernfs.i > 3713 55040 include/linux/kprobes.i > 3707 55113 include/linux/kgdb.i > 3718 54591 include/linux/ftrace.i > 7366 27519 include/linux/slab.i > 7146 28276 include/linux/ioprio.i > 7190 28029 include/linux/radix-tree.i > 7159 28137 include/linux/idr.i > 7146 28098 include/linux/iocontext.i > 3390 59188 include/linux/bio.i > 7146 28073 include/linux/list_lru.i > 7192 27842 include/linux/xarray.i > 3390 58566 include/linux/blk_types.i > 3176 62442 include/linux/writeback.i > 7366 26873 include/linux/percpu-refcount.i > 3538 55872 include/linux/highmem.i > 7054 27934 include/linux/kmod.i > 7382 26691 include/linux/gfp.i > 7054 27925 include/linux/umh.i > 3538 55255 include/linux/cacheflush.i > 7382 26416 include/linux/topology.i > 3053 63637 include/linux/memcontrol.i > 7193 27010 include/linux/sched/mm.i > 7382 26210 include/linux/mmzone.i > 7382 26210 include/linux/memory_hotplug.i > 4707 40968 include/linux/huge_mm.i > 3428 56061 include/linux/bvec.i > 3713 49335 include/linux/rethook.i > ... > 5159 28781 include/linux/io.i > > You can pick any one of these and try to analyze why the > header gets included absolutely everywhere, and why it is > so huge, then try to come up with an idea for how to > improve that based on what you found. > > When you do the analysis for linux/io.h, you find that > it's only on #72 on that list (for arm64 defconfig), and > that almost all of its 28781 lines come from one of these > indirectly included files (sorted by number of lines after > preprocessing), not from bug.h or err.h: > > 27569 include/linux/pgtable.h > 24636 include/linux/mmu_notifier.h > 24217 include/linux/page-flags.h > 23860 include/linux/mmap_lock.h > 23670 include/linux/mm_types.h > 21468 include/linux/srcu.h > 21355 include/linux/uprobes.h > 20746 include/linux/workqueue.h > 20091 include/linux/timer.h > 19950 include/linux/ktime.h > 19949 include/linux/timekeeping.h > 19536 include/linux/jiffies.h > 19428 include/linux/timex.h > 19428 include/linux/time.h > 19427 include/linux/time32.h > 18716 include/linux/percpu_counter.h > 18595 include/linux/percpu.h > 18390 include/linux/sched.h > 16879 include/linux/maple_tree.h > 16699 include/linux/seqlock.h > 16527 include/linux/kref.h > 16472 include/linux/completion.h > 16429 include/linux/swait.h > 16388 include/linux/wait.h > 16364 include/linux/rwsem.h > > Note that a lot of these are indirectly included multiple > times below linux/io.h. Almost none of the above have any > business being included in linux/io.h. Excellent! Thanks for laying out a great analysis and points worth considering. It seems this will require extensive study and motivation, and is definitely not as obvious and simple as this series, but I'll surely look into it later on. Raag
On Fri, Feb 21, 2025 at 10:38:02AM +0530, Raag Jadav wrote: > In a wider effort to improve build speeds, we're attempting to split/cleanup > core headers. > > This series attempts to cleanup io.h with "include what you need" approach. > > This depends on earlier modifications available in immutable tag[1]. Feel > free to carry your subsystem patches with it, or let Andy know if you'd > rather let him carry them. > > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/ Oh, you are fast! Please, read my comments in v1 as well an take them into account.
On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote: > On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote: > > In a wider effort to improve build speeds, we're attempting to split/cleanup > > core headers. > > > > This series attempts to cleanup io.h with "include what you need" approach. > > > > This depends on earlier modifications available in immutable tag[1]. Feel > > free to carry your subsystem patches with it, or let Andy know if you'd > > rather let him carry them. > > > > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/ > > > > v2: Fix drm_draw.c build errors > > Hi Raag, > > I think your patch is to linux/io.h is correct and we should > eventually apply it, but I think the header file cleanup needs > to be done in a little more structured way or it ends up causing > a lot of extra work for very little gain. > > As you already found, removing an old indirect #include that is > no longer needed usually leads to some files breaking. The more > impactful your change is in terms of build speed, the more > things break! I think in this case, removing linux/err.h and > linux/bug.h made very little difference because they are very > small files in terms of what else they include. While this is all true, removing unneeded inclusions rarely can lead to the "extra work with a little gain". When there is a replacement to the low level ones, it's also an improvement in my opinion and won't be harmful in the future. But I agree, that the stuff is way too tangled already and requires an enormous work to untangle it, even if doing it structurally. ... Do you have your scripts for the showed statistics being published somewhere?
On Fri, Feb 21, 2025, at 17:50, Andy Shevchenko wrote: > On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote: >> As you already found, removing an old indirect #include that is >> no longer needed usually leads to some files breaking. The more >> impactful your change is in terms of build speed, the more >> things break! I think in this case, removing linux/err.h and >> linux/bug.h made very little difference because they are very >> small files in terms of what else they include. > > While this is all true, removing unneeded inclusions rarely can lead to the > "extra work with a little gain". When there is a replacement to the low > level ones, it's also an improvement in my opinion and won't be harmful in > the future. But I agree, that the stuff is way too tangled already and requires > an enormous work to untangle it, even if doing it structurally. The problem I see with prematurely applying small improvements like this one is that they always cause build regressions, at least if the change is any good. If we can find some more impactful changes like this one, we can group them together in a branch and test them a lot better before they even reach linux-next. I mainly want to avoid people getting angry at Raag for repeatedly breaking their subsystems by pushing small patches one at a time. > Do you have your scripts for the showed statistics being published somewhere? I had a good set of scripts on an older machine and might still have some backups of that somewhere, but just hacked up something ad-hoc today beased on what I remembered from that time. Here are the snippets that you might find useful. A patch to Kbuild to create a list of each included header for each object file built in a given configuration (similar to the .filename.o.d files, but in a format I found more convenient): --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -307,7 +307,8 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) - endif quiet_cmd_cc_o_c = CC $(quiet_modtag) $@ - cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \ + cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< ; \ + $(CC) $(c_flags) -E -o - $< | grep ^\#.*include | cut -f 2 -d\" | sort -u > $@.includes \ $(cmd_ld_single) \ $(cmd_objtool) shell oneliner to find the header files that are most commonly included from those files: $ find -name \*includes | xargs cat | sort | uniq -c | sed -e 's:\./\|\././::g' | sort -rn | head -n 1000 > mostincluded oneliner to preprocess each of those headers $ cat mostincluded | grep include/linux | while read a i ; do gcc -E $i -o ${i%.h}.i ${GCCARGS} ; done oneliner to sort by product of includes and lines: $ cat mostincluded | grep include/linux/ | while read a b ; do if [ -e ${b%.h}.i ] ; then echo $a `wc -l ${b%.h}.i` ; fi ; done | sort -n -k2 | while read a b c ; do echo $[$a * $b] $a $b $c ; done | sort -nr > fulllist In the old days, I had cleaner versions of these in an automated script, and produced a .dot file to visualize the dependencies with graphviz. I did get to the point of more than doubling compile speed, so there was a clear incentive to continue. In fact, the further I got along the way, the better the savings. In the end I gave up when I could not get to a useful subset to upstream first that wouldn't already break hundreds of drivers. The best idea I have to avoid that is to pick one header to clean up from my list and do all the prerequisites but not actually break anything at first. Arnd