mbox series

[v2,0/2] Cleanup io.h

Message ID 20250221050804.2764553-1-raag.jadav@intel.com (mailing list archive)
Headers show
Series Cleanup io.h | expand

Message

Raag Jadav Feb. 21, 2025, 5:08 a.m. UTC
In a wider effort to improve build speeds, we're attempting to split/cleanup
core headers.

This series attempts to cleanup io.h with "include what you need" approach.

This depends on earlier modifications available in immutable tag[1]. Feel
free to carry your subsystem patches with it, or let Andy know if you'd
rather let him carry them.

[1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/

v2: Fix drm_draw.c build errors

Raag Jadav (2):
  io.h: drop unused headers
  drm/draw: include bug.h

 drivers/gpu/drm/drm_draw.c | 1 +
 include/linux/io.h         | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)


base-commit: b16e9f8547a328b19af59afc213ce323124d11e9

Comments

Arnd Bergmann Feb. 21, 2025, 10:15 a.m. UTC | #1
On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote:
> In a wider effort to improve build speeds, we're attempting to split/cleanup
> core headers.
>
> This series attempts to cleanup io.h with "include what you need" approach.
>
> This depends on earlier modifications available in immutable tag[1]. Feel
> free to carry your subsystem patches with it, or let Andy know if you'd
> rather let him carry them.
>
> [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/
>
> v2: Fix drm_draw.c build errors

Hi Raag,

I think your patch is to linux/io.h is correct and we should
eventually apply it, but I think the header file cleanup needs
to be done in a little more structured way or it ends up causing
a lot of extra work for very little gain.

As you already found, removing an old indirect #include that is
no longer needed usually leads to some files breaking. The more
impactful your change is in terms of build speed, the more
things break! I think in this case, removing linux/err.h and
linux/bug.h made very little difference because they are very
small files in terms of what else they include.

The approach that I would suggest you take is:

1. identify a header file that is included indirectly in a lot
   of places and also has a ton of indirect inclusions inside it
2. split it up in a way that most indirect inclusions can be
   replaced with a much smaller varient
3. repeat step 1 and 2 for additional headers, splitting them
   up but leaving the inclusions in place for the moment
4. test thousands of randconfig builds across architectures
   with the unnecessary indirect inclusions removed, send
   fixup patches for the drivers to add the missing direct
   #include statements, one patch per subsystem
5. once the known regressions are fixed, actually propose
   removing the extra #include statements and fix the remaining
   regressions

If you are looking for a good place to start with step 1, 
the way I've done this in the past is to calculate for each
header file how often it gets included in a particular build
and how large the preprocessed files are, then sort them
by the product of the two. These are the top entries I
see for an arm64 defconfig

#incl #lines file
7054 41901 include/linux/module.i
7146 39685 include/linux/fs.i
7057 39996 include/linux/elf.i
5667 44323 include/linux/energy_model.i
5667 44322 include/linux/device.i
5668 42467 include/linux/device/driver.i
4707 47776 include/linux/mm.i
7151 29653 include/linux/kobject.i
7151 29494 include/linux/sysfs.i
7146 29090 include/linux/percpu-rwsem.i
7152 28963 include/linux/rcuwait.i
7154 28929 include/linux/sched/signal.i
7151 28608 include/linux/kernfs.i
3713 55040 include/linux/kprobes.i
3707 55113 include/linux/kgdb.i
3718 54591 include/linux/ftrace.i
7366 27519 include/linux/slab.i
7146 28276 include/linux/ioprio.i
7190 28029 include/linux/radix-tree.i
7159 28137 include/linux/idr.i
7146 28098 include/linux/iocontext.i
3390 59188 include/linux/bio.i
7146 28073 include/linux/list_lru.i
7192 27842 include/linux/xarray.i
3390 58566 include/linux/blk_types.i
3176 62442 include/linux/writeback.i
7366 26873 include/linux/percpu-refcount.i
3538 55872 include/linux/highmem.i
7054 27934 include/linux/kmod.i
7382 26691 include/linux/gfp.i
7054 27925 include/linux/umh.i
3538 55255 include/linux/cacheflush.i
7382 26416 include/linux/topology.i
3053 63637 include/linux/memcontrol.i
7193 27010 include/linux/sched/mm.i
7382 26210 include/linux/mmzone.i
7382 26210 include/linux/memory_hotplug.i
4707 40968 include/linux/huge_mm.i
3428 56061 include/linux/bvec.i
3713 49335 include/linux/rethook.i
...
5159 28781 include/linux/io.i

You can pick any one of these and try to analyze why the
header gets included absolutely everywhere, and why it is
so huge, then try to come up with an idea for how to
improve that based on what you found.

When you do the analysis for linux/io.h, you find that
it's only on #72 on that list (for arm64 defconfig), and
that almost all of its 28781 lines come from one of these
indirectly included files (sorted by number of lines after
preprocessing), not from bug.h or err.h:

27569 include/linux/pgtable.h
24636 include/linux/mmu_notifier.h
24217 include/linux/page-flags.h
23860 include/linux/mmap_lock.h
23670 include/linux/mm_types.h
21468 include/linux/srcu.h
21355 include/linux/uprobes.h
20746 include/linux/workqueue.h
20091 include/linux/timer.h
19950 include/linux/ktime.h
19949 include/linux/timekeeping.h
19536 include/linux/jiffies.h
19428 include/linux/timex.h
19428 include/linux/time.h
19427 include/linux/time32.h
18716 include/linux/percpu_counter.h
18595 include/linux/percpu.h
18390 include/linux/sched.h
16879 include/linux/maple_tree.h
16699 include/linux/seqlock.h
16527 include/linux/kref.h
16472 include/linux/completion.h
16429 include/linux/swait.h
16388 include/linux/wait.h
16364 include/linux/rwsem.h

Note that a lot of these are indirectly included multiple
times below linux/io.h. Almost none of the above have any
business being included in linux/io.h.

      Arnd
Raag Jadav Feb. 21, 2025, 11:26 a.m. UTC | #2
On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote:
> On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote:
> > In a wider effort to improve build speeds, we're attempting to split/cleanup
> > core headers.
> >
> > This series attempts to cleanup io.h with "include what you need" approach.
> >
> > This depends on earlier modifications available in immutable tag[1]. Feel
> > free to carry your subsystem patches with it, or let Andy know if you'd
> > rather let him carry them.
> >
> > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/
> >
> > v2: Fix drm_draw.c build errors
> 
> Hi Raag,
> 
> I think your patch is to linux/io.h is correct and we should
> eventually apply it, but I think the header file cleanup needs
> to be done in a little more structured way or it ends up causing
> a lot of extra work for very little gain.
> 
> As you already found, removing an old indirect #include that is
> no longer needed usually leads to some files breaking. The more
> impactful your change is in terms of build speed, the more
> things break! I think in this case, removing linux/err.h and
> linux/bug.h made very little difference because they are very
> small files in terms of what else they include.
> 
> The approach that I would suggest you take is:
> 
> 1. identify a header file that is included indirectly in a lot
>    of places and also has a ton of indirect inclusions inside it
> 2. split it up in a way that most indirect inclusions can be
>    replaced with a much smaller varient
> 3. repeat step 1 and 2 for additional headers, splitting them
>    up but leaving the inclusions in place for the moment
> 4. test thousands of randconfig builds across architectures
>    with the unnecessary indirect inclusions removed, send
>    fixup patches for the drivers to add the missing direct
>    #include statements, one patch per subsystem
> 5. once the known regressions are fixed, actually propose
>    removing the extra #include statements and fix the remaining
>    regressions
> 
> If you are looking for a good place to start with step 1, 
> the way I've done this in the past is to calculate for each
> header file how often it gets included in a particular build
> and how large the preprocessed files are, then sort them
> by the product of the two. These are the top entries I
> see for an arm64 defconfig
> 
> #incl #lines file
> 7054 41901 include/linux/module.i
> 7146 39685 include/linux/fs.i
> 7057 39996 include/linux/elf.i
> 5667 44323 include/linux/energy_model.i
> 5667 44322 include/linux/device.i
> 5668 42467 include/linux/device/driver.i
> 4707 47776 include/linux/mm.i
> 7151 29653 include/linux/kobject.i
> 7151 29494 include/linux/sysfs.i
> 7146 29090 include/linux/percpu-rwsem.i
> 7152 28963 include/linux/rcuwait.i
> 7154 28929 include/linux/sched/signal.i
> 7151 28608 include/linux/kernfs.i
> 3713 55040 include/linux/kprobes.i
> 3707 55113 include/linux/kgdb.i
> 3718 54591 include/linux/ftrace.i
> 7366 27519 include/linux/slab.i
> 7146 28276 include/linux/ioprio.i
> 7190 28029 include/linux/radix-tree.i
> 7159 28137 include/linux/idr.i
> 7146 28098 include/linux/iocontext.i
> 3390 59188 include/linux/bio.i
> 7146 28073 include/linux/list_lru.i
> 7192 27842 include/linux/xarray.i
> 3390 58566 include/linux/blk_types.i
> 3176 62442 include/linux/writeback.i
> 7366 26873 include/linux/percpu-refcount.i
> 3538 55872 include/linux/highmem.i
> 7054 27934 include/linux/kmod.i
> 7382 26691 include/linux/gfp.i
> 7054 27925 include/linux/umh.i
> 3538 55255 include/linux/cacheflush.i
> 7382 26416 include/linux/topology.i
> 3053 63637 include/linux/memcontrol.i
> 7193 27010 include/linux/sched/mm.i
> 7382 26210 include/linux/mmzone.i
> 7382 26210 include/linux/memory_hotplug.i
> 4707 40968 include/linux/huge_mm.i
> 3428 56061 include/linux/bvec.i
> 3713 49335 include/linux/rethook.i
> ...
> 5159 28781 include/linux/io.i
> 
> You can pick any one of these and try to analyze why the
> header gets included absolutely everywhere, and why it is
> so huge, then try to come up with an idea for how to
> improve that based on what you found.
> 
> When you do the analysis for linux/io.h, you find that
> it's only on #72 on that list (for arm64 defconfig), and
> that almost all of its 28781 lines come from one of these
> indirectly included files (sorted by number of lines after
> preprocessing), not from bug.h or err.h:
> 
> 27569 include/linux/pgtable.h
> 24636 include/linux/mmu_notifier.h
> 24217 include/linux/page-flags.h
> 23860 include/linux/mmap_lock.h
> 23670 include/linux/mm_types.h
> 21468 include/linux/srcu.h
> 21355 include/linux/uprobes.h
> 20746 include/linux/workqueue.h
> 20091 include/linux/timer.h
> 19950 include/linux/ktime.h
> 19949 include/linux/timekeeping.h
> 19536 include/linux/jiffies.h
> 19428 include/linux/timex.h
> 19428 include/linux/time.h
> 19427 include/linux/time32.h
> 18716 include/linux/percpu_counter.h
> 18595 include/linux/percpu.h
> 18390 include/linux/sched.h
> 16879 include/linux/maple_tree.h
> 16699 include/linux/seqlock.h
> 16527 include/linux/kref.h
> 16472 include/linux/completion.h
> 16429 include/linux/swait.h
> 16388 include/linux/wait.h
> 16364 include/linux/rwsem.h
> 
> Note that a lot of these are indirectly included multiple
> times below linux/io.h. Almost none of the above have any
> business being included in linux/io.h.

Excellent! Thanks for laying out a great analysis and points worth
considering. It seems this will require extensive study and motivation,
and is definitely not as obvious and simple as this series, but I'll
surely look into it later on.

Raag
andriy.shevchenko@linux.intel.com Feb. 21, 2025, 4:12 p.m. UTC | #3
On Fri, Feb 21, 2025 at 10:38:02AM +0530, Raag Jadav wrote:
> In a wider effort to improve build speeds, we're attempting to split/cleanup
> core headers.
> 
> This series attempts to cleanup io.h with "include what you need" approach.
> 
> This depends on earlier modifications available in immutable tag[1]. Feel
> free to carry your subsystem patches with it, or let Andy know if you'd
> rather let him carry them.
> 
> [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/

Oh, you are fast! Please, read my comments in v1 as well an take them into
account.
andriy.shevchenko@linux.intel.com Feb. 21, 2025, 4:50 p.m. UTC | #4
On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote:
> On Fri, Feb 21, 2025, at 06:08, Raag Jadav wrote:
> > In a wider effort to improve build speeds, we're attempting to split/cleanup
> > core headers.
> >
> > This series attempts to cleanup io.h with "include what you need" approach.
> >
> > This depends on earlier modifications available in immutable tag[1]. Feel
> > free to carry your subsystem patches with it, or let Andy know if you'd
> > rather let him carry them.
> >
> > [1] https://lore.kernel.org/r/Z7cqCaME4LxTTBn6@black.fi.intel.com/
> >
> > v2: Fix drm_draw.c build errors
> 
> Hi Raag,
> 
> I think your patch is to linux/io.h is correct and we should
> eventually apply it, but I think the header file cleanup needs
> to be done in a little more structured way or it ends up causing
> a lot of extra work for very little gain.
> 
> As you already found, removing an old indirect #include that is
> no longer needed usually leads to some files breaking. The more
> impactful your change is in terms of build speed, the more
> things break! I think in this case, removing linux/err.h and
> linux/bug.h made very little difference because they are very
> small files in terms of what else they include.

While this is all true, removing unneeded inclusions rarely can lead to the
"extra work with a little gain". When there is a replacement to the low
level ones, it's also an improvement in my opinion and won't be harmful in
the future. But I agree, that the stuff is way too tangled already and requires
an enormous work to untangle it, even if doing it structurally.

...

Do you have your scripts for the showed statistics being published somewhere?
Arnd Bergmann Feb. 21, 2025, 5:15 p.m. UTC | #5
On Fri, Feb 21, 2025, at 17:50, Andy Shevchenko wrote:
> On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote:
>> As you already found, removing an old indirect #include that is
>> no longer needed usually leads to some files breaking. The more
>> impactful your change is in terms of build speed, the more
>> things break! I think in this case, removing linux/err.h and
>> linux/bug.h made very little difference because they are very
>> small files in terms of what else they include.
>
> While this is all true, removing unneeded inclusions rarely can lead to the
> "extra work with a little gain". When there is a replacement to the low
> level ones, it's also an improvement in my opinion and won't be harmful in
> the future. But I agree, that the stuff is way too tangled already and requires
> an enormous work to untangle it, even if doing it structurally.

The problem I see with prematurely applying small improvements like this
one is that they always cause build regressions, at least if the change
is any good. If we can find some more impactful changes like this one,
we can group them together in a branch and test them a lot better before
they even reach linux-next.

I mainly want to avoid people getting angry at Raag for repeatedly
breaking their subsystems by pushing small patches one at a time.

> Do you have your scripts for the showed statistics being published somewhere?

I had a good set of scripts on an older machine and might still
have some backups of that somewhere, but just hacked up something
ad-hoc today beased on what I remembered from that time. Here
are the snippets that you might find useful.

A patch to Kbuild to create a list of each included header for each
object file built in a given configuration (similar to the .filename.o.d
files, but in a format I found more convenient):

--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -307,7 +307,8 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -
 endif
 
 quiet_cmd_cc_o_c = CC $(quiet_modtag)  $@
-      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \
+      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< ; \
+                   $(CC) $(c_flags) -E -o - $< | grep ^\#.*include | cut -f 2 -d\" | sort -u > $@.includes \
                $(cmd_ld_single) \
                $(cmd_objtool)
 
shell oneliner to find the header files that are most commonly included
from those files:

$ find -name \*includes | xargs cat | sort | uniq -c | sed -e 's:\./\|\././::g' | sort -rn | head -n 1000 > mostincluded

oneliner to preprocess each of those headers 

$ cat mostincluded | grep include/linux | while read a i ; do gcc -E $i -o ${i%.h}.i ${GCCARGS} ; done

oneliner to sort by product of includes and lines:

$ cat mostincluded | grep include/linux/ | while read a b ; do if [ -e ${b%.h}.i ] ; then echo $a `wc -l ${b%.h}.i` ; fi ; done | sort -n -k2 | while read a b c ; do echo $[$a * $b] $a $b $c ; done | sort -nr > fulllist

In the old days, I had cleaner versions of these in an automated script,
and produced a .dot file to visualize the dependencies with graphviz.
I did get to the point of more than doubling compile speed, so there was
a clear incentive to continue. In fact, the further I got along the way,
the better the savings. In the end I gave up when I could not
get to a useful subset to upstream first that wouldn't already break
hundreds of drivers.

The best idea I have to avoid that is to pick one header to clean up
from my list and do all the prerequisites but not actually break anything
at first.

      Arnd