diff mbox series

[v4] fs/coredump: Enable dynamic configuration of max file note size

Message ID 20240506193700.7884-1-apais@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series [v4] fs/coredump: Enable dynamic configuration of max file note size | expand

Commit Message

Allen Pais May 6, 2024, 7:37 p.m. UTC
Introduce the capability to dynamically configure the maximum file
note size for ELF core dumps via sysctl.

Why is this being done?
We have observed that during a crash when there are more than 65k mmaps
in memory, the existing fixed limit on the size of the ELF notes section
becomes a bottleneck. The notes section quickly reaches its capacity,
leading to incomplete memory segment information in the resulting coredump.
This truncation compromises the utility of the coredumps, as crucial
information about the memory state at the time of the crash might be
omitted.

This enhancement removes the previous static limit of 4MB, allowing
system administrators to adjust the size based on system-specific
requirements or constraints.

Eg:
$ sysctl -a | grep core_file_note_size_min
kernel.core_file_note_size_max = 4194304

$ sysctl -n kernel.core_file_note_size_min
4194304

$echo 519304 > /proc/sys/kernel/core_file_note_size_min

$sysctl -n kernel.core_file_note_size_min
519304

Attempting to write beyond the ceiling value of 16MB
$echo 17194304 > /proc/sys/kernel/core_file_note_size_min
bash: echo: write error: Invalid argument

Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>

---
Changes in v4:
   - Rename core_file_note_size_max to core_file_note_size_min [kees]
   - Rename core_file_note_size_max to MAX_FILE_NOTE_SIZE to
     CORE_FILE_NOTE_SIZE_DEFAULT and MAX_ALLOWED_NOTE_SIZE to
     CORE_FILE_NOTE_SIZE_MAX [Kees]
   - change core_file_note_size_allowed to static and const [Kees]
Changes in v3:
   - Fix commit message to reflect the correct sysctl knob [Kees]
   - Add a ceiling for maximum pssible note size(16M) [Allen]
   - Add a pr_warn_once() [Kees]
Changes in v2:
   - Move new sysctl to fs/coredump.c [Luis & Kees]
   - rename max_file_note_size to core_file_note_size_max [kees]
   - Capture "why this is being done?" int he commit message [Luis & Kees]
---
 fs/binfmt_elf.c          |  7 +++++--
 fs/coredump.c            | 15 +++++++++++++++
 include/linux/coredump.h |  1 +
 3 files changed, 21 insertions(+), 2 deletions(-)

Comments

kernel test robot May 7, 2024, 3:11 p.m. UTC | #1
Hi Allen,

kernel test robot noticed the following build errors:

[auto build test ERROR on kees/for-next/execve]
[also build test ERROR on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
patch link:    https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
config: loongarch-randconfig-001-20240507 (https://download.01.org/0day-ci/archive/20240507/202405072249.fLkavX40-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240507/202405072249.fLkavX40-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405072249.fLkavX40-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/binfmt_elf.c: In function 'fill_files_note':
>> fs/binfmt_elf.c:1598:21: error: 'core_file_note_size_min' undeclared (first use in this function)
    1598 |         if (size >= core_file_note_size_min) {
         |                     ^~~~~~~~~~~~~~~~~~~~~~~
   fs/binfmt_elf.c:1598:21: note: each undeclared identifier is reported only once for each function it appears in


vim +/core_file_note_size_min +1598 fs/binfmt_elf.c

  1569	
  1570	/*
  1571	 * Format of NT_FILE note:
  1572	 *
  1573	 * long count     -- how many files are mapped
  1574	 * long page_size -- units for file_ofs
  1575	 * array of [COUNT] elements of
  1576	 *   long start
  1577	 *   long end
  1578	 *   long file_ofs
  1579	 * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
  1580	 */
  1581	static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm)
  1582	{
  1583		unsigned count, size, names_ofs, remaining, n;
  1584		user_long_t *data;
  1585		user_long_t *start_end_ofs;
  1586		char *name_base, *name_curpos;
  1587		int i;
  1588	
  1589		/* *Estimated* file count and total data size needed */
  1590		count = cprm->vma_count;
  1591		if (count > UINT_MAX / 64)
  1592			return -EINVAL;
  1593		size = count * 64;
  1594	
  1595		names_ofs = (2 + 3 * count) * sizeof(data[0]);
  1596	 alloc:
  1597		/* paranoia check */
> 1598		if (size >= core_file_note_size_min) {
  1599			pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
  1600				      size);
  1601			return -EINVAL;
  1602		}
  1603		size = round_up(size, PAGE_SIZE);
  1604		/*
  1605		 * "size" can be 0 here legitimately.
  1606		 * Let it ENOMEM and omit NT_FILE section which will be empty anyway.
  1607		 */
  1608		data = kvmalloc(size, GFP_KERNEL);
  1609		if (ZERO_OR_NULL_PTR(data))
  1610			return -ENOMEM;
  1611	
  1612		start_end_ofs = data + 2;
  1613		name_base = name_curpos = ((char *)data) + names_ofs;
  1614		remaining = size - names_ofs;
  1615		count = 0;
  1616		for (i = 0; i < cprm->vma_count; i++) {
  1617			struct core_vma_metadata *m = &cprm->vma_meta[i];
  1618			struct file *file;
  1619			const char *filename;
  1620	
  1621			file = m->file;
  1622			if (!file)
  1623				continue;
  1624			filename = file_path(file, name_curpos, remaining);
  1625			if (IS_ERR(filename)) {
  1626				if (PTR_ERR(filename) == -ENAMETOOLONG) {
  1627					kvfree(data);
  1628					size = size * 5 / 4;
  1629					goto alloc;
  1630				}
  1631				continue;
  1632			}
  1633	
  1634			/* file_path() fills at the end, move name down */
  1635			/* n = strlen(filename) + 1: */
  1636			n = (name_curpos + remaining) - filename;
  1637			remaining = filename - name_curpos;
  1638			memmove(name_curpos, filename, n);
  1639			name_curpos += n;
  1640	
  1641			*start_end_ofs++ = m->start;
  1642			*start_end_ofs++ = m->end;
  1643			*start_end_ofs++ = m->pgoff;
  1644			count++;
  1645		}
  1646	
  1647		/* Now we know exact count of files, can store it */
  1648		data[0] = count;
  1649		data[1] = PAGE_SIZE;
  1650		/*
  1651		 * Count usually is less than mm->map_count,
  1652		 * we need to move filenames down.
  1653		 */
  1654		n = cprm->vma_count - count;
  1655		if (n != 0) {
  1656			unsigned shift_bytes = n * 3 * sizeof(data[0]);
  1657			memmove(name_base - shift_bytes, name_base,
  1658				name_curpos - name_base);
  1659			name_curpos -= shift_bytes;
  1660		}
  1661	
  1662		size = name_curpos - (char *)data;
  1663		fill_note(note, "CORE", NT_FILE, size, data);
  1664		return 0;
  1665	}
  1666
Allen May 7, 2024, 4:36 p.m. UTC | #2
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on kees/for-next/execve]
> [also build test ERROR on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
> patch link:    https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
> patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
> config: loongarch-randconfig-001-20240507 (https://download.01.org/0day-ci/archive/20240507/202405072249.fLkavX40-lkp@intel.com/config)
> compiler: loongarch64-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240507/202405072249.fLkavX40-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202405072249.fLkavX40-lkp@intel.com/
>

 Thanks for reporting. The kernel builds fine with
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
for-next/execve.
The issue is with loongarch-randconfig, which has CONFIG_SYSCTL set to
"n". It is needed for this patch.


Thanks,
Allen

> All errors (new ones prefixed by >>):
>
>    fs/binfmt_elf.c: In function 'fill_files_note':
> >> fs/binfmt_elf.c:1598:21: error: 'core_file_note_size_min' undeclared (first use in this function)
>     1598 |         if (size >= core_file_note_size_min) {
>          |                     ^~~~~~~~~~~~~~~~~~~~~~~
>    fs/binfmt_elf.c:1598:21: note: each undeclared identifier is reported only once for each function it appears in
>
>
> vim +/core_file_note_size_min +1598 fs/binfmt_elf.c
>
>   1569
>   1570  /*
>   1571   * Format of NT_FILE note:
>   1572   *
>   1573   * long count     -- how many files are mapped
>   1574   * long page_size -- units for file_ofs
>   1575   * array of [COUNT] elements of
>   1576   *   long start
>   1577   *   long end
>   1578   *   long file_ofs
>   1579   * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
>   1580   */
>   1581  static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm)
>   1582  {
>   1583          unsigned count, size, names_ofs, remaining, n;
>   1584          user_long_t *data;
>   1585          user_long_t *start_end_ofs;
>   1586          char *name_base, *name_curpos;
>   1587          int i;
>   1588
>   1589          /* *Estimated* file count and total data size needed */
>   1590          count = cprm->vma_count;
>   1591          if (count > UINT_MAX / 64)
>   1592                  return -EINVAL;
>   1593          size = count * 64;
>   1594
>   1595          names_ofs = (2 + 3 * count) * sizeof(data[0]);
>   1596   alloc:
>   1597          /* paranoia check */
> > 1598          if (size >= core_file_note_size_min) {
>   1599                  pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
>   1600                                size);
>   1601                  return -EINVAL;
>   1602          }
>   1603          size = round_up(size, PAGE_SIZE);
>   1604          /*
>   1605           * "size" can be 0 here legitimately.
>   1606           * Let it ENOMEM and omit NT_FILE section which will be empty anyway.
>   1607           */
>   1608          data = kvmalloc(size, GFP_KERNEL);
>   1609          if (ZERO_OR_NULL_PTR(data))
>   1610                  return -ENOMEM;
>   1611
>   1612          start_end_ofs = data + 2;
>   1613          name_base = name_curpos = ((char *)data) + names_ofs;
>   1614          remaining = size - names_ofs;
>   1615          count = 0;
>   1616          for (i = 0; i < cprm->vma_count; i++) {
>   1617                  struct core_vma_metadata *m = &cprm->vma_meta[i];
>   1618                  struct file *file;
>   1619                  const char *filename;
>   1620
>   1621                  file = m->file;
>   1622                  if (!file)
>   1623                          continue;
>   1624                  filename = file_path(file, name_curpos, remaining);
>   1625                  if (IS_ERR(filename)) {
>   1626                          if (PTR_ERR(filename) == -ENAMETOOLONG) {
>   1627                                  kvfree(data);
>   1628                                  size = size * 5 / 4;
>   1629                                  goto alloc;
>   1630                          }
>   1631                          continue;
>   1632                  }
>   1633
>   1634                  /* file_path() fills at the end, move name down */
>   1635                  /* n = strlen(filename) + 1: */
>   1636                  n = (name_curpos + remaining) - filename;
>   1637                  remaining = filename - name_curpos;
>   1638                  memmove(name_curpos, filename, n);
>   1639                  name_curpos += n;
>   1640
>   1641                  *start_end_ofs++ = m->start;
>   1642                  *start_end_ofs++ = m->end;
>   1643                  *start_end_ofs++ = m->pgoff;
>   1644                  count++;
>   1645          }
>   1646
>   1647          /* Now we know exact count of files, can store it */
>   1648          data[0] = count;
>   1649          data[1] = PAGE_SIZE;
>   1650          /*
>   1651           * Count usually is less than mm->map_count,
>   1652           * we need to move filenames down.
>   1653           */
>   1654          n = cprm->vma_count - count;
>   1655          if (n != 0) {
>   1656                  unsigned shift_bytes = n * 3 * sizeof(data[0]);
>   1657                  memmove(name_base - shift_bytes, name_base,
>   1658                          name_curpos - name_base);
>   1659                  name_curpos -= shift_bytes;
>   1660          }
>   1661
>   1662          size = name_curpos - (char *)data;
>   1663          fill_note(note, "CORE", NT_FILE, size, data);
>   1664          return 0;
>   1665  }
>   1666
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
kernel test robot May 7, 2024, 4:55 p.m. UTC | #3
Hi Allen,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kees/for-next/execve]
[also build test WARNING on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
patch link:    https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
config: arm64-randconfig-r054-20240507 (https://download.01.org/0day-ci/archive/20240508/202405080045.cTS3F5Or-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240508/202405080045.cTS3F5Or-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405080045.cTS3F5Or-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> fs/coredump.c:67:27: warning: 'core_file_note_size_max' defined but not used [-Wunused-const-variable=]
      67 | static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
         |                           ^~~~~~~~~~~~~~~~~~~~~~~


vim +/core_file_note_size_max +67 fs/coredump.c

    62	
    63	static int core_uses_pid;
    64	static unsigned int core_pipe_limit;
    65	static char core_pattern[CORENAME_MAX_SIZE] = "core";
    66	static int core_name_size = CORENAME_MAX_SIZE;
  > 67	static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
    68	unsigned int core_file_note_size_min = CORE_FILE_NOTE_SIZE_DEFAULT;
    69
Kees Cook May 7, 2024, 5:02 p.m. UTC | #4
On Mon, 06 May 2024 19:37:00 +0000, Allen Pais wrote:
> Introduce the capability to dynamically configure the maximum file
> note size for ELF core dumps via sysctl.
> 
> Why is this being done?
> We have observed that during a crash when there are more than 65k mmaps
> in memory, the existing fixed limit on the size of the ELF notes section
> becomes a bottleneck. The notes section quickly reaches its capacity,
> leading to incomplete memory segment information in the resulting coredump.
> This truncation compromises the utility of the coredumps, as crucial
> information about the memory state at the time of the crash might be
> omitted.
> 
> [...]

I adjusted file names, but put it in -next. I had given some confusing
feedback on v3, but I didn't realize until later; apologies for that! The
end result is the sysctl is named kernel.core_file_note_size_limit and
the internal const min/max variables have the _min and _max suffixes.

Applied to for-next/execve, thanks!

[1/1] fs/coredump: Enable dynamic configuration of max file note size
      https://git.kernel.org/kees/c/81e238b1299e

Take care,
Allen May 7, 2024, 5:13 p.m. UTC | #5
> On Mon, 06 May 2024 19:37:00 +0000, Allen Pais wrote:
> > Introduce the capability to dynamically configure the maximum file
> > note size for ELF core dumps via sysctl.
> >
> > Why is this being done?
> > We have observed that during a crash when there are more than 65k mmaps
> > in memory, the existing fixed limit on the size of the ELF notes section
> > becomes a bottleneck. The notes section quickly reaches its capacity,
> > leading to incomplete memory segment information in the resulting coredump.
> > This truncation compromises the utility of the coredumps, as crucial
> > information about the memory state at the time of the crash might be
> > omitted.
> >
> > [...]
>
> I adjusted file names, but put it in -next. I had given some confusing
> feedback on v3, but I didn't realize until later; apologies for that! The
> end result is the sysctl is named kernel.core_file_note_size_limit and
> the internal const min/max variables have the _min and _max suffixes.
>
> Applied to for-next/execve, thanks!
>
> [1/1] fs/coredump: Enable dynamic configuration of max file note size
>       https://git.kernel.org/kees/c/81e238b1299e
>

 I should have put some thought into the feedback.
Thank you for reviewing and fixing the patch.
diff mbox series

Patch

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5397b552fbeb..4dc7eb265a97 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1564,7 +1564,6 @@  static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
 	fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
 }
 
-#define MAX_FILE_NOTE_SIZE (4*1024*1024)
 /*
  * Format of NT_FILE note:
  *
@@ -1592,8 +1591,12 @@  static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
 
 	names_ofs = (2 + 3 * count) * sizeof(data[0]);
  alloc:
-	if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
+	/* paranoia check */
+	if (size >= core_file_note_size_min) {
+		pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
+			      size);
 		return -EINVAL;
+	}
 	size = round_up(size, PAGE_SIZE);
 	/*
 	 * "size" can be 0 here legitimately.
diff --git a/fs/coredump.c b/fs/coredump.c
index be6403b4b14b..20807c3c5477 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -56,10 +56,16 @@ 
 static bool dump_vma_snapshot(struct coredump_params *cprm);
 static void free_vma_snapshot(struct coredump_params *cprm);
 
+#define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
+/* Define a reasonable max cap */
+#define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
+
 static int core_uses_pid;
 static unsigned int core_pipe_limit;
 static char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
+static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
+unsigned int core_file_note_size_min = CORE_FILE_NOTE_SIZE_DEFAULT;
 
 struct core_name {
 	char *corename;
@@ -1020,6 +1026,15 @@  static struct ctl_table coredump_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname       = "core_file_note_size_min",
+		.data           = &core_file_note_size_min,
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= &core_file_note_size_min,
+		.extra2		= (unsigned int *) &core_file_note_size_max,
+	},
 };
 
 static int __init init_fs_coredump_sysctls(void)
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index d3eba4360150..f6be9fd2aea7 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -46,6 +46,7 @@  static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
 #endif
 
 #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
+extern unsigned int core_file_note_size_min;
 extern void validate_coredump_safety(void);
 #else
 static inline void validate_coredump_safety(void) {}