diff mbox series

[v2] xmmap: inform Linux users of tuning knobs on ENOMEM

Message ID 20210630000132.GA2653@dcvr (mailing list archive)
State Accepted
Commit e0e19d5d265e5e59acf3c9e69266bc67c0e0b364
Headers show
Series [v2] xmmap: inform Linux users of tuning knobs on ENOMEM | expand

Commit Message

Eric Wong June 30, 2021, 12:01 a.m. UTC
This series is now down to a single patch.

I wanted to make things more transparent to users without
privileges to raise sys.vm.max_map_count and/or RLIMIT_DATA;
but it doesn't seem possible to account for libc/zlib/etc. doing
mmap() without our knowledge (usually via malloc).

So I think giving users some information to feed their sysadmins
is the best we can do in this situation:

--------------8<-----------
Subject: [PATCH] xmmap: inform Linux users of tuning knobs on ENOMEM

Linux users may benefit from additional information on how to
avoid ENOMEM from mmap despite the system having enough RAM to
accomodate them.  We can't reliably unmap pack windows to work
around the issue since malloc and other library routines may
mmap without our knowledge.

Signed-off-by: Eric Wong <e@80x24.org>
---
 config.c          |  3 ++-
 git-compat-util.h |  1 +
 object-file.c     | 16 +++++++++++++++-
 packfile.c        |  4 ++--
 read-cache.c      |  3 ++-
 5 files changed, 22 insertions(+), 5 deletions(-)

Comments

Jeff King June 30, 2021, 2:46 a.m. UTC | #1
On Wed, Jun 30, 2021 at 12:01:32AM +0000, Eric Wong wrote:

> This series is now down to a single patch.
> 
> I wanted to make things more transparent to users without
> privileges to raise sys.vm.max_map_count and/or RLIMIT_DATA;
> but it doesn't seem possible to account for libc/zlib/etc. doing
> mmap() without our knowledge (usually via malloc).

Oh, I should have read this one before reviewing the inner parts of v1. :)

In general I agree that trying to manage our map count can never be
foolproof. As you note, other parts of the system may contribute to that
count. But even within Git, we don't have any mechanism for unmapping
many non-packfiles. So if you have 30,000 packs, you may hit the limit
purely via the .idx files (and ditto for the new .rev files, and
probably commit-graph files, etc).

That said, I'm not opposed to handling xmmap() failures more gracefully,
as your series did. It's not foolproof, but it might help in some cases.

> So I think giving users some information to feed their sysadmins
> is the best we can do in this situation:

This seems OK to me, too. Translators might complain a bit about the
message-lego. I don't have a strong opinion.

-Peff
Eric Wong June 30, 2021, 10:07 a.m. UTC | #2
Jeff King <peff@peff.net> wrote:
> On Wed, Jun 30, 2021 at 12:01:32AM +0000, Eric Wong wrote:
> 
> > This series is now down to a single patch.
> > 
> > I wanted to make things more transparent to users without
> > privileges to raise sys.vm.max_map_count and/or RLIMIT_DATA;
> > but it doesn't seem possible to account for libc/zlib/etc. doing
> > mmap() without our knowledge (usually via malloc).
> 
> Oh, I should have read this one before reviewing the inner parts of v1. :)
> 
> In general I agree that trying to manage our map count can never be
> foolproof. As you note, other parts of the system may contribute to that
> count. But even within Git, we don't have any mechanism for unmapping
> many non-packfiles. So if you have 30,000 packs, you may hit the limit
> purely via the .idx files (and ditto for the new .rev files, and
> probably commit-graph files, etc).

Yeah, the most annoying thing with my original series was when
I hit "inflate: out of memory" once I stopped xmmap from dying.
I suspect that would be a worse far error message for users who
aren't familiar with how malloc works.

> That said, I'm not opposed to handling xmmap() failures more gracefully,
> as your series did. It's not foolproof, but it might help in some cases.

I've also been wondering if we can maintain a watermark based
on reading the contents of /proc/sys/vm/max_map_count and the
mappings we make.   Then we could start dropping mappings
when we hit half (or some other threshold) of that sysctl.
Similar for RLIMIT_DATA (though that defaults to unlimited
on my Debian system).

OTOH, I also wonder if we're overusing mmap when we could be
just as well served with pread.

I'm not up-to-date on modern mmap performance and maybe CPU
vulnerability mitigations nowadays make mmap more compelling.
However, once upon a time in 2006, pread could be a hair quicker:

https://lore.kernel.org/git/Pine.LNX.4.64.0612182234260.3479@woody.osdl.org/T/#u
(But that info could be out-of-date...)

I'm also somewhat paranoid when it comes to mmap since rogue
processes could be truncating mmap-ed files to cause bus errors.

> > So I think giving users some information to feed their sysadmins
> > is the best we can do in this situation:
> 
> This seems OK to me, too. Translators might complain a bit about the
> message-lego. I don't have a strong opinion.

*shrug*  I saw my original patches already ended up in `seen'
(commit 7b79212a93c375365c06cab5c0018ab97a4185cf)
Jeff King June 30, 2021, 5:18 p.m. UTC | #3
On Wed, Jun 30, 2021 at 10:07:22AM +0000, Eric Wong wrote:

> > In general I agree that trying to manage our map count can never be
> > foolproof. As you note, other parts of the system may contribute to that
> > count. But even within Git, we don't have any mechanism for unmapping
> > many non-packfiles. So if you have 30,000 packs, you may hit the limit
> > purely via the .idx files (and ditto for the new .rev files, and
> > probably commit-graph files, etc).
> 
> Yeah, the most annoying thing with my original series was when
> I hit "inflate: out of memory" once I stopped xmmap from dying.
> I suspect that would be a worse far error message for users who
> aren't familiar with how malloc works.

Yep. When I've seen this problem in the wild, that was exactly the
message I hit, too.

> > That said, I'm not opposed to handling xmmap() failures more gracefully,
> > as your series did. It's not foolproof, but it might help in some cases.
> 
> I've also been wondering if we can maintain a watermark based
> on reading the contents of /proc/sys/vm/max_map_count and the
> mappings we make.   Then we could start dropping mappings
> when we hit half (or some other threshold) of that sysctl.
> Similar for RLIMIT_DATA (though that defaults to unlimited
> on my Debian system).

Maybe. You'd still need to teach Git to start closing pack idx files.

One downside of closing any maps is that it leads to races with
simultaneous repacks. Usually if a pack goes away we'll re-scan the
objects directory and look for new packs. But in pack-objects, we commit
ourselves to a certain packed representation. If that pack goes away
(and we aren't holding on to it via an mmap or open descriptor), then we
have to bail.

If we hit a point where the alternative is calling die() because we
couldn't map or allocate something, then risking the race is OK. But
unmapping at a more conservative level opens us up to that race even
when we would have otherwise succeeded.

> OTOH, I also wonder if we're overusing mmap when we could be
> just as well served with pread.
> 
> I'm not up-to-date on modern mmap performance and maybe CPU
> vulnerability mitigations nowadays make mmap more compelling.
> However, once upon a time in 2006, pread could be a hair quicker:

You can compile with NO_MMAP now to emulate it with pread(), but I doubt
it performs that well. If you shrink the packedGitWindowSize, then I
think use_pack() would bring in and operate on reasonable amounts of
bytes at a time.

But lots of other code will mmap whole files (.idx code, packed-refs
lookup, the index). We'd end up copying those files entirely into
memory, rather than accessing them directly page-wise.  That increases
RAM usage (we are copying into our own heap) and makes operations
inherently O(n) in the file sizes (one of the big reasons we mmap the
packed-refs file is so that we can binary search without having to
process the whole thing).

Those cases could be converted to use pread() more directly. Doing so
for .idx files might not be too bad, as the records are fixed-length and
we know where each step of the binary search will land. For packed-refs
it would get pretty hairy; we jump to a particular offset and then have
to scan around (sometimes backwards) for the beginning of the record.

> > > So I think giving users some information to feed their sysadmins
> > > is the best we can do in this situation:
> > 
> > This seems OK to me, too. Translators might complain a bit about the
> > message-lego. I don't have a strong opinion.
> 
> *shrug*  I saw my original patches already ended up in `seen'
> (commit 7b79212a93c375365c06cab5c0018ab97a4185cf)

That doesn't necessarily mean much. Only that the topic was "seen" by
the maintainer and didn't look like complete garbage. :)

-Peff
diff mbox series

Patch

diff --git a/config.c b/config.c
index f9c400ad30..79ae9f2dea 100644
--- a/config.c
+++ b/config.c
@@ -3051,7 +3051,8 @@  int git_config_set_multivar_in_file_gently(const char *config_filename,
 		if (contents == MAP_FAILED) {
 			if (errno == ENODEV && S_ISDIR(st.st_mode))
 				errno = EISDIR;
-			error_errno(_("unable to mmap '%s'"), config_filename);
+			error_errno(_("unable to mmap '%s'%s"),
+					config_filename, mmap_os_err());
 			ret = CONFIG_INVALID_FILE;
 			contents = NULL;
 			goto out_free;
diff --git a/git-compat-util.h b/git-compat-util.h
index fb6e9af76b..fa6dd92219 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -876,6 +876,7 @@  char *xstrndup(const char *str, size_t len);
 void *xrealloc(void *ptr, size_t size);
 void *xcalloc(size_t nmemb, size_t size);
 void *xmmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
+const char *mmap_os_err(void);
 void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_t offset);
 int xopen(const char *path, int flags, ...);
 ssize_t xread(int fd, void *buf, size_t len);
diff --git a/object-file.c b/object-file.c
index f233b440b2..b9c3219793 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1023,12 +1023,26 @@  void *xmmap_gently(void *start, size_t length,
 	return ret;
 }
 
+const char *mmap_os_err(void)
+{
+	static const char blank[] = "";
+#if defined(__linux__)
+	if (errno == ENOMEM) {
+		/* this continues an existing error message: */
+		static const char enomem[] =
+", check sys.vm.max_map_count and/or RLIMIT_DATA";
+		return enomem;
+	}
+#endif /* OS-specific bits */
+	return blank;
+}
+
 void *xmmap(void *start, size_t length,
 	int prot, int flags, int fd, off_t offset)
 {
 	void *ret = xmmap_gently(start, length, prot, flags, fd, offset);
 	if (ret == MAP_FAILED)
-		die_errno(_("mmap failed"));
+		die_errno(_("mmap failed%s"), mmap_os_err());
 	return ret;
 }
 
diff --git a/packfile.c b/packfile.c
index 755aa7aec5..9ef6d98292 100644
--- a/packfile.c
+++ b/packfile.c
@@ -652,8 +652,8 @@  unsigned char *use_pack(struct packed_git *p,
 				PROT_READ, MAP_PRIVATE,
 				p->pack_fd, win->offset);
 			if (win->base == MAP_FAILED)
-				die_errno("packfile %s cannot be mapped",
-					  p->pack_name);
+				die_errno(_("packfile %s cannot be mapped%s"),
+					  p->pack_name, mmap_os_err());
 			if (!win->offset && win->len == p->pack_size
 				&& !p->do_not_close)
 				close_pack_fd(p);
diff --git a/read-cache.c b/read-cache.c
index 77961a3885..a80902155c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2236,7 +2236,8 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	mmap = xmmap_gently(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
-		die_errno(_("%s: unable to map index file"), path);
+		die_errno(_("%s: unable to map index file%s"), path,
+			mmap_os_err());
 	close(fd);
 
 	hdr = (const struct cache_header *)mmap;