From patchwork Fri Jan 7 10:53:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12706498 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44030C433EF for ; Fri, 7 Jan 2022 10:54:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237912AbiAGKyF (ORCPT ); Fri, 7 Jan 2022 05:54:05 -0500 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:56893 "EHLO out4-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237932AbiAGKyE (ORCPT ); Fri, 7 Jan 2022 05:54:04 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 97A1B5C019F; Fri, 7 Jan 2022 05:54:03 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Fri, 07 Jan 2022 05:54:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:mime-version:content-type; s=fm3; bh=GsyDT5uPYQfyOehZeDdFKl/3GhIo2ifrm9VbGN4swA0=; b=qXXeOVwuSCrB SLcLIHORwr+fLWWxyPaESIxfNn7GMKsqfnZQuVdiGkakHY253TURf7edzIJWFCxg hf02F9F4cn8EdbAB7yXiFtKZ/cmD8u9xY41M7OoQyn7qKVfp+ytvSTplmdDitOHv V3UNCAtLhcEooBtYncSz0N0YNFHDzR/bi3jznleFcqgBU56kBN1t/C/+IoT2fh+Z pL9U6A4NEzgP1MO4YvvQx3BDbH2a+evXmHIe2vtdTahHC7Y5m7zv9LIYxeAapfPg 3jP1rTVVR58o2IspfeolxM++SzG6rOPy4/qZ6KV+NIvWRwXfCMzUTNqFAG4fU/Ol BxBZS17qmw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=GsyDT5uPYQfyOehZeDdFKl/3GhIo2 ifrm9VbGN4swA0=; b=Z64PIpOD2o5CUFtUrU/Fgz2mWw4cjScnDoa34vh6old+L CcEWhJchZIaWp4kuFqrTM99hwyUZRF4BBFTOf+Be8XqSE25Pt8HIzwTZUCKGq+o2 3y87agDFxr2DlTtQzy1+OpHf+LCtY7L9LnqG5IjYW4RkXyoMxNbXlNceMxX6d7oC MmrtUSaPxAk6hm664smyf98NdK2ezYDNd7fcNF3/Q+YZaDUqDwNZjyxUUQDOiA8U 1G0+sFQbNPCcnGEglrg1VkcwF6IZKUsvLLtzUFcmUY01tzFtzRfqvexEtqm57z9q J/6cnFoeIqhoMGrgeUevFqjGc7UuYwsljUIMhW3AQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudeguddgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtugesghdtreertd dtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehpkhhs rdhimheqnecuggftrfgrthhtvghrnhepjeeifedvueelfffgjeduffdvgefhiefgjefgvd dvfeduvefffeevfffhgfekieffnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 7 Jan 2022 05:54:02 -0500 (EST) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id de782017 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 7 Jan 2022 10:53:59 +0000 (UTC) Date: Fri, 7 Jan 2022 11:53:35 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: iwiedler@gitlab.com Subject: [PATCH 0/1] Async-signal safety in signal handlers Message-ID: MIME-Version: 1.0 Content-Disposition: inline Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Hi, we have recently observed a Git process which has been hanging around for more than a month on one of our servers in production. A backtrace showed that the git-fetch(1) process was deadlocked in its signal handler while trying to free memory. Functions like malloc, free and most I/O functions aren't reentrant though, which means they must not be executed in async signal handlers as specified in signal-safety(7). The fix for git-fetch(1) is rather simple: we can just unlink(2) the lockfiles, which is indeed allowed, but skip free'ing memory. But in fact, this is a wider issue we have: we mostly didn't pay attention to those restrictions, and thus we freely call non-async-signal-safe functions. It's less clear what to do about this in most of the cases though: - git-clone(1) tries to clean up the ".git" directory and its worktree on being killed, but needs to allocate memory to compute corresponding paths. We can try to preallocate the buffer, but it's not clear whether there is a proper upper boundary. - git-gc(1) will try to commit "gc.log" and write to stderr, both of which aren't allowed. I think we'll have to just bail and leave it behind in a partially-written state. - git-repack(1) tries to remove "pack/.tmp-*" files, calling opendir(3P), readdir(3P), closedir(3P) and allocates memory. We probably have to keep track of all temporary files we create in a global list, which we can then access in our signal handler. - git-worktree(1) is doing the same as git-clone(1), trying to prune the new worktree if it's killed. Again, we'd probably have to preallocate a buffer to compute paths. - HTTP pushes do all sorts of HTTP requests in their signal handler to unlock the remote server. I don't really see what to do about this except drop the code -- setting a global "please clean up and exit now" flags is probably not going to fly well. The tempfiles and tmp-objdir code already handles signals correctly. Patrick Patrick Steinhardt (1): fetch: fix deadlock when cleaning up lockfiles in async signals builtin/clone.c | 2 +- builtin/fetch.c | 17 +++++++++++------ transport.c | 11 ++++++++--- transport.h | 14 +++++++++++++- 4 files changed, 33 insertions(+), 11 deletions(-)