From patchwork Fri Dec 6 12:42:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13897172 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5F0A205AA3 for ; Fri, 6 Dec 2024 12:43:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488992; cv=none; b=evpSBNv61nh7b2hoXfKwhXgEFfN0CRmtcMi7DyqWblntoI7nKei9CacEHKzumiYido4lcFVd0rtdsWe50JQG3G3jHYXe5dfLrYflaGOImrtWoq6drL9oGmgS3vLvleTIHK1IGurxDzPo+YkaU7dMsUiA9I/X7slt2PGhHmweKMM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488992; c=relaxed/simple; bh=JQh6V06k4lThjh4865pSf5cjgG8MzpacqgbD7cBS+p4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oFJ8lMOyQsebuwqbzeyyBheMEON7STrP2TN11iMnWfpH7qk1hMeOCgmK+PGvaFcye3L+uCroxUE+lgPopsN/O7YYzbPUY/HzjeKawAhCX/Wtp0WMlm2nUtucyhkB89kmed4FEmhJug6VE7AK9bErcwv/icKXroVFK6tvmBQVydw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QBw4nkQR; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QBw4nkQR" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a9a0ec0a94fso251358966b.1 for ; Fri, 06 Dec 2024 04:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733488988; x=1734093788; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pJcXylVXhIZnD6LVag+4lg1ND9yDNAOCRfQCaY+ewBo=; b=QBw4nkQRWi6jAOoOROOmrNGChzR5mEbtDRHoRi0Ruju7+XibphC1bcQ1QhRBQzUX/Z qv8tjgDYIgxakjZqObSsAq2kcxZ/s+F9VzYRv/kH0edxLabFR8Nf8D4YFWoT+6q/W4Tw cwmo1cYXuPfRMtAubVT2EgR/Uwmy2XXicDhXIPqMhgCy9tzsFSEfzaEp+KSPrdDslaoG vYp7vNIqTP4baHxPKJO3KtQhMIfiUtgCEaksAp6cG0wa2Zd/Uzr5rntUEihwzSAq+vHs 1fgP+UpWTEloBZuuwqPPJ6AsWJ3+f2bZNoeQNy9jYMWAdYFLKBZZ4ELR34yrsykSS+X7 S/xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733488988; x=1734093788; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pJcXylVXhIZnD6LVag+4lg1ND9yDNAOCRfQCaY+ewBo=; b=l6Wxpnq0fVRh3orsckefVOIw9nD8IX3mDlltxktEt1VJwULEUZy6x59yWU6qCBybOC bhjZhCgv5V5ni3bt96B38IQsdithOUSemdaSkprjZknwMBTvQMFmUxeg0JXYY4JwsmBo TYOkK2lENoZk2ctOJwb1wN28WLP4wOZaEjoAUvTAE9zZ3Vfl69vvE8sdwfvdKt11mNuN No2GoYcfI+/0Rouj7WfyM7JTb807yBhxZ/UL3xrGTQE9bElebGuWElqsmaWuY/DjQYNL vUHWeDUYhJ4DB9/b/zOx/8gMXCg5YsirRmHcuIgs/wGEloIvcaboEGCt8J6VBvGaFoog R5FQ== X-Gm-Message-State: AOJu0YxRQZBQd7KudS1yL5ZeN608Pe7zrV6i+GhmkfgdZCRQvDJtE+4d aR1hu9jgO+S1rC69/oXIDeHq6x+/gLtZBdvjW8D71c4+IaVtJRDJqxJlVQ== X-Gm-Gg: ASbGncu0cBiw0lPNwcYoTteAtjN4nQci4cEmxbmSJh3UdmkMV/3ecSdYp0gOf3tczZD d8hYDyOicEbh2a1Q/7fluXIg/D6/i7wIK2TnbjgoTJEuVCBfD+rpfjlpBXk6BKofffRZylDueY1 qXPrWhX1LEa5LI2bqQ39r68xXdmyKRua/Ry7lzcTy0EUeStJS5qGJNOX4qVstdhGTbV1fgBzUYm QEaVw2sgRRKhv0sno7WbWleAwGTXhkKa8zpED6bxMitlt7gi8bxebx/AvZK4ksmrc4Ae0Omy15G 8Q4T+3KYy9QtS3xhgEJyv7JxWGe9GMmy X-Google-Smtp-Source: AGHT+IF2eub8dcEpOGoCAIPwcn0fprMCM6/jsY1/Cc8qw4Yu7m6kC75qjuyWBil9qDMKYx7EozqxgQ== X-Received: by 2002:a17:906:31d1:b0:aa5:f288:e7d8 with SMTP id a640c23a62f3a-aa63a128104mr184483866b.26.1733488987887; Fri, 06 Dec 2024 04:43:07 -0800 (PST) Received: from christian-Precision-5550.. (176-138-135-207.abo.bbox.fr. [176.138.135.207]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa625e4da51sm236527266b.37.2024.12.06.04.43.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 04:43:06 -0800 (PST) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Patrick Steinhardt , Taylor Blau , Eric Sunshine , Christian Couder , Christian Couder Subject: [PATCH v3 1/5] version: refactor strbuf_sanitize() Date: Fri, 6 Dec 2024 13:42:44 +0100 Message-ID: <20241206124248.160494-2-christian.couder@gmail.com> X-Mailer: git-send-email 2.47.1.402.gc25c94707f In-Reply-To: <20241206124248.160494-1-christian.couder@gmail.com> References: <20240910163000.1985723-1-christian.couder@gmail.com> <20241206124248.160494-1-christian.couder@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The git_user_agent_sanitized() function performs some sanitizing to avoid special characters being sent over the line and possibly messing up with the protocol or with the parsing on the other side. Let's extract this sanitizing into a new strbuf_sanitize() function, as we will want to reuse it in a following patch, and let's put it into strbuf.{c,h}. While at it, let's also make a few small improvements: - use 'size_t' for 'i' instead of 'int', - move the declaration of 'i' inside the 'for ( ... )', - use strbuf_detach() to explicitly detach the string contained by the 'sb' strbuf. Helped-by: Eric Sunshine Signed-off-by: Christian Couder --- strbuf.c | 9 +++++++++ strbuf.h | 7 +++++++ version.c | 9 ++------- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/strbuf.c b/strbuf.c index 3d2189a7f6..cccfdec0e3 100644 --- a/strbuf.c +++ b/strbuf.c @@ -1082,3 +1082,12 @@ void strbuf_strip_file_from_path(struct strbuf *sb) char *path_sep = find_last_dir_sep(sb->buf); strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0); } + +void strbuf_sanitize(struct strbuf *sb) +{ + strbuf_trim(sb); + for (size_t i = 0; i < sb->len; i++) { + if (sb->buf[i] <= 32 || sb->buf[i] >= 127) + sb->buf[i] = '.'; + } +} diff --git a/strbuf.h b/strbuf.h index 003f880ff7..884157873e 100644 --- a/strbuf.h +++ b/strbuf.h @@ -664,6 +664,13 @@ typedef int (*char_predicate)(char ch); void strbuf_addstr_urlencode(struct strbuf *sb, const char *name, char_predicate allow_unencoded_fn); +/* + * Trim and replace each character with ascii code below 32 or above + * 127 (included) using a dot '.' character. Useful for sending + * capabilities. + */ +void strbuf_sanitize(struct strbuf *sb); + __attribute__((format (printf,1,2))) int printf_ln(const char *fmt, ...); __attribute__((format (printf,2,3))) diff --git a/version.c b/version.c index 41b718c29e..951e6dca74 100644 --- a/version.c +++ b/version.c @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void) if (!agent) { struct strbuf buf = STRBUF_INIT; - int i; strbuf_addstr(&buf, git_user_agent()); - strbuf_trim(&buf); - for (i = 0; i < buf.len; i++) { - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) - buf.buf[i] = '.'; - } - agent = buf.buf; + strbuf_sanitize(&buf); + agent = strbuf_detach(&buf, NULL); } return agent; From patchwork Fri Dec 6 12:42:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13897173 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B72C2066E2 for ; Fri, 6 Dec 2024 12:43:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488994; cv=none; b=On0+2RyG22TbOcFa8PFi0Ik7DbF/bwbKGj63Bb7AvRPyjSs5GBQT8u5VZe1/FZ+uOOqhwFTXQTcCc+6il36+H5NH2SpM+FC8gOXs9J85l0/WF9jHNBvT5VJ1CjjOmoI3XzGGPw9oTJMljYgQV07S3Fo1lYAr8c5YCTt8UL8k5kU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488994; c=relaxed/simple; bh=h5Ujg4TvBzOI1DoAa3pdEpCkrl67TZ1WnYiXeHXujm4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cH753CE1QJyn34iLEBxfFHCECuLv++zPAoWtFjyv7IKn+wg52bLkXeE+Ysls/3aQs1+mvcsMvGqIZDr8FR1EDZ+zjp5MJJ8myRFlbhWM05E3pO1cxgbBFNsih6sk9PMJtHUQY+y6xiZZ27DoRrEcGNkzBqpQ8jQgu4Xpz1klSKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Bh072ZcL; arc=none smtp.client-ip=209.85.218.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Bh072ZcL" Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-aa62f0cb198so209806666b.2 for ; Fri, 06 Dec 2024 04:43:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733488990; x=1734093790; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lvuy2LexqVQOKGFHw2pGQOzHXlUxKpe6JWZ2AST6r1s=; b=Bh072ZcLsJhfMjn/3d9CQ+iUKtvAu6uUx8FHxHlkDk7IGugFbWJHNqb9zc0OaFcXev nNmufrpWhS53hqFMYIItxtsFMBaY4twbpcf0wmBLL7z/gUcY5klXkkdc9NgPxdWXs9oL ebTxjG46EUfPh9JJR4MUQNN3vibzTxVLEMK8J/dUjf+az1gTMAYGYoYXIkHb4GrNqx7y lYth5M2DSrryPZGmHFpPJbZTipvG04l7EjVzpVqgcy340DHfDhgmAWNaLr7Zot2H02CB tmH0Q5Vb9NqZJcgJDovyRFAl0OLl+7j4/G5eApSwtbsYMQNjJL8EHG6BS1qLi3QWP8id RRBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733488990; x=1734093790; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lvuy2LexqVQOKGFHw2pGQOzHXlUxKpe6JWZ2AST6r1s=; b=QHK1YIT4729xpMnuVfKzcXpdBmSARzobf4XVUNngUczjkZvKFBhdtLdM1i+eT/gpJb +CqEVjRw8y8ZsbcvKmTAto5AVjUWZA7yES/5lhZhXwNUyMPiNPzSfsZj6nkrp5Vwl+dN j3jJUv1ZA1b0A20QgLG95vWxNkeicjgQ+NmzFSYQG/ySbmZ+8BUyk7LTtAQRraNaxESA WJrp15tn5JYFt7MnExH3hebs/I9Pn58HZh/CIeCpJm6amq2R4wXOp7sJ0MrgkwNQT8R+ JBxsRIQBJUQIK+XJxUEvuJXKPUeSC8FRiQkQxhPHDkW0oeczRXLhX4MmJ0730pvcHjH9 edvA== X-Gm-Message-State: AOJu0YzXm5mr+NZIoCMm+cCsrWSzA6GsifD2GOt+Wae2MSBu/d/cqQm+ TrYjH+JrN/xLB9yEj3E2OUaWzY7FOFrl+ZsLY3oCPKNdMUg5NXcAPa9kqw== X-Gm-Gg: ASbGncuFcr3cJw/n3y/hTa8oY9sCoJA4SBerJodZClqnejAhQIle2SjRm4seoo2rVcj uJnqb34gxFzUfwKLsB7oo9CtciVI5nrrHr2fSW6fmt/ZaC4xV1DAl8npon3B0SfrsoDvy/M0Ayk dHQLZBYJEThxW7ciBWDYi+IJM6PnIXgYLIElpHP2hu1l0FM8SjP96d5gJYbV5RWrM8hRu0hqEHA 8hKvSgAxRM/l+T/qTzX+6zFGyIIr8x1scbENcLTtUG4acXvxgdGOf4o/gZu32yh+pQRici38Db9 l5L4AAHGOrG1nBZnE8ckJ+U/h8dSMtT0 X-Google-Smtp-Source: AGHT+IHmrg+n2BvQu4diXgvunW5+gk6r27RfF9oLxrGwIFfVxQWZ84KFboCLbNxfKo6IfcVIVimrwg== X-Received: by 2002:a17:906:3191:b0:aa6:273f:452b with SMTP id a640c23a62f3a-aa63a005dfdmr171327166b.16.1733488990206; Fri, 06 Dec 2024 04:43:10 -0800 (PST) Received: from christian-Precision-5550.. (176-138-135-207.abo.bbox.fr. [176.138.135.207]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa625e4da51sm236527266b.37.2024.12.06.04.43.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 04:43:08 -0800 (PST) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Patrick Steinhardt , Taylor Blau , Eric Sunshine , Christian Couder , Christian Couder Subject: [PATCH v3 2/5] strbuf: refactor strbuf_trim_trailing_ch() Date: Fri, 6 Dec 2024 13:42:45 +0100 Message-ID: <20241206124248.160494-3-christian.couder@gmail.com> X-Mailer: git-send-email 2.47.1.402.gc25c94707f In-Reply-To: <20241206124248.160494-1-christian.couder@gmail.com> References: <20240910163000.1985723-1-christian.couder@gmail.com> <20241206124248.160494-1-christian.couder@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We often have to split strings at some specified terminator character. The strbuf_split*() functions, that we can use for this purpose, return substrings that include the terminator character, so we often need to remove that character. When it is a whitespace, newline or directory separator, the terminator character can easily be removed using an existing triming function like strbuf_rtrim(), strbuf_trim_trailing_newline() or strbuf_trim_trailing_dir_sep(). There is no function to remove that character when it's not one of those characters though. Let's introduce a new strbuf_trim_trailing_ch() function that can be used to remove any trailing character, and let's refactor existing code that manually removed trailing characters using this new function. We are also going to use this new function in a following commit. Signed-off-by: Christian Couder --- strbuf.c | 7 +++++++ strbuf.h | 3 +++ trace2/tr2_cfg.c | 10 ++-------- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/strbuf.c b/strbuf.c index cccfdec0e3..c986ec28f4 100644 --- a/strbuf.c +++ b/strbuf.c @@ -134,6 +134,13 @@ void strbuf_trim_trailing_dir_sep(struct strbuf *sb) sb->buf[sb->len] = '\0'; } +void strbuf_trim_trailing_ch(struct strbuf *sb, int c) +{ + while (sb->len > 0 && sb->buf[sb->len - 1] == c) + sb->len--; + sb->buf[sb->len] = '\0'; +} + void strbuf_trim_trailing_newline(struct strbuf *sb) { if (sb->len > 0 && sb->buf[sb->len - 1] == '\n') { diff --git a/strbuf.h b/strbuf.h index 884157873e..5e389ab065 100644 --- a/strbuf.h +++ b/strbuf.h @@ -197,6 +197,9 @@ void strbuf_trim_trailing_dir_sep(struct strbuf *sb); /* Strip trailing LF or CR/LF */ void strbuf_trim_trailing_newline(struct strbuf *sb); +/* Strip trailing character c */ +void strbuf_trim_trailing_ch(struct strbuf *sb, int c); + /** * Replace the contents of the strbuf with a reencoded form. Returns -1 * on error, 0 on success. diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c index 22a99a0682..9da1f8466c 100644 --- a/trace2/tr2_cfg.c +++ b/trace2/tr2_cfg.c @@ -35,10 +35,7 @@ static int tr2_cfg_load_patterns(void) tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1); for (s = tr2_cfg_patterns; *s; s++) { - struct strbuf *buf = *s; - - if (buf->len && buf->buf[buf->len - 1] == ',') - strbuf_setlen(buf, buf->len - 1); + strbuf_trim_trailing_ch(*s, ','); strbuf_trim_trailing_newline(*s); strbuf_trim(*s); } @@ -74,10 +71,7 @@ static int tr2_load_env_vars(void) tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1); for (s = tr2_cfg_env_vars; *s; s++) { - struct strbuf *buf = *s; - - if (buf->len && buf->buf[buf->len - 1] == ',') - strbuf_setlen(buf, buf->len - 1); + strbuf_trim_trailing_ch(*s, ','); strbuf_trim_trailing_newline(*s); strbuf_trim(*s); } From patchwork Fri Dec 6 12:42:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13897174 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9626E206F3D for ; Fri, 6 Dec 2024 12:43:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488997; cv=none; b=L5OmT1z4sna6HkMVoBaaEU2ouHP6GyiQnzOjS3+WPy2TxQgBi2h0wF1UTYi/foxNhiXtDHVtNKwv3EU7QDbR/QHGKDwJ8rBZUKBYdHid+YJelIDIpzbjCP7FszqLXHSUwGPsn3nRTo04Hpu6aHg10yTi6IaPzFpbJv0jQVq937I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488997; c=relaxed/simple; bh=/OU5W+Ki21aNHHzjDBNwHNtfHEcTkmry+xMA4Zh4wl0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G2nGNqZrp63sV08gRnWPe/iWU+1eRg7BgJhCtnLcScYe+rh9Yl3B95QJBsUgAaaYXxuVrDUJIFNmdqg0f8QimNAKvfusmsr13+cai8uIuCmF5iA+ArcuwyyJBA0kHPCE9wLIA4KEu4tLf02Xjq0ei3BuzufltnSsN1Wr3e/MQ9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=McImlnHA; arc=none smtp.client-ip=209.85.208.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="McImlnHA" Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5d0d4a2da4dso3283757a12.1 for ; Fri, 06 Dec 2024 04:43:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733488992; x=1734093792; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZLbM4W161DFHevTdn8YlWPSMN3Tu5v5SM6bsZPWwvPU=; b=McImlnHAHmUMKXt2voAGd1Dc5DSfUlxAcSh97vtJIcdhKCxLUQoF0Mv7PYz9NB6ZRx x+2KrTQbYUC6nckTF4IBWpC1MWBp6t5JxzMMnj0YplozKaXACJN3G73Iur0E9UGY8nko CYcx8mi9wWY3/Lvxh8Te4+yFk+/8HTPCnrUeAzd9mpsmyaFDyb9JTsMAEFUQmNTiQHsc k2YFbO0NycntvxV6rCSXwv7fdcV2qStBpCbDc6rSGOqhzJ9Xe2Tr1qsG27nlmkgSVi4K a7zkam8bJ7x5GMkaq5l6H+WMbke3/XEYONgFGI90EBCwc5FPVZuI6ERCyNNcQ8F8x1e3 sOCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733488992; x=1734093792; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZLbM4W161DFHevTdn8YlWPSMN3Tu5v5SM6bsZPWwvPU=; b=sW5sb+QZ0jXdhZ1/7+YALyRiASwg5R4LzyXf/80Y7BmCqBfCYGCiWjmPeHAtsELuHp vqyE3WU/Wuj3DWrnOzpMMfLYMCr3Xg1E1pfVlnx0bUonv2OUNJNkKGd6EdYF/rJqSPMF PrOUZRNINnrzrQqJxBLCrCaGZlHORkqPzwZTt544hJYEHrUgGw3duWSQ+0uGQ27LUQzL mCMBZYKgS+VVorBKCUBEmh1mQZeDKyp6/5hlWPN+E4ikDPDARJZ/DFv3Thgq3zE3JMRU OlP2JOs9C7E8NHJ06m4uoW861mGr0A11WZgSaM4HfANW0iW2dXoTKV2RvW1ws92nle8N rWsA== X-Gm-Message-State: AOJu0YwdyoTASe/+0EQToSqQ3YeBo5Ki5bd/W4jOEbvpIBhWVmk40UTf K2yTfShUMdZMuyor8oRHiSH0/paJRm3kO2+OmWDUhOOWrg3PTkQCnPNx4w== X-Gm-Gg: ASbGncuhx2/CKLw4opLHBP5BPH+rUpF7l0/Ui84iwbGuKiGumBUQnc0W88cOWXm9T6O xTWur2lYmQ6Xoqge4JSRSVUkNJ15vxlhEXsBp4BLrjGWYmoHKLcqH0UHQt0NGOwvv0FYSmQ6zZd TRhATsg0rUc/ZeY5qFq71+Vse3819rrZiSZOeJ+zHLvvLtRmhFB+X+fGd7myrT+uSocrWmiEUzJ UjTith+f3PAPZsPK/WELt4h9wTRdUxwjrbFboNFPIiMrV0Hbej6m9WlU9KQHsv4AJ9XMgBNah98 +sq6E85GmdR0QcDbovYT1aYbduqPJeeO X-Google-Smtp-Source: AGHT+IHZ5y2yWa20vIzsD6M+7HjYLZmsNcY9/z9GqDx4eLaCzjpTf5Lca0MckVXz6thoGI29xQEp/A== X-Received: by 2002:a17:907:774b:b0:aa6:2e71:d94e with SMTP id a640c23a62f3a-aa63a0aa4dbmr268587766b.25.1733488991789; Fri, 06 Dec 2024 04:43:11 -0800 (PST) Received: from christian-Precision-5550.. (176-138-135-207.abo.bbox.fr. [176.138.135.207]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa625e4da51sm236527266b.37.2024.12.06.04.43.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 04:43:10 -0800 (PST) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Patrick Steinhardt , Taylor Blau , Eric Sunshine , Christian Couder , Christian Couder Subject: [PATCH v3 3/5] Add 'promisor-remote' capability to protocol v2 Date: Fri, 6 Dec 2024 13:42:46 +0100 Message-ID: <20241206124248.160494-4-christian.couder@gmail.com> X-Mailer: git-send-email 2.47.1.402.gc25c94707f In-Reply-To: <20241206124248.160494-1-christian.couder@gmail.com> References: <20240910163000.1985723-1-christian.couder@gmail.com> <20241206124248.160494-1-christian.couder@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When a server S knows that some objects from a repository are available from a promisor remote X, S might want to suggest to a client C cloning or fetching the repo from S that C should use X directly instead of S for these objects. Note that this could happen both in the case S itself doesn't have the objects and borrows them from X, and in the case S has the objects but knows that X is better connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections) than S. Implementation of the latter case, which would require S to omit in its response the objects available on X, is left for future improvement though. Then C might or might not, want to get the objects from X, and should let S know about this. To allow S and C to agree and let each other know about C using X or not, let's introduce a new "promisor-remote" capability in the protocol v2, as well as a few new configuration variables: - "promisor.advertise" on the server side, and: - "promisor.acceptFromServer" on the client side. By default, or if "promisor.advertise" is set to 'false', a server S will not advertise the "promisor-remote" capability. If S doesn't advertise the "promisor-remote" capability, then a client C replying to S shouldn't advertise the "promisor-remote" capability either. If "promisor.advertise" is set to 'true', S will advertise its promisor remotes with a string like: promisor-remote=[;]... where each element contains information about a single promisor remote in the form: name=[,url=] where is the urlencoded name of a promisor remote and is the urlencoded URL of the promisor remote named . For now, the URL is passed in addition to the name. In the future, it might be possible to pass other information like a filter-spec that the client should use when cloning from S, or a token that the client should use when retrieving objects from X. It might also be possible in the future for "promisor.advertise" to have other values. For example a value like "onlyName" could prevent S from advertising URLs, which could help in case C should use a different URL for X than the URL S is using. (The URL S is using might be an internal one on the server side for example.) By default or if "promisor.acceptFromServer" is set to "None", C will not accept to use the promisor remotes that might have been advertised by S. In this case, C will not advertise any "promisor-remote" capability in its reply to S. If "promisor.acceptFromServer" is set to "All" and S advertised some promisor remotes, then on the contrary, C will accept to use all the promisor remotes that S advertised and C will reply with a string like: promisor-remote=[;]... where the elements are the urlencoded names of all the promisor remotes S advertised. In a following commit, other values for "promisor.acceptFromServer" will be implemented, so that C will be able to decide the promisor remotes it accepts depending on the name and URL it received from S. So even if that name and URL information is not used much right now, it will be needed soon. Helped-by: Taylor Blau Helped-by: Patrick Steinhardt Signed-off-by: Christian Couder --- Documentation/config/promisor.txt | 17 ++ Documentation/gitprotocol-v2.txt | 54 ++++++ connect.c | 9 + promisor-remote.c | 195 +++++++++++++++++++++ promisor-remote.h | 36 +++- serve.c | 26 +++ t/t5710-promisor-remote-capability.sh | 241 ++++++++++++++++++++++++++ upload-pack.c | 3 + 8 files changed, 580 insertions(+), 1 deletion(-) create mode 100755 t/t5710-promisor-remote-capability.sh diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt index 98c5cb2ec2..9cbfe3e59e 100644 --- a/Documentation/config/promisor.txt +++ b/Documentation/config/promisor.txt @@ -1,3 +1,20 @@ promisor.quiet:: If set to "true" assume `--quiet` when fetching additional objects for a partial clone. + +promisor.advertise:: + If set to "true", a server will use the "promisor-remote" + capability, see linkgit:gitprotocol-v2[5], to advertise the + promisor remotes it is using, if it uses some. Default is + "false", which means the "promisor-remote" capability is not + advertised. + +promisor.acceptFromServer:: + If set to "all", a client will accept all the promisor remotes + a server might advertise using the "promisor-remote" + capability. Default is "none", which means no promisor remote + advertised by a server will be accepted. By accepting a + promisor remote, the client agrees that the server might omit + objects that are lazily fetchable from this promisor remote + from its responses to "fetch" and "clone" requests from the + client. See linkgit:gitprotocol-v2[5]. diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt index 1652fef3ae..f25a9a6ad8 100644 --- a/Documentation/gitprotocol-v2.txt +++ b/Documentation/gitprotocol-v2.txt @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus save themselves and the server(s) the request(s) needed to inspect the headers of that bundle or bundles. +promisor-remote= +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The server may advertise some promisor remotes it is using or knows +about to a client which may want to use them as its promisor remotes, +instead of this repository. In this case should be of the +form: + + pr-infos = pr-info | pr-infos ";" pr-info + + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url + +where `pr-name` is the urlencoded name of a promisor remote, and +`pr-url` the urlencoded URL of that promisor remote. + +In this case, if the client decides to use one or more promisor +remotes the server advertised, it can reply with +"promisor-remote=" where should be of the form: + + pr-names = pr-name | pr-names ";" pr-name + +where `pr-name` is the urlencoded name of a promisor remote the server +advertised and the client accepts. + +Note that, everywhere in this document, `pr-name` MUST be a valid +remote name, and the ';' and ',' characters MUST be encoded if they +appear in `pr-name` or `pr-url`. + +If the server doesn't know any promisor remote that could be good for +a client to use, or prefers a client not to use any promisor remote it +uses or knows about, it shouldn't advertise the "promisor-remote" +capability at all. + +In this case, or if the client doesn't want to use any promisor remote +the server advertised, the client shouldn't advertise the +"promisor-remote" capability at all in its reply. + +The "promisor.advertise" and "promisor.acceptFromServer" configuration +options can be used on the server and client side respectively to +control what they advertise or accept respectively. See the +documentation of these configuration options for more information. + +Note that in the future it would be nice if the "promisor-remote" +protocol capability could be used by the server, when responding to +`git fetch` or `git clone`, to advertise better-connected remotes that +the client can use as promisor remotes, instead of this repository, so +that the client can lazily fetch objects from these other +better-connected remotes. This would require the server to omit in its +response the objects available on the better-connected remotes that +the client has accepted. This hasn't been implemented yet though. So +for now this "promisor-remote" capability is useful only when the +server advertises some promisor remotes it already uses to borrow +objects from. + GIT --- Part of the linkgit:git[1] suite diff --git a/connect.c b/connect.c index 58f53d8dcb..898bf3b438 100644 --- a/connect.c +++ b/connect.c @@ -22,6 +22,7 @@ #include "protocol.h" #include "alias.h" #include "bundle-uri.h" +#include "promisor-remote.h" static char *server_capabilities_v1; static struct strvec server_capabilities_v2 = STRVEC_INIT; @@ -487,6 +488,7 @@ void check_stateless_delimiter(int stateless_rpc, static void send_capabilities(int fd_out, struct packet_reader *reader) { const char *hash_name; + const char *promisor_remote_info; if (server_supports_v2("agent")) packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized()); @@ -500,6 +502,13 @@ static void send_capabilities(int fd_out, struct packet_reader *reader) } else { reader->hash_algo = &hash_algos[GIT_HASH_SHA1]; } + if (server_feature_v2("promisor-remote", &promisor_remote_info)) { + char *reply = promisor_remote_reply(promisor_remote_info); + if (reply) { + packet_write_fmt(fd_out, "promisor-remote=%s", reply); + free(reply); + } + } } int get_remote_bundle_uri(int fd_out, struct packet_reader *reader, diff --git a/promisor-remote.c b/promisor-remote.c index 9345ae3db2..ea418c4094 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -11,6 +11,7 @@ #include "strvec.h" #include "packfile.h" #include "environment.h" +#include "url.h" struct promisor_remote_config { struct promisor_remote *promisors; @@ -221,6 +222,18 @@ int repo_has_promisor_remote(struct repository *r) return !!repo_promisor_remote_find(r, NULL); } +int repo_has_accepted_promisor_remote(struct repository *r) +{ + struct promisor_remote *p; + + promisor_remote_init(r); + + for (p = r->promisor_remote_config->promisors; p; p = p->next) + if (p->accepted) + return 1; + return 0; +} + static int remove_fetched_oids(struct repository *repo, struct object_id **oids, int oid_nr, int to_free) @@ -292,3 +305,185 @@ void promisor_remote_get_direct(struct repository *repo, if (to_free) free(remaining_oids); } + +static int allow_unsanitized(char ch) +{ + if (ch == ',' || ch == ';' || ch == '%') + return 0; + return ch > 32 && ch < 127; +} + +static void promisor_info_vecs(struct repository *repo, + struct strvec *names, + struct strvec *urls) +{ + struct promisor_remote *r; + + promisor_remote_init(repo); + + for (r = repo->promisor_remote_config->promisors; r; r = r->next) { + char *url; + char *url_key = xstrfmt("remote.%s.url", r->name); + + strvec_push(names, r->name); + strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); + + free(url); + free(url_key); + } +} + +char *promisor_remote_info(struct repository *repo) +{ + struct strbuf sb = STRBUF_INIT; + int advertise_promisors = 0; + struct strvec names = STRVEC_INIT; + struct strvec urls = STRVEC_INIT; + + git_config_get_bool("promisor.advertise", &advertise_promisors); + + if (!advertise_promisors) + return NULL; + + promisor_info_vecs(repo, &names, &urls); + + if (!names.nr) + return NULL; + + for (size_t i = 0; i < names.nr; i++) { + if (i) + strbuf_addch(&sb, ';'); + strbuf_addstr(&sb, "name="); + strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized); + if (urls.v[i]) { + strbuf_addstr(&sb, ",url="); + strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized); + } + } + + strbuf_sanitize(&sb); + + strvec_clear(&names); + strvec_clear(&urls); + + return strbuf_detach(&sb, NULL); +} + +enum accept_promisor { + ACCEPT_NONE = 0, + ACCEPT_ALL +}; + +static int should_accept_remote(enum accept_promisor accept, + const char *remote_name UNUSED, + const char *remote_url UNUSED) +{ + if (accept == ACCEPT_ALL) + return 1; + + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); +} + +static void filter_promisor_remote(struct strvec *accepted, const char *info) +{ + struct strbuf **remotes; + const char *accept_str; + enum accept_promisor accept = ACCEPT_NONE; + + if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { + if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) + accept = ACCEPT_NONE; + else if (!strcasecmp("All", accept_str)) + accept = ACCEPT_ALL; + else + warning(_("unknown '%s' value for '%s' config option"), + accept_str, "promisor.acceptfromserver"); + } + + if (accept == ACCEPT_NONE) + return; + + /* Parse remote info received */ + + remotes = strbuf_split_str(info, ';', 0); + + for (size_t i = 0; remotes[i]; i++) { + struct strbuf **elems; + const char *remote_name = NULL; + const char *remote_url = NULL; + char *decoded_name = NULL; + char *decoded_url = NULL; + + strbuf_trim_trailing_ch(remotes[i], ';'); + elems = strbuf_split_str(remotes[i]->buf, ',', 0); + + for (size_t j = 0; elems[j]; j++) { + int res; + strbuf_trim_trailing_ch(elems[j], ','); + res = skip_prefix(elems[j]->buf, "name=", &remote_name) || + skip_prefix(elems[j]->buf, "url=", &remote_url); + if (!res) + warning(_("unknown element '%s' from remote info"), + elems[j]->buf); + } + + if (remote_name) + decoded_name = url_percent_decode(remote_name); + if (remote_url) + decoded_url = url_percent_decode(remote_url); + + if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url)) + strvec_push(accepted, decoded_name); + + strbuf_list_free(elems); + free(decoded_name); + free(decoded_url); + } + + strbuf_list_free(remotes); +} + +char *promisor_remote_reply(const char *info) +{ + struct strvec accepted = STRVEC_INIT; + struct strbuf reply = STRBUF_INIT; + + filter_promisor_remote(&accepted, info); + + if (!accepted.nr) + return NULL; + + for (size_t i = 0; i < accepted.nr; i++) { + if (i) + strbuf_addch(&reply, ';'); + strbuf_addstr_urlencode(&reply, accepted.v[i], allow_unsanitized); + } + + strvec_clear(&accepted); + + return strbuf_detach(&reply, NULL); +} + +void mark_promisor_remotes_as_accepted(struct repository *r, const char *remotes) +{ + struct strbuf **accepted_remotes = strbuf_split_str(remotes, ';', 0); + + for (size_t i = 0; accepted_remotes[i]; i++) { + struct promisor_remote *p; + char *decoded_remote; + + strbuf_trim_trailing_ch(accepted_remotes[i], ';'); + decoded_remote = url_percent_decode(accepted_remotes[i]->buf); + + p = repo_promisor_remote_find(r, decoded_remote); + if (p) + p->accepted = 1; + else + warning(_("accepted promisor remote '%s' not found"), + decoded_remote); + + free(decoded_remote); + } + + strbuf_list_free(accepted_remotes); +} diff --git a/promisor-remote.h b/promisor-remote.h index 88cb599c39..814ca248c7 100644 --- a/promisor-remote.h +++ b/promisor-remote.h @@ -9,11 +9,13 @@ struct object_id; * Promisor remote linked list * * Information in its fields come from remote.XXX config entries or - * from extensions.partialclone. + * from extensions.partialclone, except for 'accepted' which comes + * from protocol v2 capabilities exchange. */ struct promisor_remote { struct promisor_remote *next; char *partial_clone_filter; + unsigned int accepted : 1; const char name[FLEX_ARRAY]; }; @@ -32,4 +34,36 @@ void promisor_remote_get_direct(struct repository *repo, const struct object_id *oids, int oid_nr); +/* + * Prepare a "promisor-remote" advertisement by a server. + * Check the value of "promisor.advertise" and maybe the configured + * promisor remotes, if any, to prepare information to send in an + * advertisement. + * Return value is NULL if no promisor remote advertisement should be + * made. Otherwise it contains the names and urls of the advertised + * promisor remotes separated by ';' + */ +char *promisor_remote_info(struct repository *repo); + +/* + * Prepare a reply to a "promisor-remote" advertisement from a server. + * Check the value of "promisor.acceptfromserver" and maybe the + * configured promisor remotes, if any, to prepare the reply. + * Return value is NULL if no promisor remote from the server + * is accepted. Otherwise it contains the names of the accepted promisor + * remotes separated by ';'. + */ +char *promisor_remote_reply(const char *info); + +/* + * Set the 'accepted' flag for some promisor remotes. Useful when some + * promisor remotes have been accepted by the client. + */ +void mark_promisor_remotes_as_accepted(struct repository *repo, const char *remotes); + +/* + * Has any promisor remote been accepted by the client? + */ +int repo_has_accepted_promisor_remote(struct repository *r); + #endif /* PROMISOR_REMOTE_H */ diff --git a/serve.c b/serve.c index d674764a25..5a40a7abb7 100644 --- a/serve.c +++ b/serve.c @@ -12,6 +12,7 @@ #include "upload-pack.h" #include "bundle-uri.h" #include "trace2.h" +#include "promisor-remote.h" static int advertise_sid = -1; static int advertise_object_info = -1; @@ -31,6 +32,26 @@ static int agent_advertise(struct repository *r UNUSED, return 1; } +static int promisor_remote_advertise(struct repository *r, + struct strbuf *value) +{ + if (value) { + char *info = promisor_remote_info(r); + if (!info) + return 0; + strbuf_addstr(value, info); + free(info); + } + return 1; +} + +static void promisor_remote_receive(struct repository *r, + const char *remotes) +{ + mark_promisor_remotes_as_accepted(r, remotes); +} + + static int object_format_advertise(struct repository *r, struct strbuf *value) { @@ -157,6 +178,11 @@ static struct protocol_capability capabilities[] = { .advertise = bundle_uri_advertise, .command = bundle_uri_command, }, + { + .name = "promisor-remote", + .advertise = promisor_remote_advertise, + .receive = promisor_remote_receive, + }, }; void protocol_v2_advertise_capabilities(void) diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh new file mode 100755 index 0000000000..000cb4c0f6 --- /dev/null +++ b/t/t5710-promisor-remote-capability.sh @@ -0,0 +1,241 @@ +#!/bin/sh + +test_description='handling of promisor remote advertisement' + +. ./test-lib.sh + +# Setup the repository with three commits, this way HEAD is always +# available and we can hide commit 1 or 2. +test_expect_success 'setup: create "template" repository' ' + git init template && + test_commit -C template 1 && + test_commit -C template 2 && + test_commit -C template 3 && + test-tool genrandom foo 10240 >template/foo && + git -C template add foo && + git -C template commit -m foo +' + +# A bare repo will act as a server repo with unpacked objects. +test_expect_success 'setup: create bare "server" repository' ' + git clone --bare --no-local template server && + mv server/objects/pack/pack-* . && + packfile=$(ls pack-*.pack) && + git -C server unpack-objects --strict <"$packfile" +' + +check_missing_objects () { + git -C "$1" rev-list --objects --all --missing=print > all.txt && + perl -ne 'print if s/^[?]//' all.txt >missing.txt && + test_line_count = "$2" missing.txt && + if test "$2" -lt 2 + then + test "$3" = "$(cat missing.txt)" + else + test -f "$3" && + sort <"$3" >expected_sorted && + sort actual_sorted && + test_cmp expected_sorted actual_sorted + fi +} + +initialize_server () { + count="$1" + missing_oids="$2" + + # Repack everything first + git -C server -c repack.writebitmaps=false repack -a -d && + + # Remove promisor file in case they exist, useful when reinitializing + rm -rf server/objects/pack/*.promisor && + + # Repack without the largest object and create a promisor pack on server + git -C server -c repack.writebitmaps=false repack -a -d \ + --filter=blob:limit=5k --filter-to="$(pwd)/pack" && + promisor_file=$(ls server/objects/pack/*.pack | sed "s/\.pack/.promisor/") && + >"$promisor_file" && + + # Check objects missing on the server + check_missing_objects server "$count" "$missing_oids" +} + +copy_to_server2 () { + oid_path="$(test_oid_to_path $1)" && + path="server/objects/$oid_path" && + path2="server2/objects/$oid_path" && + mkdir -p $(dirname "$path2") && + cp "$path" "$path2" +} + +test_expect_success "setup for testing promisor remote advertisement" ' + # Create another bare repo called "server2" + git init --bare server2 && + + # Copy the largest object from server to server2 + obj="HEAD:foo" && + oid="$(git -C server rev-parse $obj)" && + copy_to_server2 "$oid" && + + initialize_server 1 "$oid" && + + # Configure server2 as promisor remote for server + git -C server remote add server2 "file://$(pwd)/server2" && + git -C server config remote.server2.promisor true && + + git -C server2 config uploadpack.allowFilter true && + git -C server2 config uploadpack.allowAnySHA1InWant true && + git -C server config uploadpack.allowFilter true && + git -C server config uploadpack.allowAnySHA1InWant true +' + +test_expect_success "clone with promisor.advertise set to 'true'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with promisor.advertise set to 'false'" ' + git -C server config promisor.advertise false && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + +test_expect_success "clone with promisor.acceptfromserver set to 'None'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=None \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + +test_expect_success "init + fetch with promisor.advertise set to 'true'" ' + git -C server config promisor.advertise true && + + test_when_finished "rm -rf client" && + mkdir client && + git -C client init && + git -C client config remote.server2.promisor true && + git -C client config remote.server2.fetch "+refs/heads/*:refs/remotes/server2/*" && + git -C client config remote.server2.url "file://$(pwd)/server2" && + git -C client config remote.server.url "file://$(pwd)/server" && + git -C client config remote.server.fetch "+refs/heads/*:refs/remotes/server/*" && + git -C client config promisor.acceptfromserver All && + GIT_NO_LAZY_FETCH=0 git -C client fetch --filter="blob:limit=5k" server && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "setup for subsequent fetches" ' + # Generate new commit with large blob + test-tool genrandom bar 10240 >template/bar && + git -C template add bar && + git -C template commit -m bar && + + # Fetch new commit with large blob + git -C server fetch origin && + git -C server update-ref HEAD FETCH_HEAD && + git -C server rev-parse HEAD >expected_head && + + # Repack everything twice and remove .promisor files before + # each repack. This makes sure everything gets repacked + # into a single packfile. The second repack is necessary + # because the first one fetches from server2 and creates a new + # packfile and its associated .promisor file. + + rm -f server/objects/pack/*.promisor && + git -C server -c repack.writebitmaps=false repack -a -d && + rm -f server/objects/pack/*.promisor && + git -C server -c repack.writebitmaps=false repack -a -d && + + # Unpack everything + rm pack-* && + mv server/objects/pack/pack-* . && + packfile=$(ls pack-*.pack) && + git -C server unpack-objects --strict <"$packfile" && + + # Copy new large object to server2 + obj_bar="HEAD:bar" && + oid_bar="$(git -C server rev-parse $obj_bar)" && + copy_to_server2 "$oid_bar" && + + # Reinitialize server so that the 2 largest objects are missing + printf "%s\n" "$oid" "$oid_bar" >expected_missing.txt && + initialize_server 2 expected_missing.txt && + + # Create one more client + cp -r client client2 +' + +test_expect_success "subsequent fetch from a client when promisor.advertise is true" ' + git -C server config promisor.advertise true && + + GIT_NO_LAZY_FETCH=0 git -C client pull origin && + + git -C client rev-parse HEAD >actual && + test_cmp expected_head actual && + + cat client/bar >/dev/null && + + check_missing_objects server 2 expected_missing.txt +' + +test_expect_success "subsequent fetch from a client when promisor.advertise is false" ' + git -C server config promisor.advertise false && + + GIT_NO_LAZY_FETCH=0 git -C client2 pull origin && + + git -C client2 rev-parse HEAD >actual && + test_cmp expected_head actual && + + cat client2/bar >/dev/null && + + check_missing_objects server 1 "$oid" +' + +test_done diff --git a/upload-pack.c b/upload-pack.c index 43006c0614..c6550a8d51 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -31,6 +31,7 @@ #include "write-or-die.h" #include "json-writer.h" #include "strmap.h" +#include "promisor-remote.h" /* Remember to update object flag allocation in object.h */ #define THEY_HAVE (1u << 11) @@ -318,6 +319,8 @@ static void create_pack_file(struct upload_pack_data *pack_data, strvec_push(&pack_objects.args, "--delta-base-offset"); if (pack_data->use_include_tag) strvec_push(&pack_objects.args, "--include-tag"); + if (repo_has_accepted_promisor_remote(the_repository)) + strvec_push(&pack_objects.args, "--missing=allow-promisor"); if (pack_data->filter_options.choice) { const char *spec = expand_list_objects_filter_spec(&pack_data->filter_options); From patchwork Fri Dec 6 12:42:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13897175 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FC0B207675 for ; Fri, 6 Dec 2024 12:43:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488997; cv=none; b=uk5UQ+TvdnMtNXtkVk+dPbPj9jAQuUR3IWomLVRvl1/aqQ07srnyGFXoNitHjq2u1FQQ11STWoYtCQHfKOzimEIfogZR+Nu1ePKARDoHHIHeVg6lYbDJS1s9J38Qg1axiL0zCAf7CTgshVUPL8q16BiVDAJzX7xVynTK0+VL53I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733488997; c=relaxed/simple; bh=Pw2miDFzsRu+dAJ/e8KYKWRRA55H+/Fq7pZfw4D6m1Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VMF2DDcQ0zoqm36vhRl7/EuLLuvfCR4q+nATteyl4L/pHZ5hhVWwe9bA3B/r2FjG/L0hiuyaSy5o15ln1i1kG119y9pHQMWx1exjpT3moGiYe1ZMCx1MuAFd7CFaAzs3IsQY5eQK7oPSpLvAagV9mKIb6xRMuxW7eJhCLEJmrMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GeUAYwSo; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GeUAYwSo" Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5d3d2a30afcso377191a12.3 for ; Fri, 06 Dec 2024 04:43:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733488993; x=1734093793; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2hWfFOoMeiINHdqOPgLH7F2YzWjOHQKzRfyIVzr4PNQ=; b=GeUAYwSoeEJk+2Lx8e7OeFYED/b/kghmWRi2iq/SJpUykOT0OM9caySUR02VaYwO+1 eStha4RplhnbJqD31rpDpsKZxseMa7YIKiRPe/aSUNhPpa1/f8yOWy/51ckVECCF9a03 0nMsSu8R4ph0eUauyOEkmwoltcH+aqFXwt60BE6p2jw9tBd8rkI+f2fiyG5+vbGq22Hd lCWkbXVjAfExNdTLSi/P0z+colZF2a3NjfrJb00vfJYyArorBi9w7ssp+ZM2LX+B5bzO DkxEUCLgqx5IUn7iZhKYu3iWbc1cUER05i8GbIWxmqUNO3A4fXeEny05XoJvBI8IbQwA A9FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733488993; x=1734093793; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2hWfFOoMeiINHdqOPgLH7F2YzWjOHQKzRfyIVzr4PNQ=; b=S7JrrWFT6o8ni/UULmE8fKbljqpQRfYWYNaAWYArqRRef3oWa36nLZf0Z4Yqb/aHoh k61RKhhdcJku/SKA3iFA1U2M6ax03e0nvnL3R4A8uZw12KmOTbgqYAaqji0PS0qF0q4O 2xWGG5mRmYVk8AGeL5bNeVt6wpri1CDF6MbH/LVjtSxoigomaW//oTnCEBgbhIpO/bsP ywz2MKVC5OpNa1x8aWA+CwqKtoSimXZEZuCK5aG0xfpp4EgPlmD5PSePn6Q8/ooDtUiZ k5xmWjyjF9c8Lw0xRHMZ+4BumjBE8NhZ47yme+qLz1sardOgCwzoLY7iZ+2u2/tDJFyM L7FQ== X-Gm-Message-State: AOJu0YzN2lM6fyMLC7Gca3Lo6RhluvsxP0kVj8sbWgtzogLp6IpXDIQ5 f4qntDMwsCklB5n6y0wOLSoEtQ+T6XbZZzwrRH1Hivo+F+t65FSbwKGLlg== X-Gm-Gg: ASbGnctHzpKI7ZItBSG5cCEociF5uL/oEGouB4oKfgpRNc7bV6YnCmJSv9g01yf3S2H 0hD+uSuzwpMtbKjfhv2rlBJwbDzegfbV7iZMeEiJ1TWVZ3ccjbyjUjQAbyruLyfU3N/b6Fvve3s WJYRs5+cgS7YzKrKn/pS+zpcwxAvsrU4zPn4vDt6nFvgy5z6ikfiXpgGKn2V46x/+CXUZfdjLa0 PXSFkppui3BI5QhgDq3xwxKQ60zBChHyf4uR902jo65CnslaqfbxibQnwGiX959xf2tkd0Bi094 +AR1DrCAYm3jhUdSN/oRqo2hTzFdrpmS X-Google-Smtp-Source: AGHT+IGN1mp8JpbM19sQWzUbSXR9vLhyGssv+qW7MM/t055XynbiZAxMpmHuowEYUKV8Azc3dcczbw== X-Received: by 2002:a17:906:23ea:b0:aa5:63a1:17cf with SMTP id a640c23a62f3a-aa639fecd9amr240695366b.20.1733488992775; Fri, 06 Dec 2024 04:43:12 -0800 (PST) Received: from christian-Precision-5550.. (176-138-135-207.abo.bbox.fr. [176.138.135.207]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa625e4da51sm236527266b.37.2024.12.06.04.43.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 04:43:12 -0800 (PST) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Patrick Steinhardt , Taylor Blau , Eric Sunshine , Christian Couder , Christian Couder Subject: [PATCH v3 4/5] promisor-remote: check advertised name or URL Date: Fri, 6 Dec 2024 13:42:47 +0100 Message-ID: <20241206124248.160494-5-christian.couder@gmail.com> X-Mailer: git-send-email 2.47.1.402.gc25c94707f In-Reply-To: <20241206124248.160494-1-christian.couder@gmail.com> References: <20240910163000.1985723-1-christian.couder@gmail.com> <20241206124248.160494-1-christian.couder@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 A previous commit introduced a "promisor.acceptFromServer" configuration variable with only "None" or "All" as valid values. Let's introduce "KnownName" and "KnownUrl" as valid values for this configuration option to give more choice to a client about which promisor remotes it might accept among those that the server advertised. In case of "KnownName", the client will accept promisor remotes which are already configured on the client and have the same name as those advertised by the client. This could be useful in a corporate setup where servers and clients are trusted to not switch names and URLs, but where some kind of control is still useful. In case of "KnownUrl", the client will accept promisor remotes which have both the same name and the same URL configured on the client as the name and URL advertised by the server. This is the most secure option, so it should be used if possible. Signed-off-by: Christian Couder --- Documentation/config/promisor.txt | 22 ++++++--- promisor-remote.c | 60 ++++++++++++++++++++--- t/t5710-promisor-remote-capability.sh | 68 +++++++++++++++++++++++++++ 3 files changed, 138 insertions(+), 12 deletions(-) diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt index 9cbfe3e59e..d1364bc018 100644 --- a/Documentation/config/promisor.txt +++ b/Documentation/config/promisor.txt @@ -12,9 +12,19 @@ promisor.advertise:: promisor.acceptFromServer:: If set to "all", a client will accept all the promisor remotes a server might advertise using the "promisor-remote" - capability. Default is "none", which means no promisor remote - advertised by a server will be accepted. By accepting a - promisor remote, the client agrees that the server might omit - objects that are lazily fetchable from this promisor remote - from its responses to "fetch" and "clone" requests from the - client. See linkgit:gitprotocol-v2[5]. + capability. If set to "knownName" the client will accept + promisor remotes which are already configured on the client + and have the same name as those advertised by the client. This + is not very secure, but could be used in a corporate setup + where servers and clients are trusted to not switch name and + URLs. If set to "knownUrl", the client will accept promisor + remotes which have both the same name and the same URL + configured on the client as the name and URL advertised by the + server. This is more secure than "all" or "knownUrl", so it + should be used if possible instead of those options. Default + is "none", which means no promisor remote advertised by a + server will be accepted. By accepting a promisor remote, the + client agrees that the server might omit objects that are + lazily fetchable from this promisor remote from its responses + to "fetch" and "clone" requests from the client. See + linkgit:gitprotocol-v2[5]. diff --git a/promisor-remote.c b/promisor-remote.c index ea418c4094..b72d539c19 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -369,30 +369,73 @@ char *promisor_remote_info(struct repository *repo) return strbuf_detach(&sb, NULL); } +/* + * Find first index of 'vec' where there is 'val'. 'val' is compared + * case insensively to the strings in 'vec'. If not found 'vec->nr' is + * returned. + */ +static size_t strvec_find_index(struct strvec *vec, const char *val) +{ + for (size_t i = 0; i < vec->nr; i++) + if (!strcasecmp(vec->v[i], val)) + return i; + return vec->nr; +} + enum accept_promisor { ACCEPT_NONE = 0, + ACCEPT_KNOWN_URL, + ACCEPT_KNOWN_NAME, ACCEPT_ALL }; static int should_accept_remote(enum accept_promisor accept, - const char *remote_name UNUSED, - const char *remote_url UNUSED) + const char *remote_name, const char *remote_url, + struct strvec *names, struct strvec *urls) { + size_t i; + if (accept == ACCEPT_ALL) return 1; - BUG("Unhandled 'enum accept_promisor' value '%d'", accept); + i = strvec_find_index(names, remote_name); + + if (i >= names->nr) + /* We don't know about that remote */ + return 0; + + if (accept == ACCEPT_KNOWN_NAME) + return 1; + + if (accept != ACCEPT_KNOWN_URL) + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); + + if (!strcasecmp(urls->v[i], remote_url)) + return 1; + + warning(_("known remote named '%s' but with url '%s' instead of '%s'"), + remote_name, urls->v[i], remote_url); + + return 0; } -static void filter_promisor_remote(struct strvec *accepted, const char *info) +static void filter_promisor_remote(struct repository *repo, + struct strvec *accepted, + const char *info) { struct strbuf **remotes; const char *accept_str; enum accept_promisor accept = ACCEPT_NONE; + struct strvec names = STRVEC_INIT; + struct strvec urls = STRVEC_INIT; if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) accept = ACCEPT_NONE; + else if (!strcasecmp("KnownUrl", accept_str)) + accept = ACCEPT_KNOWN_URL; + else if (!strcasecmp("KnownName", accept_str)) + accept = ACCEPT_KNOWN_NAME; else if (!strcasecmp("All", accept_str)) accept = ACCEPT_ALL; else @@ -403,6 +446,9 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) if (accept == ACCEPT_NONE) return; + if (accept != ACCEPT_ALL) + promisor_info_vecs(repo, &names, &urls); + /* Parse remote info received */ remotes = strbuf_split_str(info, ';', 0); @@ -432,7 +478,7 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) if (remote_url) decoded_url = url_percent_decode(remote_url); - if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url)) + if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url, &names, &urls)) strvec_push(accepted, decoded_name); strbuf_list_free(elems); @@ -440,6 +486,8 @@ static void filter_promisor_remote(struct strvec *accepted, const char *info) free(decoded_url); } + strvec_clear(&names); + strvec_clear(&urls); strbuf_list_free(remotes); } @@ -448,7 +496,7 @@ char *promisor_remote_reply(const char *info) struct strvec accepted = STRVEC_INIT; struct strbuf reply = STRBUF_INIT; - filter_promisor_remote(&accepted, info); + filter_promisor_remote(the_repository, &accepted, info); if (!accepted.nr) return NULL; diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh index 000cb4c0f6..483cc8e16d 100755 --- a/t/t5710-promisor-remote-capability.sh +++ b/t/t5710-promisor-remote-capability.sh @@ -157,6 +157,74 @@ test_expect_success "init + fetch with promisor.advertise set to 'true'" ' check_missing_objects server 1 "$oid" ' +test_expect_success "clone with promisor.acceptfromserver set to 'KnownName'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownName \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with 'KnownName' and different remote names" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.serverTwo.promisor=true \ + -c remote.serverTwo.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.serverTwo.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownName \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + +test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=KnownUrl \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "clone with 'KnownUrl' and different remote urls" ' + ln -s server2 serverTwo && + + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/serverTwo" \ + -c promisor.acceptfromserver=KnownUrl \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" ' git -C server config promisor.advertise true && From patchwork Fri Dec 6 12:42:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13897176 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B647203713 for ; Fri, 6 Dec 2024 12:43:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733489002; cv=none; b=RKIobJBkIakaAXP1IFhkK5fCKHETAKEr5EJhvOaHRwvE+ahDeYrDjsU37ESyCGHYCyR18ECYG3jH0K6Vnnfi/25QANa73jgH/kBUVsZUIlRj+PrqnyLE/hnacC3LzRs89vBtO5U7v/5YZMsyaGNSN7BauMrjjoQTmdHlbE8blac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733489002; c=relaxed/simple; bh=wFX7R3seHkyzoaPdmaIobcrtgFRkvQs03PGNSGAscDw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ixlc90UYu6z6SWwOevVQADQe6U34DzfEMbNqgZGuPY3k2JKegAkpdBdD0lnSub3hJKJGxzhPu/xBhIkbfuSUeG7Hq8VhEXEXL/6v3XKc6xYf6J28d03k9D0BwXtPBiSaF6rAZGdS1Ls3drbcb/6sk44yiIn7AfG+oEkV4IJTrRM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mN6VEMGO; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mN6VEMGO" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-aa549d9dffdso291595366b.2 for ; Fri, 06 Dec 2024 04:43:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733488995; x=1734093795; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cBJJNs1F1/hM2227V4WOWjzDcNf9tDld0JX40xnGRkE=; b=mN6VEMGOeNEBZVZ7COxwa+Ul5hBq1fFTY9Eg8C/j90V4lAyRetGMO7DPj+UmMbCZiT 8aBO09w+D/Vz5ZlQzs0QNopz3n+M4MMcKPIc36gsYeM5LejPIfrx/8Pq2x9DskYwUrSD QlHM2QOkgTo/kqravtBCCGrSl+bEp8RS+PEV5U29m8uX4zW5L0g0DsjZaP4ox+EIccSl /jhhhmdINsxSFo6lYaIRFO4QzkC8LLdZc37+FjEvCiLRMxjy92T/wdBWpiycnTIwqQ6l AuP2Clo3ExE0CBBUQKB8MM2pwYn8vafBh3yObGE+ZHov/G76ag7FdSofp55Ba91YQ2NQ B73g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733488995; x=1734093795; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cBJJNs1F1/hM2227V4WOWjzDcNf9tDld0JX40xnGRkE=; b=GrS9iLA0q7c9L+LDjU3gVXxuBHAAglQuG7aYdOxzejvyqUkbnuGJGy48lCYe4+IG0X q87b54KgSS7TBhCbpEke1a+T+JBk9V3fRnbx9qq5bDwrfVwxHT0UcSPtLhH0wZyj0uo5 n/PVwHGI9fW6bdFr4iAwZuxOdktMqP8zJQElDJt14bfeaoDW9DzGohdHiyZeJ7D8YZ+R 5elIYPSAn9I+3c7volEN/UA8/UsIAgl5W5NnEBN0hFM6m6Lt0xfA7ZuW7ljtiEJHr1GC mvkOLvJr1l/hdeoRr5XJ9CitC7j13EQwJIOahDBcitt19+L4k6TBRzscboTMTKpCsCFg tOZw== X-Gm-Message-State: AOJu0YyhtaW77bf41GZK+Gscg6ufbmeAfSk+Gx0tWPGhRnJUOxmPuPQ0 TuIOEkizCeE/zzIMudr8t17Fbl27CgK+Opxh3crvWle4Q0I8CqvqCCH3hg== X-Gm-Gg: ASbGncsAh/cKIIIhysWKMSZXJyJ6qvpzqxeJ+Guuw+4vhX8kkfggdwOBNhHvwL4Vu3M fmAjEAmhYUojPcOk9tQPgQ8rR9nD60hOU9Zi0rACCmRAitBYWRPJf2BOxp7+INdlNcmK3cTdUoa y+V30Lru2xNil0W5qw/mUJvIw+AnzXGdhExrsZuz+Q55xtzznaafueMRogXzwyr3wfcyGXna4t2 Bpbq/Uv1z9+5iHbXn0DWXKwf0wcUCLO6jCYgVooWA3IF4XNspdaV4cciF0iQ4RKRVjXTOSXejBF 47fszqdwCS/PaNecQ0ZHR9Xsekrmi/MD X-Google-Smtp-Source: AGHT+IExEKQgX7dXSQDum0UWp3IhV/krwSOjNRcwbb+SSXbTJgat0Eru9R5BVognViJXgv4sHSDoWQ== X-Received: by 2002:a17:906:3096:b0:aa5:274b:60e4 with SMTP id a640c23a62f3a-aa63a088bc4mr176478366b.29.1733488994751; Fri, 06 Dec 2024 04:43:14 -0800 (PST) Received: from christian-Precision-5550.. (176-138-135-207.abo.bbox.fr. [176.138.135.207]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa625e4da51sm236527266b.37.2024.12.06.04.43.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Dec 2024 04:43:13 -0800 (PST) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Patrick Steinhardt , Taylor Blau , Eric Sunshine , Christian Couder , Christian Couder Subject: [PATCH v3 5/5] doc: add technical design doc for large object promisors Date: Fri, 6 Dec 2024 13:42:48 +0100 Message-ID: <20241206124248.160494-6-christian.couder@gmail.com> X-Mailer: git-send-email 2.47.1.402.gc25c94707f In-Reply-To: <20241206124248.160494-1-christian.couder@gmail.com> References: <20240910163000.1985723-1-christian.couder@gmail.com> <20241206124248.160494-1-christian.couder@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Let's add a design doc about how we could improve handling liarge blobs using "Large Object Promisors" (LOPs). It's a set of features with the goal of using special dedicated promisor remotes to store large blobs, and having them accessed directly by main remotes and clients. Signed-off-by: Christian Couder --- .../technical/large-object-promisors.txt | 530 ++++++++++++++++++ 1 file changed, 530 insertions(+) create mode 100644 Documentation/technical/large-object-promisors.txt diff --git a/Documentation/technical/large-object-promisors.txt b/Documentation/technical/large-object-promisors.txt new file mode 100644 index 0000000000..267c65b0d5 --- /dev/null +++ b/Documentation/technical/large-object-promisors.txt @@ -0,0 +1,530 @@ +Large Object Promisors +====================== + +Since Git has been created, users have been complaining about issues +with storing large files in Git. Some solutions have been created to +help, but they haven't helped much with some issues. + +Git currently supports multiple promisor remotes, which could help +with some of these remaining issues, but it's very hard to use them to +help, because a number of important features are missing. + +The goal of the effort described in this document is to add these +important features. + +We will call a "Large Object Promisor", or "LOP" in short, a promisor +remote which is used to store only large blobs and which is separate +from the main remote that should store the other Git objects and the +rest of the repos. + +By extension, we will also call "Large Object Promisor", or LOP, the +effort described in this document to add a set of features to make it +easier to handle large blobs/files in Git by using LOPs. + +This effort would especially improve things on the server side, and +especially for large blobs that are already compressed in a binary +format. + +This effort could help provide an alternative to Git LFS +(https://git-lfs.com/) and similar tools like git-annex +(https://git-annex.branchable.com/) for handling large files, even +though a complete alternative would very likely require other efforts +especially on the client side, where it would likely help to implement +a new object representation for large blobs as discussed in: + +https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ + +0) Non goals +------------ + +- We will not discuss those client side improvements here, as they + would require changes in different parts of Git than this effort. ++ +So we don't pretend to fully replace Git LFS with only this effort, +but we nevertheless believe that it can significantly improve the +current situation on the server side, and that other separate +efforts could also improve the situation on the client side. + +- In the same way, we are not going to discuss all the possible ways + to implement a LOP or their underlying object storage. ++ +In particular we are not going to discuss pluggable ODBs or other +object database backends that could chunk large blobs, dedup the +chunks and store them efficiently. Sure, that would be a nice +improvement to store large blobs on the server side, but we believe +it can just be a separate effort as it's also not technically very +related to this effort. + +In other words, the goal of this document is not to talk about all the +possible ways to optimize how Git could handle large blobs, but to +describe how a LOP based solution could work well and alleviate a +number of current issues in the context of Git clients and servers +sharing Git objects. + +I) Issues with the current situation +------------------------------------ + +- Some statistics made on GitLab repos have shown that more than 75% + of the disk space is used by blobs that are larger than 1MB and + often in a binary format. + +- So even if users could use Git LFS or similar tools to store a lot + of large blobs out of their repos, it's a fact that in practice they + don't do it as much as they probably should. + +- On the server side ideally, the server should be able to decide for + itself how it stores things. It should not depend on users deciding + to use tools like Git LFS on some blobs or not. + +- It's much more expensive to store large blobs that don't delta + compress well on regular fast seeking drives (like SSDs) than on + object storage (like Amazon S3 or GCP Buckets). Using fast drives + for regular Git repos makes sense though, as serving regular Git + content (blobs containing text or code) needs drives where seeking + is fast, but the content is relatively small. On the other hand, + object storage for Git LFS blobs makes sense as seeking speed is not + as important when dealing with large files, while costs are more + important. So the fact that users don't use Git LFS or similar tools + for a significant number of large blobs has likely some bad + consequences on the cost of repo storage for most Git hosting + platforms. + +- Having large blobs handled in the same way as other blobs and Git + objects in Git repos instead of on object storage also has a cost in + increased memory and CPU usage, and therefore decreased performance, + when creating packfiles. (This is because Git tries to use delta + compression or zlib compression which is unlikely to work well on + already compressed binary content.) So it's not just a storage cost + increase. + +- When a large blob has been committed into a repo, it might not be + possible to remove this blob from the repo without rewriting + history, even if the user then decides to use Git LFS or a similar + tool to handle it. + +- In fact Git LFS and similar tools are not very flexible in letting + users change their minds about the blobs they should handle or not. + +- Even when users are using Git LFS or similar tools, they are often + complaining that these tools require significant effort to set up, + learn and use correctly. + +II) Main features of the "Large Object Promisors" solution +---------------------------------------------------------- + +The main features below should give a rough overview of how the +solution may work. Details about needed elements can be found in +following sections. + +Even if each feature below is very useful for the full solution, it is +very likely to be also useful on its own in some cases where the full +solution is not required. However, we'll focus primarily on the big +picture here. + +Also each feature doesn't need to be implemented entirely in Git +itself. Some could be scripts, hooks or helpers that are not part of +the Git repo. It could be helpful if those could be shared and +improved on collaboratively though. + +1) Large blobs are stored on LOPs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Large blobs should be stored on special promisor remotes that we will +call "Large Object Promisors" or LOPs. These LOPs should be additional +remotes dedicated to contain large blobs especially those in binary +format. They should be used along with main remotes that contain the +other objects. + +Note 1 +++++++ + +To clarify, a LOP is a normal promisor remote, except that: + +- it should store only large blobs, + +- it should be separate from the main remote, so that the main remote + can focus on serving other objects and the rest of the repos (see + feature 4) below) and can use the LOP as a promisor remote for + itself. + +Note 2 +++++++ + +Git already makes it possible for a main remote to also be a promisor +remote storing both regular objects and large blobs for a client that +clones from it with a filter on blob size. But here we explicitly want +to avoid that. + +Rationale ++++++++++ + +LOP remotes should be good at handling large blobs while main remotes +should be good at handling other objects. + +Implementation +++++++++++++++ + +Git already has support for multiple promisor remotes, see +link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. + +Also, Git already has support for partial clone using a filter on the +size of the blobs (with `git clone --filter=blob:limit=`). Most +of the other main features below are based on these existing features +and are about making them easy and efficient to use for the purpose of +better handling large blobs. + +2) LOPs can use object storage +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A LOP could be using object storage, like an Amazon S3 or GCP Bucket +to actually store the large blobs, and could be accessed through a Git +remote helper (see linkgit:gitremote-helpers[7]) which makes the +underlying object storage appears like a remote to Git. + +Note +++++ + +A LOP could be a promisor remote accessed using a remote helper by +both some clients and the main remote. + +Rationale ++++++++++ + +This looks like the simplest way to create LOPs that can cheaply +handle many large blobs. + +Implementation +++++++++++++++ + +Remote helpers are quite easy to write as shell scripts, but it might +be more efficient and maintainable to write them using other languages +like Go. + +Other ways to implement LOPs are certainly possible, but the goal of +this document is not to discuss how to best implement a LOP or its +underlying object storage (see the "0) Non goals" section above). + +3) LOP object storage can be Git LFS storage +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The underlying object storage that a LOP uses could also serve as +storage for large files handled by Git LFS. + +Rationale ++++++++++ + +This would simplify the server side if it wants to both use a LOP and +act as a Git LFS server. + +4) A main remote can offload to a LOP with a configurable threshold +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On the server side, a main remote should have a way to offload to a +LOP all its blobs with a size over a configurable threshold. + +Rationale ++++++++++ + +This makes it easy to set things up and to clean things up. For +example, an admin could use this to manually convert a repo not using +LOPs to a repo using a LOP. On a repo already using a LOP but where +some users would sometimes push large blobs, a cron job could use this +to regularly make sure the large blobs are moved to the LOP. + +Implementation +++++++++++++++ + +Using something based on `git repack --filter=...` to separate the +blobs we want to offload from the other Git objects could be a good +idea. The missing part is to connect to the LOP, check if the blobs we +want to offload are already there and if not send them. + +5) A main remote should try to remain clean from large blobs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A main remote should try to avoid containing a lot of oversize +blobs. For that purpose, it should offload as needed to a LOP and it +should have ways to prevent oversize blobs to be fetched, and also +perhaps pushed, into it. + +Rationale ++++++++++ + +A main remote containing many oversize blobs would defeat the purpose +of LOPs. + +Implementation +++++++++++++++ + +The way to offload to a LOP discussed in 4) above can be used to +regularly offload oversize blobs. About preventing oversize blobs to +be fetched into the repo see 6) below. About preventing oversize blob +pushes, a pre-receive hook could be used. + +Also there are different scenarios in which large blobs could get +fetched into the main remote, for example: + +- A client that doesn't implement the "promisor-remote" protocol + (described in 6) below) clones from the main remote. + +- The main remote gets a request for information about a large blob + and is not able to get that information without fetching the blob + from the LOP. + +It might not be possible to completely prevent all these scenarios +from happening. So the goal here should be to implement features that +make the fetching of large blobs less likely. For example adding a +`remote-object-info` command in the `git cat-file --batch*` protocol +might make it possible for a main repo to respond to some requests +about large blobs without fetching them. + +6) A protocol negotiation should happen when a client clones +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a client clones from a main repo, there should be a protocol +negotiation so that the server can advertise one or more LOPs and so +that the client and the server can discuss if the client could +directly use a LOP the server is advertising. If the client and the +server can agree on that, then the client would be able to get the +large blobs directly from the LOP and the server would not need to +fetch those blobs from the LOP to be able to serve the client. + +Note +++++ + +For fetches instead of clones, see the "What about fetches?" FAQ entry +below. + +Rationale ++++++++++ + +Security, configurability and efficiency of setting things up. + +Implementation +++++++++++++++ + +A "promisor-remote" protocol v2 capability looks like a good way to +implement this. The way the client and server use this capability +could be controlled by configuration variables. + +Information that the server could send to the client through that +protocol could be things like: LOP name, LOP URL, filter-spec (for +example `blob:limit=`) or just size limit that should be used as +a filter when cloning, token to be used with the LOP, etc.. + +7) A client can offload to a LOP +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a client is using a LOP that is also a LOP of its main remote, +the client should be able to offload some large blobs it has fetched, +but might not need anymore, to the LOP. + +Rationale ++++++++++ + +On the client, the easiest way to deal with unneeded large blobs is to +offload them. + +Implementation +++++++++++++++ + +This is very similar to what 4) above is about, except on the client +side instead of the server side. So a good solution to 4) could likely +be adapted to work on the client side too. + +There might be some security issues here, as there is no negotiation, +but they might be mitigated if the client can reuse a token it got +when cloning (see 6) above). Also if the large blobs were fetched from +a LOP, it is likely, and can easily be confirmed, that the LOP still +has them, so that they can just be removed from the client. + +III) Benefits of using LOPs +--------------------------- + +Many benefits are related to the issues discussed in "I) Issues with +the current situation" above: + +- No need to rewrite history when deciding which blobs are worth + handling separately than other objects, or when moving or removing + the threshold. + +- If the protocol between client and server is developed and secured + enough, then many details might be setup on the server side only and + all the clients could then easily get all the configuration + information and use it to set themselves up mostly automatically. + +- Storage costs benefits on the server side. + +- Reduced memory and CPU needs on main remotes on the server side. + +- Reduced storage needs on the client side. + +IV) FAQ +------- + +What about using multiple LOPs on the server and client side? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +That could perhaps be useful in some cases, but it's more likely for +now than in most cases a single LOP will be advertised by the server +and should be used by the client. + +A case where it could be useful for a server to advertise multiple +LOPs is if a LOP is better for some users while a different LOP is +better for other users. For example some clients might have a better +connection to a LOP than others. + +In those cases it's the responsibility of the server to have some +documentation to help clients. It could say for example something like +"Users in this part of the world might want to pick only LOP A as it +is likely to be better connected to them, while users in other parts +of the world should pick only LOP B for the same reason." + +Trusting the LOPs advertised by the server, or not trusting them? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In some contexts, like in corporate setup where the server and all the +clients are parts of an internal network in a company where admins +have all the rights on every system, it's Ok, and perhaps even a good +thing, if the clients fully trust the server, as it can help ensure +that all the clients are on the same page. + +There are also contexts in which clients trust a code hosting platform +serving them some repos, but might not fully trust other users +managing or contributing to some of these repos. For example, the code +hosting platform could have hooks in place to check that any object it +receives doesn't contain malware or otherwise bad content. In this +case it might be OK for the client to use a main remote and its LOP if +they are both hosted by the code hosting platform, but not if the LOP +is hosted elsewhere (where the content is not checked). + +In other contexts, a client should just not trust a server. + +So there should be different ways to configure how the client should +behave when a server advertises a LOP to it at clone time. + +As the basic elements that a server can advertise about a LOP are a +LOP name and a LOP URL, the client should base its decision about +accepting a LOP on these elements. + +One simple way to be very strict in the LOP it accepts is for example +for the client to check that the LOP is already configured on the +client with the same name and URL as what the server advertises. + +In general default and "safe" settings should require that the LOP are +configured on the client separately from the "promisor-remote" +protocol and that the client accepts a LOP only when information about +it from the protocol matches what has been already configured +separately. + +What about LOP names? +~~~~~~~~~~~~~~~~~~~~~ + +In some contexts, for example if the clients sometimes fetch from each +other, it can be a good idea for all the clients to use the same names +for all the remotes they use, including LOPs. + +In other contexts, each client might want to be able to give the name +it wants to each remote, including each LOP, it interacts with. + +So there should be different ways to configure how the client accepts +or not the LOP name the server advertises. + +If a default or "safe" setting is used, then as such a setting should +require that the LOP be configured separately, then the name would be +configured separately and there is no risk that the server could +dictate a name to a client. + +Could the main remote be bogged down by old or paranoid clients? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Yes, it could happen if there are too many clients that are either +unwilling to trust the main remote or that just don't implement the +"promisor-remote" protocol because they are too old or not fully +compatible with the 'git' client. + +When serving such a client, the main remote has no other choice than +to first fetch from its LOP, to then be able to provide to the client +everything it requested. So the main remote, even if it has cleanup +mechanisms (see section II.4 above), would be burdened at least +temporarily with the large blobs it had to fetch from its LOP. + +Not behaving like this would be breaking backward compatibility, and +could be seen as segregating clients. For example, it might be +possible to implement a special mode that allows the server to just +reject clients that don't implement the "promisor-remote" protocol or +aren't willing to trust the main remote. This mode might be useful in +a special context like a corporate environment. There is no plan to +implement such a mode though, and this should be discussed separately +later anyway. + +A better way to proceed is probably for the main remote to show a +message telling clients that don't implement the protocol or are +unwilling to accept the advertised LOP(s) that they would get faster +clone and fetches by upgrading client software or properly setting +them up to accept LOP(s). + +Waiting for clients to upgrade, monitoring these upgrades and limiting +the use of LOPs to repos that are not very frequently accessed might +be other good ways to make sure that some benefits are still reaped +from LOPs. Over time, as more and more clients upgrade and benefit +from LOPs, using them in more and more frequently accessed repos will +become worth it. + +Corporate environments, where it might be easier to make sure that all +the clients are up-to-date and properly configured, could hopefully +benefit more and earlier from using LOPs. + +What about fetches? +~~~~~~~~~~~~~~~~~~~ + +There are different kinds of fetches. A regular fetch happens when +some refs have been updated on the server and the client wants the ref +updates and possibly the new objects added with them. A "backfill" or +"lazy" fetch, on the contrary, happens when the client needs to use +some objects it already knows about but doesn't have because they are +on a promisor remote. + +Regular fetch ++++++++++++++ + +In a regular fetch, the client will contact the main remote and a +protocol negotiation will happen between them. It's a good thing that +a protocol negotiation happens every time, as the configuration on the +client or the main remote could have changed since the previous +protocol negotiation. In this case, the new protocol negotiation +should ensure that the new fetch will happen in a way that satisfies +the new configuration of both the client and the server. + +In most cases though, the configurations on the client and the main +remote will not have changed between 2 fetches or between the initial +clone and a subsequent fetch. This means that the result of a new +protocol negotiation will be the same as the previous result, so the +new fetch will happen in the same way as the previous clone or fetch, +using, or not using, the same LOP(s) as last time. + +"Backfill" or "lazy" fetch +++++++++++++++++++++++++++ + +When there is a backfill fetch, the client doesn't necessarily contact +the main remote first. It will try to fetch from its promisor remotes +in the order they appear in the config file, except that a remote +configured using the `extensions.partialClone` config variable will be +tried last. See +link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. + +This is not new with this effort. In fact this is how multiple remotes +have already been working for around 5 years. + +When using LOPs, having the main remote configured using +`extensions.partialClone`, so it's tried last, makes sense, as missing +objects should only be large blobs that are on LOPs. + +This means that a protocol negotiation will likely not happen as the +missing objects will be fetched from the LOPs, and then there will be +nothing left to fetch from the main remote. + +To secure that, it could be a good idea for LOPs to require a token +from the client when it fetches from them. The client could get the +token when performing a protocol negotiation with the main remote (see +section II.6 above).