From patchwork Thu May 23 21:26:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672304 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73F46128361 for ; Thu, 23 May 2024 21:26:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499574; cv=none; b=kfLf48ExjSg53+n1MCQ8J8b3rtzDfZ2hkNkw8Bi8zmyGLJlx2S1xoGgzKcKbklIt3cDYIwGU13MxvSGSsXR+1ZzqTwL2quIIwHqHoTiYwCqCEnJsoZX6+48q7rxlH58Y3KNkn+/g4ESlkHwI47v3AuIqLxij/Y+ucAB+HweBzKk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499574; c=relaxed/simple; bh=+h8EBBJM6kQcimE50unyWDzE2NF0rRQFE6NERXmRZ/4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WBs1Hfy2empMF3I9gPo236XSPdLt9VIEwMPxKY+lf84SH+HkqeNgJU3ub1hOAWl4nQx//imuoiPeHUcEBPOU/NKYe6BUDY/UtJstMo9idaT1VcvZ/SkXkLfuJK0zfHJ9dXjj+MIeiWWEtnxrVrOkb8zI+matZZgVnhQrACr9NZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=pZbZzTru; arc=none smtp.client-ip=209.85.160.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="pZbZzTru" Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-43e0a5e3c08so16596101cf.3 for ; Thu, 23 May 2024 14:26:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499572; x=1717104372; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Nfbq6IPF09RYZATmkNgRynk012fh8jURRDPwzyjHbXA=; b=pZbZzTru8P7AYuxlK3TdsW6o5t44iKInS1+pljdymxQI4AUsf+5GNLzHjZW59ImT3m yPd6T+q50E5u49a2Ixh7CXOLVi+j7sBkH21KMsRceKvi1S1U6ItDAH3C8E18XuvnORyw KF9TK3PfnkxcPcEs8jjoA7NHnthAELSRO2SiuFm4rHq5VKmrGM8bj0up0eCsvG7VPQre Z9Wv5zeOUK6WLA+MRjZHqmvrcvznEgI61+xDpUs11WwYK3WeyT/ThnbHbk+PV9JdMHJq xVuan5pu0D4fAAzXw7rrvJWTauExmGr6CuYWc+knzvqNZssoiBQ3qXM57O8XlwoR9FPZ Ov7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499572; x=1717104372; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Nfbq6IPF09RYZATmkNgRynk012fh8jURRDPwzyjHbXA=; b=oeN3C35jwRlnT+ded69d0+Y68q14G597qrsJrtHSrYAS371TbTBgi8Xknky9L4z4tp +YmFW9tjKi90s+n/CFA/cUSDq/6PhsQrUC7fQKYMunVaF0X9nhHD4nWgkvF9uEaGLBF9 4tWhIHY/tALG3O1JCJQP0aDYKhU8xVl4VR603C7HANIixAKg/mltxiLqh7X/2rc3yNUa hsYedf+dTkwhiIUXFYRFDD5b3iIisE9grU8eU8bve8K5+9+pGX8RYZvALBJ2RQCasgxT +DvgHgvZ9SG6VMqJqkPj/hU2omG9K/+AlF/gT1vWePljlRohQXBXIw8wfW4F6v4Vqk4m cQRA== X-Gm-Message-State: AOJu0YwXEXi5Sf4h+7lLCpSgjsOrd06f6wW9gGgtl39Nvu4a9D8qsFAn qXZXawdoqFwQ+GOw414yGTrSvMeYBiHQGE85Bx08p53R6yo8QGPz257sUg4+FREh5+olVU42Pt6 i X-Google-Smtp-Source: AGHT+IEf+fhd/qZ1XNyi/A8gWJ+pyfiHK94v8lO1awqhRAvM1FshG9cPLcIu6PVp1eKCh0iYClYITg== X-Received: by 2002:a05:6214:3a0a:b0:6ab:6d70:99ca with SMTP id 6a1803df08f44-6abcda78778mr2413556d6.53.1716499571976; Thu, 23 May 2024 14:26:11 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac162ef87fsm564946d6.80.2024.05.23.14.26.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:11 -0700 (PDT) Date: Thu, 23 May 2024 17:26:10 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Message-ID: <0f20c9becf452ef7a7e931b36336ccddba0f1d13.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Introduce a new manual page, gitpacking(7) to collect useful information about advanced packing concepts in Git. In future commits in this series, this manual page will expand to describe the new pseudo-merge bitmaps feature, as well as include examples, relevant configuration bits, use-cases, and so on. Outside of this series, this manual page may absorb similar pieces from other parts of Git's documentation about packing. Signed-off-by: Taylor Blau --- Documentation/Makefile | 1 + Documentation/gitpacking.txt | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) create mode 100644 Documentation/gitpacking.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 3f2383a12c7..920b6248aa4 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -51,6 +51,7 @@ MAN7_TXT += gitdiffcore.txt MAN7_TXT += giteveryday.txt MAN7_TXT += gitfaq.txt MAN7_TXT += gitglossary.txt +MAN7_TXT += gitpacking.txt MAN7_TXT += gitnamespaces.txt MAN7_TXT += gitremote-helpers.txt MAN7_TXT += gitrevisions.txt diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt new file mode 100644 index 00000000000..50e9900d845 --- /dev/null +++ b/Documentation/gitpacking.txt @@ -0,0 +1,34 @@ +gitpacking(7) +============= + +NAME +---- +gitpacking - Advanced concepts related to packing in Git + +SYNOPSIS +-------- +gitpacking + +DESCRIPTION +----------- + +This document aims to describe some advanced concepts related to packing +in Git. + +Many concepts are currently described scattered between manual pages of +various Git commands, including linkgit:git-pack-objects[1], +linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5], +and parts of the `Documentation/technical` tree. + +There are many aspects of packing in Git that are not covered in this +document that instead live in the aforementioned areas. Over time, those +scattered bits may coalesce into this document. + +SEE ALSO +-------- +linkgit:git-pack-objects[1] +linkgit:git-repack[1] + +GIT +--- +Part of the linkgit:git[1] suite From patchwork Thu May 23 21:26:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672305 Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A82CB128361 for ; Thu, 23 May 2024 21:26:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499578; cv=none; b=IXQUcdeucaOBv4nkn3Q+MU9Sa9mJ+yHl4EOsp5z0AYlH1Yh8AbiGzgkC5ttaHFiAUhWHEktn70xGu7TAteYNlAQfKC6OP4FRtsVcZdRcX/NHawlg4e0LNQn2kEb8dJBaiqNQY4q6AwjoXQ0w3S+SZwiwETo5KVQv+BPiAMT+bTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499578; c=relaxed/simple; bh=j5CFMMfwPchfRmJqAzHYwsSLaCwYG+H9x1ZiPDAFiDg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pcgLaJL6twKD3vVniU/6AGuXaEP9SpCagnqgZ4I4uaaPAVTbMT43pu0boTGzxKD+kyE71VSh+G8OXSzRFKUxr/4FYobBKDleTNN/9rWDz9p+KdTyvn5lZesBPsYtG9TgTiiCkBpNCwI1/vWsAUBK/0XHij/JUerVn3qOXi2ws2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=wcbJ5yaB; arc=none smtp.client-ip=209.85.222.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="wcbJ5yaB" Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7948b7e4e5dso321028785a.1 for ; Thu, 23 May 2024 14:26:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499575; x=1717104375; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7rI4QUSPIQ2Hdy7WIB777QCejc22IuI5udZAXpV0vog=; b=wcbJ5yaB/gea9lUwdzCE0EEqH+0iAnRiUgtMccX1lVlWTqj3/jFbaeVkkGlIfVMymf bkiilZZudyHzqqHJkhXvYJPkZumvZ5fLJ6fC6AucbjfRjyVBqv2qa/dv8nzGWKttc+R5 bA/EX/pOTvjqky1XwuJeMWwFMGdetU02q8mMDEGwceHIBws7As/dnVakZ3bXCDoVPIlz 6VnqUFaNR8iwRL1TTZEkkIWn+Z8ZlQbzplBD58+GVDywlQbkmRCyaJasyncUj9h2ik4m BS8jWCLExDhEyvQ4NVMl7x78cqp6A7wnyxXHBzhb99+lJPeysS63ocgvhW6ZhH6S6DNT qUVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499575; x=1717104375; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7rI4QUSPIQ2Hdy7WIB777QCejc22IuI5udZAXpV0vog=; b=AGyiLnUY3WA1T5RgJNyWD9zbNBUre8mtylEKEyBfhyykACAcFpyq9UIJAoRnKZgrbk UU4MAWCU/ly9CngAlJ8EyHO7+QRqXizToTpGCdhsdJ6jFnbmAzXsMZ+qGgNMC5R38RiR SpT1FWW8gTYEpQEqDavZOzadN83UMPgmtVRq1aSTKlLMFJA+k8CZ9R1hjWYXSHCNqi3f NqpdiUSWYk9z/ZwhqJuS5IjzgzfeNX9JXHVpGqOYfztveP6mSQKjsPI/SMnyQ/4jIy9H YQd9eCem6Tf9WQ9h3aAjf4hFQyFLr87dKBX500Fr3ZBpKjPfn1QnGE6bTdZF+dmqPz3L xMgw== X-Gm-Message-State: AOJu0YzFMhKfjzH0hZwM0NhZ3gyhQfdwUdGLjTQeP4ZCgjgGraxS2n+z ZFNy5eAxFGXOXdysXvP6H/wG+zGXm3PLnp7LHBhXvKRGRCkDvjZK0ComwMkK1rzpwxYgmM6S+Ws M X-Google-Smtp-Source: AGHT+IGFNx92phmsp4f1Smnt/q72Dj68F66XiPMvAi3pX2BrktnEIp7arQfN2FVLI/05oFM73BwYzw== X-Received: by 2002:a05:620a:2fb:b0:792:beda:304a with SMTP id af79cd13be357-794aafa6764mr52112885a.0.1716499575163; Thu, 23 May 2024 14:26:15 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abcc5844sm2776485a.53.2024.05.23.14.26.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:14 -0700 (PDT) Date: Thu, 23 May 2024 17:26:13 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Message-ID: <48afaa7492815350ae17405da1a8d09eb8e97c15.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Add some details to the gitpacking(7) manual page which motivate and describe pseudo-merge bitmaps. The exact on-disk format and many of the configuration knobs will be described in subsequent commits. Helped-by: Jeff King Signed-off-by: Taylor Blau --- Documentation/gitpacking.txt | 72 ++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt index 50e9900d845..f24396f0173 100644 --- a/Documentation/gitpacking.txt +++ b/Documentation/gitpacking.txt @@ -24,6 +24,78 @@ There are many aspects of packing in Git that are not covered in this document that instead live in the aforementioned areas. Over time, those scattered bits may coalesce into this document. +== Pseudo-merge bitmaps + +NOTE: Pseudo-merge bitmaps are considered an experimental feature, so +the configuration and many of the ideas are subject to change. + +=== Background + +Reachability bitmaps are most efficient when we have on-disk stored +bitmaps for one or more of the starting points of a traversal. For this +reason, Git prefers storing bitmaps for commits at the tips of refs, +because traversals tend to start with those points. + +But if you have a large number of refs, it's not feasible to store a +bitmap for _every_ ref tip. It takes up space, and just OR-ing all of +those bitmaps together is expensive. + +One way we can deal with that is to create bitmaps that represent +_groups_ of refs. When a traversal asks about the entire group, then we +can use this single bitmap instead of considering each ref individually. +Because these bitmaps represent the set of objects which would be +reachable in a hypothetical merge of all of the commits, we call them +pseudo-merge bitmaps. + +=== Overview + +A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as +follows: + +Commit bitmap:: + + A bitmap whose set bits describe the set of commits included in the + pseudo-merge's "merge" bitmap (as below). + +Merge bitmap:: + + A bitmap whose set bits describe the reachability closure over the set + of commits in the pseudo-merge's "commits" bitmap (as above). An + identical bitmap would be generated for an octopus merge with the same + set of parents as described in the commits bitmap. + +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits +for a given pseudo-merge are listed on either side of the traversal, +either directly (by explicitly asking for them as part of the `HAVES` +or `WANTS`) or indirectly (by encountering them during a fill-in +traversal). + +=== Use-cases + +For example, suppose there exists a pseudo-merge bitmap with a large +number of commits, all of which are listed in the `WANTS` section of +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the +bitmap machinery can quickly determine there is a pseudo-merge which +satisfies some subset of the wanted objects on either side of the query. +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would +have to repeat the decompression and `OR`-ing step over a potentially +large number of individual bitmaps, which can take proportionally more +time. + +Another benefit of pseudo-merges arises when there is some combination +of (a) a large number of references, with (b) poor bitmap coverage, and +(c) deep, nested trees, making fill-in traversal relatively expensive. +For example, suppose that there are a large enough number of tags where +bitmapping each of the tags individually is infeasible. Without +pseudo-merge bitmaps, computing the result of, say, `git rev-list +--use-bitmap-index --count --objects --tags` would likely require a +large amount of fill-in traversal. But when a large quantity of those +tags are stored together in a pseudo-merge bitmap, the bitmap machinery +can take advantage of the fact that we only care about the union of +objects reachable from all of those tags, and answer the query much +faster. + SEE ALSO -------- linkgit:git-pack-objects[1] From patchwork Thu May 23 21:26:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672306 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1492A1292DB for ; Thu, 23 May 2024 21:26:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499581; cv=none; b=tFCHBLeciMMG07gVXl09aXo/wfud+0P0gy+hZz4HR1L8BksCZKF4YPzS9vPlzl5iP+Rgw+cokyZMghlMOiKNL98hfR86JfMOd/g1fBt+QnHeOvjQd0ixUdoc7Qi1jeG28ksEGzeP1nZ7+U9vuhFDGYM/aeY7BQfjbxIJssI+nAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499581; c=relaxed/simple; bh=UUqJBbYLGvri/cPeuiJX+AnBv9WFynxOmX7r3gDBNok=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gZcl1Ant6+mweH2SoHtyeVJmYvQRYix25U7eQNVNZ8RKjnFe2e0xjq3dIsH8bwDa4J9XIrc6tlRtNJlay2gHxaWc2c9y7N+DFYlwyITQ/mnB1kpKn5R+UOaEkYSmzyq7MhbghHFe+xtWWcgPtAgVnTmF8G5KPvUYFz7/R/ggVPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=A+1jOO/s; arc=none smtp.client-ip=209.85.222.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="A+1jOO/s" Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-794aac9a2cdso14821585a.3 for ; Thu, 23 May 2024 14:26:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499579; x=1717104379; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rwyaGdXvH0f0RbEJo0zZFvLi+livOxMw8FNsKZXpRbQ=; b=A+1jOO/sOOpMPT1eQB73UUgpr1Z5In6041NmAf1u4A+ee7NOtyOlMCWJRQ+eG8aMqG MdXyxypTBzFyp4GfcMGRJ9qHyv4s++sMGkodAActKhhXOnYcO/mnaX2avdHJdlHcPsDA uuOLaE77HAMSYnenRnLWyCvfsznr3GoP7CqHyBMukW8wtDKKBR21R4dgTpqylWZcBHUc aCt8+BAwxA8JwH0jSKcFl9vE6nPPYcMCC+AK3Lecv9zvo0HL/zWThZUMEq1LWhks/7+v fScsJdnwp/oluYXUqoH32ceLJO6tkOOiCrj3ufGegDFrbIlpMdnvZOrDQ85GrewSO8/Z HD0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499579; x=1717104379; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rwyaGdXvH0f0RbEJo0zZFvLi+livOxMw8FNsKZXpRbQ=; b=hgSZb9Osqz8hzJWWbvXa6NKeCCDbjJ8CrnDWTnElqaTQLdtjp1SbelwjpzNv3oeJCM /yPz3kVxnBPe1Kkpc95Uj5CA7bhx8JCt50yo6gAnRNDDBrZ4d5mlBSvjsO7O4xvrvLIP +KX5RjFIAnyX9Xh/F5lid/cGP6n6Zyvd0UxHJUMM7RGCxpbNIZLMrU5bjcems98vaMxz SlCdfp8+K7cUFmM8IuCodHAYkbzWTDNdyRAiTLiRrJsHLK3xPFsqbZhlLI8WDsbUdiu+ Qu7GspeUO+oAOB86ZzG4SOTlL4MjyXF5/76GX7L7SSxGIFjInda93lBOvYOWbPTUvRiJ txFw== X-Gm-Message-State: AOJu0YywQ5jLjAsQTD/6B83rpfIGuCoT5olq7t93aVYtTcqP/iS4XJdG CuircnCxUXCHGW1dUaYA/4E5urmjLQJCLCSuiki2XsTqhaDipOp+RCrT+3bh4PNK4ssrPP0xhf8 z X-Google-Smtp-Source: AGHT+IHmkizSWYINhzUcYMy/+72hcWBkY+ECwNupLNreK6YIsInBRx50wuU5G++/bCi1o3mbxYNlkQ== X-Received: by 2002:a05:620a:14a4:b0:792:cce6:cef8 with SMTP id af79cd13be357-794ab0a3242mr45753685a.25.1716499578517; Thu, 23 May 2024 14:26:18 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abcd9841sm2713885a.57.2024.05.23.14.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:17 -0700 (PDT) Date: Thu, 23 May 2024 17:26:16 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format Message-ID: <44046f83c1a8a5971fd434debbf31df00658e2b8.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to implement pseudo-merge bitmaps over the next several commits by first describing the serialization format which will store the new pseudo-merge bitmaps themselves. This format is implemented as an optional extension within the bitmap v1 format, making it compatible with previous versions of Git, as well as the original .bitmap implementation within JGit. The format is described in detail in the patch contents below, but the high-level description is as follows: - An array of pseudo-merge bitmaps, each containing a pair of EWAH bitmaps: one describing the set of pseudo-merge "parents", and another describing the set of object(s) reachable from those parents. - A lookup table to determine which pseudo-merge(s) a given commit appears in. An optional extended lookup table follows when there is at least one commit which appears in multiple pseudo-merge groups. - Trailing metadata, including the number of pseudo-merge(s), number of unique parents, the offset within the .bitmap file for the pseudo-merge commit lookup table, and the size of the optional extension itself. Signed-off-by: Taylor Blau --- Documentation/technical/bitmap-format.txt | 132 ++++++++++++++++++++++ 1 file changed, 132 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index f5d200939b0..ee7775a2586 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -255,3 +255,135 @@ triplet is - xor_row (4 byte integer, network byte order): :: The position of the triplet whose bitmap is used to compress this one, or `0xffffffff` if no such bitmap exists. + +Pseudo-merge bitmaps +-------------------- + +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of +bytes (preceding the name-hash cache, commit lookup table, and trailing +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps. + +For more information on what pseudo-merges are, why they are useful, and +how to configure them, see the information in linkgit:gitpacking[7]. + +=== File format + +If enabled, pseudo-merge bitmaps are stored in an optional section at +the end of a `.bitmap` file. The format is as follows: + +.... ++-------------------------------------------+ +| .bitmap File | ++-------------------------------------------+ +| | +| Pseudo-merge bitmaps (Variable Length) | +| +---------------------------+ | +| | commits_bitmap (EWAH) | | +| +---------------------------+ | +| | merge_bitmap (EWAH) | | +| +---------------------------+ | +| | ++-------------------------------------------+ +| | +| Lookup Table | +| +---------------------------+ | +| | commit_pos (4 bytes) | | +| +---------------------------+ | +| | offset (8 bytes) | | +| +------------+--------------+ | +| | +| Offset Cases: | +| ------------- | +| | +| 1. MSB Unset: single pseudo-merge bitmap | +| + offset to pseudo-merge bitmap | +| | +| 2. MSB Set: multiple pseudo-merges | +| + offset to extended lookup table | +| | ++-------------------------------------------+ +| | +| Extended Lookup Table (Optional) | +| +----+----------+----------+----------+ | +| | N | Offset 1 | .... | Offset N | | +| +----+----------+----------+----------+ | +| | | 8 bytes | .... | 8 bytes | | +| +----+----------+----------+----------+ | +| | ++-------------------------------------------+ +| | +| Pseudo-merge Metadata | +| +-----------------------------------+ | +| | # pseudo-merges (4 bytes) | | +| +-----------------------------------+ | +| | # commits (4 bytes) | | +| +-----------------------------------+ | +| | Lookup offset (8 bytes) | | +| +-----------------------------------+ | +| | Extension size (8 bytes) | | +| +-----------------------------------+ | +| | ++-------------------------------------------+ +.... + +* One or more pseudo-merge bitmaps, each containing: + + ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of + commits included in the this psuedo-merge. + + ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of + the set of objects reachable from all commits listed in the + `commits_bitmap`. + +* A lookup table, mapping pseudo-merged commits to the pseudo-merges + they belong to. Entries appear in increasing order of each commit's + bit position. Each entry is 12 bytes wide, and is comprised of the + following: + + ** `commit_pos`, a 4-byte unsigned value (in network byte-order) + containing the bit position for this commit. + + ** `offset`, an 8-byte unsigned value (also in network byte-order) + containing either one of two possible offsets, depending on whether or + not the most-significant bit is set. + + *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset + (relative to the beginning of the `.bitmap` file) at which the + pseudo-merge bitmap for this commit can be read. This indicates + only a single pseudo-merge bitmap contains this commit. + + *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset + (again relative to the beginning of the `.bitmap` file) at which + the extended offset table can be located describing the set of + pseudo-merge bitmaps which contain this commit. This indicates + that multiple pseudo-merge bitmaps contain this commit. + +* An (optional) extended lookup table (written if and only if there is + at least one commit which appears in more than one pseudo-merge). + There are as many entries as commits which appear in multiple + pseudo-merges. Each entry contains the following: + + ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges + which contain a given commit. + + ** An array of `N` 8-byte unsigned values, each of which is + interpreted as an offset (relative to the beginning of the + `.bitmap` file) at which a pseudo-merge bitmap for this commit can + be read. These values occur in no particular order. + +* Positions for all pseudo-merges, each stored as an 8-byte unsigned + value (in network byte-order) containing the offset (relative to the + beginning of the `.bitmap` file) of each consecutive pseudo-merge. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + pseudo-merges. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + unique commits which appear in any pseudo-merge. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes between the start of the pseudo-merge section and the + beginning of the lookup table. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes in the pseudo-merge section (including this field). From patchwork Thu May 23 21:26:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672307 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 021911292E4 for ; Thu, 23 May 2024 21:26:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499584; cv=none; b=GfdYiT199Y8hXp8wHiORsYpnl9247IvGjldAPmI11krGRL7t9g8jKg4hSaepwhVnkxfKDijfufjmupnq8C+TCP+RYXNPrIZ5DHUpFGhIFaQGkq1jxjMWG/L1oVqW0KWCiHCQGrAGMDhxlMIHA8qqhwY6rggUaEF5ACiQ+6lAEAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499584; c=relaxed/simple; bh=soKJO413PZJLyqJiHb1Srzoxys7hX2I7+8wqYORq+Ew=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kyX8zSRAAuDvn1jFTt9UNP+Td/yEE2KUaCm/3vTS4DmdyaX8VP1Nu2NevQ+ESpakf8oP8uIcehdQN832fg/DmzKChItad9fVmBYFSMs39o09bL8gHj6iKrFwOjwNsmz4x7xBlpTuN2trTDs1GZVTSb/TUkyUlVO9KOsYtpK9X3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=o2yNdWCa; arc=none smtp.client-ip=209.85.219.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="o2yNdWCa" Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-df4e7763603so2140395276.1 for ; Thu, 23 May 2024 14:26:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499581; x=1717104381; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=x7c2NoJ2o1olbMC0NuhOqP/GbNvHrAvQ818tlxsMKZU=; b=o2yNdWCaD5EFskjnR5vHmfcxENhoar2qgCOSBYjT77GiBKXskRf3MpLAAYQ4nVctwN kYHo9GMz3Y1W38j47ZUurAty3IZfxYCnOzexXqlvRgjWsGE4/aleHJ4YfREf9SI6qSx2 r/ho5ZS7mvWEhQGMEYnT2Oj1K13gqszXoeiOqNEwFlG8JgUv5IKs1qqsEzPhtapgj0GF BnmpfRpaGKm2H0Yc23KwRsWnh8jm6JBzPzDcvadzVKhku+bInScvq278m9A23WQ8yQk3 NYPN7s03ICDegQnQYp7DPsXjpnw2ZMbjDiO75w3fApt1r+nvi7sLwBL5xdWRRaVmlMx8 +bmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499581; x=1717104381; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=x7c2NoJ2o1olbMC0NuhOqP/GbNvHrAvQ818tlxsMKZU=; b=WOs6n0PCepHx5l0WOmEEeODV3rYZEiYmfHMESkPKioM7vVF28VVY9pw4jfwCt0/QGI gxq06Mo25MndJ6bCgbbLrcGK8Duwqvk473P2d9IV/A2ngxU9n5iSCmLmPBKDRSTl4RU3 lFAYQjpPCECQ8drAV6lH6UxIf5ehNnYKUvPKJSrnh1LnNoQPz743pork7EUhASuMsG6Y bI4xx4LUkRBiNPqyL6WMHct7mRBr5LM0GoCVxX3d3ExqnegW78gutDkyyibK8cQHrCmd VnijW6xoGfauwb4HMJeHGuDS/wg1Xu4bOH9WEfuZk4xbMH3yuHDfswz4QJXXJfuoG2wy sRQQ== X-Gm-Message-State: AOJu0YxS1y0GLa4Lvkvpbu7K3RtlFXhwU3o678f/j9A37horWCV4WysT bYV0d95+Y9S2Z0PBU30u3YUzew6U2xYQQL39ID0tSoKiQouLid88IO9BmWXWZGdrNqJtGX3ErtH Z X-Google-Smtp-Source: AGHT+IFaX0itoiUpzppurKW88TRM4GhEsNii08fY9QE/nNWjthhW2FIpxyw87rB+hmMecn2hnpANUw== X-Received: by 2002:a25:5f09:0:b0:df4:c920:f80c with SMTP id 3f1490d57ef6-df772185bfbmr451180276.19.1716499581592; Thu, 23 May 2024 14:26:21 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb18c1268sm521901cf.87.2024.05.23.14.26.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:21 -0700 (PDT) Date: Thu, 23 May 2024 17:26:20 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()` Message-ID: <211d6f1412874d6211f4ce92f74bb3ed88292f8e.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In order to know whether a given pseudo-merge (comprised of a "parents" and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap result, we need to be able to quickly determine whether the "parents" bitmap is a subset of the current set of objects reachable on either side of a traversal. Implement a helper function to prepare for that, which determines whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a subset of a non-EWAH bitmap (in this case, the results bitmap from either side of the traversal). This function makes use of the EWAH iterator to avoid inflating any part of the EWAH bitmap after we determine it is not a subset of the non-EWAH bitmap. This "fail-fast" allows us to avoid a potentially large amount of wasted effort. Signed-off-by: Taylor Blau --- ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ ewah/ewok.h | 6 ++++++ 2 files changed, 49 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index ac7e0af622a..d352fec54ce 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other) self->words[i] |= other->words[i]; } +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other) +{ + struct ewah_iterator it; + eword_t word; + size_t i; + + ewah_iterator_init(&it, self); + + for (i = 0; i < other->word_alloc; i++) { + if (!ewah_iterator_next(&word, &it)) { + /* + * If we reached the end of `self`, and haven't + * rejected `self` as a possible subset of + * `other` yet, then we are done and `self` is + * indeed a subset of `other`. + */ + return 1; + } + if (word & ~other->words[i]) { + /* + * Otherwise, compare the next two pairs of + * words. If the word from `self` has bit(s) not + * in the word from `other`, `self` is not a + * subset of `other`. + */ + return 0; + } + } + + /* + * If we got to this point, there may be zero or more words + * remaining in `self`, with no remaining words left in `other`. + * If there are any bits set in the remaining word(s) in `self`, + * then `self` is not a subset of `other`. + */ + while (ewah_iterator_next(&word, &it)) + if (word) + return 0; + + /* `self` is definitely a subset of `other` */ + return 1; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; diff --git a/ewah/ewok.h b/ewah/ewok.h index c11d76c6f33..2b6c4ac499c 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -179,7 +179,13 @@ void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); + +/* + * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set + * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise. + */ int bitmap_is_subset(struct bitmap *self, struct bitmap *other); +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Thu May 23 21:26:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672308 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E10D41292E9 for ; Thu, 23 May 2024 21:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499588; cv=none; b=QKlL2wxHv542fBodQMTD52aRNbmt0oTjxRRQSWqK63oxeHv6tZ9FIvewmXDJbu6x1Jp5ix9PrV+I0QOsfL0dSZLGzkacsqv652CdiqW0phmxU+mPeS7bFHi6Sj2k4AL9pXXYCN4TMOp/qFcN4eojv4BHjsWlMStu1v98zhMpTk4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499588; c=relaxed/simple; bh=JQFeOzZPRi30+17T+1kb+CRnxss4pIC0YEGwVEm/cnM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oifKJmFAFl9BBl4Kbm8XFVnEaPW/XIqpK4bME+kgugPQvQmo6DWWYyUqY6Ogd+3lhw2eYqU63I9WrEzoMznOuFPV+SrYWmlJOSnYJDS85gLRJfQGvuGb5NEAPT96d/wwAQZ1QLeIBaDQpU48axDP/2u44AtV2HUCqJY1SJeuBL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=Zp9kztpV; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Zp9kztpV" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-627ebbe7720so23951587b3.0 for ; Thu, 23 May 2024 14:26:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499585; x=1717104385; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/rwgV4LKg4jESB8+yWiBKJLJrM6OHHYtmulSn+FnkSM=; b=Zp9kztpVEeWbbZzNUvFjLm8HSf5yFKpN4Z0pkizVt8I8eEIwElM156zsFWK4mmGtvw VuFWTY8Q/CeRRgXi1V5ilv7TZJQPYv3FIyvsSCNC16jVy4bQoUhlk6QkRqm2cQxSsUt1 B4DnTTz4MEV4i2ojoDn6RdqZteM12XBH7e1xkCPNfvcuJOFB15s8fnv6tphDCfhMm+RX kGwo1A/c1eHmMyvCZTYS6E2m8H7ttAH6SMU4/RXqMm7GXWcEqOVVvtABdWx/XpHfExw4 ItG4+nwDBv6b36ULMpsQWPYtrk372o/14KkZr3Un6JdpP47+c9si37bpkt58OkpTIX/Q lD/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499585; x=1717104385; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/rwgV4LKg4jESB8+yWiBKJLJrM6OHHYtmulSn+FnkSM=; b=j7+WQEynkR5KSmIh1jyhOeYy5OJXG81l26A1mvc4Q92Fd8vEA20CrvogieaeOxJHmB V1ngQgnRuoKhcIaqEVo8q44iAVstlEhx+UyYTfbe0E7mLq4K6LOcWM+1lqGNAg9AXiXo IxslonlbXkZuxqtdOt8k0eLOlBIUtqpztV+Z2C/nNRHu5hYAWw0IX2ndXZ5ajb4PAHUI HmatWAXyWtz/kCrdQN4JjXVKAMMWMxSJ0iuHZAJbtsMx0WDcAqb+WBGMo9LQY/G/1F3S s1dqASqk+OwHV3mpLLWT4fDQCeKaiGWjFvoimlCNav+hCS6kL0Jhd1A9KypJ1ociiqPn pN8A== X-Gm-Message-State: AOJu0Yzn39cPPm5c7T3f7FBp37/c19S8Cbb3tMH9V1q53LxwlDy2FAMA CfV47v4Ir65GUb6azE9BUrCSb7OaaxX/8c5wPlG1DSa7ZzWc9N+PjswuZO22fUXMeapz5FEm9SH X X-Google-Smtp-Source: AGHT+IF/oqcczS8r7AtJ0JmR7b5PKa1+A1yATR31uacULmLNvLBQAg7pNEL9s5ycY1nHAw7s3Km2RQ== X-Received: by 2002:a05:6902:2787:b0:de6:a3d:265f with SMTP id 3f1490d57ef6-df772156e66mr507022276.2.1716499584897; Thu, 23 May 2024 14:26:24 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070dc53bsm623636d6.41.2024.05.23.14.26.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:24 -0700 (PDT) Date: Thu, 23 May 2024 17:26:23 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Message-ID: <650cac2dcf920b2bb64d7a026aa83228a2b2c354.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to map from commits selected for bitmaps (by OID) to a bitmapped_commit structure (containing the bitmap itself, among other things like its XOR offset, etc.) This map was initialized at the end of `bitmap_writer_build()`. New entries are added in `pack-bitmap-write.c::store_selected()`, which is called by the bitmap_builder machinery (which is responsible for traversing history and generating the actual bitmaps). Reorganize when this field is initialized and when entries are added to it so that we can quickly determine whether a commit is a candidate for pseudo-merge selection, or not (since it was already selected to receive a bitmap, and thus storing it in a pseudo-merge would be redundant). The changes are as follows: - Introduce a new `bitmap_writer_init()` function which initializes the `writer.bitmaps` field (instead of waiting until the end of `bitmap_writer_build()`). - Add map entries in `push_bitmapped_commit()` (which is called via `bitmap_writer_select_commits()`) with OID keys and NULL values to track whether or not we *expect* to write a bitmap for some given commit. - Validate that a NULL entry is found matching the given key when we store a selected bitmap. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 3 ++- midx-write.c | 2 +- pack-bitmap-write.c | 24 ++++++++++++++++++------ pack-bitmap.h | 2 +- 4 files changed, 22 insertions(+), 9 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 26a6d0d7919..6209264e60c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1340,7 +1340,8 @@ static void write_pack_file(void) hash_to_hex(hash)); if (write_bitmap_index) { - bitmap_writer_init(&bitmap_writer); + bitmap_writer_init(&bitmap_writer, + the_repository); bitmap_writer_set_checksum(&bitmap_writer, hash); bitmap_writer_build_type_index(&bitmap_writer, &to_pack, written_list, nr_written); diff --git a/midx-write.c b/midx-write.c index 7c0c08c64b2..c747d1a6af3 100644 --- a/midx-write.c +++ b/midx-write.c @@ -820,7 +820,7 @@ static int write_midx_bitmap(const char *midx_name, for (i = 0; i < pdata->nr_objects; i++) index[i] = &pdata->objects[i].idx; - bitmap_writer_init(&writer); + bitmap_writer_init(&writer, the_repository); bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS); bitmap_writer_build_type_index(&writer, pdata, index, pdata->nr_objects); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 6cae670412c..d8870155831 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -27,9 +27,12 @@ struct bitmapped_commit { uint32_t commit_pos; }; -void bitmap_writer_init(struct bitmap_writer *writer) +void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r) { memset(writer, 0, sizeof(struct bitmap_writer)); + if (writer->bitmaps) + BUG("bitmap writer already initialized"); + writer->bitmaps = kh_init_oid_map(); } void bitmap_writer_free(struct bitmap_writer *writer) @@ -128,11 +131,21 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer, static inline void push_bitmapped_commit(struct bitmap_writer *writer, struct commit *commit) { + int hash_ret; + khiter_t hash_pos; + if (writer->selected_nr >= writer->selected_alloc) { writer->selected_alloc = (writer->selected_alloc + 32) * 2; REALLOC_ARRAY(writer->selected, writer->selected_alloc); } + hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid, + &hash_ret); + if (!hash_ret) + die(_("duplicate entry when writing bitmap index: %s"), + oid_to_hex(&commit->object.oid)); + kh_value(writer->bitmaps, hash_pos) = NULL; + writer->selected[writer->selected_nr].commit = commit; writer->selected[writer->selected_nr].bitmap = NULL; writer->selected[writer->selected_nr].write_as = NULL; @@ -483,14 +496,14 @@ static void store_selected(struct bitmap_writer *writer, { struct bitmapped_commit *stored = &writer->selected[ent->idx]; khiter_t hash_pos; - int hash_ret; stored->bitmap = bitmap_to_ewah(ent->bitmap); - hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", + hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid); + if (hash_pos == kh_end(writer->bitmaps)) + die(_("attempted to store non-selected commit: '%s'"), oid_to_hex(&commit->object.oid)); + kh_value(writer->bitmaps, hash_pos) = stored; } @@ -506,7 +519,6 @@ int bitmap_writer_build(struct bitmap_writer *writer, uint32_t *mapping; int closed = 1; /* until proven otherwise */ - writer->bitmaps = kh_init_oid_map(); writer->to_pack = to_pack; if (writer->show_progress) diff --git a/pack-bitmap.h b/pack-bitmap.h index 3091095f336..f87e60153dd 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -114,7 +114,7 @@ struct bitmap_writer { unsigned char pack_checksum[GIT_MAX_RAWSZ]; }; -void bitmap_writer_init(struct bitmap_writer *writer); +void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r); void bitmap_writer_show_progress(struct bitmap_writer *writer, int show); void bitmap_writer_set_checksum(struct bitmap_writer *writer, const unsigned char *sha1); From patchwork Thu May 23 21:26:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672309 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63C581292FF for ; Thu, 23 May 2024 21:26:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499590; cv=none; b=GAMMLSwsyypjc/g4fzMuY/CZzgVXhPI6d3Jzs4+utppWYSzCWvyP6tI4xyhDoWmjUdYjhTA/yoEjoSNuA49DIQQkkxotbxgW87eSTkeblgphdymXOKwx5HtOEthjIBAaUw34NMFc+tghRRwdS/1wkuDDxONgTg84/QfLL7AiCUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499590; c=relaxed/simple; bh=ReCt4ao4Tpb9IejqSMk0FxijtPz/MUKxy/EiAZXR6aA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=IAahSLbvlE1UpfkcS8gmPuHFdUt66AD9/9f1l7uAE4bLxRi7laWABxs7B4JJSTkOE2AyKlwrTK91toUgJncXzUsTfI0nNYHJyycit37qcXz+u7amt06vpMicKO5zTdilxk7mGW2V5FPOuNlUVJS/sKCjouDTahrJ7aLfaJ6rpDE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=mUSBVB3i; arc=none smtp.client-ip=209.85.219.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="mUSBVB3i" Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6ab9ce67eefso1493466d6.0 for ; Thu, 23 May 2024 14:26:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499588; x=1717104388; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=oqyzmoQaf0/bNidQ/XZuR4Gez8S4fZELhMGUHYyfCX0=; b=mUSBVB3ivmOT4abqyEFrbRnIixeoLqJzO8X2knR75RTstLXfHNcb7Gc5HpxKrH2xWD uwtdfmDMXArcaWS7j5jvUkIZ4CId/3RW5NiYCjth0aX1aATvvsZ2jOjMkh/uJjL/RyHZ m8tS+Oa8GVFUBR/TmfccJ/BDXsfgjJwUzWKYQ47BQQ4NSa96R+/vVX0hADtUfaDsrO5N MQFsu/ReQmUonLVAR4La2yPMSFC0BKR0yo/rheR0qhes3FWrqXG4BgeBVqbo2Tz2C1Mk 5qLZqzWNmDVbFCgng8ODxJX/MmETSst9gxLj2nLq4rw6U4zWcPkxVqkeZFWUHYicVfQx 2I0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499588; x=1717104388; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oqyzmoQaf0/bNidQ/XZuR4Gez8S4fZELhMGUHYyfCX0=; b=fX4jlN69dijlg03pIqSDYDFlmG+4iPCvNLyC5+3H423oJkRwMgAn/g+Xyx0QQiMK2C HmK9BevVQUKOpKKRwpwg73Ivs/G7MeOd1xyrb9BZRZHyx0ceIjwyrSMGH5NTmT23x0bC AQce+k/kpxJgp7pob5cQLXh9YVqoA+shnGIhMkGneUFVymN2dEENJVdYaepX0kZ81PY4 UyxcG5Y251tSjmt4PtsNY+PBOihYA+wvuNgi2kQIEkA/K9YrcRVnYKXR2f/aUnWc3YH9 W+kQxq5UrcTnRTwq8KDIAI2tWfq85FckZlJOYgwjmLLsuf+sHbhIYardbq5kfOU72PDV HQeQ== X-Gm-Message-State: AOJu0Ywa2X6NUXw+d3TqnAzOpd4Z5DEgqcJOgL4ofr3/ijnrjYHQryJj pjIp2l5IXvK8cn3hteHinfyv+GxH652mnynpBstjhpdEP6vBf5tzxpJPK+N2hOa0WYUuk1Gqpt9 W X-Google-Smtp-Source: AGHT+IEcS9h2p5L5njzx/4Z0J2+CSEzMfDSQZmy3rersC4eOoonrFVKm3x41LGyuf1rx4fdR3ExhpQ== X-Received: by 2002:a05:6214:2d44:b0:6ab:94ad:2921 with SMTP id 6a1803df08f44-6abcd0cc6f0mr2931526d6.24.1716499587956; Thu, 23 May 2024 14:26:27 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070edf4asm615246d6.53.2024.05.23.14.26.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:27 -0700 (PDT) Date: Thu, 23 May 2024 17:26:26 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 06/24] pseudo-merge.ch: initial commit Message-ID: <6647d8832ce1b70c4b46bd1191086f7e4bc19a34.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Add a new (empty) header file to contain the implementation for selecting, reading, and applying pseudo-merge bitmaps. For now this header and its corresponding implementation are left empty, but they will evolve over the course of subsequent commit(s). Signed-off-by: Taylor Blau --- Makefile | 1 + pseudo-merge.c | 2 ++ pseudo-merge.h | 6 ++++++ 3 files changed, 9 insertions(+) create mode 100644 pseudo-merge.c create mode 100644 pseudo-merge.h diff --git a/Makefile b/Makefile index 0285db56306..4705a69f57f 100644 --- a/Makefile +++ b/Makefile @@ -1105,6 +1105,7 @@ LIB_OBJS += prompt.o LIB_OBJS += protocol.o LIB_OBJS += protocol-caps.o LIB_OBJS += prune-packed.o +LIB_OBJS += pseudo-merge.o LIB_OBJS += quote.o LIB_OBJS += range-diff.o LIB_OBJS += reachable.o diff --git a/pseudo-merge.c b/pseudo-merge.c new file mode 100644 index 00000000000..37e037ba272 --- /dev/null +++ b/pseudo-merge.c @@ -0,0 +1,2 @@ +#include "git-compat-util.h" +#include "pseudo-merge.h" diff --git a/pseudo-merge.h b/pseudo-merge.h new file mode 100644 index 00000000000..cab8ff6960a --- /dev/null +++ b/pseudo-merge.h @@ -0,0 +1,6 @@ +#ifndef PSEUDO_MERGE_H +#define PSEUDO_MERGE_H + +#include "git-compat-util.h" + +#endif From patchwork Thu May 23 21:26:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672310 Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB54F129A71 for ; Thu, 23 May 2024 21:26:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499594; cv=none; b=Lpk62m7Ild5F+2y+WtH3UFbjc2hGqEE8kVhE2hIKiqzP1B1pVaYwAWBaQ2y/8t0AJc+Qt/q6ZbH8jzFNOjaOF1MqvSxmcfNfnOer+F2OekOBvvXmVsnmBA9MGwzd0E+Zhj2zXsKbQgEynCc1vz6D1ygQFKxXmiBnIG+ocOQCyw4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499594; c=relaxed/simple; bh=dK1wvkKI/IOpTzIs3Z6hnaKBf/wGZ+1ZarmlfaoXYVw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hRI5DyzrWk8WqRC7A+mIImS5iSrqWn/XNu6iHBpUUoP4e3oqoTFvltuWb+modGCIHyk+B4PIBcolEzHrUoEUdVmOP3OrDfCOxbXjlxNJgleG6uZjPh43a4pAGC0MsR808qgI1lK6T/QRALGQaSJVrGrwbZxbB/Cr6AlotfsLOa4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=eB2zgNJO; arc=none smtp.client-ip=209.85.221.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="eB2zgNJO" Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-4e15ac35809so2223694e0c.2 for ; Thu, 23 May 2024 14:26:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499591; x=1717104391; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PDxk7rTr1k7L4HspQXsDMze97TOsfhoiwuf1+oKKsFs=; b=eB2zgNJOJGe07JkwPHJ3MN7oV1cl0P/yopEA5id6OKO10UbJ6tbOzAXpolVfYdMENT iWEn4ilxntzq5ss0VHKrwgcNG4BZwd4vSixR7IfIgNvksrlghZqdRv19u6rWTyq20+rL QeEIaxFr6Ecs8wQZopzZejz+DKOqztNTgsvjQI1IpvHhRwwUhWZCmjidXR9wbBFMX7v/ yDb/+ytAbVxEntQ78cbPlxXaXIwYnpxSv7ghnaUKZhnrSS0nbdHo9Lkl0x8pK8v60ZgC JtKenXpR3ZppxVhYQrjzMaby6qbm6RWjnq6fjWZQkgKmFO5BNxf/PLkxSBG/8YbovG3a yF+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499591; x=1717104391; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PDxk7rTr1k7L4HspQXsDMze97TOsfhoiwuf1+oKKsFs=; b=ai/BSrBb9fiurOAfpEpc1HTcZjc8oAwxh3YVmYb669wwUNIizvMFF05+v1TPcPr2lV +PiUeLvw+km5v6vSZ5LTmT71L1Ps/8Dachpb8vPFSRZYUgydAFW30D9Ig3e76SZEkhj6 d6BI1O6ozeAjw9oTP6mvncD0uYHZ0KXSFjtlouffMhLuFZBb6Lytbn8JT/PFPccEWfcb 3QRn5qw7froEVl8eCMO7LwbeLO88DN42PgoQr1xGP6oD3wTLueQBbXIGYyF7MFq1/Uqw tBP0Z5gFkb8Dxw0Ny91jxaio+NYiAY2Ute8hCrh8d2mImalj3tqAiZBmdRBtSIbSSxmj c77A== X-Gm-Message-State: AOJu0YyxRmSTLZaOHixaG6jkgnxzTV/lxYrl0JNUlPDwssPddR7AtjwR 866RfRPTESwHKHGbVUOtZFh64Baucsu7AcVGaDf2sQpKmW8YU49/TINO0cg6fiGp0+hFrbZyHmF t X-Google-Smtp-Source: AGHT+IE2PUEFPRDBljs0YeHo691TBI3XWTPjQzPZGImIKIF5WAWXBgxwy6JUWRjtvrUEts9uun7/BQ== X-Received: by 2002:a05:6122:458e:b0:4c8:e5a0:4222 with SMTP id 71dfb90a1353d-4e4f02d2667mr545978e0c.12.1716499591153; Thu, 23 May 2024 14:26:31 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abd48978sm2215185a.132.2024.05.23.14.26.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:30 -0700 (PDT) Date: Thu, 23 May 2024 17:26:29 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to write pseudo-merge bitmaps by annotating individual bitmapped commits (which are represented by the `bitmapped_commit` structure) with an extra bit indicating whether or not they are a pseudo-merge. In subsequent commits, pseudo-merge bitmaps will be generated by allocating a fake commit node with parents covering the full set of commits represented by the pseudo-merge bitmap. These commits will be added to the set of "selected" commits as usual, but will be written specially instead of being included with the rest of the selected commits. Mechanically speaking, there are two parts of this change: - The bitmapped_commit struct gets a new bit indicating whether it is a pseudo-merge, or an ordinary commit selected for bitmaps. - A handful of changes to only write out the non-pseudo-merge commits when enumerating through the selected array (see the new `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear after all non-pseudo-merge commits, so it is safe to enumerate through the selected array like so: for (i = 0; i < bitmap_writer_selected_nr(); i++) if (writer.selected[i].pseudo_merge) BUG("unexpected pseudo-merge"); without encountering the BUG(). Signed-off-by: Taylor Blau --- object.h | 2 +- pack-bitmap-write.c | 96 +++++++++++++++++++++++++++++---------------- pack-bitmap.h | 3 ++ 3 files changed, 67 insertions(+), 34 deletions(-) diff --git a/object.h b/object.h index 99b9c8f114c..e6f9e89d3c5 100644 --- a/object.h +++ b/object.h @@ -81,7 +81,7 @@ void object_array_init(struct object_array *array); * reflog.c: 10--12 * builtin/show-branch.c: 0-------------------------------------------26 * builtin/unpack-objects.c: 2021 - * pack-bitmap.h: 22 + * pack-bitmap.h: 2122 */ #define FLAG_BITS 28 diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d8870155831..60eb1e71c98 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -25,8 +25,14 @@ struct bitmapped_commit { int flags; int xor_offset; uint32_t commit_pos; + unsigned pseudo_merge : 1; }; +static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer) +{ + return writer->selected_nr - writer->pseudo_merges_nr; +} + void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r) { memset(writer, 0, sizeof(struct bitmap_writer)); @@ -129,27 +135,31 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer, */ static inline void push_bitmapped_commit(struct bitmap_writer *writer, - struct commit *commit) + struct commit *commit, + unsigned pseudo_merge) { - int hash_ret; - khiter_t hash_pos; - if (writer->selected_nr >= writer->selected_alloc) { writer->selected_alloc = (writer->selected_alloc + 32) * 2; REALLOC_ARRAY(writer->selected, writer->selected_alloc); } - hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid, - &hash_ret); - if (!hash_ret) - die(_("duplicate entry when writing bitmap index: %s"), - oid_to_hex(&commit->object.oid)); - kh_value(writer->bitmaps, hash_pos) = NULL; + if (!pseudo_merge) { + int hash_ret; + khiter_t hash_pos = kh_put_oid_map(writer->bitmaps, + commit->object.oid, + &hash_ret); + + if (!hash_ret) + die(_("duplicate entry when writing bitmap index: %s"), + oid_to_hex(&commit->object.oid)); + kh_value(writer->bitmaps, hash_pos) = NULL; + } writer->selected[writer->selected_nr].commit = commit; writer->selected[writer->selected_nr].bitmap = NULL; writer->selected[writer->selected_nr].write_as = NULL; writer->selected[writer->selected_nr].flags = 0; + writer->selected[writer->selected_nr].pseudo_merge = pseudo_merge; writer->selected_nr++; } @@ -180,16 +190,20 @@ static void compute_xor_offsets(struct bitmap_writer *writer) while (next < writer->selected_nr) { struct bitmapped_commit *stored = &writer->selected[next]; - int best_offset = 0; struct ewah_bitmap *best_bitmap = stored->bitmap; struct ewah_bitmap *test_xor; + if (stored->pseudo_merge) + goto next; + for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) { int curr = next - i; if (curr < 0) break; + if (writer->selected[curr].pseudo_merge) + continue; test_xor = ewah_pool_new(); ewah_xor(writer->selected[curr].bitmap, stored->bitmap, test_xor); @@ -205,6 +219,7 @@ static void compute_xor_offsets(struct bitmap_writer *writer) } } +next: stored->xor_offset = best_offset; stored->write_as = best_bitmap; @@ -217,7 +232,8 @@ struct bb_commit { struct bitmap *commit_mask; struct bitmap *bitmap; unsigned selected:1, - maximal:1; + maximal:1, + pseudo_merge:1; unsigned idx; /* within selected array */ }; @@ -255,17 +271,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb, revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { - struct commit *c = writer->selected[i].commit; - struct bb_commit *ent = bb_data_at(&bb->data, c); + struct bitmapped_commit *bc = &writer->selected[i]; + struct bb_commit *ent = bb_data_at(&bb->data, bc->commit); ent->selected = 1; ent->maximal = 1; + ent->pseudo_merge = bc->pseudo_merge; ent->idx = i; ent->commit_mask = bitmap_new(); bitmap_set(ent->commit_mask, i); - add_pending_object(&revs, &c->object, ""); + add_pending_object(&revs, &bc->commit->object, ""); } if (prepare_revision_walk(&revs)) @@ -444,8 +461,13 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, struct commit *c = prio_queue_get(queue); if (old_bitmap && mapping) { - struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c); + struct ewah_bitmap *old; struct bitmap *remapped = bitmap_new(); + + if (commit->object.flags & BITMAP_PSEUDO_MERGE) + old = NULL; + else + old = bitmap_for_commit(old_bitmap, c); /* * If this commit has an old bitmap, then translate that * bitmap and add its bits to this one. No need to walk @@ -464,12 +486,14 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, * Mark ourselves and queue our tree. The commit * walk ensures we cover all parents. */ - pos = find_object_pos(writer, &c->object.oid, &found); - if (!found) - return -1; - bitmap_set(ent->bitmap, pos); - prio_queue_put(tree_queue, - repo_get_commit_tree(the_repository, c)); + if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) { + pos = find_object_pos(writer, &c->object.oid, &found); + if (!found) + return -1; + bitmap_set(ent->bitmap, pos); + prio_queue_put(tree_queue, + repo_get_commit_tree(the_repository, c)); + } for (p = c->parents; p; p = p->next) { pos = find_object_pos(writer, &p->item->object.oid, @@ -499,6 +523,9 @@ static void store_selected(struct bitmap_writer *writer, stored->bitmap = bitmap_to_ewah(ent->bitmap); + if (ent->pseudo_merge) + return; + hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid); if (hash_pos == kh_end(writer->bitmaps)) die(_("attempted to store non-selected commit: '%s'"), @@ -631,7 +658,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(writer, indexed_commits[i]); + push_bitmapped_commit(writer, indexed_commits[i], 0); return; } @@ -664,7 +691,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer, } } - push_bitmapped_commit(writer, chosen); + push_bitmapped_commit(writer, chosen, 0); i += next + 1; display_progress(writer->progress, i); @@ -701,8 +728,11 @@ static void write_selected_commits_v1(struct bitmap_writer *writer, { int i; - for (i = 0; i < writer->selected_nr; ++i) { + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); ++i) { struct bitmapped_commit *stored = &writer->selected[i]; + if (stored->pseudo_merge) + BUG("unexpected pseudo-merge among selected: %s", + oid_to_hex(&stored->commit->object.oid)); if (offsets) offsets[i] = hashfile_total(f); @@ -735,10 +765,10 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f, uint32_t i; uint32_t *table, *table_inv; - ALLOC_ARRAY(table, writer->selected_nr); - ALLOC_ARRAY(table_inv, writer->selected_nr); + ALLOC_ARRAY(table, bitmap_writer_nr_selected_commits(writer)); + ALLOC_ARRAY(table_inv, bitmap_writer_nr_selected_commits(writer)); - for (i = 0; i < writer->selected_nr; i++) + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) table[i] = i; /* @@ -746,16 +776,16 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f, * bitmap corresponds to j'th bitmapped commit (among the selected * commits) in lex order of OIDs. */ - QSORT_S(table, writer->selected_nr, table_cmp, writer); + QSORT_S(table, bitmap_writer_nr_selected_commits(writer), table_cmp, writer); /* table_inv helps us discover that relationship (i'th bitmap * to j'th commit by j = table_inv[i]) */ - for (i = 0; i < writer->selected_nr; i++) + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) table_inv[table[i]] = i; trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository); - for (i = 0; i < writer->selected_nr; i++) { + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) { struct bitmapped_commit *selected = &writer->selected[table[i]]; uint32_t xor_offset = selected->xor_offset; uint32_t xor_row; @@ -827,7 +857,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer, memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); header.version = htons(default_version); header.options = htons(flags | options); - header.entry_count = htonl(writer->selected_nr); + header.entry_count = htonl(bitmap_writer_nr_selected_commits(writer)); hashcpy(header.checksum, writer->pack_checksum); hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz); @@ -839,7 +869,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer, if (options & BITMAP_OPT_LOOKUP_TABLE) CALLOC_ARRAY(offsets, index_nr); - for (i = 0; i < writer->selected_nr; i++) { + for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) { struct bitmapped_commit *stored = &writer->selected[i]; int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access); diff --git a/pack-bitmap.h b/pack-bitmap.h index f87e60153dd..6937a0f090f 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -21,6 +21,7 @@ struct bitmap_disk_header { unsigned char checksum[GIT_MAX_RAWSZ]; }; +#define BITMAP_PSEUDO_MERGE (1u<<21) #define NEEDS_BITMAP (1u<<22) /* @@ -109,6 +110,8 @@ struct bitmap_writer { struct bitmapped_commit *selected; unsigned int selected_nr, selected_alloc; + uint32_t pseudo_merges_nr; + struct progress *progress; int show_progress; unsigned char pack_checksum[GIT_MAX_RAWSZ]; From patchwork Thu May 23 21:26:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672311 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEED712B16C for ; Thu, 23 May 2024 21:26:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499597; cv=none; b=LAScxXn2ZhA6dS9ZQuxBGMOMP3NYYafzjN6eYrwtXcIIIZZKrBouWorDSiRsECxO7SUCLDBY1ucxtvlsNfFpBxP4gUOoIiEUOc2gw9fmYyEZkIswNOqwO9QZd+xr7knBaeI2YGVxmP2MCHGQvffbFCgllpCyLrCi1yWjZ/eNGds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499597; c=relaxed/simple; bh=vzp80rjWpRzvGkNzHwXXh6MvuVxC5bhiEl1zOUkL4Fg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=JjXmjY8kysZk7rsyW1MWl3QFvHT6R+MWOXZWrtFQF/Hnw/rZ51nSmS3oy9ON36S3F3ZrLyRSmjyJBbi/3UZNPSxZT7XqerVAn2BqY3YbWDOaZXRNxblmyUiDZ+JL+j31q+4RPwCv1P5xmLVIadLZEbTrLVUpUCImgGa9g92txRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=Wjdg123k; arc=none smtp.client-ip=209.85.219.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Wjdg123k" Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-6ab9d3d878bso1491676d6.2 for ; Thu, 23 May 2024 14:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499594; x=1717104394; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=qU7wYPHttZ3eDvUe397jdb9pOvr2Pwy1R/Dotu6BEkw=; b=Wjdg123kuL+WeigOfKVDs+jMaYFOiYjqq8+tA4s3vynxeSdkYxI3FeIXz/T8yBImn8 EGOZJFllfaqB0PPXKgsBnuYxOdNQCeiYmWN5OhIbCU7vnpDEEOtrhh2KOpnTq9BoQ3e+ tSDaZUYuUcKECeLPZrhwLX+uMN5qcSNjFubyNK1gCIImj2+RRaCcUzwsKVtm3P7sPePy eGpF0T/E1ZH74U9us+hceQBmyAzY0eYcUz2WcHdZ+qVnx9xn96EwLq03KEqVRX2SkNlg xw0vQIoUEr8OS7lYw96F9X2YyVLQnDh4KZoh/eyRw1yfDnwiWR7Uyp8waEGkBgeAbEH5 usdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499594; x=1717104394; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qU7wYPHttZ3eDvUe397jdb9pOvr2Pwy1R/Dotu6BEkw=; b=diKiNXJQxv4Iq/xjHoWdSXwz9st1QHu7oX7ACzZI2ipTt2UppEAin06atPJgzDVsoe i/7tT9GcRsGyJG07RyZMkh7C1ejezDNyrdDJuBahAE2acTmODjRgZOqTj2Psk/E+T+IS F6yGKi20kdszXTF0l+ARgseqK2GkG4xb0xaWlm+yxomLsCctZ7Wrtdn4Ns79e+Ofmj3e b8PJPSAcfOK0g4uwSgYyP0JgN8nHM64SEdBWrRoOjgB2R3jmjLXnkTPfebUjRZgIwW5X JXjipsEklzrc2aHqIvXvMmhm8tuY6jcG5qVY+Um6lrGDXfOtrlDP+8xh6aFXYj/qw/ne +x3g== X-Gm-Message-State: AOJu0YybN7dX68FawIJQC0EmSI9lm2K0mUe52UM3RdkZA/s++ZU9kQ4u HwGtLcEgZebcXAsupmlNkgTJ0x0vAZskVPUW2P3+ybTLngn4iruY12zIwi9j5jFlOqT/jVu1VOh j X-Google-Smtp-Source: AGHT+IHMxxn/DhteTU6dcR8uBSBOeU0aIdvNkQcqkO59xaSdWt/wcsHtF/XH3P8Fii39dkqEciiUsw== X-Received: by 2002:a05:6214:4888:b0:6aa:f601:ce51 with SMTP id 6a1803df08f44-6abcd0d2d37mr4237956d6.62.1716499594292; Thu, 23 May 2024 14:26:34 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070f4b7dsm606516d6.67.2024.05.23.14.26.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:33 -0700 (PDT) Date: Thu, 23 May 2024 17:26:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to implement pseudo-merge bitmap selection by implementing a necessary new function, `bitmap_writer_has_bitmapped_object_id()`. This function returns whether or not the bitmap_writer selected the given object ID for bitmapping. This will allow the pseudo-merge machinery to reject candidates for pseudo-merges if they have already been selected as an ordinary bitmap tip. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 6 ++++++ pack-bitmap.h | 2 ++ 2 files changed, 8 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 60eb1e71c98..299aa8af6f5 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -130,6 +130,12 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer, } } +int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer, + const struct object_id *oid) +{ + return kh_get_oid_map(writer->bitmaps, *oid) != kh_end(writer->bitmaps); +} + /** * Compute the actual bitmaps */ diff --git a/pack-bitmap.h b/pack-bitmap.h index 6937a0f090f..e175f28e0de 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -125,6 +125,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer, struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); +int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer, + const struct object_id *oid); uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, struct packing_data *mapping); int rebuild_bitmap(const uint32_t *reposition, From patchwork Thu May 23 21:26:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672312 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7795112838A for ; Thu, 23 May 2024 21:26:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499602; cv=none; b=IPhhWzJRds9MwpsEie4M2KQKzj69lKaR3CzNmuQwdU2gbNWgZTHkrZRX/zEeZ5bXr6ioALdOx47q0zlYEq7yX9u95MP3tEQqwOIJYJdAlpl9QCraYzx/uv4hlbZ3QkBFMR7l1nEj4DotXWsUrvwZ+JAR3NAQ2LuB/khgHYN+MCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499602; c=relaxed/simple; bh=NPL66ipPMX3pZf+Tixy3UFnkh2ViQXjsQT4AO9glGko=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fr+qO7sCJ7WibePP1fDTXhPyPSeh2k6+KyAxUAq1SyVC+e59fazlwrj4AJFbdHzKOXCKeK8x69lTnZsTy7EDqwNx50Ubx5ELis1reaC77WvkbbBNRK6LF2JRMU1oS5/UOv3HrPyoRld9sXeCKKQf92CCPg92wk3TAVjTdttUlRM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=F0yyE0bQ; arc=none smtp.client-ip=209.85.221.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="F0yyE0bQ" Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4e4efcc3d8fso96286e0c.0 for ; Thu, 23 May 2024 14:26:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499599; x=1717104399; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=DJRNfrGp2DYNvDbMSycjUfC871UBePymvHeFswB5tXQ=; b=F0yyE0bQZORWiJF2tvk9AXo/QG0KMeq1v5A5eWue04wPRe8t0BOu/A3RlFTRZMbjjp D0ClOwmMcy5FCMKUlRSxYkUpYNKJnA9ou7pb1h0zYXFLHVXNy/s4Xf5tk5x2mknIK3pp hVQya+NCD2njtwPGY0rNzPR+slFk+aU1t1qUtNFVXIcQaQWe6oFYSwPcLxXll9UIxL05 OG0ASbSyJw5sWqEvwUDu7DzlKitHCVhNq+xXbVsz0cMrmktApVRnJu0L3pZ0VRXlWqAc bVjBLFIN6/eZc7wWN/5kL+ylGj3pkJRCm2zlpCqLt+ry/7DGOPeOLU3vzB0uSKzs8a0j 26gQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499599; x=1717104399; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=DJRNfrGp2DYNvDbMSycjUfC871UBePymvHeFswB5tXQ=; b=Rt7dfxmJDeer9q5fzFDQ7xX6+RzpeJ5npa65RvHoDDMZMIDj/auNxpnMVDExU+gz3c JLnYuBXkCvKdSf2dT0jIa5N4SxfCUOFcUjZzqviHFcRvHOefy8ZYbf/EqJ19zV+teQBC rs92WT8g9/qBktfQC2Wmei/2G/R4NBXk8MWRLJlfzSG9hGF6KFy/ipd/AkiG6tZJd++v LI8yKutCR1NPBPtc7nfn18s4SPCgmTL8bP7G97VJPxvI+th1A0Zxj1S/o1mk0WsP98ZP EAiKqq/Sy/pndYA2Ha8VTNiWUx7EU0WMNx2mp1R9oPwru99gKprmifbM/BrwpqbkhgBl LPVA== X-Gm-Message-State: AOJu0YwnoXj/VR+rB+Frk5IrrqGg31Ds+/+RnfmXANHhCNa8qGsyJZeZ LT8brne2RqYjkm9hcLxummibRVUuTmINz1/ioOmougprzwU4eGNLceZwBkwjUxkLPQDwFY8Hetr 0 X-Google-Smtp-Source: AGHT+IG3wBO6V9nhe+zs/WsA27Lcyr7jQt8eyJ3+kXGv+6ogqDy/+SpJ0ML5bW4aH9JAnNAcljERcQ== X-Received: by 2002:a05:6122:4692:b0:4e4:ee6b:1783 with SMTP id 71dfb90a1353d-4e4f0233ad2mr415052e0c.5.1716499597699; Thu, 23 May 2024 14:26:37 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abd303c1sm2321185a.101.2024.05.23.14.26.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:37 -0700 (PDT) Date: Thu, 23 May 2024 17:26:36 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Message-ID: <6bf372f4020dda272ab4f69cb42333465475dd91.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pseudo-merge selection code will be added in a subsequent commit, and will need a way to push the allocated commit structures into the bitmap writer from a separate compilation unit. Make the `bitmap_writer_push_bitmapped_commit()` function part of the pack-bitmap.h header in order to make this possible. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 9 ++++----- pack-bitmap.h | 2 ++ 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 299aa8af6f5..bc19b33ad16 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -140,9 +140,8 @@ int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct bitmap_writer *writer, - struct commit *commit, - unsigned pseudo_merge) +void bitmap_writer_push_commit(struct bitmap_writer *writer, + struct commit *commit, unsigned pseudo_merge) { if (writer->selected_nr >= writer->selected_alloc) { writer->selected_alloc = (writer->selected_alloc + 32) * 2; @@ -664,7 +663,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(writer, indexed_commits[i], 0); + bitmap_writer_push_commit(writer, indexed_commits[i], 0); return; } @@ -697,7 +696,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer, } } - push_bitmapped_commit(writer, chosen, 0); + bitmap_writer_push_commit(writer, chosen, 0); i += next + 1; display_progress(writer->progress, i); diff --git a/pack-bitmap.h b/pack-bitmap.h index e175f28e0de..a7e2f56c971 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -127,6 +127,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer, uint32_t index_nr); int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer, const struct object_id *oid); +void bitmap_writer_push_commit(struct bitmap_writer *writer, + struct commit *commit, unsigned pseudo_merge); uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, struct packing_data *mapping); int rebuild_bitmap(const uint32_t *reposition, From patchwork Thu May 23 21:26:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672313 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F43E129A8D for ; Thu, 23 May 2024 21:26:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499603; cv=none; b=JBn1QqufGnnjKuoANuMYGV0CEu6luUEaEF+jR3Q/2UZsnQX0EDw5bUJlWWCm42sfSYuUytMBQz5mZ7m1gb7M0nBA66LEHjIfociPvoFPU9yVRjM7vZC9mosXmLq0usjuK1uSHruJiXmFLmr6n5SaIP4Ivm7vysqMssZH5YbcjxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499603; c=relaxed/simple; bh=PDDKKLKXhPKgAyB8WLSvpQfWsh1rhVfb3RxpnUg1cU4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=FCDzkCg9GurhufyEpqv+3+HTLpW1jW8opfpsPKhRBtLEoHqn+B3kx0WT1kgNqEcVGjbdNWedYETLcIYN6qcxYx+arvd0zRCOYCYWDRZPFrkxFsB7k4nbaLs7JXhLx1xxNMXk1B7s/9n8M+xDcIia5PDnYZfNnhIaOkW1/MVE80c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=Dx1LjLSv; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Dx1LjLSv" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-43f984101e4so13397851cf.0 for ; Thu, 23 May 2024 14:26:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499601; x=1717104401; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tTcbwPcZypBcoxrLy2Og5e1jrS3bI5Iptd80FyJcJwg=; b=Dx1LjLSvWYxEWpKubdXJlilHih4VlL/M8b+nDbE6efan0QeYyF9KCBmOHDC0S8lLzy eUHKwLDXQ70SaNnYL4a5IwAJcfyoP8vIDgPGdJfVmGnMdv6GXy21EmJtCtQF5KScxLPv UoTFK78Ngp1viKzNkWsF8AZADt95lyX14pVOWzCm2uAdF5MZTyifcUw3kvC3A80Ib2YY w0AWg0QHDT3kBybqARLAgHjNgNstzlSHcW0yFCRW9lT65/BNHC4vRMoUPB3I4qrTPBl6 Fl0DYhp63ebKaIDa6bRnb5gL+2muA1VtRlR+X1XvAnx5eY0g7YFG6i9YdhjpihwTgbgd LJUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499601; x=1717104401; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tTcbwPcZypBcoxrLy2Og5e1jrS3bI5Iptd80FyJcJwg=; b=q1ZAPNiCaOuIKHG+iZ/C6uxJwlNZQUtpK0JEPXBv7HTWw6lhd8zfTwS3QzjeiNR/Pi nS0crfNyLdOonbZNfo2Lj/j6BLfRiIEflRlIrl7CgZqfwSGUxNXBmuKQY7cR2JcJPkGj vbLldfqPvT8YJRea4QmWYtQWxBJ4QReOZqtfxwEHWqUNyXeqWgJn/SfYbfbSA8Z68kAY SC9ySz7yMfWcS72Qqr+vxPw8FqGVa1n0RPNwC8fw2C9eCribav1pPm/pgetvtNUQbtlo QM8Oa0X7FFlJJ6NZXeIFSzGNSV7/CnThB8Smuw5GpRiu5VIkJHX6zUmk7S55GbmwRMA4 gn0A== X-Gm-Message-State: AOJu0YzdhofJ6rFY0fmsIEDxTIMbhIHBtmgJbvH/Vuiv6i2zXHak6tO4 UQYvuYxpd2qSwxGcg4ahoW/hHGdX/ltfHgUrG0X/BuoOQ9bnsusH/r/trHvlfx7BHbn67kEGIBP D X-Google-Smtp-Source: AGHT+IFq08hCrUpA63o2PYOyPXOQB+VSzsZyB3vzEf2hnGI1pGi5ongrN/2tLl3c/Tn+3t7bobyfnA== X-Received: by 2002:ac8:5952:0:b0:43a:88e1:43a2 with SMTP id d75a77b69052e-43fb0e49918mr3478701cf.3.1716499600942; Thu, 23 May 2024 14:26:40 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb18c5b5csm519881cf.91.2024.05.23.14.26.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:40 -0700 (PDT) Date: Thu, 23 May 2024 17:26:39 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 10/24] config: introduce `git_config_double()` Message-ID: <6c77671ae9ce063d5659ab0542d4bfa3e1303995.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Future commits will want to parse a double-precision floating point value from configuration, but we have no way to parse such a value prior to this patch. The core of the routine is implemented in git_parse_double(). Unlike git_parse_unsigned() and git_parse_signed(), however, the function implemented here only works on type "double", and not related types like "float", or "long double". This is because "float" and "long double" use different functions to convert from ASCII strings to floating point values (strtof() and strtold(), respectively). Likewise, there is no pointer type that can assign to any of these values (except for "void *"), so the only way to define this trio of functions would be with a macro expansion that is parameterized over the floating point type and conversion function. That is all doable, but likely to be overkill given our current needs, which is only to parse double-precision floats. Signed-off-by: Taylor Blau --- config.c | 9 +++++++++ config.h | 7 +++++++ parse.c | 29 +++++++++++++++++++++++++++++ parse.h | 1 + 4 files changed, 46 insertions(+) diff --git a/config.c b/config.c index 77a0fd2d80e..7df89f17275 100644 --- a/config.c +++ b/config.c @@ -1243,6 +1243,15 @@ ssize_t git_config_ssize_t(const char *name, const char *value, return ret; } +double git_config_double(const char *name, const char *value, + const struct key_value_info *kvi) +{ + double ret; + if (!git_parse_double(value, &ret)) + die_bad_number(name, value, kvi); + return ret; +} + static const struct fsync_component_name { const char *name; enum fsync_component component_bits; diff --git a/config.h b/config.h index f4966e37494..f5f306f373d 100644 --- a/config.h +++ b/config.h @@ -261,6 +261,13 @@ unsigned long git_config_ulong(const char *, const char *, ssize_t git_config_ssize_t(const char *, const char *, const struct key_value_info *); +/** + * Identically to `git_config_double`, but for double-precision floating point + * values. + */ +double git_config_double(const char *, const char *, + const struct key_value_info *); + /** * Same as `git_config_bool`, except that integers are returned as-is, and * an `is_bool` flag is unset. diff --git a/parse.c b/parse.c index 42d691a0fbb..7a60a4f816c 100644 --- a/parse.c +++ b/parse.c @@ -125,6 +125,35 @@ int git_parse_ssize_t(const char *value, ssize_t *ret) return 1; } +int git_parse_double(const char *value, double *ret) +{ + char *end; + double val; + uintmax_t factor; + + if (!value || !*value) { + errno = EINVAL; + return 0; + } + + errno = 0; + val = strtod(value, &end); + if (errno == ERANGE) + return 0; + if (end == value) { + errno = EINVAL; + return 0; + } + factor = get_unit_factor(end); + if (!factor) { + errno = EINVAL; + return 0; + } + val *= factor; + *ret = val; + return 1; +} + int git_parse_maybe_bool_text(const char *value) { if (!value) diff --git a/parse.h b/parse.h index 07d2193d698..6bb9a54d9ac 100644 --- a/parse.h +++ b/parse.h @@ -6,6 +6,7 @@ int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); int git_parse_int(const char *value, int *ret); int git_parse_int64(const char *value, int64_t *ret); +int git_parse_double(const char *value, double *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather From patchwork Thu May 23 21:26:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672314 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85EAE12838F for ; Thu, 23 May 2024 21:26:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499609; cv=none; b=AdzAjKGRgo1OH8AUtVZur8JLdmptBI3lZ4McP+bRic8zQi59KzLOrkLaXBIv+zEtbB3MVmLue7y1NFGb4VL09I4EEMuzgvRtQbAggT8qzYAQarBXxUAlSIRK5bCJBcbHASv0sd62H4tXqsujfewu8BvhyoGtn3PSbzUj9IfUxPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499609; c=relaxed/simple; bh=/grMWjdy98wNNd6mTfKtwFBdsS07iM2gRLEkiu4TKT4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fp0QEJeBvMpyA4UvHbtzKPIwiQel6Sh+teB2gaiAI6qSPBs7g4/ASA6e77s4JSt1LgWVurQV8jSeGbh4v8NpNxQWX/iZ87FLzYeJFjdUgEe3b+YIRjjeg2/ZP62lasT5dvZ80ruOeYUu8DlTHqSIZN0A/eL719Ara/3PNoXL6Gs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=oALfKI5i; arc=none smtp.client-ip=209.85.160.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="oALfKI5i" Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-43fb094da40so1160651cf.0 for ; Thu, 23 May 2024 14:26:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499605; x=1717104405; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=jFCW2sIq7m2+CUdI2DKl635aNklwrR+kbeTo0b6uF/M=; b=oALfKI5ixwDyb3uAaO0odsthsNVDpS2DsyXGOBG4HGAvc1+n/yo4LN0ChbdTPCGB5P E2lwVR7tI0ZQJe2Q5B5XV7X5MJ/I/JhWK34IqqkoR+r9xZIWM7aKlkyqMEwf1gsGieT6 S08CeonNXWdfdA+UxedVpdWaSSxBS1r1hBXRXXfxfVZ6imNmEh9nRlE5qUGLUr5Ehyj3 17/FqXFY7BUVUZLqe1teI3fPRqvEv0x+CrhkKvp4NWNyDP7ldE5yQyRyZVQPiAeP7620 B1wFPxY5PMXkTs5JUyvtBXMvoZlQb2IhJmy+sU4lNT3fMv43qf1lX9RV4xKLaRujm6dr n39g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499605; x=1717104405; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jFCW2sIq7m2+CUdI2DKl635aNklwrR+kbeTo0b6uF/M=; b=E7biDbA420Lgk/XL3hs9TaJkBeHQ0uiVWNMq3sVCSKspF1CnIGXpPjez3oqd0iHL6L tZmtHx5wMaC/RycHhMaoaB2dQOgssYWBtT3ZcFeVRGibgQ7ELCjbyVOU2RdW9o+TbEmn CXI9DXJAu0/tHLakIdYkEyZbpCrEMEDZCbdKV1dOPI3ckbg7Cr+jYqyWSIluPuqmamMt jXi6CgnQfwP1ivBYbzKVerX3DaqFhjqbsCVFsjwg6BkPAgU5fAgITCwlDFG4mg5E4/wq DKEARe1K9A74lkdlyz4htzJizujCZChvNo+HetQPDU+C+3DSmhfoOxRgArD/RjjEjh7E P0yw== X-Gm-Message-State: AOJu0YyzMNxLTrc9pVSWdIN1QfLFZjKJ9KuU1QvWePCVAlCQw6kia3Tl GsLnY3zzEA2C8WE1RLDWPAp8tur1RfNzmd6Q6hGCgmX2LDieoJYik9Muubh0i0LBSa+J4JUdd+7 b X-Google-Smtp-Source: AGHT+IGcGFRMiBwWhWR5W5HTeTanrl+Wf/8ycB19hGj4VAN7chC0E5gTHtJQi4Nq/ak88Wg+l4solQ== X-Received: by 2002:a05:6214:301b:b0:6a0:ebde:ae7d with SMTP id 6a1803df08f44-6ab8f3283a0mr57541396d6.4.1716499604645; Thu, 23 May 2024 14:26:44 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac20a0fe7bsm520176d6.142.2024.05.23.14.26.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:44 -0700 (PDT) Date: Thu, 23 May 2024 17:26:42 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Message-ID: <180072ce84868265acfda8c1adf375e39a3b7610.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Teach the new pseudo-merge machinery how to select non-bitmapped commits for inclusion in different pseudo-merge group(s) based on a handful of criteria. Note that the selected pseudo-merge commits aren't actually used or written anywhere yet. This will be done in the following commit. Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 + Documentation/config/bitmap-pseudo-merge.txt | 91 ++++ Documentation/gitpacking.txt | 83 ++++ pack-bitmap-write.c | 21 + pack-bitmap.h | 2 + pseudo-merge.c | 454 +++++++++++++++++++ pseudo-merge.h | 94 ++++ 7 files changed, 747 insertions(+) create mode 100644 Documentation/config/bitmap-pseudo-merge.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 6f649c997c0..caa34311214 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -384,6 +384,8 @@ include::config/apply.txt[] include::config/attr.txt[] +include::config/bitmap-pseudo-merge.txt[] + include::config/blame.txt[] include::config/branch.txt[] diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt new file mode 100644 index 00000000000..1f264eca99b --- /dev/null +++ b/Documentation/config/bitmap-pseudo-merge.txt @@ -0,0 +1,91 @@ +NOTE: The configuration options in `bitmapPseudoMerge.*` are considered +EXPERIMENTAL and may be subject to change or be removed entirely in the +future. For more information about the pseudo-merge bitmap feature, see +the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7]. + +bitmapPseudoMerge..pattern:: + Regular expression used to match reference names. Commits + pointed to by references matching this pattern (and meeting + the below criteria, like `bitmapPseudoMerge..sampleRate` + and `bitmapPseudoMerge..threshold`) will be considered + for inclusion in a pseudo-merge bitmap. ++ +Commits are grouped into pseudo-merge groups based on whether or not +any reference(s) that point at a given commit match the pattern, which +is an extended regular expression. ++ +Within a pseudo-merge group, commits may be further grouped into +sub-groups based on the capture groups in the pattern. These +sub-groupings are formed from the regular expressions by concatenating +any capture groups from the regular expression, with a '-' dash in +between. ++ +For example, if the pattern is `refs/tags/`, then all tags (provided +they meet the below criteria) will be considered candidates for the +same pseudo-merge group. However, if the pattern is instead +`refs/remotes/([0-9])+/tags/`, then tags from different remotes will +be grouped into separate pseudo-merge groups, based on the remote +number. + +bitmapPseudoMerge..decay:: + Determines the rate at which consecutive pseudo-merge bitmap + groups decrease in size. Must be non-negative. This parameter + can be thought of as `k` in the function `f(n) = C * n^-k`, + where `f(n)` is the size of the `n`th group. ++ +Setting the decay rate equal to `0` will cause all groups to be the +same size. Setting the decay rate equal to `1` will cause the `n`th +group to be `1/n` the size of the initial group. Higher values of the +decay rate cause consecutive groups to shrink at an increasing rate. +The default is `1`. ++ +If all groups are the same size, it is possible that groups containing +newer commits will be able to be used less often than earlier groups, +since it is more likely that the references pointing at newer commits +will be updated more often than a reference pointing at an old commit. + +bitmapPseudoMerge..sampleRate:: + Determines the proportion of non-bitmapped commits (among + reference tips) which are selected for inclusion in an + unstable pseudo-merge bitmap. Must be between `0` and `1` + (inclusive). The default is `1`. + +bitmapPseudoMerge..threshold:: + Determines the minimum age of non-bitmapped commits (among + reference tips, as above) which are candidates for inclusion + in an unstable pseudo-merge bitmap. The default is + `1.week.ago`. + +bitmapPseudoMerge..maxMerges:: + Determines the maximum number of pseudo-merge commits among + which commits may be distributed. ++ +For pseudo-merge groups whose pattern does not contain any capture +groups, this setting is applied for all commits matching the regular +expression. For patterns that have one or more capture groups, this +setting is applied for each distinct capture group. ++ +For example, if your capture group is `refs/tags/`, then this setting +will distribute all tags into a maximum of `maxMerges` pseudo-merge +commits. However, if your capture group is, say, +`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to +each remote's set of tags individually. ++ +Must be non-negative. The default value is 64. + +bitmapPseudoMerge..stableThreshold:: + Determines the minimum age of commits (among reference tips, + as above, however stable commits are still considered + candidates even when they have been covered by a bitmap) which + are candidates for a stable a pseudo-merge bitmap. The default + is `1.month.ago`. ++ +Setting this threshold to a smaller value (e.g., 1.week.ago) will cause +more stable groups to be generated (which impose a one-time generation +cost) but those groups will likely become stale over time. Using a +larger value incurs the opposite penalty (fewer stable groups which are +more useful). + +bitmapPseudoMerge..stableSize:: + Determines the size (in number of commits) of a stable + psuedo-merge bitmap. The default is `512`. diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt index f24396f0173..4a6fcba6f72 100644 --- a/Documentation/gitpacking.txt +++ b/Documentation/gitpacking.txt @@ -96,6 +96,89 @@ can take advantage of the fact that we only care about the union of objects reachable from all of those tags, and answer the query much faster. +=== Configuration + +Reference tips are grouped into different pseudo-merge groups according +to two criteria. A reference name matches one or more of the defined +pseudo-merge patterns, and optionally one or more capture groups within +that pattern which further partition the group. + +Within a group, commits may be considered "stable", or "unstable" +depending on their age. These are adjusted by setting the +`bitmapPseudoMerge..stableThreshold` and +`bitmapPseudoMerge..threshold` configuration values, respectively. + +All stable commits are grouped into pseudo-merges of equal size +(`bitmapPseudoMerge..stableSize`). If the `stableSize` +configuration is set to, say, 100, then the first 100 commits (ordered +by committer date) which are older than the `stableThreshold` value will +form one group, the next 100 commits will form another group, and so on. + +Among unstable commits, the pseudo-merge machinery will attempt to +combine older commits into large groups as opposed to newer commits +which will appear in smaller groups. This is based on the heuristic that +references whose tip commit is older are less likely to be modified to +point at a different commit than a reference whose tip commit is newer. + +The size of groups is determined by a power-law decay function, and the +decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`, +where `f(n)` describes the size of the `n`-th pseudo-merge group. The +sample rate controls what percentage of eligible commits are considered +as candidates. The threshold parameter indicates the minimum age (so as +to avoid including too-recent commits in a pseudo-merge group, making it +less likely to be valid). The "maxMerges" parameter sets an upper-bound +on the number of pseudo-merge commits an individual group + +The "stable"-related parameters control "stable" pseudo-merge groups, +comprised of a fixed number of commits which are older than the +configured "stable threshold" value and may be grouped together in +chunks of "stableSize" in order of age. + +The exact configuration for pseudo-merges is as follows: + +include::config/bitmap-pseudo-merge.txt[] + +=== Examples + +Suppose that you have a repository with a large number of references, +and you want a bare-bones configuration of pseudo-merge bitmaps that +will enhance bitmap coverage of the `refs/` namespace. You may start +wiht a configuration like so: + + [bitmapPseudoMerge "all"] + pattern = "refs/" + threshold = now + stableThreshold = never + sampleRate = 100 + maxMerges = 64 + +This will create pseudo-merge bitmaps for all references, regardless of +their age, and group them into 64 pseudo-merge commits. + +If you wanted to separate tags from branches when generating +pseudo-merge commits, you would instead define the pattern with a +capture group, like so: + + [bitmapPseudoMerge "all"] + pattern = "refs/(heads/tags)/" + +Suppose instead that you are working in a fork-network repository, with +each fork specified by some numeric ID, and whose refs reside in +`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some +fork) in the network. In this instance, you may instead write something +like: + + [bitmapPseudoMerge "all"] + pattern = "refs/virtual/([0-9]+)/(heads|tags)/" + threshold = now + stableThreshold = never + sampleRate = 100 + maxMerges = 64 + +Which would generate pseudo-merge group identifiers like "1234-heads", +and "5678-tags" (for branches in fork "1234", and tags in remote "5678", +respectively). + SEE ALSO -------- linkgit:git-pack-objects[1] diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index bc19b33ad16..d5884ea5e9c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -17,6 +17,7 @@ #include "trace2.h" #include "tree.h" #include "tree-walk.h" +#include "pseudo-merge.h" struct bitmapped_commit { struct commit *commit; @@ -39,11 +40,25 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r) if (writer->bitmaps) BUG("bitmap writer already initialized"); writer->bitmaps = kh_init_oid_map(); + writer->pseudo_merge_commits = kh_init_oid_map(); + + string_list_init_dup(&writer->pseudo_merge_groups); + + load_pseudo_merges_from_config(&writer->pseudo_merge_groups); +} + +static void free_pseudo_merge_commit_idx(struct pseudo_merge_commit_idx *idx) +{ + if (!idx) + return; + free(idx->pseudo_merge); + free(idx); } void bitmap_writer_free(struct bitmap_writer *writer) { uint32_t i; + struct pseudo_merge_commit_idx *idx; if (!writer) return; @@ -55,6 +70,10 @@ void bitmap_writer_free(struct bitmap_writer *writer) kh_destroy_oid_map(writer->bitmaps); + kh_foreach_value(writer->pseudo_merge_commits, idx, + free_pseudo_merge_commit_idx(idx)); + kh_destroy_oid_map(writer->pseudo_merge_commits); + for (i = 0; i < writer->selected_nr; i++) { struct bitmapped_commit *bc = &writer->selected[i]; if (bc->write_as != bc->bitmap) @@ -703,6 +722,8 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer, } stop_progress(&writer->progress); + + select_pseudo_merges(writer, indexed_commits, indexed_commits_nr); } diff --git a/pack-bitmap.h b/pack-bitmap.h index a7e2f56c971..1e730ea1e54 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -110,6 +110,8 @@ struct bitmap_writer { struct bitmapped_commit *selected; unsigned int selected_nr, selected_alloc; + struct string_list pseudo_merge_groups; + kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */ uint32_t pseudo_merges_nr; struct progress *progress; diff --git a/pseudo-merge.c b/pseudo-merge.c index 37e037ba272..0f6854c753f 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -1,2 +1,456 @@ #include "git-compat-util.h" #include "pseudo-merge.h" +#include "date.h" +#include "oid-array.h" +#include "strbuf.h" +#include "config.h" +#include "string-list.h" +#include "refs.h" +#include "pack-bitmap.h" +#include "commit.h" +#include "alloc.h" +#include "progress.h" + +#define DEFAULT_PSEUDO_MERGE_DECAY 1.0 +#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 +#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1 +#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512 + +static double gitexp(double base, int exp) +{ + double result = 1; + while (1) { + if (exp % 2) + result *= base; + exp >>= 1; + if (!exp) + break; + base *= base; + } + return result; +} + +static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group, + const struct pseudo_merge_matches *matches, + uint32_t i) +{ + double C = 0.0f; + uint32_t n; + + /* + * The size of pseudo-merge groups decays according to a power series, + * which looks like: + * + * f(n) = C * n^-k + * + * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k' + * is the decay rate, and 'C' is a scaling value. + * + * The value of C depends on the number of groups, decay rate, and total + * number of commits. It is computed such that if there are M and N + * total groups and commits, respectively, that: + * + * N = f(0) + f(1) + ... f(M-1) + * + * Rearranging to isolate C, we get: + * + * N = \sum_{n=1}^M C / n^k + * + * N / C = \sum_{n=1}^M n^-k + * + * C = N / \sum_{n=1}^M n^-k + * + * For example, if we have a decay rate of 'k' being equal to 1.5, 'N' + * total commits equal to 10,000, and 'M' being equal to 6 groups, then + * the (rounded) group sizes are: + * + * { 5469, 1934, 1053, 684, 489, 372 } + * + * increasing the number of total groups, say to 10, scales the group + * sizes appropriately: + * + * { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 } + */ + for (n = 0; n < group->max_merges; n++) + C += 1.0 / gitexp(n + 1, group->decay); + C = matches->unstable_nr / C; + + return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5); +} + +static void pseudo_merge_group_init(struct pseudo_merge_group *group) +{ + memset(group, 0, sizeof(struct pseudo_merge_group)); + + strmap_init_with_options(&group->matches, NULL, 0); + + group->decay = DEFAULT_PSEUDO_MERGE_DECAY; + group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES; + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE; + group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD; + group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD; + group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE; +} + +static int pseudo_merge_config(const char *var, const char *value, + const struct config_context *ctx, + void *cb_data) +{ + struct string_list *list = cb_data; + struct string_list_item *item; + struct pseudo_merge_group *group; + struct strbuf buf = STRBUF_INIT; + const char *sub, *key; + size_t sub_len; + int ret = 0; + + if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key)) + goto done; + + if (!sub_len) + goto done; + + strbuf_add(&buf, sub, sub_len); + + item = string_list_lookup(list, buf.buf); + if (!item) { + item = string_list_insert(list, buf.buf); + + item->util = xmalloc(sizeof(struct pseudo_merge_group)); + pseudo_merge_group_init(item->util); + } + + group = item->util; + + if (!strcmp(key, "pattern")) { + struct strbuf re = STRBUF_INIT; + + free(group->pattern); + if (*value != '^') + strbuf_addch(&re, '^'); + strbuf_addstr(&re, value); + + group->pattern = xcalloc(1, sizeof(regex_t)); + if (regcomp(group->pattern, re.buf, REG_EXTENDED)) + die(_("failed to load pseudo-merge regex for %s: '%s'"), + sub, re.buf); + + strbuf_release(&re); + } else if (!strcmp(key, "decay")) { + group->decay = git_config_double(var, value, ctx->kvi); + if (group->decay < 0) { + warning(_("%s must be non-negative, using default"), var); + group->decay = DEFAULT_PSEUDO_MERGE_DECAY; + } + } else if (!strcmp(key, "samplerate")) { + group->sample_rate = git_config_double(var, value, ctx->kvi); + if (!(0 <= group->sample_rate && group->sample_rate <= 1)) { + warning(_("%s must be between 0 and 1, using default"), var); + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE; + } + } else if (!strcmp(key, "threshold")) { + if (git_config_expiry_date(&group->threshold, var, value)) { + ret = -1; + goto done; + } + } else if (!strcmp(key, "maxmerges")) { + group->max_merges = git_config_int(var, value, ctx->kvi); + if (group->max_merges < 0) { + warning(_("%s must be non-negative, using default"), var); + group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES; + } + } else if (!strcmp(key, "stablethreshold")) { + if (git_config_expiry_date(&group->stable_threshold, var, value)) { + ret = -1; + goto done; + } + } else if (!strcmp(key, "stablesize")) { + group->stable_size = git_config_int(var, value, ctx->kvi); + if (group->stable_size <= 0) { + warning(_("%s must be positive, using default"), var); + group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE; + } + } + +done: + strbuf_release(&buf); + + return ret; +} + +void load_pseudo_merges_from_config(struct string_list *list) +{ + struct string_list_item *item; + + git_config(pseudo_merge_config, list); + + for_each_string_list_item(item, list) { + struct pseudo_merge_group *group = item->util; + if (!group->pattern) + die(_("pseudo-merge group '%s' missing required pattern"), + item->string); + if (group->threshold < group->stable_threshold) + die(_("pseudo-merge group '%s' has unstable threshold " + "before stable one"), item->string); + } +} + +static int find_pseudo_merge_group_for_ref(const char *refname, + const struct object_id *oid, + int flags UNUSED, + void *_data) +{ + struct bitmap_writer *writer = _data; + struct object_id peeled; + struct commit *c; + uint32_t i; + int has_bitmap; + + if (!peel_iterated_oid(oid, &peeled)) + oid = &peeled; + + c = lookup_commit(the_repository, oid); + if (!c) + return 0; + + has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid); + + for (i = 0; i < writer->pseudo_merge_groups.nr; i++) { + struct pseudo_merge_group *group; + struct pseudo_merge_matches *matches; + struct strbuf group_name = STRBUF_INIT; + regmatch_t captures[16]; + size_t j; + + group = writer->pseudo_merge_groups.items[i].util; + if (regexec(group->pattern, refname, ARRAY_SIZE(captures), + captures, 0)) + continue; + + if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1) + warning(_("pseudo-merge regex from config has too many capture " + "groups (max=%"PRIuMAX")"), + (uintmax_t)ARRAY_SIZE(captures) - 2); + + for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) { + regmatch_t *match = &captures[j]; + if (match->rm_so == -1) + continue; + + if (group_name.len) + strbuf_addch(&group_name, '-'); + + strbuf_add(&group_name, refname + match->rm_so, + match->rm_eo - match->rm_so); + } + + matches = strmap_get(&group->matches, group_name.buf); + if (!matches) { + matches = xcalloc(1, sizeof(*matches)); + strmap_put(&group->matches, strbuf_detach(&group_name, NULL), + matches); + } + + if (c->date <= group->stable_threshold) { + ALLOC_GROW(matches->stable, matches->stable_nr + 1, + matches->stable_alloc); + matches->stable[matches->stable_nr++] = c; + } else if (c->date <= group->threshold && !has_bitmap) { + ALLOC_GROW(matches->unstable, matches->unstable_nr + 1, + matches->unstable_alloc); + matches->unstable[matches->unstable_nr++] = c; + } + + strbuf_release(&group_name); + } + + return 0; +} + +static struct commit *push_pseudo_merge(struct pseudo_merge_group *group) +{ + struct commit *merge; + + ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc); + + merge = alloc_commit_node(the_repository); + merge->object.parsed = 1; + merge->object.flags |= BITMAP_PSEUDO_MERGE; + + group->merges[group->merges_nr++] = merge; + + return merge; +} + +static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits, + const struct object_id *oid) + +{ + struct pseudo_merge_commit_idx *pmc; + int hash_ret; + khiter_t hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, + &hash_ret); + + if (hash_ret) { + CALLOC_ARRAY(pmc, 1); + kh_value(pseudo_merge_commits, hash_pos) = pmc; + } else { + pmc = kh_value(pseudo_merge_commits, hash_pos); + } + + return pmc; +} + +#define MIN_PSEUDO_MERGE_SIZE 8 + +static void select_pseudo_merges_1(struct bitmap_writer *writer, + struct pseudo_merge_group *group, + struct pseudo_merge_matches *matches) +{ + uint32_t i, j; + uint32_t stable_merges_nr; + + if (!matches->stable_nr && !matches->unstable_nr) + return; /* all tips in this group already have bitmaps */ + + stable_merges_nr = matches->stable_nr / group->stable_size; + if (matches->stable_nr % group->stable_size) + stable_merges_nr++; + + /* make stable_merges_nr pseudo merges for stable commits */ + for (i = 0, j = 0; i < stable_merges_nr; i++) { + struct commit *merge; + struct commit_list **p; + + merge = push_pseudo_merge(group); + p = &merge->parents; + + /* + * For each pseudo-merge created above, add parents to the + * allocated commit node from the stable set of commits + * (un-bitmapped, newer than the stable threshold). + */ + do { + struct commit *c; + struct pseudo_merge_commit_idx *pmc; + + if (j >= matches->stable_nr) + break; + + c = matches->stable[j++]; + /* + * Here and below, make sure that we keep our mapping of + * commits -> pseudo-merge(s) which include the key'd + * commit up-to-date. + */ + pmc = pseudo_merge_idx(writer->pseudo_merge_commits, + &c->object.oid); + + ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc); + + pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr; + p = commit_list_append(c, p); + } while (j % group->stable_size); + + bitmap_writer_push_commit(writer, merge, 1); + writer->pseudo_merges_nr++; + } + + /* make up to group->max_merges pseudo merges for unstable commits */ + for (i = 0, j = 0; i < group->max_merges; i++) { + struct commit *merge; + struct commit_list **p; + uint32_t size, end; + + merge = push_pseudo_merge(group); + p = &merge->parents; + + size = pseudo_merge_group_size(group, matches, i); + end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size; + + /* + * For each pseudo-merge commit created above, add parents to + * the allocated commit node from the unstable set of commits + * (newer than the stable threshold). + * + * Account for the sample rate, since not every candidate from + * the set of stable commits will be included as a pseudo-merge + * parent. + */ + for (; j < end && j < matches->unstable_nr; j++) { + struct commit *c = matches->unstable[j]; + struct pseudo_merge_commit_idx *pmc; + + if (j % (uint32_t)(1.0 / group->sample_rate)) + continue; + + pmc = pseudo_merge_idx(writer->pseudo_merge_commits, + &c->object.oid); + + ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc); + + pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr; + p = commit_list_append(c, p); + } + + bitmap_writer_push_commit(writer, merge, 1); + writer->pseudo_merges_nr++; + if (end >= matches->unstable_nr) + break; + } +} + +static int commit_date_cmp(const void *va, const void *vb) +{ + timestamp_t a = (*(const struct commit **)va)->date; + timestamp_t b = (*(const struct commit **)vb)->date; + + if (a < b) + return -1; + else if (a > b) + return 1; + return 0; +} + +static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches) +{ + QSORT(matches->stable, matches->stable_nr, commit_date_cmp); + QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp); +} + +void select_pseudo_merges(struct bitmap_writer *writer, + struct commit **commits, size_t commits_nr) +{ + struct progress *progress = NULL; + uint32_t i; + + if (!writer->pseudo_merge_groups.nr) + return; + + if (writer->show_progress) + progress = start_progress("Selecting pseudo-merge commits", + writer->pseudo_merge_groups.nr); + + for_each_ref(find_pseudo_merge_group_for_ref, writer); + + for (i = 0; i < writer->pseudo_merge_groups.nr; i++) { + struct pseudo_merge_group *group; + struct hashmap_iter iter; + struct strmap_entry *e; + + group = writer->pseudo_merge_groups.items[i].util; + strmap_for_each_entry(&group->matches, &iter, e) { + struct pseudo_merge_matches *matches = e->value; + + sort_pseudo_merge_matches(matches); + + select_pseudo_merges_1(writer, group, matches); + } + + display_progress(progress, i + 1); + } + + stop_progress(&progress); +} diff --git a/pseudo-merge.h b/pseudo-merge.h index cab8ff6960a..f809cf42aeb 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -2,5 +2,99 @@ #define PSEUDO_MERGE_H #include "git-compat-util.h" +#include "strmap.h" +#include "khash.h" +#include "ewah/ewok.h" + +struct commit; +struct string_list; +struct bitmap_index; +struct bitmap_writer; + +/* + * A pseudo-merge group tracks the set of non-bitmapped reference tips + * that match the given pattern. + * + * Within those matches, they are further segmented by separating + * consecutive capture groups with '-' dash character capture groups + * with '-' dash characters. + * + * Those groups are then ordered by committer date and partitioned + * into individual pseudo-merge(s) according to the decay, max_merges, + * sample_rate, and threshold parameters. + */ +struct pseudo_merge_group { + regex_t *pattern; + + /* capture group(s) -> struct pseudo_merge_matches */ + struct strmap matches; + + /* + * The individual pseudo-merge(s) that are generated from the + * above array of matches, partitioned according to the below + * parameters. + */ + struct commit **merges; + size_t merges_nr; + size_t merges_alloc; + + /* + * Pseudo-merge grouping parameters. See git-config(1) for + * more information. + */ + double decay; + int max_merges; + double sample_rate; + int stable_size; + timestamp_t threshold; + timestamp_t stable_threshold; +}; + +struct pseudo_merge_matches { + struct commit **stable; + struct commit **unstable; + size_t stable_nr, stable_alloc; + size_t unstable_nr, unstable_alloc; +}; + +/* + * Read the repository's configuration: + * + * - bitmapPseudoMerge..pattern + * - bitmapPseudoMerge..decay + * - bitmapPseudoMerge..sampleRate + * - bitmapPseudoMerge..threshold + * - bitmapPseudoMerge..maxMerges + * - bitmapPseudoMerge..stableThreshold + * - bitmapPseudoMerge..stableSize + * + * and populates the given `list` with pseudo-merge groups. String + * entry keys are the pseudo-merge group names, and the values are + * pointers to the pseudo_merge_group structure itself. + */ +void load_pseudo_merges_from_config(struct string_list *list); + +/* + * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a + * particular (non-pseudo-merge) commit to the list of pseudo-merge(s) + * it appears in. + */ +struct pseudo_merge_commit_idx { + uint32_t *pseudo_merge; + size_t nr, alloc; +}; + +/* + * Selects pseudo-merges from a list of commits, populating the given + * string_list of pseudo-merge groups. + * + * Populates the pseudo_merge_commits map with a commit_idx + * corresponding to each commit in the list. Counts the total number + * of pseudo-merges generated. + * + * Optionally shows a progress meter. + */ +void select_pseudo_merges(struct bitmap_writer *writer, + struct commit **commits, size_t commits_nr); #endif From patchwork Thu May 23 21:26:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672315 Received: from mail-yw1-f181.google.com (mail-yw1-f181.google.com [209.85.128.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 290F2128392 for ; Thu, 23 May 2024 21:26:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499610; cv=none; b=DhN59BzObRtoJm0iPmldTc24DEdP0sBaPkXUBv2LLvIvY7yG3o7jOAXQ50lzo1L71y25awnaadwJmhK+3joVxiUNvGg5tRdmc3Q71fFI8KSsDUNviyL9pJLqBHMBWHM26fqu9jmdycjWRC/TBoAuwToEupT2UhN3BMNWq3HMWuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499610; c=relaxed/simple; bh=lk9U90HV1mqzYpVtj+DG3yMGFwlUM1xhgwh63VVhAb0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=nkPk8Ka3BS45Koe6Ll6VhbkeLlCKzwPYz3lS90qUm7OlkWUWq8SnOmMqTwCj3pgamSE8SiH7bKLwvdGwju4hAFPvlaaOfUC5Slrrm1Ok3/shOuFHmRm961fnLbppZVtKn84px0Rs7JwRp+2sowPkWRPuOWe81Jg4oqN3cuH1/lA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=yrH5eHhh; arc=none smtp.client-ip=209.85.128.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="yrH5eHhh" Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-62a08b1a8e6so1818167b3.3 for ; Thu, 23 May 2024 14:26:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499608; x=1717104408; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EFp+mJUCzJxxRsbaBVCiX2U8/epF29rzfWU1nTLuigI=; b=yrH5eHhhLcrTxDXMsdGVQQwIYRp1Ttn9O7J0M+Ktwblh9LIngCy6z1+fMZyzs/e58F j6MAZ7AjecYRusjjVFG7csY7maK3hgWlCPaBEjm2UsFAP0WVuVvdFmPRkQadtJvwEAIy zGmXFxDPCXYhrXOo24Q/8GJYdQuoPdtvRzIoD1YHFhm5yq792xK2cjVcYc2n8VvKnYG6 WadVZM0MeSn+TJT1BEIKi4IZyUupB9HUoH6Fe1JpUaQhSLPR/r3jdEWOlp6FpfsUY6w+ 7on4/vSO1YKLHGrY/hQGxFAZ6EReMIMl2dRhES4aoF1lKVQbRQlzdMcw/H0SmO1OFpv4 jN2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499608; x=1717104408; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EFp+mJUCzJxxRsbaBVCiX2U8/epF29rzfWU1nTLuigI=; b=MxW6K/PiQo2Wow5JeJmTgBKNNH92wMm8iUOC9qAWOgNbNijXGGx0CM86pm4bp0S/bR mwn7HKr3uWUv2uZDLRLc6TpO+kYtUqMWikOtZPTjN6JLvEYr5EszMk5I23D4aAbOCNE2 wo8NmoY8qyf1zOtTpUhkchHsthaoKMA3d3r1zhIRxticywH4QUDxACjBcqdP541T7GLq 7Om0VWVoJI/RW72dvMQ5kanphLPpsba1LTP/ohk7R8vt68XlcHwyuOxx2NVSyg9Ig+r0 RPFd4Wwq+yHErSLIQx82Bn4Lc4sO2HUyqK0gTcf6HfsCy9mcOT4c2XjiLFTZzb73VPup 51ag== X-Gm-Message-State: AOJu0YzVgjNM0xMYVP5xCxmEeJtIjmUmEd04Ssxqm3/XDLal0TXAk9QX hGspnDj3camhNf69D3jCCCEm1Z4hlvQnwUqor8Vmvro50l8hOYjnVNZOIMb5W6gW79f3/EgYGWz I X-Google-Smtp-Source: AGHT+IEx58EOnWpx32P34a4JtiUHGNxY5QIDKg8rj32gXN/Or03FCgOuhpe+zY/yCQRutYZQmwfksQ== X-Received: by 2002:a0d:d583:0:b0:61a:b568:a3cd with SMTP id 00721157ae682-62a08d6253amr3988387b3.2.1716499607705; Thu, 23 May 2024 14:26:47 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac162f05d3sm571816d6.86.2024.05.23.14.26.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:47 -0700 (PDT) Date: Thu, 23 May 2024 17:26:46 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table Message-ID: <90df19e43f54ac3950e05c5a007e47650e75fe1a.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the pack-bitmap writer machinery understands how to select and store pseudo-merge commits, teach it how to write the new optional pseudo-merge .bitmap extension. No readers yet exist for this new extension to the .bitmap format. The following commits will take any preparatory step(s) necessary before then implementing the routines necessary to read this new table. In the meantime, the new `write_pseudo_merges()` function implements writing this new format as described by a previous commit in Documentation/technical/bitmap-format.txt. Writing this table is fairly straightforward and consists of a few sub-components: - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge "parents", and another for the objects reachable from those parents) - for each commit, the offset of either (a) the pseudo-merge it belongs to, or (b) an extended lookup table if it belongs to >1 pseudo-merge groups - if there are any commits belonging to >1 pseudo-merge group, the extended lookup tables (which each consist of the number of pseudo-merge groups a commit appears in, and then that many 4-byte unsigned ) Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 131 ++++++++++++++++++++++++++++++++++++++++++++ pack-bitmap.h | 1 + 2 files changed, 132 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d5884ea5e9c..47250398aa2 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -18,6 +18,7 @@ #include "tree.h" #include "tree-walk.h" #include "pseudo-merge.h" +#include "oid-array.h" struct bitmapped_commit { struct commit *commit; @@ -771,6 +772,130 @@ static void write_selected_commits_v1(struct bitmap_writer *writer, } } +static void write_pseudo_merges(struct bitmap_writer *writer, + struct hashfile *f) +{ + struct oid_array commits = OID_ARRAY_INIT; + struct bitmap **commits_bitmap = NULL; + off_t *pseudo_merge_ofs = NULL; + off_t start, table_start, next_ext; + + uint32_t base = bitmap_writer_nr_selected_commits(writer); + size_t i, j = 0; + + CALLOC_ARRAY(commits_bitmap, writer->pseudo_merges_nr); + CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr); + + for (i = 0; i < writer->pseudo_merges_nr; i++) { + struct bitmapped_commit *merge = &writer->selected[base + i]; + struct commit_list *p; + + if (!merge->pseudo_merge) + BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); + + commits_bitmap[i] = bitmap_new(); + + for (p = merge->commit->parents; p; p = p->next) + bitmap_set(commits_bitmap[i], + find_object_pos(writer, &p->item->object.oid, + NULL)); + } + + start = hashfile_total(f); + + for (i = 0; i < writer->pseudo_merges_nr; i++) { + struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]); + + pseudo_merge_ofs[i] = hashfile_total(f); + + dump_bitmap(f, commits_ewah); + dump_bitmap(f, writer->selected[base+i].write_as); + + ewah_free(commits_ewah); + } + + next_ext = st_add(hashfile_total(f), + st_mult(kh_size(writer->pseudo_merge_commits), + sizeof(uint64_t))); + + table_start = hashfile_total(f); + + commits.alloc = kh_size(writer->pseudo_merge_commits); + CALLOC_ARRAY(commits.oid, commits.alloc); + + for (i = kh_begin(writer->pseudo_merge_commits); i != kh_end(writer->pseudo_merge_commits); i++) { + if (!kh_exist(writer->pseudo_merge_commits, i)) + continue; + oid_array_append(&commits, &kh_key(writer->pseudo_merge_commits, i)); + } + + oid_array_sort(&commits); + + /* write lookup table (non-extended) */ + for (i = 0; i < commits.nr; i++) { + int hash_pos; + struct pseudo_merge_commit_idx *c; + + hash_pos = kh_get_oid_map(writer->pseudo_merge_commits, + commits.oid[i]); + if (hash_pos == kh_end(writer->pseudo_merge_commits)) + BUG("could not find pseudo-merge commit %s", + oid_to_hex(&commits.oid[i])); + + c = kh_value(writer->pseudo_merge_commits, hash_pos); + + hashwrite_be32(f, find_object_pos(writer, &commits.oid[i], + NULL)); + if (c->nr == 1) + hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]); + else if (c->nr > 1) { + if (next_ext & ((uint64_t)1<<63)) + die(_("too many pseudo-merges")); + hashwrite_be64(f, next_ext | ((uint64_t)1<<63)); + next_ext = st_add3(next_ext, + sizeof(uint32_t), + st_mult(c->nr, sizeof(uint64_t))); + } else + BUG("expected commit '%s' to have at least one " + "pseudo-merge", oid_to_hex(&commits.oid[i])); + } + + /* write lookup table (extended) */ + for (i = 0; i < commits.nr; i++) { + int hash_pos; + struct pseudo_merge_commit_idx *c; + + hash_pos = kh_get_oid_map(writer->pseudo_merge_commits, + commits.oid[i]); + if (hash_pos == kh_end(writer->pseudo_merge_commits)) + BUG("could not find pseudo-merge commit %s", + oid_to_hex(&commits.oid[i])); + + c = kh_value(writer->pseudo_merge_commits, hash_pos); + if (c->nr == 1) + continue; + + hashwrite_be32(f, c->nr); + for (j = 0; j < c->nr; j++) + hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]); + } + + /* write positions for all pseudo merges */ + for (i = 0; i < writer->pseudo_merges_nr; i++) + hashwrite_be64(f, pseudo_merge_ofs[i]); + + hashwrite_be32(f, writer->pseudo_merges_nr); + hashwrite_be32(f, kh_size(writer->pseudo_merge_commits)); + hashwrite_be64(f, table_start - start); + hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t)); + + for (i = 0; i < writer->pseudo_merges_nr; i++) + bitmap_free(commits_bitmap[i]); + + free(pseudo_merge_ofs); + free(commits_bitmap); +} + static int table_cmp(const void *_va, const void *_vb, void *_data) { struct bitmap_writer *writer = _data; @@ -878,6 +1003,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer, int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX"); + if (writer->pseudo_merges_nr) + options |= BITMAP_OPT_PSEUDO_MERGES; + f = hashfd(fd, tmp_file.buf); memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); @@ -907,6 +1035,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer, write_selected_commits_v1(writer, f, offsets); + if (options & BITMAP_OPT_PSEUDO_MERGES) + write_pseudo_merges(writer, f); + if (options & BITMAP_OPT_LOOKUP_TABLE) write_lookup_table(writer, f, offsets); diff --git a/pack-bitmap.h b/pack-bitmap.h index 1e730ea1e54..db9ae554fa8 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -37,6 +37,7 @@ enum pack_bitmap_opts { BITMAP_OPT_FULL_DAG = 0x1, BITMAP_OPT_HASH_CACHE = 0x4, BITMAP_OPT_LOOKUP_TABLE = 0x10, + BITMAP_OPT_PSEUDO_MERGES = 0x20, }; enum pack_bitmap_flags { From patchwork Thu May 23 21:26:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672316 Received: from mail-oo1-f43.google.com (mail-oo1-f43.google.com [209.85.161.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77688128392 for ; Thu, 23 May 2024 21:26:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499613; cv=none; b=fGUdoxFFF0gqRtUR2qTof4GrMb1Oi4jRM3U1/XiLQAwmO2lb6NRpkEqmw3vdGo4MzrKQf1QEwqiCX5Joy75BJHHiKAERKWsie+XJUFdWgRFmZIiU2dW0DVVAEnBXpl38f3BlSGtn+vAll1wyq8lNAX3r0DFWbUk7eHAefR4vnMY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499613; c=relaxed/simple; bh=TBlk/BomLXnbn5zDMonuSH56JPGOqQwnOYIjY1gGtjU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=FjkuwOFHip90JtoPIzUkL7E9IAaRYp0cwiemjHG3VgKMm0uAsAJles6SBrPyI+gFmzc4GZETcllOu1wuF8Vho6imytZTIrthYTTpz42Wb+tDWaO2T+e9CKz3ynz1CTv2nFCgcF9v/DsBJ3a0FjDcF+O87xF4sAvX67YsKMTB7TI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=pZodjeqZ; arc=none smtp.client-ip=209.85.161.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="pZodjeqZ" Received: by mail-oo1-f43.google.com with SMTP id 006d021491bc7-5b31f2c6e52so2869824eaf.2 for ; Thu, 23 May 2024 14:26:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499611; x=1717104411; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=jFI1cnPyZs4YhZi/Wyd3za+PF0NEm+j9eUfq7jVj/Z8=; b=pZodjeqZ/4JKJPDwp4YwVy5HHKs/rZ9HXl7c3U/tSWE09qMyA9EeHrSRnms7rLmC+J YQrWYal4Gr1wumkwd1LYcmc0O8muhtG779SDk7jwa5nzXVAG4e0yCi+1iZePTdz9/pFu +w4RMWvzqeoRq4vT9IzyhNorEmH8fnZleaxfnfhRJxjmtm7F/msqj6vu9R+QSL1iPwaz 1O0S/dzLUxt35vpn7+WJLzKc+NIlIu3+Spk5I7XdKi7wWnyfpHzx1GtuCRhb4+VSjgiD HigwlCQ8vC4Uydw1Ffr71SjtxRgUnwWu3mu0W29Bv0YeGsEtal7cKIRiHYv4hWq735Ma RR+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499611; x=1717104411; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jFI1cnPyZs4YhZi/Wyd3za+PF0NEm+j9eUfq7jVj/Z8=; b=WYEVWEKFXscSRYnMei0jwZuoLyeo+Skq1eVg/lqarKWSvjc58h2lEIpFAVDmBfRd9X 0o+P8BQHuYNFueh/3qsNB1JgMi7+gQgBIrkD1iD23wIFBU6/HLhddfXM75JrZt7ZDlhl yHkpNDXXLM56yH/KqWfXluVXCHv7SYVoU5qZxo8yaDD+POjYTfSk9xovfJJIfqLlEpuI tTZh17SGjHt3Uv8qr+uQcQ5e0Hpm0ERdcaF/G9sccm3vIdoY57PyRjWFpP/Up14oRL1s kvTiI33xYWA4nUq+6d5OZqD66H6W4SPPxgmCc/ubBnL4cNYTpmXynXdZTtCOg7judMXB P5cg== X-Gm-Message-State: AOJu0YxZ9M0RLkxApB0VDe7Ikrc6dgbIFKdsbZNzifVdi4HCrlDZZh9/ gAoTpyYU2P+E7JWsKg2a10ylvJXZ7GBNtxXTnExlAbdvTjn7Kx/YvXAWe5Nbqqc+Wlvgny3UMez u X-Google-Smtp-Source: AGHT+IHFdFJJEJBwbC1rFDEh1QWu3QR0Cj1UT/2s+C3JdfEPPFXkk0bEi4tO7BnOOP3t6q/mSw4UoQ== X-Received: by 2002:a05:6359:28f:b0:17f:6b3f:1b0a with SMTP id e5c5f4694b2df-197e5216b48mr63682555d.15.1716499611045; Thu, 23 May 2024 14:26:51 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070dc62dsm630836d6.38.2024.05.23.14.26.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:50 -0700 (PDT) Date: Thu, 23 May 2024 17:26:49 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-bitmap machinery uses the `read_bitmap_1()` function to read a bitmap from within the mmap'd region corresponding to the .bitmap file. As as side-effect of calling this function, `read_bitmap_1()` increments the `index->map_pos` variable to reflect the number of bytes read. Extract the core of this routine to a separate function (that operates over a `const unsigned char *`, a `size_t` and a `size_t *` pointer) instead of a `struct bitmap_index *` pointer. This function (called `read_bitmap()`) is part of the pack-bitmap.h API so that it can be used within the upcoming portion of the implementation in pseduo-merge.ch. Rewrite the existing function, `read_bitmap_1()`, in terms of its more generic counterpart. Signed-off-by: Taylor Blau --- pack-bitmap.c | 24 +++++++++++++++--------- pack-bitmap.h | 2 ++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 35c5ef9d3cd..3519edb896b 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) return composed; } -/* - * Read a bitmap from the current read position on the mmaped - * index, and increase the read position accordingly - */ -static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) +struct ewah_bitmap *read_bitmap(const unsigned char *map, + size_t map_size, size_t *map_pos) { struct ewah_bitmap *b = ewah_pool_new(); - ssize_t bitmap_size = ewah_read_mmap(b, - index->map + index->map_pos, - index->map_size - index->map_pos); + ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos, + map_size - *map_pos); if (bitmap_size < 0) { error(_("failed to load bitmap index (corrupted?)")); @@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) return NULL; } - index->map_pos += bitmap_size; + *map_pos += bitmap_size; + return b; } +/* + * Read a bitmap from the current read position on the mmaped + * index, and increase the read position accordingly + */ +static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) +{ + return read_bitmap(index->map, index->map_size, &index->map_pos); +} + static uint32_t bitmap_num_objects(struct bitmap_index *index) { if (index->midx) diff --git a/pack-bitmap.h b/pack-bitmap.h index db9ae554fa8..21aabf805ea 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -160,4 +160,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname); int verify_bitmap_files(struct repository *r); +struct ewah_bitmap *read_bitmap(const unsigned char *map, + size_t map_size, size_t *map_pos); #endif From patchwork Thu May 23 21:26:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672317 Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3873112BEA1 for ; Thu, 23 May 2024 21:26:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499616; cv=none; b=aIo3eooRhG2hFgM3H/lpscHtAIGO/x0yjNk6TyHKQPA7QOOxtJd2vajGx70cPvlUVMSCORNPQ9whfzVH0WniNO8HZZhWivFuM+SZlx1ha5vg4Pid/AUUA7cnuc/THCiRuL731J6xbFV+kJ04g4GghoWOyJWTWZmFZohzvOrvulw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499616; c=relaxed/simple; bh=Q5ZI6L1kOwAWAt36F1OOELWsCOdvl0IcNJI0VNSGe+g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=SUrpFQHf6GhbGcnsoZ3hXK2Qwnw4/Q1JvfwASPDY8eyqE5uv0vMQUtR10tG4+Dd1KTAnEEkbcT3jTuxqykpspqAgud5ml7saRMYE62cWToRhz1EGrKuDflrbBwT2rp4U+poqTSgXu6vrVcJI0ea1LjTTeBgfEAO0Wj0DuRdwJaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=Y/RxPkBb; arc=none smtp.client-ip=209.85.167.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Y/RxPkBb" Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-3c9995562a0so4139676b6e.2 for ; Thu, 23 May 2024 14:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499614; x=1717104414; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Ry6S+SautFv9UU06O4Iu0WmsczwpKrNxBByii8xV/Tc=; b=Y/RxPkBbmfaxsKGiHp2mQvPvazE4D4av+MI0UIPcMg3+fp/xmt+z5GvjKTHW9kS7E4 l24qVYXLUOYUZUnTWGcQjqGd3UnlC0yc0avWDR6uaZcTuLpgYmi/B+TTSv6bBLvaFTBr hPan/lsA/foc/sCgeU+keVfsWDa8chnQxxOwDlxk5CQD3yHxifZT8ilK1xLPSkTEYF5f eoccGrZcUOmlwbbVRMBjgRoKDXmoCOM3WZVuryIoGF5AXsheor0bXl0L5aHOKsqQ2zMC HZ9/L+3REeccemZuSETw2+tzxE+lM1dJIe4criiWBnCiY6QueGNlBwkkUoOLcaQmxF1T ji2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499614; x=1717104414; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Ry6S+SautFv9UU06O4Iu0WmsczwpKrNxBByii8xV/Tc=; b=sWPiHWWXctq05tsX0kzIromL+VmKh+V3/eI9wNr5LQPK7T5gitdJggpyxmR8Yugqpb tf/WErFE12CVF3qRFJxB9eBGkXFa3aAUiMdAoFHPRGFACjlRWiLgxelGKlInla1JJgE/ Su/5ifzE7xiilxP2DEYWVeAiEj4NEB5k+nIr6HyP+3m2rDTIzuwDiay1guw/1SNhTcYa tS0dLYcM+I86GpO+haBEAjsfydRQQEeUMBzPtj6q2dbw8jFtm1bkNB8CcPZ1Za+V1UYp cEYrCCGhdHS91cwM9cMUJkkn4JLfR0JmoT8fq36zHOO7u7C7e5ixVMaCtNCLeMZtRm0k UD9A== X-Gm-Message-State: AOJu0YxO3uvaIYsyHLE7usEoCuQVZuznzJ7d5JPXjgX22Ec474u55I/g aAFRkKVTuqoTsptrr1MDeKL8ERGJFiQUsW9B6wwf62IABMfTSKT4FtcNtrjJUnfSOsLqMoedTFw 5 X-Google-Smtp-Source: AGHT+IHC/KKxAS+qB7xFeFzGr1nPuwyOud4ZZsKCiK1dOeZFqeG5lyDLfeHutxFcm8LFZVuOsa/ITg== X-Received: by 2002:a05:6808:1a27:b0:3c6:ce0:6820 with SMTP id 5614622812f47-3d1a62fce1amr745157b6e.35.1716499614065; Thu, 23 May 2024 14:26:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abca5934sm3069985a.11.2024.05.23.14.26.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:53 -0700 (PDT) Date: Thu, 23 May 2024 17:26:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 14/24] pseudo-merge: scaffolding for reads Message-ID: <435ac048003d55355c8b58812d836933dcffae2b.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement scaffolding within the new pseudo-merge compilation unit necessary to use the pseudo-merge API from within the pack-bitmap.c machinery. The core of this scaffolding is two-fold: - The `pseudo_merge` structure itself, which represents an individual pseudo-merge bitmap. It has fields for both bitmaps, as well as metadata about its position within the memory-mapped region, and a few extra bits indicating whether or not it is satisfied, and which bitmaps(s, if any) have been read, since they are initialized lazily. - The `pseudo_merge_map` structure, which holds an array of pseudo_merges, as well as a pointer to the memory-mapped region containing the pseudo-merge serialization from within a .bitmap file. Note that the `bitmap_index` structure is defined statically within the pack-bitmap.o compilation unit, so we can't take in a `struct bitmap_index *`. Instead, wrap the primary components necessary to read the pseudo-merges in this new structure to avoid exposing the implementation details of the `bitmap_index` structure. Signed-off-by: Taylor Blau --- pseudo-merge.c | 10 ++++++++ pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) diff --git a/pseudo-merge.c b/pseudo-merge.c index 0f6854c753f..f0080d53c03 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -454,3 +454,13 @@ void select_pseudo_merges(struct bitmap_writer *writer, stop_progress(&progress); } + +void free_pseudo_merge_map(struct pseudo_merge_map *pm) +{ + uint32_t i; + for (i = 0; i < pm->nr; i++) { + ewah_pool_free(pm->v[i].commits); + ewah_pool_free(pm->v[i].bitmap); + } + free(pm->v); +} diff --git a/pseudo-merge.h b/pseudo-merge.h index f809cf42aeb..e9216baace8 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -97,4 +97,69 @@ struct pseudo_merge_commit_idx { void select_pseudo_merges(struct bitmap_writer *writer, struct commit **commits, size_t commits_nr); +/* + * Represents a serialized view of a file containing pseudo-merge(s) + * (see Documentation/technical/bitmap-format.txt for a specification + * of the format). + */ +struct pseudo_merge_map { + /* + * An array of pseudo-merge(s), lazily loaded from the .bitmap + * file. + */ + struct pseudo_merge *v; + size_t nr; + size_t commits_nr; + + /* + * Pointers into a memory-mapped view of the .bitmap file: + * + * - map: the beginning of the .bitmap file + * - commits: the beginning of the pseudo-merge commit index + * - map_size: the size of the .bitmap file + */ + const unsigned char *map; + const unsigned char *commits; + + size_t map_size; +}; + +/* + * An individual pseudo-merge, storing a pair of lazily-loaded + * bitmaps: + * + * - commits: the set of commit(s) that are part of the pseudo-merge + * - bitmap: the set of object(s) reachable from the above set of + * commits. + * + * The `at` and `bitmap_at` fields are used to store the locations of + * each of the above bitmaps in the .bitmap file. + */ +struct pseudo_merge { + struct ewah_bitmap *commits; + struct ewah_bitmap *bitmap; + + off_t at; + off_t bitmap_at; + + /* + * `satisfied` indicates whether the given pseudo-merge has been + * used. + * + * `loaded_commits` and `loaded_bitmap` indicate whether the + * respective bitmaps have been loaded and read from the + * .bitmap file. + */ + unsigned satisfied : 1, + loaded_commits : 1, + loaded_bitmap : 1; +}; + +/* + * Frees the given pseudo-merge map, releasing any memory held by (a) + * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does + * not free the memory-mapped view of the .bitmap file. + */ +void free_pseudo_merge_map(struct pseudo_merge_map *pm); + #endif From patchwork Thu May 23 21:26:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672318 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE7CD128807 for ; Thu, 23 May 2024 21:26:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499620; cv=none; b=TEXp1USbzausr+ZFtHQq8keiPdz68SrYouxktwwGaJeGKjDHO0YYbRLD7g1c2680EASBncE/l9miGseLZiciseQdOJaWPcjznvjKW9DEZttp7415Cp20zA+kc4LQ3zZyw09COt2w7Z38r2H2X/rGL2SX4QaIr43lWYG21cSiOLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499620; c=relaxed/simple; bh=Ayb/a+SMRANBR1ZiKPhvtui3V1wY4+opEJor5SJiQ9M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=IFutQgbD7H45hacGsthsXYOwXQIZGsmsd/V4xV7/ZVVT6gcvHXYHASWJsXvBiW8Y29tDfo32Lo4zmuYTGwfrsqzYYPkVJMsfi93BKwbtG4B0LJYhPgc7KQBpeIdKpy0t+hsgFnFQHv2p1Afq/JBlFN35E3eHkep304W8bE4GS64= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=CSRot3au; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="CSRot3au" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6ab9d01cbc4so1478206d6.1 for ; Thu, 23 May 2024 14:26:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499617; x=1717104417; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8uWGIB5ySfjI8LPLNonCagCjxEOlE+8mvLD+0Julcas=; b=CSRot3auV7hY9JGLUh3520zi/oE29nH4REiE1mqGxdWVwZev/H4yNewLdzeZobyhTu VyeM4q9awwCrW/G/Aeqldzl4xVcI9IYDR6Ik24GLxwTQ5myRqqiesYi7ZLj+dzNvCSDi fB2XWZuEfPiDBALOH0u9Cq0/LzLoZi1OMxT5LrygBiewJZElVbGpiq2uDooOaayXU0LE 9BwdYPaft7kUNgrXIEdWuQ1yS3rAlJf0iq8ux99SDBUnpgzs0bUoZOM/5bzbl5dOLW7I jIRkqZOZOqdAG+rT0ISRa2nRHhfRdd/w1kJTL8ekk6tw2sFA0pQZ6NTdA6iF58z0Y5On ZnWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499617; x=1717104417; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8uWGIB5ySfjI8LPLNonCagCjxEOlE+8mvLD+0Julcas=; b=dkeHqxWPY03/lQ1qMvMT1TW/8IDqsfzdNyM8GvBgmZgypPvlxQS0J4Kja2bFwY8Qfs fUdwFIjEEfG/8qr0WmVIQg38MbRGYmXVEOn3SBnZuZp+rse42yDOWTN+doLmY14BYbcu QyJCOu+NekudjjWgzH3bEulCwj9xhA+ymYQT75O9xIPxByYpoRgHFq7FbgAfezobQX3H pIQfbiKEaePLBxeYtAFjLr1yW3bhv88bG/7QDAJ5sJP1ixcfsd2JaSXfVgtC/I9gcIgQ sc/hLMlX+Z+3qrPFz0e003PLYipNgxmfojKutpoF+S/JypfMRfsymk4Cbqr5kIy2Brd5 K/9g== X-Gm-Message-State: AOJu0Yx4kG/KCijyynwIj3rJFoiQSPJbt/UStxG3NMK6sGeOJHxC3Q+i +CYarVWa4y9qlloyGl9b1Qn8iJgxjqvqMKuqxx3dYr9tP5eF/az0OBI9XkiL1f63Km9wJk5s4U8 u X-Google-Smtp-Source: AGHT+IG0z0GOzossSoOLovVKbV80sUAL3eBkEz8CgK8whnn2hR/u/ZCjP6eJPhFqXouOuDgxp+WbJQ== X-Received: by 2002:a05:6214:4a81:b0:6aa:8aef:dfae with SMTP id 6a1803df08f44-6abcdaa9e5fmr4652306d6.55.1716499617172; Thu, 23 May 2024 14:26:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070f4b7dsm609046d6.67.2024.05.23.14.26.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:56 -0700 (PDT) Date: Thu, 23 May 2024 17:26:55 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the scaffolding for reading the pseudo-merge extension has been laid, teach the pack-bitmap machinery to read the pseudo-merge extension when present. Note that pseudo-merges themselves are not yet used during traversal, this step will be taken by a future commit. In the meantime, read the table and initialize the pseudo_merge_map structure introduced by a previous commit. When the pseudo-merge extension is present, `load_bitmap_header()` performs basic sanity checks to make sure that the table is well-formed. Signed-off-by: Taylor Blau --- pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index 3519edb896b..fc9c3e2fc43 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -20,6 +20,7 @@ #include "list-objects-filter-options.h" #include "midx.h" #include "config.h" +#include "pseudo-merge.h" /* * An entry on the bitmap index, representing the bitmap for a given @@ -86,6 +87,9 @@ struct bitmap_index { */ unsigned char *table_lookup; + /* This contains the pseudo-merge cache within 'map' (if found). */ + struct pseudo_merge_map pseudo_merges; + /* * Extended index. * @@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index) index->table_lookup = (void *)(index_end - table_size); index_end -= table_size; } + + if (flags & BITMAP_OPT_PSEUDO_MERGES) { + unsigned char *pseudo_merge_ofs; + size_t table_size; + uint32_t i; + + if (sizeof(table_size) > index_end - index->map - header_size) + return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)")); + + table_size = get_be64(index_end - 8); + if (table_size > index_end - index->map - header_size) + return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)")); + + if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) { + const unsigned char *ext = (index_end - table_size); + + index->pseudo_merges.map = index->map; + index->pseudo_merges.map_size = index->map_size; + index->pseudo_merges.commits = ext + get_be64(index_end - 16); + index->pseudo_merges.commits_nr = get_be32(index_end - 20); + index->pseudo_merges.nr = get_be32(index_end - 24); + + CALLOC_ARRAY(index->pseudo_merges.v, + index->pseudo_merges.nr); + + pseudo_merge_ofs = index_end - 24 - + (index->pseudo_merges.nr * sizeof(uint64_t)); + for (i = 0; i < index->pseudo_merges.nr; i++) { + index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs); + pseudo_merge_ofs += sizeof(uint64_t); + } + } + + index_end -= table_size; + } } index->entry_count = ntohl(header->entry_count); From patchwork Thu May 23 21:26:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672319 Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2437128804 for ; Thu, 23 May 2024 21:27:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499623; cv=none; b=T3UPYJxz7yNMfvozcMz50xcKdYUWHzRbNtyzH2KCg0gnwO1PNrv6hUj/295NjPkXNjFkdzJLGRz2UxfPitWRZzVBdt+hBopTu5NgkNSOzApxFNRRLrmBwKtlweim/DQSxXavsoiC+QCoNl+Xpg3YJoVUTZ8FcpG+E+Qkh1QB5Bc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499623; c=relaxed/simple; bh=IXbc+CIhtJrSC5vPkh2+CUL6JxXToWYrfWRgc5mKUlo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eRzBY/QeS/LdPjXVvC/giIy1DinYDn3TRLuWLsYEokt/yWlx3fRX6Tv+CMXJ3f5G2ag1ff4tjXH8ciUdi4BXVeQCzp/fL4+xjjVHXZ7y3YoI+tmjGpRXdiNQKpB19g1h7xpq/Oh81hEpOtR6nL5HzPWsJhqwHgmDogdDFbNZCVg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=b4s84MY2; arc=none smtp.client-ip=209.85.219.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="b4s84MY2" Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6ab975abb24so4702556d6.1 for ; Thu, 23 May 2024 14:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499620; x=1717104420; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vGV2CaoHqUtYdRFml2QjMXVFzSQHg5ZJViOnrE6+zgU=; b=b4s84MY2Ba6DvEpNRDyrSnfpxWiYkwu5WUbcGpCSKLERMMeaP6n5Uq+/XBq0aMJhJO 90XDAYwZvjf77SI4pm7Ld464LbPl5LkCA3v7YuIe44/xe9/RcY+bpUAnbfPDKgQvC2Nn 1+gFn4ccdB87fYEhBd9DpSViAe987xHqAifLO2Ma+XdKPW3SmcELrxiee5Xe6LVJOPzg vt0x8ue3r2gZIMzX/z+q83vYmf+vAeI+9ZSz/qQj6o4ayb9F3A9LjEVge0EgzT9YFBHC rgTjTIzg2ZeVsw2u1Gu/+d/XCtVPsHO16pK6CZ8cxF9DO4cOV2P0+zdCkVaTl98a4NMe h3ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499620; x=1717104420; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vGV2CaoHqUtYdRFml2QjMXVFzSQHg5ZJViOnrE6+zgU=; b=LPibm4CwjqRBB6ehyfuGQaDiHReGVkmpOf6E9prAeRzy9edXIvA77i+3yWL2lwzHr1 GERviLx41hf3MBQp+Ulgl5LGsUk2qfiso4e6y449z7gNIxGuheCoi1DEbHJ6eew/ilSX dof1sLIGA8UFVXTlitSlCcw8cg+zHs6eITNlDMuRns+qPJA6oBjFbIjSl8htdU+3udUK 6/LUsAB/MvtyzfikLiIxZXluN/43ehR4Qr8/jHyUfE30oRpDeqoC/SjThfyO2h1O5Onn ztkX+vuaTjfpVXxh18SbMt8HRsXwLbHUnmzwrk7QxX4QBLmWOEG3zuIz5bcEibHwDOPr YQsg== X-Gm-Message-State: AOJu0YzR+ZMljy8SRC4IPi33ENKsw+UaOmJ2xe2JVDSUun+eo1bfN99o Rll4YBsREcC6Umzb8Plq0oB2rfOaPeyPSYKt5/DSDz+IY/4sJZyCfbpOn8Vjrkv7Bl+YsZVg50V a X-Google-Smtp-Source: AGHT+IFoaK/x/hC0ox8tf8CmrjOKuAOtnzzgXArW/MLfxwT2O1xOM2dD+rgJL4kibqHG+exd+fdSeg== X-Received: by 2002:a05:6214:5685:b0:6a0:cf48:5196 with SMTP id 6a1803df08f44-6abcd0b1860mr3506566d6.54.1716499620335; Thu, 23 May 2024 14:27:00 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070c2f02sm636586d6.1.2024.05.23.14.26.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:26:59 -0700 (PDT) Date: Thu, 23 May 2024 17:26:58 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits Message-ID: <3a72e66cb6932fa73f8261043b6dcc262ebb6e31.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement the basic API for reading pseudo-merge bitmaps, which consists of four basic functions: - pseudo_merge_bitmap() - use_pseudo_merge() - apply_pseudo_merges_for_commit() - cascade_pseudo_merges() These functions are all documented in pseudo-merge.h, but their rough descriptions are as follows: - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for a given pseudo-merge - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on the commits EWAH bitmap, not the objects bitmap - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge commits for a given result set, and cascades any yet-unsatisfied pseudo-merges if any were applied in the previous step - cascade_pseudo_merges() applies all pseudo-merges which are satisfied but have not been previously applied, repeating this process until no more pseudo-merges can be applied The core of the API is the latter two functions, which are responsible for applying pseudo-merges during the object traversal implemented in the pack-bitmap machinery. The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge()) are low-level ways to interact with the pseudo-merge machinery, which will be useful in future commits. Signed-off-by: Taylor Blau --- pseudo-merge.c | 235 +++++++++++++++++++++++++++++++++++++++++++++++++ pseudo-merge.h | 44 +++++++++ 2 files changed, 279 insertions(+) diff --git a/pseudo-merge.c b/pseudo-merge.c index f0080d53c03..7d131011497 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -10,6 +10,7 @@ #include "commit.h" #include "alloc.h" #include "progress.h" +#include "hex.h" #define DEFAULT_PSEUDO_MERGE_DECAY 1.0 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 @@ -464,3 +465,237 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm) } free(pm->v); } + +struct pseudo_merge_commit_ext { + uint32_t nr; + const unsigned char *ptr; +}; + +static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm, + struct pseudo_merge_commit_ext *ext, size_t at) +{ + if (at >= pm->map_size) + return error(_("extended pseudo-merge read out-of-bounds " + "(%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)at, (uintmax_t)pm->map_size); + if (at + 4 >= pm->map_size) + return error(_("extended pseudo-merge entry is too short " + "(%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)(at + 4), (uintmax_t)pm->map_size); + + ext->nr = get_be32(pm->map + at); + ext->ptr = pm->map + at + sizeof(uint32_t); + + return 0; +} + +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge) +{ + if (!merge->loaded_commits) + BUG("cannot use unloaded pseudo-merge bitmap"); + + if (!merge->loaded_bitmap) { + size_t at = merge->bitmap_at; + + merge->bitmap = read_bitmap(pm->map, pm->map_size, &at); + merge->loaded_bitmap = 1; + } + + return merge->bitmap; +} + +struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge) +{ + if (!merge->loaded_commits) { + size_t pos = merge->at; + + merge->commits = read_bitmap(pm->map, pm->map_size, &pos); + merge->bitmap_at = pos; + merge->loaded_commits = 1; + } + return merge; +} + +static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm, + struct object_id *oid, + size_t want) +{ + size_t lo = 0; + size_t hi = pm->nr; + + while (lo < hi) { + size_t mi = lo + (hi - lo) / 2; + size_t got = pm->v[mi].at; + + if (got == want) + return use_pseudo_merge(pm, &pm->v[mi]); + else if (got < want) + hi = mi; + else + lo = mi + 1; + } + + warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX), + oid_to_hex(oid), (uintmax_t)want); + + return NULL; +} + +struct pseudo_merge_commit { + uint32_t commit_pos; + uint64_t pseudo_merge_ofs; +}; + +#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t)) + +static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge, + const unsigned char *at) +{ + merge->commit_pos = get_be32(at); + merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t)); +} + +static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm, + struct pseudo_merge_commit_ext *ext, + struct pseudo_merge_commit *merge, + uint32_t n) +{ + size_t ofs; + + if (n >= ext->nr) + return error(_("extended pseudo-merge lookup out-of-bounds " + "(%"PRIu32" >= %"PRIu32")"), n, ext->nr); + + ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t))); + if (ofs >= pm->map_size) + return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)ofs, (uintmax_t)pm->map_size); + + read_pseudo_merge_commit_at(merge, pm->map + ofs); + + return 0; +} + +static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge, + struct bitmap *result, + struct bitmap *roots) +{ + if (merge->satisfied) + return 0; + + if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result)) + return 0; + + bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge)); + if (roots) + bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge)); + merge->satisfied = 1; + + return 1; +} + +static int pseudo_merge_commit_cmp(const void *va, const void *vb) +{ + struct pseudo_merge_commit merge; + uint32_t key = *(uint32_t*)va; + + read_pseudo_merge_commit_at(&merge, vb); + + if (key < merge.commit_pos) + return -1; + if (key > merge.commit_pos) + return 1; + return 0; +} + +static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm, + uint32_t pos) +{ + if (!pm->commits_nr) + return NULL; + + return bsearch(&pos, pm->commits, pm->commits_nr, + PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp); +} + +int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct commit *commit, uint32_t commit_pos) +{ + struct pseudo_merge *merge; + struct pseudo_merge_commit *merge_commit; + int ret = 0; + + merge_commit = find_pseudo_merge(pm, commit_pos); + if (!merge_commit) + return 0; + + if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) { + struct pseudo_merge_commit_ext ext = { 0 }; + off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63); + uint32_t i; + + if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) { + warning(_("could not read extended pseudo-merge table " + "for commit %s"), + oid_to_hex(&commit->object.oid)); + return ret; + } + + for (i = 0; i < ext.nr; i++) { + if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0) + return ret; + + merge = pseudo_merge_at(pm, &commit->object.oid, + merge_commit->pseudo_merge_ofs); + + if (!merge) + return ret; + + if (apply_pseudo_merge(pm, merge, result, NULL)) + ret++; + } + } else { + merge = pseudo_merge_at(pm, &commit->object.oid, + merge_commit->pseudo_merge_ofs); + + if (!merge) + return ret; + + if (apply_pseudo_merge(pm, merge, result, NULL)) + ret++; + } + + if (ret) + cascade_pseudo_merges(pm, result, NULL); + + return ret; +} + +int cascade_pseudo_merges(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct bitmap *roots) +{ + unsigned any_satisfied; + int ret = 0; + + do { + struct pseudo_merge *merge; + uint32_t i; + + any_satisfied = 0; + + for (i = 0; i < pm->nr; i++) { + merge = use_pseudo_merge(pm, &pm->v[i]); + if (apply_pseudo_merge(pm, merge, result, roots)) { + any_satisfied |= 1; + ret++; + } + } + } while (any_satisfied); + + return ret; +} diff --git a/pseudo-merge.h b/pseudo-merge.h index e9216baace8..755edc054ae 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -162,4 +162,48 @@ struct pseudo_merge { */ void free_pseudo_merge_map(struct pseudo_merge_map *pm); +/* + * Loads the bitmap corresponding to the given pseudo-merge from the + * map, if it has not already been loaded. + */ +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge); + +/* + * Loads the pseudo-merge and its commits bitmap from the given + * pseudo-merge map, if it has not already been loaded. + */ +struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge); + +/* + * Applies pseudo-merge(s) containing the given commit to the bitmap + * "result". + * + * If any pseudo-merge(s) were satisfied, returns the number + * satisfied, otherwise returns 0. If any were satisfied, the + * remaining unsatisfied pseudo-merges are cascaded (see below). + */ +int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct commit *commit, uint32_t commit_pos); + +/* + * Applies pseudo-merge(s) which are satisfied according to the + * current bitmap in result (or roots, see below). If any + * pseudo-merges were satisfied, repeat the process over unsatisfied + * pseudo-merge commits until no more pseudo-merges are satisfied. + * + * Result is the bitmap to which the pseudo-merge(s) are applied. + * Roots (if given) is a bitmap of the traversal tip(s) for either + * side of a reachability traversal. + * + * Roots may given instead of a populated results bitmap at the + * beginning of a traversal on either side where the reachability + * closure over tips is not yet known. + */ +int cascade_pseudo_merges(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct bitmap *roots); + #endif From patchwork Thu May 23 21:27:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672320 Received: from mail-ua1-f41.google.com (mail-ua1-f41.google.com [209.85.222.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CED91129E94 for ; Thu, 23 May 2024 21:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499626; cv=none; b=YYHygSatmCF3Q99mz9C170+pOqn6Wt53FNRwKEMEThlDdZbtJneaYml7P35YfjI5wB/EUmXcJOvlWEd3EQLmp/C6PfzgMrBFd1tTgvBeqK/ReDucekZJxTu8Wh7ENm/bqpaCBg28/iK0IfIjYluVDHMpWAleM0sVJCLyCa9OC4w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499626; c=relaxed/simple; bh=yJz/yrRHcrAYTwawq9EKRwgOKtLZ5QwIzWxAfUy7Xdg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=clm7JjDl1yyIGkr1Otb2jiJDNwzCE/Ob9orR1XSRVXsWmW0Yrrvba4QKdJf2jvj8OZjlu4+X7SRlcs+5NwbaCwpqAs40vnbRFoHYT3pZ79MHPidSOEY1/nAOOhJZq+UwErMsGgmb9pnm4jOfHdqJAQKVgYmn3yTqPZ90uqodgT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=iu89s6kp; arc=none smtp.client-ip=209.85.222.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="iu89s6kp" Received: by mail-ua1-f41.google.com with SMTP id a1e0cc1a2514c-80324bcb9ebso523209241.0 for ; Thu, 23 May 2024 14:27:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499623; x=1717104423; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tRM/0EiABz5NflFNmv+/dFJedOOt5/SGfFG7VeoWWU0=; b=iu89s6kpjWOFviNbKaRRJCkhfa1qyZ0UZ+gF1g93w24rnvn+Yo257XYciJZYEB6EX5 Ssz/ie7Hbt8KfgGvToAAK+rIZ1Fh093uKSIEVSXLAmTU6J2gduzYCrupeMQtVE+OfC4Z wfkGb5SpUtRpcH82lYtC6L1MeYvyaOVhU2Ycogwi9UC3mZeDpiIujIXbCzaaCE1Z8YQG EbDqWPpkQEDIpJxSAqpxDz/Ji5cZMaCh8BGrr0bKbXP9y2AS1afUcEFQCw0ZzH80Hpww jPfVlnmPgNhm1HXIRfdAU7xcTu8F5oPHexUfG7q0RvYANglZpzDFOEEx459I8JJ+pXfF HBJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499623; x=1717104423; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tRM/0EiABz5NflFNmv+/dFJedOOt5/SGfFG7VeoWWU0=; b=fHlOVXJwh+fVSs74LGZUBpDgAF3saht8ftLrXfd65O96y+NR9oVNH7HrDMvFtgz8FY NKvhKM32q8mbSfRpNIAaFMPmHh/bsD01oPEb9h17j3wJ1LHuHWeS5eR70x3NJ2yFbbOQ dUfAvUzDRZWua1lBKvyvuz4wHVKFe7zxAqEpBW44zrt1Vn8/RK1Guvfd7vgdQ+gxIj+K nDYPkqKcQu7u0ReLpcNsrRTUCQ7oPNk4BKNMtM5lDLvbe1SGc8/Exxf0Mg8Sedki7hll crgyIlv0rEAI5wJ1qBzbCc4f+By+KyTYX9dTkYfOQZ2TkyE821ajMSeVdoJ7PRqLugTw ZNLQ== X-Gm-Message-State: AOJu0YyYSOzK+oq70GxtP4EzzTqhZQCne0GXjFxbLcPfw5JiJzoqtNy2 jAseK+Nu0pyqUtKBBYNDt23mp925iTvCOeUZa01zckiBv/XCpb32sCwzC8H+5S1nn3ysE+EjGcz f X-Google-Smtp-Source: AGHT+IHyVMi9kO4sIlYPRLimg0knFZ0YwZWt0y/q+dZ6ZrGZdyNFEXPqIM4tTjjJU5chCS32ELaWkw== X-Received: by 2002:a05:6102:3022:b0:48a:34b3:82 with SMTP id ada2fe7eead31-48a386e3fb1mr573727137.35.1716499623507; Thu, 23 May 2024 14:27:03 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb18c21f3sm522481cf.94.2024.05.23.14.27.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:03 -0700 (PDT) Date: Thu, 23 May 2024 17:27:02 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()` Message-ID: <42a836fda8add1e0f26394660b88f0e2e3e635e7.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Some of the pseudo-merge test helpers (which will be introduced in the following commit) will want to indicate the total number of commits in or objects reachable from a pseudo-merge. Implement a popcount() function that operates on EWAH bitmaps to quickly determine how many bits are set in each of the respective bitmaps. Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 ++++++++++++++ ewah/ewok.h | 1 + 2 files changed, 15 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d352fec54ce..dc2ca190f12 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self) return count; } +size_t ewah_bitmap_popcount(struct ewah_bitmap *self) +{ + struct ewah_iterator it; + eword_t word; + size_t count = 0; + + ewah_iterator_init(&it, self); + + while (ewah_iterator_next(&word, &it)) + count += ewah_bit_popcount64(word); + + return count; +} + int bitmap_is_empty(struct bitmap *self) { size_t i; diff --git a/ewah/ewok.h b/ewah/ewok.h index 2b6c4ac499c..7074a6347b7 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -195,6 +195,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other); void bitmap_or(struct bitmap *self, const struct bitmap *other); size_t bitmap_popcount(struct bitmap *self); +size_t ewah_bitmap_popcount(struct ewah_bitmap *self); int bitmap_is_empty(struct bitmap *self); #endif From patchwork Thu May 23 21:27:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672321 Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E1B7129E9F for ; Thu, 23 May 2024 21:27:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499629; cv=none; b=f3asc3W+14D+ps4pSH8tj4uyYURsSWeZvO5mJoif/VpS0gqNMH5VRa5nTAWFWAmnyuVwieYJu3/UTEjjXVaEv5p4j2ZvV1CpoypnMvSWah4Rj7H1qalfIPIebC3vm048TXP3oKYsJ5+6JsUIFzUaKkdKCzlny0FCDvLaF9K0q34= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499629; c=relaxed/simple; bh=MgMY54PKCLJ/2ma4eddn+9ywHyoqew1NvlnKhaPTnjk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ickRQnDAm+IV1V4ioC1JbsVIi7pasFS4UrmPxctUxr+oewht2vlivuRFTtVTC8xJPSLWGf9ynkPHIzRSekcr7rWDDdHSUwvz7Nk9xorTU+St+zvJqai5sXLuTU/nPhwdPdxsePtgUN7vJbZL0WD4rOGsTbo/cM/OwBhmUJ0rFMY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=0o2h7hrv; arc=none smtp.client-ip=209.85.219.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="0o2h7hrv" Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6ab8ab4ca0bso10599156d6.0 for ; Thu, 23 May 2024 14:27:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499626; x=1717104426; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SWMMbr1Vy/P9Z5zpil6N5cVf+cv/qQ2YuUWCY55/mLE=; b=0o2h7hrvcuVhWc2PwewwbN9Qox4V60UHJqkXvSok0TXGm9mfDPEDEDE83P/yJ9+tRh r5tYX7X1LKtwHxEamZzxXQ9UkJ3I9bgCectYe8uHIbDt/ZJBlD0RpebQygNtW0nODHIS qAVzKvUvZLiXqGgBhaFeW7mYJ9W+DOkzsyy25DvAmOv+uVtBGLf4fxV8WI2xpreaLx+o LVG6QCfNDLo7ZiAkOC3Es2CjQfRiMy7nYwxbx5ZlRIXUdCEJ2/AjTnx+D7wAAbwKp/lA yS5nHk5EkY6AOksNnApPaAbraKgYAguUBSEhnYnv9P4t+yxBfF3emdiRywS4cvl4f0zc Ec3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499626; x=1717104426; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SWMMbr1Vy/P9Z5zpil6N5cVf+cv/qQ2YuUWCY55/mLE=; b=wb3ZvCtpkWVhHn76vxKUey+oURZJdRVSaZgkXOiTlqcH28P/XmZZZd9GAJLA1Vf5iJ ERkEB3MLvTSsPgYWeE1QJICUqgDZH2/S/xtfrskp481B/RHL+bYQ3AEoMnuf0QataNkY tlLwwDhz4NutzAM66CCPeUb6ZrVelZjci2eTAfA64NxINLrnNFZVbwbGgbMABNg6bCQo QmOV71vIc4SH6XrDRRRyE77WRUCsbOBgDy386W31wHxNLNjLgVAn4BFqKDpvRBnvvGLh TU5It8oFM6fD1Cuw+4tOCxPX2Ln2jvPU11e2pHr7sZkmtY+remwYiL1LRMdhsisXo8lO RMxA== X-Gm-Message-State: AOJu0Yx0UmyvY9jkrMY8HYt/eThmb0TaRVPQAtKdbGJZakJA4vGIbQeZ fwzj3f0uDWhxMsMxmrC3KLObC7R7RUVEP7r95oWoA+Dj0gylam19N7lp8So41Uc+NQ3OGoPn9q0 O X-Google-Smtp-Source: AGHT+IGnnDQjZgImRpu2qOHQPzDCtGWiA5szJES97Xeg+ft01jZmr9ZizzvxSi2mrWNSrT5YVIsQxg== X-Received: by 2002:a05:6214:4342:b0:6ab:9884:1a82 with SMTP id 6a1803df08f44-6abcd0d4bbfmr2590406d6.51.1716499626680; Thu, 23 May 2024 14:27:06 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070c7c3dsm604826d6.22.2024.05.23.14.27.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:06 -0700 (PDT) Date: Thu, 23 May 2024 17:27:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge Message-ID: <06ba1a5bbfd206cd47c9d8d474371042dc226031.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement three new sub-commands for the "bitmap" test-helper: - t/helper test-tool bitmap dump-pseudo-merges - t/helper test-tool bitmap dump-pseudo-merge-commits - t/helper test-tool bitmap dump-pseudo-merge-objects These three helpers dump the list of pseudo merges, the "parents" of the nth pseudo-merges, and the set of objects reachable from those parents, respectively. These helpers will be useful in subsequent patches when we add test coverage for pseudo-merge bitmaps. Signed-off-by: Taylor Blau --- pack-bitmap.c | 126 +++++++++++++++++++++++++++++++++++++++++ pack-bitmap.h | 3 + t/helper/test-bitmap.c | 34 ++++++++--- 3 files changed, 156 insertions(+), 7 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index fc9c3e2fc43..c13074673af 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2443,6 +2443,132 @@ int test_bitmap_hashes(struct repository *r) return 0; } +static void bit_pos_to_object_id(struct bitmap_index *bitmap_git, + uint32_t bit_pos, + struct object_id *oid) +{ + uint32_t index_pos; + + if (bitmap_is_midx(bitmap_git)) + index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos); + else + index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos); + + nth_bitmap_object_oid(bitmap_git, oid, index_pos); +} + +int test_bitmap_pseudo_merges(struct repository *r) +{ + struct bitmap_index *bitmap_git; + uint32_t i; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) { + struct pseudo_merge *merge; + struct ewah_bitmap *commits_bitmap, *merge_bitmap; + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[i]); + commits_bitmap = merge->commits; + merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges, + merge); + + printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n", + (uintmax_t)merge->at, + (uintmax_t)ewah_bitmap_popcount(commits_bitmap), + (uintmax_t)ewah_bitmap_popcount(merge_bitmap)); + } + +cleanup: + free_bitmap_index(bitmap_git); + return 0; +} + +static void dump_ewah_object_ids(struct bitmap_index *bitmap_git, + struct ewah_bitmap *bitmap) + +{ + struct ewah_iterator it; + eword_t word; + uint32_t pos = 0; + + ewah_iterator_init(&it, bitmap); + + while (ewah_iterator_next(&word, &it)) { + struct object_id oid; + uint32_t offset; + + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + if (!(word >> offset)) + break; + + offset += ewah_bit_ctz64(word >> offset); + + bit_pos_to_object_id(bitmap_git, pos + offset, &oid); + printf("%s\n", oid_to_hex(&oid)); + } + pos += BITS_IN_EWORD; + } +} + +int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n) +{ + struct bitmap_index *bitmap_git; + struct pseudo_merge *merge; + int ret = 0; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + if (n >= bitmap_git->pseudo_merges.nr) { + ret = error(_("pseudo-merge index out of range " + "(%"PRIu32" >= %"PRIuMAX")"), + n, (uintmax_t)bitmap_git->pseudo_merges.nr); + goto cleanup; + } + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[n]); + dump_ewah_object_ids(bitmap_git, merge->commits); + +cleanup: + free_bitmap_index(bitmap_git); + return ret; +} + +int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n) +{ + struct bitmap_index *bitmap_git; + struct pseudo_merge *merge; + int ret = 0; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + if (n >= bitmap_git->pseudo_merges.nr) { + ret = error(_("pseudo-merge index out of range " + "(%"PRIu32" >= %"PRIuMAX")"), + n, (uintmax_t)bitmap_git->pseudo_merges.nr); + goto cleanup; + } + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[n]); + + dump_ewah_object_ids(bitmap_git, + pseudo_merge_bitmap(&bitmap_git->pseudo_merges, + merge)); + +cleanup: + free_bitmap_index(bitmap_git); + return ret; +} + int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest) diff --git a/pack-bitmap.h b/pack-bitmap.h index 21aabf805ea..4466b5ad0fb 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *, void test_bitmap_walk(struct rev_info *revs); int test_bitmap_commits(struct repository *r); int test_bitmap_hashes(struct repository *r); +int test_bitmap_pseudo_merges(struct repository *r); +int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n); +int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n); #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \ "GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL" diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c index af43ee1cb5e..6af2b42678f 100644 --- a/t/helper/test-bitmap.c +++ b/t/helper/test-bitmap.c @@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void) return test_bitmap_hashes(the_repository); } +static int bitmap_dump_pseudo_merges(void) +{ + return test_bitmap_pseudo_merges(the_repository); +} + +static int bitmap_dump_pseudo_merge_commits(uint32_t n) +{ + return test_bitmap_pseudo_merge_commits(the_repository, n); +} + +static int bitmap_dump_pseudo_merge_objects(uint32_t n) +{ + return test_bitmap_pseudo_merge_objects(the_repository, n); +} + int cmd__bitmap(int argc, const char **argv) { setup_git_directory(); - if (argc != 2) - goto usage; - - if (!strcmp(argv[1], "list-commits")) + if (argc == 2 && !strcmp(argv[1], "list-commits")) return bitmap_list_commits(); - if (!strcmp(argv[1], "dump-hashes")) + if (argc == 2 && !strcmp(argv[1], "dump-hashes")) return bitmap_dump_hashes(); + if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges")) + return bitmap_dump_pseudo_merges(); + if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits")) + return bitmap_dump_pseudo_merge_commits(atoi(argv[2])); + if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects")) + return bitmap_dump_pseudo_merge_objects(atoi(argv[2])); -usage: usage("\ttest-tool bitmap list-commits\n" - "\ttest-tool bitmap dump-hashes"); + "\ttest-tool bitmap dump-hashes\n" + "\ttest-tool bitmap dump-pseudo-merges\n" + "\ttest-tool bitmap dump-pseudo-merge-commits \n" + "\ttest-tool bitmap dump-pseudo-merge-objects "); return -1; } From patchwork Thu May 23 21:27:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672322 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B66DF12A15B for ; Thu, 23 May 2024 21:27:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499633; cv=none; b=GfVaSn+jVdPwTjP9ot7Yu5eLCrhwTKemBmFq1VR+inPCJYMtP+fvWFXu7oOgTLafb8kXzvoGgr9LdMUnSX1opeeREKINA6H9ISJu81RocF5jMPF9kYiNuiJWQ6aYOnav6mQKSjg8ilvvXRIt45yaAw2nqNLBTHUsPAEdXIUOxQg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499633; c=relaxed/simple; bh=xMgKNLIcm9lAQcxh1/wtpf6T3uVVqdO+0FTvZhtx/0g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DJv0R2s2UQMfXu/cLScNG3pGJBcOQQzYmVMZI3+681fzdG16ZnOtoova7xdR7G6129d5b9OxncbqvPZzCsCE8qktmhwGr2GDox6ohGAramCC76mnva1QWzha6y1VfkvK2bsxcX0+z6xHzlqk2ewlUInVO9kwctmMMHjmnvJgUy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=FdM0I904; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="FdM0I904" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-43f984101e4so13399401cf.0 for ; Thu, 23 May 2024 14:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499630; x=1717104430; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=c0o/xdRRLMCeXuCxFpb/wuyXk4/6G5Ivw97ADYBrdzo=; b=FdM0I904E9p5T8o8i6d23xYA469d7ZJeJ/XQmAMKPwPx3V6d8Zevf3dIZYGiKsEvWX d3PRf/PEuiGb280jxAu0b/hJc+Sr8C5lYKTzaezATUZzDE3p0PdbZk3k04QQyhhQQ1zS 0aCCN9J6yJOQwaD8TvSpCL+mQCPZ1nWAYqUiZVp+nFP5OWY95zyYU7FF37mV5eKXQjhT gZCWqZNurF2ZGHEHdcJuwqirnbbGDQcLsHGEgKkzS688AXrrBiz4nwJNAf9CLVqD3mfu ij/0uikdrt6HQ7DAqz565ROGHtDORry7oLJGxbrLShnna9KsL+/NcWiC9Naj15exK63+ iGgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499630; x=1717104430; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=c0o/xdRRLMCeXuCxFpb/wuyXk4/6G5Ivw97ADYBrdzo=; b=umxmx79iahiwFTsqm5+IBeAKWoVKeWZihsa4Q2o5k8wLAAO6KO827Ew9AEVprUYll4 LOHFBLEW7Tx6+nu8jLTIvex+Dm8PPIYQhGWqEMRDJnGcwu0uGJLpQsY+S6hrjysixk03 yVdq0LaeSDPTBL2BAZcxFkfVuR7gLE+1WeCsZUBfh8EpYfN/74Iw8/3H0E2Ea5dJwGLg D6xo8GYwbz+y+A+XrOzHpV0LXmEWHTks700pJrIopub0wjM/jGlw8grsxnrAxF+iNJGd 3pwxcIt/qvQyQkdmOzWV0unh4BrG65yeHj89jGMKSZpphWxZ0OZtlzu1itL9gHIYB0s6 JOkQ== X-Gm-Message-State: AOJu0YwBoDxVY8E8WTAiuBu7r6pYi18oBOEWG+a2hUkvTzDTdOFlwPi8 FPO+8LXxzh5MXgG7zXnjU9S+x8/qSeWhw6S3uWnoIAS6sXN4qAQ65VE8DcPtg+X6WRRH56EFj40 7 X-Google-Smtp-Source: AGHT+IFpJwW8LXan5t4uB+8XyTFQEZblag3qsXaBXr5He88dln6xFWKU5GT3IAkdLfuLbNIyfeCD8Q== X-Received: by 2002:a05:622a:19a9:b0:43a:ffb9:6c65 with SMTP id d75a77b69052e-43fb0d511c9mr3821911cf.0.1716499630165; Thu, 23 May 2024 14:27:10 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb17c9581sm618351cf.23.2024.05.23.14.27.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:09 -0700 (PDT) Date: Thu, 23 May 2024 17:27:08 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Message-ID: <936f6d1b7e392367bb87a755d014633d7171f0ab.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: One of the tests we'll want to add for pseudo-merge bitmaps needs to be able to generate a large number of commits at a specific date. Support the `--notick` option (with identical semantics to the `--notick` option for `test_commit()`) within `test_commit_bulk` as a prerequisite for that. Callers can then set the various _DATE variables themselves. Signed-off-by: Taylor Blau --- t/test-lib-functions.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 862d80c9748..427b375b392 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -458,6 +458,7 @@ test_commit_bulk () { indir=. ref=HEAD n=1 + notick= message='commit %s' filename='%s.t' contents='content %s' @@ -488,6 +489,9 @@ test_commit_bulk () { filename="${1#--*=}-%s.t" contents="${1#--*=} %s" ;; + --notick) + notick=yes + ;; -*) BUG "invalid test_commit_bulk option: $1" ;; @@ -507,7 +511,10 @@ test_commit_bulk () { while test "$total" -gt 0 do - test_tick && + if test -z "$notick" + then + test_tick + fi && echo "commit $ref" printf 'author %s <%s> %s\n' \ "$GIT_AUTHOR_NAME" \ From patchwork Thu May 23 21:27:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672323 Received: from mail-oo1-f54.google.com (mail-oo1-f54.google.com [209.85.161.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06FF412A15B for ; Thu, 23 May 2024 21:27:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499637; cv=none; b=I4dkESr3WOcFlroYCIZT516fVtqJsdpUXewT+S8y+1I46gxjk6x7xH/z2rQXI0ZHjNvjhKLm7wJ4iJVCNCvdHoIuGpI0x5sSmv1ZzVg6Tekah3zC0fTmHfCe/FlV7iItsEKyF/Xv3fZ+m5Ye1GWqjMvK6l/XCYMmqjLvGDSZGNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499637; c=relaxed/simple; bh=qRP+oiA27z+DEjRqdN+Sv9GPKU9FrIuYHQepEt2c+0M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lOqvhMclw9QA2fkBcm2V3fbbjfyWHeg3kEkfjE9uWPb+BTf0DidmOCfiZqyPAPnkMVcjc35DBEA9P1Rt9XfVyQ2kMZOvD5coRmRYtgZBBniMoihh/nGg6melYZ6P+jXF7ok8g0ARPORjD3rilvOL3lQx1D9oJcwcWO6Jyp9gI38= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=TB2TQK8T; arc=none smtp.client-ip=209.85.161.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="TB2TQK8T" Received: by mail-oo1-f54.google.com with SMTP id 006d021491bc7-5b277993a48so4628025eaf.1 for ; Thu, 23 May 2024 14:27:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499634; x=1717104434; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/eS8Z7Pn1NGmbGfaP8tH/BzgdAXCsaOVdg/lPmxIcsE=; b=TB2TQK8TwxRRjC2azlOZFFrumAwPujI1C6sjQ3ODojtaSiZwAmmX/ypAtq+rvIaO2Y /DitmY8JunvrtnJIrJAf/z2A8k/E96Nmk1w6r/j6mk24VUerws2LpTGVs/M8I6FPijKW rkh3gSP0s509FMVSN2VYuec+UQ+P32CSha4h0uOCMmqSoDrPMlMxsPO5uS4nfNU3QjFN rW5zhgqHNsB6Gz7MbOdHNEaTl8U+EYpgevmU67N5EeDoGo3jBcE8+RW/32yvYOrKFuEj 6a+9mPg0MMLkl9uERdbpynVvIdswAxjPvv9HCk3aZlh3IYGdhgwJn1CHfMU8u7Um9Dn2 eadA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499634; x=1717104434; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/eS8Z7Pn1NGmbGfaP8tH/BzgdAXCsaOVdg/lPmxIcsE=; b=le0EVh4Hu2tD8yDZzPnMqAksJkqaxUBtA9DiecTmdp9zLjURlIG+1kvh7g0xhm/4SQ 2N9WOVYJu9BHGPKjNJyoJ+WUroAPPqQiO3XeFAQWZDxBRzmfwRpCsmQfBAL8V+UqLSeZ vVtDZvgLcDTJ1bYixpfocm12QjTJhH02CDfGhMQWnh4nLe/7gPN+V2aCUAbvSDqGH34I Qh5JCVzt7JnYXE9RKhU4xVWMc5Ib3DbRFwgdzCvcMHEPKUMXacWsGAXZEEJROkeJWZrJ m4auOtmEvEoh/M+oAZYzW7KkuwhaRllKalP5tbRcraNtC/0wLziaQ9/Q+rVJn8+YjoIN 8GFg== X-Gm-Message-State: AOJu0YzyKQvQrjZOmQwfLLDRejCKsbZSy3JRG0XCYuqFXMT2IA7kT9uk SCNuutiMhoFA+HC3El8uvU0B0nMq4rbMjyB1WaCxoDoWTSdcgz7No2r+g+7UQgVpP+oUx/FR5tm A X-Google-Smtp-Source: AGHT+IGqpoGCp0YSvD+EZTwD4bnXEWAts2oeCyxqBzTwXf9OK+T+um9IvuUU7OLniqZmx9URXT7kzA== X-Received: by 2002:a05:6358:2825:b0:197:3d6f:cce3 with SMTP id e5c5f4694b2df-197e5101b84mr79757855d.13.1716499633487; Thu, 23 May 2024 14:27:13 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb18c2af7sm525701cf.92.2024.05.23.14.27.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:13 -0700 (PDT) Date: Thu, 23 May 2024 17:27:11 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that all of the groundwork has been laid to support reading and using pseudo-merges, make use of that work in this commit by teaching the pack-bitmap machinery to use pseudo-merge(s) when available during traversal. The basic operation is as follows: - When enumerating objects on either side of a reachability query, first see if any subset of the roots satisfies some pseudo-merge bitmap. If it does, apply that pseudo-merge bitmap. - If any pseudo-merge bitmap(s) were applied in the previous step, OR them into the result[^1]. Then repeat the process over all pseudo-merge bitmaps (we'll refer to this as "cascading" pseudo-merges). Once this is done, OR in the resulting bitmap. - If there is no fill-in traversal to be done, return the bitmap for that side of the reachability query. If there is fill-in traversal, then for each commit we encounter via show_commit(), check to see if any unsatisfied pseudo-merges containing that commit as one of its parents has been made satisfied by the presence of that commit. If so, OR in the object set from that pseudo-merge bitmap, and then cascade. If not, continue traversal. A similar implementation is present in the boundary-based bitmap traversal routines. [^1]: Importantly, we cannot OR in the entire set of roots along with the objects reachable from whatever pseudo-merge bitmaps were satisfied. This may leave some dangling bits corresponding to any unsatisfied root(s) getting OR'd into the resulting bitmap, tricking other parts of the traversal into thinking we already have a reachability closure over those commit(s) when we do not. Signed-off-by: Taylor Blau --- pack-bitmap.c | 112 ++++++++++- t/t5333-pseudo-merge-bitmaps.sh | 328 ++++++++++++++++++++++++++++++++ 2 files changed, 439 insertions(+), 1 deletion(-) create mode 100755 t/t5333-pseudo-merge-bitmaps.sh diff --git a/pack-bitmap.c b/pack-bitmap.c index c13074673af..e61058dada6 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -114,6 +114,9 @@ struct bitmap_index { unsigned int version; }; +static int pseudo_merges_satisfied_nr; +static int pseudo_merges_cascades_nr; + static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) { struct ewah_bitmap *parent; @@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED, { } +static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git, + struct bitmap *result, + struct commit *commit, + uint32_t commit_pos) +{ + int ret; + + ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges, + result, commit, commit_pos); + + if (ret) + pseudo_merges_satisfied_nr += ret; + + return ret; +} + static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, struct commit *commit, @@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, } bitmap_set(data->base, bitmap_pos); + if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit, + bitmap_pos)) + return 0; + return 1; } @@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED, BUG("should not be called"); } +static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git, + struct bitmap *result, + struct bitmap *roots) +{ + int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges, + result, roots); + if (ret) { + pseudo_merges_cascades_nr++; + pseudo_merges_satisfied_nr += ret; + } + + return ret; +} + static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots) @@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, unsigned int i; unsigned int tmp_blobs, tmp_trees, tmp_tags; int any_missing = 0; + int existing_bitmaps = 0; cb.bitmap_git = bitmap_git; cb.base = bitmap_new(); @@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, revs->ignore_missing_links = 1; + if (bitmap_git->pseudo_merges.nr) { + struct bitmap *roots_bitmap = bitmap_new(); + struct object_list *objects = NULL; + + for (objects = roots; objects; objects = objects->next) { + struct object *object = objects->item; + int pos; + + pos = bitmap_position(bitmap_git, &object->oid); + if (pos < 0) + continue; + + bitmap_set(roots_bitmap, pos); + } + + if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap)) + bitmap_free(roots_bitmap); + } + /* * OR in any existing reachability bitmaps among `roots` into * `cb.base`. @@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, continue; if (add_commit_to_bitmap(bitmap_git, &cb.base, - (struct commit *)object)) + (struct commit *)object)) { + existing_bitmaps = 1; continue; + } any_missing = 1; } @@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, if (!any_missing) goto cleanup; + if (existing_bitmaps) + cascade_pseudo_merges_1(bitmap_git, cb.base, NULL); + tmp_blobs = revs->blob_objects; tmp_trees = revs->tree_objects; tmp_tags = revs->blob_objects; @@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, return cb.base; } +static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git) +{ + uint32_t i; + for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) + bitmap_git->pseudo_merges.v[i].satisfied = 0; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, { struct bitmap *base = NULL; int needs_walk = 0; + unsigned existing_bitmaps = 0; struct object_list *not_mapped = NULL; + unsatisfy_all_pseudo_merges(bitmap_git); + + if (bitmap_git->pseudo_merges.nr) { + struct bitmap *roots_bitmap = bitmap_new(); + struct object_list *objects = NULL; + + for (objects = roots; objects; objects = objects->next) { + struct object *object = objects->item; + int pos; + + pos = bitmap_position(bitmap_git, &object->oid); + if (pos < 0) + continue; + + bitmap_set(roots_bitmap, pos); + } + + base = bitmap_new(); + if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap)) + bitmap_free(roots_bitmap); + } + /* * Go through all the roots for the walk. The ones that have bitmaps * on the bitmap index will be `or`ed together to form an initial @@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, */ while (roots) { struct object *object = roots->item; + roots = roots->next; + if (base) { + int pos = bitmap_position(bitmap_git, &object->oid); + if (pos > 0 && bitmap_get(base, pos)) { + object->flags |= SEEN; + continue; + } + } + if (object->type == OBJ_COMMIT && add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { object->flags |= SEEN; + existing_bitmaps = 1; continue; } @@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, roots = not_mapped; + if (existing_bitmaps) + cascade_pseudo_merges_1(bitmap_git, base, NULL); + /* * Let's iterate through all the roots that don't have bitmaps to * check if we can determine them to be reachable from the existing @@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, object_list_free(&wants); object_list_free(&haves); + trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied", + pseudo_merges_satisfied_nr); + trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades", + pseudo_merges_cascades_nr); + return bitmap_git; cleanup: diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh new file mode 100755 index 00000000000..4c9aebcffdc --- /dev/null +++ b/t/t5333-pseudo-merge-bitmaps.sh @@ -0,0 +1,328 @@ +#!/bin/sh + +test_description='pseudo-merge bitmaps' + +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 + +. ./test-lib.sh + +test_pseudo_merges () { + test-tool bitmap dump-pseudo-merges +} + +test_pseudo_merge_commits () { + test-tool bitmap dump-pseudo-merge-commits "$1" +} + +test_pseudo_merges_satisfied () { + test_trace2_data bitmap pseudo_merges_satisfied "$1" +} + +test_pseudo_merges_cascades () { + test_trace2_data bitmap pseudo_merges_cascades "$1" +} + +tag_everything () { + git rev-list --all --no-object-names >in && + perl -lne ' + print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/ + ' expect && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + + test_pseudo_merges_satisfied 0 merges && + test_must_be_empty merges && + test_cmp expect actual +' + +test_expect_success 'pseudo-merges accurately represent their objects' ' + test_config bitmapPseudoMerge.test.pattern "refs/tags/" && + test_config bitmapPseudoMerge.test.maxMerges 8 && + test_config bitmapPseudoMerge.test.stableThreshold never && + + git repack -adb && + + test_pseudo_merges >merges && + test_line_count = 8 merges && + + for i in $(test_seq 0 $(($(wc -l commits && + + git rev-list --objects --no-object-names --stdin expect.raw && + test-tool bitmap dump-pseudo-merge-objects $i >actual.raw && + + sort -u expect && + sort -u actual && + + test_cmp expect actual || return 1 + done +' + +test_expect_success 'bitmap traversal with pseudo-merges' ' + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 8 trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 8 merges && + test_line_count = 1 merges && + test_pseudo_merge_commits 0 >commits && + + test-tool bitmap list-commits >bitmaps && + bitmaps_nr="$(wc -l expect && + + test $(cat expect) -eq $(wc -l merges && + test_line_count = 1 merges && + + test_pseudo_merge_commits 0 >oids && + git cat-file --batch commits && + + test $(wc -l in && + git update-ref --stdin merges && + merges_nr="$(wc -l oids && + git cat-file --batch commits && + + expect="$(grep -c "^committer.*$old +0000$" commits)" && + actual="$(wc -l oids && + git cat-file --batch commits && + test $(wc -l err && + + cat >expect <<-EOF && + fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one + EOF + + test_cmp expect err +' + +test_expect_success 'pseudo-merge pattern with capture groups' ' + git init pseudo-merge-captures && + ( + cd pseudo-merge-captures && + + test_commit_bulk 128 && + tag_everything && + + for r in $(test_seq 8) + do + test_commit_bulk 16 && + + git rev-list HEAD~16.. >in && + + perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" refs && + + test_pseudo_merges >merges && + for m in $(test_seq 0 $(($(wc -l oids && + grep -f oids refs | + perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" | + sort -u || return 1 + done >remotes && + + test $(wc -l merges && + test_line_count = 2 merges && + + test_pseudo_merge_commits 0 >commits-0.raw && + test_pseudo_merge_commits 1 >commits-1.raw && + + sort commits-0.raw >commits-0 && + sort commits-1.raw >commits-1 && + + comm -12 commits-0 commits-1 >overlap && + + test_line_count -gt 0 overlap + ) +' + +test_expect_success 'pseudo-merge overlap traversal' ' + ( + cd pseudo-merge-overlap && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 2 trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 2 X-Patchwork-Id: 13672324 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15EBD12BF14 for ; Thu, 23 May 2024 21:27:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499639; cv=none; b=OYP+ZwaFEFsBQGruoMFdHDIjsTjunCfq9LWwD4z18yuHAb3fDYFnC1GReBfn55SeVNzevx0JTm1sLKkZ2VdukN2MDiqiMMOGmie+ubTXv2jbgZyyT4NEYjzgOla/jwsABLeMKCN+62GHGVie5O4p/fftkhRQKfTtaUdMpVZWUO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499639; c=relaxed/simple; bh=7Ap27KEQ2CF+uaLPKSr20ozuzGNMkqH1p4V3Cegsr2I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ue6ihPsmwXz5kCVt4UPfjndp4ai8uCTG5LU0xySN0AH1DPkjxluU+wH6zZukco3sBaOjSM21NdC7hwBESYVqHNekCZQ0tJaVX+XK8Y8mgd5EYvZ1O+0LdldWbEGfEg73/NyNqYViG0Dl3Vg2uxeMrn6VjlZKsESzBV+1y7Q7HMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=mmuJx35A; arc=none smtp.client-ip=209.85.222.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="mmuJx35A" Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-794977cbd66so97785285a.1 for ; Thu, 23 May 2024 14:27:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499637; x=1717104437; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=a0dxg6n/02ukASUVOiwAGHAHdQCCh1PL8TcayfSPUME=; b=mmuJx35AZ18OMoj2g01vjoKVPwP5+q6MaoM1D0TCUjI8WSby/0zbf7ZflIUAKmo3BV xWFmOlD0ks4H46btpcplnPC1PsfFjvcZgYs47vAs+02vXHM36kOZwOhI1A/YIkn2EHEL H7hKbPIxIu5Gm35ShzoCtKo3u+/8plbXducWhxsqdd7jQv0suTBuyTTZCKBWfybCJSNK stSyHrqdlshZU09koT+XdHoR/LII/3qiMkl7XffAmXoAb/JRCiOd+3qEQDALnZvmivdp 74FjXaSr8psDyyqjkQiVXOlyd4Y+Pe1sPCKYHfrPq2b0GBOYY3tEbdkTOHcDVzwNdyEV HnXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499637; x=1717104437; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=a0dxg6n/02ukASUVOiwAGHAHdQCCh1PL8TcayfSPUME=; b=n950JAeLnmerD1J0S7GfGLTFB8wDjUP6PV/wAZZvGtjGK4RhnRynwXUWTvhSIDPMO0 5UfXLYMfaKpeip4ocMI604vFijN3xhkucvRnIJVk9i19S5OKMhue9ub9DOUBCWJLpFf3 W/BZ9Po7hgDa5cu2XC0c3p4CFipT/Plzs5iJNMoAo9nYFig9zzj+DDwXwg2PzuG0oezI IpMgtR5+TerdIhd3hz3vLLdJRjDxPoYSxj2OowscKSVmNZCz7FBo+sGDJE73qBzAEOs+ SDAGfwAwvS7LLWVd3YjPPthoV/nvbHkocbe9gtCtgoSaqQH8RN+zqOSL9LweSiCnWFua 0JVw== X-Gm-Message-State: AOJu0YyCoEC7JodosF6qMjXg4GhQe2L/yMtoFmI/N3V2QvZTZkSnP808 P+HeodwqkXl4VvCdd1dT0z7XW8eRctLx2QghAZ0RwAcYr1d1AMCmu7i48JMq4ETB1Npag0EuE/B 1 X-Google-Smtp-Source: AGHT+IG9bmzfsp7cAAhsLp3WfcpwKQa6sW39BrP04APkHlOadguHM4IWOzPtzZYqZcWc/+x76pw76g== X-Received: by 2002:a05:620a:3709:b0:792:bf4d:6e8b with SMTP id af79cd13be357-794ab057a84mr53572485a.14.1716499636669; Thu, 23 May 2024 14:27:16 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794abd063f1sm2432785a.72.2024.05.23.14.27.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:16 -0700 (PDT) Date: Thu, 23 May 2024 17:27:15 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 21/24] pack-bitmap: extra trace2 information Message-ID: <9240b06a7d87e6321342a280daa74ed0e0948357.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Add some extra trace2 lines to capture the number of bitmap lookups that are hits versus misses, as well as the number of reachability roots that have bitmap coverage (versus those that do not). Signed-off-by: Taylor Blau --- pack-bitmap.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index e61058dada6..1966b3b95f1 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -116,6 +116,10 @@ struct bitmap_index { static int pseudo_merges_satisfied_nr; static int pseudo_merges_cascades_nr; +static int existing_bitmaps_hits_nr; +static int existing_bitmaps_misses_nr; +static int roots_with_bitmaps_nr; +static int roots_without_bitmaps_nr; static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) { @@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, partial = bitmap_for_commit(bitmap_git, commit); if (partial) { + existing_bitmaps_hits_nr++; + bitmap_or_ewah(data->base, partial); return 0; } + existing_bitmaps_misses_nr++; + bitmap_set(data->base, bitmap_pos); if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit, bitmap_pos)) @@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, { struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); - if (!or_with) + if (!or_with) { + existing_bitmaps_misses_nr++; return 0; + } + + existing_bitmaps_hits_nr++; if (!*base) *base = ewah_to_bitmap(or_with); @@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, object->flags &= ~UNINTERESTING; add_pending_object(revs, object, ""); needs_walk = 1; + + roots_without_bitmaps_nr++; } else { object->flags |= SEEN; + + roots_with_bitmaps_nr++; } } @@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, pseudo_merges_satisfied_nr); trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades", pseudo_merges_cascades_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/hits", + existing_bitmaps_hits_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/misses", + existing_bitmaps_misses_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap", + roots_with_bitmaps_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap", + roots_without_bitmaps_nr); return bitmap_git; From patchwork Thu May 23 21:27:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672325 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56EA912881C for ; Thu, 23 May 2024 21:27:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499642; cv=none; b=aYpcoPlmN/EDVJlpWkzuJbjkdDF2YqG3v9kdGP/ILL+7BDIfIsoQ93jzjPrF4UijvMKhm5jZosBnkcoos8+KWz2CV0MxzocfHDPSC212KjbQHAGQh4Tb4CrSxaZ0rNQ8hu4h1XH9wJfK85OeKRHis+TEWRvF76YVCSUja2d4H9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499642; c=relaxed/simple; bh=rYDApDWeh7xbLW8jFhYmmI2xGsnRwv51iQz+Zyp7h84=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WbB+CGhThpye6mA0vcIpf9O4FnxDyEfO5uLfUE9tRI2ULnzgJrUaUtfULreFO0LAOWuSS/5x22S5lNIFtX50Iv+1sVdGFfJtCOIuCLLHTiyGwG/emFIaIF4705LT+wGCfbInIBjfeXA6riOERZRtgKVfHRNyohZWL8jJa/5CnlM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=f/Gi01Oy; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="f/Gi01Oy" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6ab9d074fd1so1348436d6.1 for ; Thu, 23 May 2024 14:27:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499640; x=1717104440; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=sibx1wvZ5ibj1oa0U6cIkDGfjpB7yz0ftXmuVB2FEgg=; b=f/Gi01OyFZ9NDom6xj0enkBWeZDcPXpUiCnKpEhPny02JgGaViDcQh8eON0vPA/BLQ vn8cFfG2wyroXYr1nJM1jdy6oszYzdrKkH0/jQG8tk4eCicqYp+btG1ebg2VSIRDkgdG pM8vl/fi9wcfCKV8e+pXFSAXATNqWIOzsYF76lr6qz8BTnxUYX4mSpClcE6frmj5zmKE 1LJxmnDxv5doljF5F5dw8NEpUmFC9MaS24MQ1L0MCiaeex2A6MYHjenGKP244WTVHjUb zoK7LF49iHVteO8APmleTnUkw17OW6oqTbVmOeoB7lPeyDkA9UO6fLfpmsRniceUF+S6 My5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499640; x=1717104440; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sibx1wvZ5ibj1oa0U6cIkDGfjpB7yz0ftXmuVB2FEgg=; b=s4c7cVwNE9C3HmlKluPtRpJXh6kPXea8IRHeZKTG+y45mZ8Rzn3+vf8bREfKq9VFjK Ek761TMPOSEi5Gw/8K0259Hlk3lM/ypEfDW+3at1SRNAw5rkk6qfkr8hdNbNK+pudeVU LN1UFWmDCSvtyBmSVOlhNLl/XEOzFLUHtROg8jdV2hN0Q3zybIZrjXKH+GeNZcvg1JgJ v4txhWxuzISOgmWTKqEKpBvMNBm/Wr3YnRaWkDVeGaHMXHbZCtrqxFoGGtyaeekRctN0 v9jEbkbic3GCw3BlHz8eooEPERiJho+Wk4jkduldRc4hQ9rOnYFQ/ZFG05qM6qe0+qrO 6lRg== X-Gm-Message-State: AOJu0YzjKOcigExEc+lTCY00L02ZYHN/wScXATwA5E68KGDzv5+AV8pF kmlDvK/bhEh5tVzC5mIWRngfY75hAyx4peoox+Vxy8t5UdrhH4n8ZgXsyMWwcrbJbDQWYpHpBRC w X-Google-Smtp-Source: AGHT+IE0rHeWGQ1D0G8W8zMggGi9UpXYTk08Ww1nkncdQoOBR+TSu2RyP96iJow0/UcmGHMRrtmyhw== X-Received: by 2002:a05:6214:3385:b0:6aa:42a3:3cde with SMTP id 6a1803df08f44-6abcd10a865mr2544416d6.59.1716499639826; Thu, 23 May 2024 14:27:19 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac162f2356sm567706d6.98.2024.05.23.14.27.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:19 -0700 (PDT) Date: Thu, 23 May 2024 17:27:18 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 22/24] ewah: `bitmap_equals_ewah()` Message-ID: <625596a143229e6a9a8a36d2b0eaa480ac204809.1716499565.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to reuse existing pseudo-merge bitmaps by implementing a `bitmap_equals_ewah()` helper. This helper will be used to see if a raw bitmap (containing the set of parents for some pseudo-merge) is equal to any existing pseudo-merge's commits bitmap (which are stored as EWAH-compressed bitmaps on disk). Signed-off-by: Taylor Blau --- ewah/bitmap.c | 19 +++++++++++++++++++ ewah/ewok.h | 1 + 2 files changed, 20 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index dc2ca190f12..55928dada86 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other) +{ + struct ewah_iterator it; + eword_t word; + size_t i = 0; + + ewah_iterator_init(&it, other); + + while (ewah_iterator_next(&word, &it)) + if (word != (i < self->word_alloc ? self->words[i++] : 0)) + return 0; + + for (; i < self->word_alloc; i++) + if (self->words[i]) + return 0; + + return 1; +} + int bitmap_is_subset(struct bitmap *self, struct bitmap *other) { size_t common_size, i; diff --git a/ewah/ewok.h b/ewah/ewok.h index 7074a6347b7..5e357e24933 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other); /* * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set From patchwork Thu May 23 21:27:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672326 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E8EC128815 for ; Thu, 23 May 2024 21:27:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499646; cv=none; b=WQMtzSzy5HEc+9v0tCdLaWXV6XnpwLnuJPOu0wJhi4MoyqMfI8i4ZWBDEkXMQmKmR/TtO06rvSzmiBchfDau3eBkjhwQM/f9fN949DD+5OlHzhAqDh7IF1NVM2cPjEPMwNcCxK1m42q0GEhdohAoJAM2P67EZUqVnsQABBXFvGk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499646; c=relaxed/simple; bh=Xgchw+wWvUJ9WKXUY2npplaN2u6jmWHfGMxlRls0dJo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=K7Pasbr0GSbfHJqSu/CPqvFxYn1AFmLneunr5dC1Eb63HZ6HcNula5qNIBYaepCzzR00vnvBOd6eVBFZEdOEquu4kPMQodTkKtCMkfaQsU1TaKo1vT57lNrQkj6j6keJFvRsUAB14damlOdFz08VLpLIViYlN7fwJhRBRkts9us= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=k0bHfmmv; arc=none smtp.client-ip=209.85.219.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="k0bHfmmv" Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6ab9d3d878bso1495146d6.2 for ; Thu, 23 May 2024 14:27:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499643; x=1717104443; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PGA3CyB6R21f41hGDNuBRGL4cv0Ua/xc38xQOAcLZr0=; b=k0bHfmmvfpxw3nHI9QBkwIuVasLdvw+E0mSfrCibZ3f0CQKWL/k+sZtgCkSi4C/jiB nmBXgR2RdVW+IVEJSRYCzSFrmziGTCLLi4daVrCsDOnf/svkR/rzcJ1ENjPH6TABXMY3 Cm0q3rtx/gxTGWIq98ZcPeaqIIzqRgIGR9j/JiiBv4DXppLChjK82t6l67k3VxbBve8n LpUvhWGwRAtMx9joG9zHOHQyaGX5+x2zIqQBxX+UIqPSuW48Z4M0JdViXegGIjCtK/Ti ypanuSkk3u+JiC5qgLavTkKap4dIgMk070Hhb69lwuyGkEOVDrS4oha/dMbHwyuwM87/ 9i3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499643; x=1717104443; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PGA3CyB6R21f41hGDNuBRGL4cv0Ua/xc38xQOAcLZr0=; b=SPWHGrfEdhIxPro4K6eKOlHwDyhoKhxMTDvljV3I1//8PXzWLClpC2q0yNwCp3Wy3j 64S69pLaPWN73UIW2ceMnMiIwAqxO1D0l9HW14h84/rjRuc+WE+7yam/BOVvGZjHVsWX LMt49TX4RVJZXJ5qL/Qas6YOWOC7bYJ2amQ4U0Mxk+kIIPfYtxShL+bn7W6NAERCynAF X1yj4kGCcnmRTfbh6D/G744S6UfQu/qplH6jIUy42rM/RflEyyvqtHtVGCx/CeVM9MWa lwK1R3CNAPYwySjdPpaffVkim0LexrOGV2jFUbGVIYOWLxsYNPhkk1N0NUyN6fvHFSJE zENw== X-Gm-Message-State: AOJu0Yzorz3SI+DKtFOz+B5aCvbZdXi+/5qLGzv8I9QC9jdojj0u+nIu DHiWVuOXw8A+bt3bKwCwYMZ/QAnqUWeNQwRzaMpporqfTgd2vye9HLfyBfpfeZfj36LQNP5Vlil / X-Google-Smtp-Source: AGHT+IFZFRSwwSzPSCUme2bVysuiRbNfNk7Q7IUBrDb/SBPRkoHc8uGoadIkLcM1YphODyBvvXVl7A== X-Received: by 2002:a05:6214:4909:b0:6ab:9266:e899 with SMTP id 6a1803df08f44-6abccfa2d81mr2672616d6.43.1716499642951; Thu, 23 May 2024 14:27:22 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac070f0b83sm615206d6.54.2024.05.23.14.27.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:22 -0700 (PDT) Date: Thu, 23 May 2024 17:27:21 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: This patch implements support for reusing existing pseudo-merge commits when writing bitmaps when there is an existing pseudo-merge bitmap which has exactly the same set of parents as one that we are about to write. Note that unstable pseudo-merges are likely to change between consecutive repacks, and so are generally poor candidates for reuse. However, stable pseudo-merges (see the configuration option 'bitmapPseudoMerge..stableThreshold') are by definition unlikely to change between runs (as they represent long-running branches). Because there is no index from a *set* of pseudo-merge parents to a matching pseudo-merge bitmap, we have to construct the bitmap corresponding to the set of parents for each pending pseudo-merge commit and see if a matching bitmap exists. This is technically quadratic in the number of pseudo-merges, but is OK in practice for a couple of reasons: - non-matching pseudo-merge bitmaps are rejected quickly as soon as they differ in a single bit - already-matched pseudo-merge bitmaps are discarded from subsequent rounds of search - the number of pseudo-merges is generally small, even for large repositories In order to do this, implement (a) a function that finds a matching pseudo-merge given some uncompressed bitset describing its parents, (b) a function that computes the bitset of parents for a given pseudo-merge commit, and (c) call that function before computing the set of reachable objects for some pending pseudo-merge. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 15 +++++++- pack-bitmap.c | 32 ++++++++++++++++ pack-bitmap.h | 2 + pseudo-merge.c | 55 ++++++++++++++++++++++++++++ pseudo-merge.h | 7 ++++ t/t5333-pseudo-merge-bitmaps.sh | 65 +++++++++++++++++++++++++++++++++ 6 files changed, 174 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 47250398aa2..6e8060f8a0b 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -19,6 +19,10 @@ #include "tree-walk.h" #include "pseudo-merge.h" #include "oid-array.h" +#include "config.h" +#include "alloc.h" +#include "refs.h" +#include "strmap.h" struct bitmapped_commit { struct commit *commit; @@ -465,6 +469,7 @@ static int fill_bitmap_tree(struct bitmap_writer *writer, } static int reused_bitmaps_nr; +static int reused_pseudo_merge_bitmaps_nr; static int fill_bitmap_commit(struct bitmap_writer *writer, struct bb_commit *ent, @@ -490,7 +495,7 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, struct bitmap *remapped = bitmap_new(); if (commit->object.flags & BITMAP_PSEUDO_MERGE) - old = NULL; + old = pseudo_merge_bitmap_for_commit(old_bitmap, c); else old = bitmap_for_commit(old_bitmap, c); /* @@ -501,7 +506,10 @@ static int fill_bitmap_commit(struct bitmap_writer *writer, if (old && !rebuild_bitmap(mapping, old, remapped)) { bitmap_or(ent->bitmap, remapped); bitmap_free(remapped); - reused_bitmaps_nr++; + if (commit->object.flags & BITMAP_PSEUDO_MERGE) + reused_pseudo_merge_bitmaps_nr++; + else + reused_bitmaps_nr++; continue; } bitmap_free(remapped); @@ -631,6 +639,9 @@ int bitmap_writer_build(struct bitmap_writer *writer, the_repository); trace2_data_intmax("pack-bitmap-write", the_repository, "building_bitmaps_reused", reused_bitmaps_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "building_bitmaps_pseudo_merge_reused", + reused_pseudo_merge_bitmaps_nr); stop_progress(&writer->progress); diff --git a/pack-bitmap.c b/pack-bitmap.c index 1966b3b95f1..70230e26479 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, return cb.base; } +struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + struct commit_list *p; + struct bitmap *parents; + struct pseudo_merge *match = NULL; + + if (!bitmap_git->pseudo_merges.nr) + return NULL; + + parents = bitmap_new(); + + for (p = commit->parents; p; p = p->next) { + int pos = bitmap_position(bitmap_git, &p->item->object.oid); + if (pos < 0 || pos >= bitmap_num_objects(bitmap_git)) + goto done; + + bitmap_set(parents, pos); + } + + match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, + parents); + +done: + bitmap_free(parents); + if (match) + return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match); + + return NULL; +} + static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git) { uint32_t i; @@ -2809,6 +2840,7 @@ void free_bitmap_index(struct bitmap_index *b) */ close_midx_revindex(b->midx); } + free_pseudo_merge_map(&b->pseudo_merges); free(b); } diff --git a/pack-bitmap.h b/pack-bitmap.h index 4466b5ad0fb..1171e6d9893 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -142,6 +142,8 @@ int rebuild_bitmap(const uint32_t *reposition, struct bitmap *dest); struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, struct commit *commit); +struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct bitmap_writer *writer, struct commit **indexed_commits, unsigned int indexed_commits_nr); diff --git a/pseudo-merge.c b/pseudo-merge.c index 7d131011497..a117520996c 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -699,3 +699,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm, return ret; } + +struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm, + struct bitmap *parents) +{ + struct pseudo_merge *match = NULL; + size_t i; + + if (!pm->nr) + return NULL; + + /* + * NOTE: this loop is quadratic in the worst-case (where no + * matching pseudo-merge bitmaps are found), but in practice + * this is OK for a few reasons: + * + * - Rejecting pseudo-merge bitmaps that do not match the + * given commit is done quickly (i.e. `bitmap_equals_ewah()` + * returns early when we know the two bitmaps aren't equal. + * + * - Already matched pseudo-merge bitmaps (which we track with + * the `->satisfied` bit here) are skipped as potential + * candidates. + * + * - The number of pseudo-merges should be small (in the + * hundreds for most repositories). + * + * If in the future this semi-quadratic behavior does become a + * problem, another approach would be to keep track of which + * pseudo-merges are still "viable" after enumerating the + * pseudo-merge commit's parents: + * + * - A pseudo-merge bitmap becomes non-viable when the bit(s) + * corresponding to one or more parent(s) of the given + * commit are not set in a candidate pseudo-merge's commits + * bitmap. + * + * - After processing all bits, enumerate the remaining set of + * viable pseudo-merge bitmaps, and check that their + * popcount() matches the number of parents in the given + * commit. + */ + for (i = 0; i < pm->nr; i++) { + struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]); + if (!candidate || candidate->satisfied) + continue; + if (!bitmap_equals_ewah(parents, candidate->commits)) + continue; + + match = candidate; + match->satisfied = 1; + break; + } + + return match; +} diff --git a/pseudo-merge.h b/pseudo-merge.h index 755edc054ae..2aca01d0566 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -206,4 +206,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm, struct bitmap *result, struct bitmap *roots); +/* + * Returns a pseudo-merge which contains the exact set of commits + * listed in the "parents" bitamp, or NULL if none could be found. + */ +struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm, + struct bitmap *parents); + #endif diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh index 4c9aebcffdc..f052f395a77 100755 --- a/t/t5333-pseudo-merge-bitmaps.sh +++ b/t/t5333-pseudo-merge-bitmaps.sh @@ -22,6 +22,10 @@ test_pseudo_merges_cascades () { test_trace2_data bitmap pseudo_merges_cascades "$1" } +test_pseudo_merges_reused () { + test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1" +} + tag_everything () { git rev-list --all --no-object-names >in && perl -lne ' @@ -325,4 +329,65 @@ test_expect_success 'pseudo-merge overlap stale traversal' ' ) ' +test_expect_success 'pseudo-merge reuse' ' + git init pseudo-merge-reuse && + ( + cd pseudo-merge-reuse && + + stable="1641013200" && # 2022-01-01 + unstable="1672549200" && # 2023-01-01 + + GIT_COMMITTER_DATE="$stable +0000" && + export GIT_COMMITTER_DATE && + test_commit_bulk --notick 128 && + GIT_COMMITTER_DATE="$unstable +0000" && + export GIT_COMMITTER_DATE && + test_commit_bulk --notick 128 && + + tag_everything && + + git \ + -c bitmapPseudoMerge.test.pattern="refs/tags/" \ + -c bitmapPseudoMerge.test.maxMerges=1 \ + -c bitmapPseudoMerge.test.threshold=now \ + -c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \ + -c bitmapPseudoMerge.test.stableSize=512 \ + repack -adb && + + test_pseudo_merges >merges && + test_line_count = 2 merges && + + test_pseudo_merge_commits 0 >stable-oids.before && + test_pseudo_merge_commits 1 >unstable-oids.before && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt git \ + -c bitmapPseudoMerge.test.pattern="refs/tags/" \ + -c bitmapPseudoMerge.test.maxMerges=2 \ + -c bitmapPseudoMerge.test.threshold=now \ + -c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \ + -c bitmapPseudoMerge.test.stableSize=512 \ + repack -adb && + + test_pseudo_merges_reused 1 merges && + test_line_count = 3 merges && + + test_pseudo_merge_commits 0 >stable-oids.after && + for i in 1 2 + do + test_pseudo_merge_commits $i || return 1 + done >unstable-oids.after && + + sort -u expect && + sort -u actual && + test_cmp expect actual && + + sort -u expect && + sort -u actual && + test_cmp expect actual + ) +' + test_done From patchwork Thu May 23 21:27:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13672327 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B122912BF29 for ; Thu, 23 May 2024 21:27:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499649; cv=none; b=J8JzKynBH6/9tTCopM4Qyb+cKv/qO6N4/6zQniC4nEj3avvFs1+v4p69xI6QEtVVDFpWX5Dx72T2GDV9eg8kuHHaPuLnaShaOw9H6fEkj7JlXmgwsG/kL6ur6e3NYY7veSz3WjBf7NMDbZD30v1bYJ58MV4EJoAYJM2aUWNMkxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716499649; c=relaxed/simple; bh=1JlysWEi5+vieexP99dbIwxjrsXiZxm9+0moyQr3wh0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rcuEaOI2eFNW/F58SdQdSeresrqqW4gX+N9s3FtFBFK3tT0d02p6T/XMqlq+bIU9a1Z41x+IW2h0GXE15+lDy0CbNXt3vyUNsg+Gx1Uz7ZGWbyQEURZPfPQSHYRkQCgeQAoMbbtgHP+VzrlryW9RonZXg03Q3jPqXxERR/3Nsj0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=silnn9Zd; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="silnn9Zd" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-43fb05b1ef2so1180661cf.1 for ; Thu, 23 May 2024 14:27:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1716499646; x=1717104446; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0yLArB5HF63q5EyTyW3jWNMwIvlmnQ1EZW92cuvMM2o=; b=silnn9ZdyXiNquQMaL91JKc40wmrONHFuyryikDCBgSZqwHJ3qbhBplZrFzsfP3WHY e6zRE/B+LLaSnjjraW54gk9W/QlD56VJcQ19ulRzNP1+gmDMjFiEjd/HUDhuNyCHMsu8 KK/YgrVU78yoAHy790b6Mu8NBNMPkNQsW3PY9uRiCaaQlgBxZsuFMCcwVM7K4p8Dezl2 fYT8AKZJhe313/zGAfcQfxMsyqN7lA95OvZ+/ZuEdYEGZ8a3skYLifM7dHqGljI/OBVs FaIii+yBEqUlDlOXRTXZetW6BCEHyMZOvpU3pFoAAIGs8umqN+tPb//5ppAQh6Hcx1i/ X7OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716499646; x=1717104446; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0yLArB5HF63q5EyTyW3jWNMwIvlmnQ1EZW92cuvMM2o=; b=UYXXFaTmBZyCs9ZI7ETjjmU9Y/XzhJ6gOuzUbvxw7B+70dLYs39zyq9aAx7OWZMjUe MpVAKgkEK3r/OYzsEBbTtaWpa3bEyuu0GpSMN3vK997oW3T7UMxzOxGhK59xoo4hJwD7 P2QPQg9ZGT0dsBZ3otSE79QPBazZZHT8+huuLjDDcyRzt/ixB9F8huDTASWTGFnO4xeV WFRZfTLu69h3VBObIQcwCmX9bSDzzcn3WRFaH/sDhOje9GScJqttEyrLYegzV0vPdRu6 dmf1FrqYWUyNhW1q2o6lY/394WzEdeAdHkiCSEMQdGXKuNlWzaLSeTKSO8wh4CVhLNxE FWUQ== X-Gm-Message-State: AOJu0Yyh+aHfLlkfa3DnOEUm1uvL25Q4/YEzvsLkMk/LIl4Ki8cR/jXf Bca8SQaIECMHSHs5rwao1ldDhYusppNNs3eENgEVMi3mf8WsQgStHizMgP19ZtD3j9gf/GfRGNi L X-Google-Smtp-Source: AGHT+IEcNFZ6BV9ZoOk42Gue5MYmYCqhTLTOntbW67nH1NRPmRrKLy7SY0Zh7YHik24h4sZ/bjYKSg== X-Received: by 2002:a05:622a:491:b0:43a:bcd7:9898 with SMTP id d75a77b69052e-43fb0e74c23mr4264831cf.5.1716499646194; Thu, 23 May 2024 14:27:26 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43fb17f3bfasm601911cf.37.2024.05.23.14.27.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 14:27:25 -0700 (PDT) Date: Thu, 23 May 2024 17:27:24 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement a straightforward performance test demonstrating the benefit of pseudo-merge bitmaps by measuring how long it takes to count reachable objects in a few different scenarios: - without bitmaps, to demonstrate a reasonable baseline - with bitmaps, but without pseudo-merges - with bitmaps and pseudo-merges Results from running this test on git.git are as follows: Test this tree ----------------------------------------------------------------------------------- 5333.2: git rev-list --count --all --objects (no bitmaps) 3.54(3.45+0.08) 5333.3: git rev-list --count --all --objects (no pseudo-merges) 0.43(0.40+0.03) 5333.4: git rev-list --count --all --objects (with pseudo-merges) 0.12(0.11+0.01) On a private repository which is much larger, and has many spikey parts of history that aren't merged into the 'master' branch, the results are as follows: Test this tree --------------------------------------------------------------------------------------- 5333.1: git rev-list --count --all --objects (no bitmaps) 122.29(121.31+0.97) 5333.2: git rev-list --count --all --objects (no pseudo-merges) 21.88(21.30+0.58) 5333.3: git rev-list --count --all --objects (with pseudo-merges) 5.05(4.77+0.28) Signed-off-by: Taylor Blau --- t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh new file mode 100755 index 00000000000..2e8b1d2635e --- /dev/null +++ b/t/perf/p5333-pseudo-merge-bitmaps.sh @@ -0,0 +1,32 @@ +#!/bin/sh + +test_description='pseudo-merge bitmaps' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git \ + -c bitmapPseudoMerge.all.pattern="refs/" \ + -c bitmapPseudoMerge.all.threshold=now \ + -c bitmapPseudoMerge.all.stableThreshold=never \ + -c bitmapPseudoMerge.all.maxMerges=64 \ + -c pack.writeBitmapLookupTable=true \ + repack -adb +' + +test_perf 'git rev-list --count --all --objects (no bitmaps)' ' + git rev-list --objects --all +' + +test_perf 'git rev-list --count --all --objects (no pseudo-merges)' ' + GIT_TEST_USE_PSEUDO_MERGES=0 \ + git rev-list --objects --all --use-bitmap-index +' + +test_perf 'git rev-list --count --all --objects (with pseudo-merges)' ' + GIT_TEST_USE_PSEUDO_MERGES=1 \ + git rev-list --objects --all --use-bitmap-index +' + +test_done