[0/6] Importing and exporting stashes to refs

Message ID	20220310173236.4165310-1-sandals@crustytoothpaste.net (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> From: "brian m. carlson" <sandals@crustytoothpaste.net> To: <git@vger.kernel.org> Cc: Junio C Hamano <gitster@pobox.com>, Derrick Stolee <dstolee@gmail.com>, Thomas Gummerer <t.gummerer@gmail.com> Subject: [PATCH 0/6] Importing and exporting stashes to refs Date: Thu, 10 Mar 2022 17:32:30 +0000 Message-Id: <20220310173236.4165310-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Importing and exporting stashes to refs \| expand [0/6] Importing and exporting stashes to refs [1/6] builtin/stash: factor out generic function to look up stash info [2/6] builtin/stash: fill in all commit data [3/6] object-name: make get_oid quietly return an error [4/6] builtin/stash: provide a way to export stashes to a ref [5/6] builtin/stash: provide a way to import stashes from a ref [6/6] doc: add stash export and import to docs

Message ID

20220310173236.4165310-1-sandals@crustytoothpaste.net (mailing list archive)

Headers

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
        Derrick Stolee <dstolee@gmail.com>,
        Thomas Gummerer <t.gummerer@gmail.com>
Subject: [PATCH 0/6] Importing and exporting stashes to refs
Date: Thu, 10 Mar 2022 17:32:30 +0000
Message-Id: <20220310173236.4165310-1-sandals@crustytoothpaste.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Importing and exporting stashes to refs | expand

Message

brian m. carlson March 10, 2022, 5:32 p.m. UTC

Stashes are currently stored using the reflog in a given repository.
This is an interesting and novel way to handle them, but there is no way
to easily move a stash across machines.  For example, stashes cannot be
bundled, pushed, or fetched.

This is suboptimal for a lot of reasons.  First, there is a recent push
towards ephemeral development environments, but many users make heavy
use of their stashes and wish to persist them long term[0].  Additionally,
it would be convenient to share a snapshot of in-progress work with a
colleague or a collaborator on a project.  And finally, many users wish
to sync their in-progress state across machines, and we currently have
no good way to do so, so they often do dangerous things like using cloud
syncing services for their repositories.

Let's solve this problem by allowing users to import and export stashes
to a chain of commits.  The commits used in a stash export are nearly
identical to those used in the stashes, with one notable change: the
first parent of a stash is a pointer to the previous stash, or an empty
commit if there is no previous stash.  All of the other parents used in
the stash commit are present following it in their normal order.

This allows users to write their exported stashes to a single ref and
then push that ref to a remote or to bundle it for easy transport, and
then fetch it on the receiving side.  It also permits saving the index
and even untracked files and syncing them across machines, unlike
temporary commits.

We intentionally attempt to exactly round-trip commits between stashes,
although we don't do so for the exported data due to the base commit not
having identical timestamps.  Preserving the commits exactly lets us
more efficiently test our code and it also permits users to more easily
determine if they have the same data.

The tooling here is intentionally plumbing.  It's designed to be simple
and functional and get the basic job done.  If we want additional
features, we can add them in the future, but this should be a simple,
basic feature set that can support additional uses.

[0] For example, the present author has 124 stash entries in his
repository for this project.

brian m. carlson (6):
  builtin/stash: factor out generic function to look up stash info
  builtin/stash: fill in all commit data
  object-name: make get_oid quietly return an error
  builtin/stash: provide a way to export stashes to a ref
  builtin/stash: provide a way to import stashes from a ref
  doc: add stash export and import to docs

 Documentation/git-stash.txt |  27 +++
 builtin/stash.c             | 359 +++++++++++++++++++++++++++++++++---
 cache.h                     |  21 ++-
 object-name.c               |   6 +-
 t/t3903-stash.sh            |  52 ++++++
 5 files changed, 431 insertions(+), 34 deletions(-)

Comments

Junio C Hamano March 10, 2022, 7:14 p.m. UTC | #1

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> ...  The commits used in a stash export are nearly
> identical to those used in the stashes, with one notable change: the
> first parent of a stash is a pointer to the previous stash, or an empty
> commit if there is no previous stash.  All of the other parents used in
> the stash commit are present following it in their normal order.
> ...
> We intentionally attempt to exactly round-trip commits between stashes,
> although we don't do so for the exported data due to the base commit not
> having identical timestamps.  Preserving the commits exactly lets us
> more efficiently test our code and it also permits users to more easily
> determine if they have the same data.

Hmph, out of reflog entries stash@{0}, stash@{1}, stash@{3}, if we
create a chain of commits A, B, C such that

	A^2 = B, A^1 = stash@{0}
	B^2 = C, B^1 = stash@{1}
	         C^1 = stash@{2}

then the original stash entry commits can be recreated identically,
and after you export the stash as "A", you can "import" from it
without creating any new commit to represent the stash entries, no?

When we create A, if we use a predictable commit log message and
the same author/committer ident as A^1 (i.e. stash@{0}), and do it
the same for B and C, then no matter who exports the stash and at
which time, we'd get an identical result, I would presume.

> The tooling here is intentionally plumbing.  It's designed to be simple
> and functional and get the basic job done.  If we want additional
> features, we can add them in the future, but this should be a simple,
> basic feature set that can support additional uses.

Sounds sensible.

brian m. carlson March 10, 2022, 9:04 p.m. UTC | #2

On 2022-03-10 at 19:14:59, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > ...  The commits used in a stash export are nearly
> > identical to those used in the stashes, with one notable change: the
> > first parent of a stash is a pointer to the previous stash, or an empty
> > commit if there is no previous stash.  All of the other parents used in
> > the stash commit are present following it in their normal order.
> > ...
> > We intentionally attempt to exactly round-trip commits between stashes,
> > although we don't do so for the exported data due to the base commit not
> > having identical timestamps.  Preserving the commits exactly lets us
> > more efficiently test our code and it also permits users to more easily
> > determine if they have the same data.
> 
> Hmph, out of reflog entries stash@{0}, stash@{1}, stash@{3}, if we
> create a chain of commits A, B, C such that
> 
> 	A^2 = B, A^1 = stash@{0}
> 	B^2 = C, B^1 = stash@{1}
> 	         C^1 = stash@{2}
> 
> then the original stash entry commits can be recreated identically,
> and after you export the stash as "A", you can "import" from it
> without creating any new commit to represent the stash entries, no?

True, that's an alternative approach.  Mine has the nice ability that
you can see the items in the stash with log --first-parent, which I
found to be useful in my testing.  We could of course change yours
to have that property as well by reversing the order, but then the last
item in the chain would have a base commit or a different pattern.

Yours does have the nice ability that we can see the actual original
stash commits as well.

> When we create A, if we use a predictable commit log message and
> the same author/committer ident as A^1 (i.e. stash@{0}), and do it
> the same for B and C, then no matter who exports the stash and at
> which time, we'd get an identical result, I would presume.

True.

I do want to preserve my nice --first-parent property.  What I propose
to do is this: I'll take your approach and reverse the parents to
preserve the --first-parent chain and synthesize a predictable root
commit based on the fake ID information we use for stashes when nobody's
provided any.

Junio C Hamano March 10, 2022, 9:38 p.m. UTC | #3

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I do want to preserve my nice --first-parent property.  What I propose
> to do is this: I'll take your approach and reverse the parents to
> preserve the --first-parent chain and synthesize a predictable root
> commit based on the fake ID information we use for stashes when nobody's
> provided any.

I am wondering if this can be made not an export format but a new
mechanism to store stashes that we use without having to export and
import.  Capping the series of "stash entry" commits with an extra
commit that is continuously amended, and recording which stash entry
has already been used (and not to be shown) etc., in the log message
part of that commit, would give us "stash drop" without rewriting
all the history and would easily bring us to feature parity with the
reflog based implementation, I would hope?

brian m. carlson March 10, 2022, 10:42 p.m. UTC | #4

On 2022-03-10 at 21:38:42, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > I do want to preserve my nice --first-parent property.  What I propose
> > to do is this: I'll take your approach and reverse the parents to
> > preserve the --first-parent chain and synthesize a predictable root
> > commit based on the fake ID information we use for stashes when nobody's
> > provided any.
> 
> I am wondering if this can be made not an export format but a new
> mechanism to store stashes that we use without having to export and
> import.  Capping the series of "stash entry" commits with an extra
> commit that is continuously amended, and recording which stash entry
> has already been used (and not to be shown) etc., in the log message
> part of that commit, would give us "stash drop" without rewriting
> all the history and would easily bring us to feature parity with the
> reflog based implementation, I would hope?

I had thought of providing a different stash format via
extensions.stashFormat or something instead of this.  The problem becomes
backward compatibility: we essentially can't use stashes with older
versions, and a very common use case in development is to work inside a
Docker container or such where the version of Git is whatever the OS
happened to ship, so such a change can be painful.  It also impedes
working with the repository using libgit2, which is heavily used in some
environments (e.g., the Rust toolchain), until someone ports that change
there.

I'm not opposed to seeing such a format change where refs/stash itself
becomes pushable via a format change, but for my present use case, I
don't want to do that right now.  It's possible that I might write such
a series in the future, or someone else could write it, but I won't be
writing it at the present moment.