Message ID | 20250320-b4-pks-mingw-lockfile-flake-v2-2-a84c90cfc6c2@pks.im (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL` | expand |
Hi Patrick, On Thu, 20 Mar 2025, Patrick Steinhardt wrote: > In our CI systems we can observe that t0610 fails rather frequently. > This testcase races a bunch of git-update-ref(1) processes with one > another which are all trying to update a unique reference, where we > expect that all processes succeed and end up updating the reftable > stack. The error message in this case looks like the following: > > fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error > > Instrumenting the code with a couple of calls to `BUG()` in relevant > sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover > that this error is caused when calling `flock_acquire()`, which is a > thin wrapper around our lockfile API. Curiously, the error code we get > in such cases is `EACCESS`, indicating that we are not allowed to access > the file. > > The root cause of this is an oddity of `CreateFileW()`, which is what > `_wopen()` uses internally. Quoting its documentation [1]: > > If you call CreateFile on a file that is pending deletion as a > result of a previous call to DeleteFile, the function fails. The > operating system delays file deletion until all handles to the file > are closed. GetLastError returns ERROR_ACCESS_DENIED. > > This behaviour is triggered quite often in the above testcase because > all the processes race with one another trying to acquire the lock for > the "tables.list" file. This is due to how locking works in the reftable > library when compacting a stack: > > 1. Lock the "tables.list" file and reads its contents. > > 2. Decide which tables to compact. > > 3. Lock each of the individual tables that we are about to compact. > > 4. Unlock the "tables.list" file. > > 5. Compact the individual tables into one large table. > > 6. Re-lock the "tables.list" file. > > 7. Write the new list of tables into it. > > 8. Commit the "tables.list" file. > > The important step is (4): we don't commit the file directly by renaming > it into place, but instead we delete the lockfile so that concurrent > processes can continue to append to the reftable stack while we compact > the tables. And because we use `DeleteFileW()` to do so, we may now race > with another process that wants to acquire that lockfile. So if we are > unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected > `ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to > handle and thus it will bail out without retrying to acquire the lock. > > In theory, the issue is not limited to the reftable library and can be > triggered by every other user of the lockfile subsystem, as well. My gut > feeling tells me it's rather unlikely to surface elsewhere though. > > Fix the issue by translating the error to `EEXIST`. This makes the > lockfile subsystem handle the error correctly: in case a timeout is set > it will now retry acquiring the lockfile until the timeout has expired. > > With this, t0610 is now always passing on my machine whereas it was > previously failing in around 20-30% of all test runs. > > [1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew Couldn't we simply handle `EACCES` the same way as `EEXIST` in step 4? This suggestion is different from v1, which would have affected all callers of `mingw_open()`. The reason I ask is that `RtlGetLastNtStatus()` is undocumented, and should therefore not be used. I know that I will be tasked with removing that call should it be introduced into Git's source code, and naturally I'd like to avoid that. I know that e.g. PostgreSQL used this undocumented function at least at some stage, but SQLite avoided it by introducing a simple poll strategy. We could also do that, but if there is already code in the reftable library that skips doing things if a `.lock` file exists, then doing the same if the `.lock` file cannot be created, too, should be a safe argument to make. Ciao, Johannes > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > compat/mingw.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/compat/mingw.c b/compat/mingw.c > index f524c54d06d..50c80b1b750 100644 > --- a/compat/mingw.c > +++ b/compat/mingw.c > @@ -21,6 +21,9 @@ > #include "gettext.h" > #define SECURITY_WIN32 > #include <sspi.h> > +#include <winternl.h> > + > +#define STATUS_DELETE_PENDING ((NTSTATUS) 0xC0000056) > > #define HCAST(type, handle) ((type)(intptr_t)handle) > > @@ -621,6 +624,8 @@ int mingw_open (const char *filename, int oflags, ...) > wchar_t wfilename[MAX_PATH]; > open_fn_t open_fn; > > + DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NTAPI, RtlGetLastNtStatus, void); > + > va_start(args, oflags); > mode = va_arg(args, int); > va_end(args); > @@ -644,6 +649,21 @@ int mingw_open (const char *filename, int oflags, ...) > > fd = open_fn(wfilename, oflags, mode); > > + /* > + * Internally, `_wopen()` uses the `CreateFile()` API with CREATE_NEW, > + * which may error out with ERROR_ACCESS_DENIED and an NtStatus of > + * STATUS_DELETE_PENDING when the file is scheduled for deletion via > + * `DeleteFileW()`. The file essentially exists, so we map errno to > + * EEXIST instead of EACCESS so that callers don't have to special-case > + * this. > + * > + * This fixes issues for example with the lockfile interface when one > + * process has a lock that it is about to commit or release while > + * another process wants to acquire it. > + */ > + if (fd < 0 && create && GetLastError() == ERROR_ACCESS_DENIED && > + INIT_PROC_ADDR(RtlGetLastNtStatus) && RtlGetLastNtStatus() == STATUS_DELETE_PENDING) > + errno = EEXIST; > if (fd < 0 && (oflags & O_ACCMODE) != O_RDONLY && errno == EACCES) { > DWORD attrs = GetFileAttributesW(wfilename); > if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY)) > > -- > 2.49.0.472.ge94155a9ec.dirty > >
On Wed, Mar 26, 2025 at 01:20:12PM +0100, Johannes Schindelin wrote: > On Thu, 20 Mar 2025, Patrick Steinhardt wrote: > > In our CI systems we can observe that t0610 fails rather frequently. > > This testcase races a bunch of git-update-ref(1) processes with one > > another which are all trying to update a unique reference, where we > > expect that all processes succeed and end up updating the reftable > > stack. The error message in this case looks like the following: > > > > fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error > > > > Instrumenting the code with a couple of calls to `BUG()` in relevant > > sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover > > that this error is caused when calling `flock_acquire()`, which is a > > thin wrapper around our lockfile API. Curiously, the error code we get > > in such cases is `EACCESS`, indicating that we are not allowed to access > > the file. > > > > The root cause of this is an oddity of `CreateFileW()`, which is what > > `_wopen()` uses internally. Quoting its documentation [1]: > > > > If you call CreateFile on a file that is pending deletion as a > > result of a previous call to DeleteFile, the function fails. The > > operating system delays file deletion until all handles to the file > > are closed. GetLastError returns ERROR_ACCESS_DENIED. > > > > This behaviour is triggered quite often in the above testcase because > > all the processes race with one another trying to acquire the lock for > > the "tables.list" file. This is due to how locking works in the reftable > > library when compacting a stack: > > > > 1. Lock the "tables.list" file and reads its contents. > > > > 2. Decide which tables to compact. > > > > 3. Lock each of the individual tables that we are about to compact. > > > > 4. Unlock the "tables.list" file. > > > > 5. Compact the individual tables into one large table. > > > > 6. Re-lock the "tables.list" file. > > > > 7. Write the new list of tables into it. > > > > 8. Commit the "tables.list" file. > > > > The important step is (4): we don't commit the file directly by renaming > > it into place, but instead we delete the lockfile so that concurrent > > processes can continue to append to the reftable stack while we compact > > the tables. And because we use `DeleteFileW()` to do so, we may now race > > with another process that wants to acquire that lockfile. So if we are > > unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected > > `ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to > > handle and thus it will bail out without retrying to acquire the lock. > > > > In theory, the issue is not limited to the reftable library and can be > > triggered by every other user of the lockfile subsystem, as well. My gut > > feeling tells me it's rather unlikely to surface elsewhere though. > > > > Fix the issue by translating the error to `EEXIST`. This makes the > > lockfile subsystem handle the error correctly: in case a timeout is set > > it will now retry acquiring the lockfile until the timeout has expired. > > > > With this, t0610 is now always passing on my machine whereas it was > > previously failing in around 20-30% of all test runs. > > > > [1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew > > Couldn't we simply handle `EACCES` the same way as `EEXIST` in step 4? > > This suggestion is different from v1, which would have affected all > callers of `mingw_open()`. Yeah, but it basically has the same problem: we cannot tell whether EACCESS is caused by the race or by insufficient privileges. So the behaviour would be more self-contained, but it would still not be correct in the same way as it would be incorrect in `mingw_open()`. We do want to retry locking the file in case we raced, but when EACCESS is raised due to insufficient permissions we don't. > The reason I ask is that `RtlGetLastNtStatus()` is undocumented, and > should therefore not be used. I know that I will be tasked with removing > that call should it be introduced into Git's source code, and naturally > I'd like to avoid that. Unfortunate, but fair enough. It's quite surprising that it is not possible to figure out the exact status code without relying on undocumented functions. > I know that e.g. PostgreSQL used this undocumented function at least at > some stage, but SQLite avoided it by introducing a simple poll strategy. > We could also do that, but if there is already code in the reftable > library that skips doing things if a `.lock` file exists, then doing the > same if the `.lock` file cannot be created, too, should be a safe argument > to make. I did stumble over the PostgreSQL patch at one point indeed, yeah. Thanks for the pointer to SQLite. It indeed has the following snippet: #define winIoerrCanRetry1(a) (((a)==ERROR_ACCESS_DENIED) || \ ((a)==ERROR_SHARING_VIOLATION) || \ ((a)==ERROR_LOCK_VIOLATION) || \ ((a)==ERROR_DEV_NOT_EXIST) || \ ((a)==ERROR_NETNAME_DELETED) || \ ((a)==ERROR_SEM_TIMEOUT) || \ ((a)==ERROR_NETWORK_UNREACHABLE)) The function gets used via `winRetryIoerr()`, which is used in various I/O functions to retry the operation, including `winOpen()` to open or create a file. And it indeed uses a rather simple polling system there where it sleeps for 25ms up to 10 times. This certainly is something we could implement in `mingw_open()`: when we see that `CreateFileW()` has returned any of the above errors we simply retry the operation. It wouldn't fix the race itself, but it would hopefully make it less likely to hit. If you would be okay with such a solution I can implement it. Also, one thing to note: this problem isn't caused by the reftable library, it's caused by the lockfile subsystem. So if we don't want to do this in `mingw_open()`, any self-contained fix should go into the lockfile system, not into the reftable library, because we may hit the same symptoms anywhere else where we race around creation/deletion of a lockfile. We just happen to hit this case in the reftable library because the test is intentionally stress-testing and racing this code path. Patrick
Hi Patrick, On Fri, 28 Mar 2025, Patrick Steinhardt wrote: > On Wed, Mar 26, 2025 at 01:20:12PM +0100, Johannes Schindelin wrote: > > On Thu, 20 Mar 2025, Patrick Steinhardt wrote: > > > In our CI systems we can observe that t0610 fails rather frequently. > > > This testcase races a bunch of git-update-ref(1) processes with one > > > another which are all trying to update a unique reference, where we > > > expect that all processes succeed and end up updating the reftable > > > stack. The error message in this case looks like the following: > > > > > > fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error > > > > > > Instrumenting the code with a couple of calls to `BUG()` in relevant > > > sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover > > > that this error is caused when calling `flock_acquire()`, which is a > > > thin wrapper around our lockfile API. Curiously, the error code we get > > > in such cases is `EACCESS`, indicating that we are not allowed to access > > > the file. > > > > > > The root cause of this is an oddity of `CreateFileW()`, which is what > > > `_wopen()` uses internally. Quoting its documentation [1]: > > > > > > If you call CreateFile on a file that is pending deletion as a > > > result of a previous call to DeleteFile, the function fails. The > > > operating system delays file deletion until all handles to the file > > > are closed. GetLastError returns ERROR_ACCESS_DENIED. > > > > > > This behaviour is triggered quite often in the above testcase because > > > all the processes race with one another trying to acquire the lock for > > > the "tables.list" file. This is due to how locking works in the reftable > > > library when compacting a stack: > > > > > > 1. Lock the "tables.list" file and reads its contents. > > > > > > 2. Decide which tables to compact. > > > > > > 3. Lock each of the individual tables that we are about to compact. > > > > > > 4. Unlock the "tables.list" file. > > > > > > 5. Compact the individual tables into one large table. > > > > > > 6. Re-lock the "tables.list" file. > > > > > > 7. Write the new list of tables into it. > > > > > > 8. Commit the "tables.list" file. > > > > > > The important step is (4): we don't commit the file directly by renaming > > > it into place, but instead we delete the lockfile so that concurrent > > > processes can continue to append to the reftable stack while we compact > > > the tables. And because we use `DeleteFileW()` to do so, we may now race > > > with another process that wants to acquire that lockfile. So if we are > > > unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected > > > `ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to > > > handle and thus it will bail out without retrying to acquire the lock. > > > > > > In theory, the issue is not limited to the reftable library and can be > > > triggered by every other user of the lockfile subsystem, as well. My gut > > > feeling tells me it's rather unlikely to surface elsewhere though. > > > > > > Fix the issue by translating the error to `EEXIST`. This makes the > > > lockfile subsystem handle the error correctly: in case a timeout is set > > > it will now retry acquiring the lockfile until the timeout has expired. > > > > > > With this, t0610 is now always passing on my machine whereas it was > > > previously failing in around 20-30% of all test runs. > > > > > > [1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew > > > > Couldn't we simply handle `EACCES` the same way as `EEXIST` in step 4? > > > > This suggestion is different from v1, which would have affected all > > callers of `mingw_open()`. > > Yeah, but it basically has the same problem: we cannot tell whether > EACCESS is caused by the race or by insufficient privileges. So the > behaviour would be more self-contained, but it would still not be > correct in the same way as it would be incorrect in `mingw_open()`. We > do want to retry locking the file in case we raced, but when EACCESS is > raised due to insufficient permissions we don't. My thinking was that in the call chain from reftable, it seems that the failure to lock the file is less bad than in other cases, that's why I asked whether that might be a layer where we could add a work-around without resorting to undocumented functions. > > The reason I ask is that `RtlGetLastNtStatus()` is undocumented, and > > should therefore not be used. I know that I will be tasked with removing > > that call should it be introduced into Git's source code, and naturally > > I'd like to avoid that. > > Unfortunate, but fair enough. It's quite surprising that it is not > possible to figure out the exact status code without relying on > undocumented functions. Indeed. > > I know that e.g. PostgreSQL used this undocumented function at least at > > some stage, but SQLite avoided it by introducing a simple poll strategy. > > We could also do that, but if there is already code in the reftable > > library that skips doing things if a `.lock` file exists, then doing the > > same if the `.lock` file cannot be created, too, should be a safe argument > > to make. > > I did stumble over the PostgreSQL patch at one point indeed, yeah. > > Thanks for the pointer to SQLite. It indeed has the following snippet: > > #define winIoerrCanRetry1(a) (((a)==ERROR_ACCESS_DENIED) || \ > ((a)==ERROR_SHARING_VIOLATION) || \ > ((a)==ERROR_LOCK_VIOLATION) || \ > ((a)==ERROR_DEV_NOT_EXIST) || \ > ((a)==ERROR_NETNAME_DELETED) || \ > ((a)==ERROR_SEM_TIMEOUT) || \ > ((a)==ERROR_NETWORK_UNREACHABLE)) > > The function gets used via `winRetryIoerr()`, which is used in various > I/O functions to retry the operation, including `winOpen()` to open or > create a file. And it indeed uses a rather simple polling system there > where it sleeps for 25ms up to 10 times. > > This certainly is something we could implement in `mingw_open()`: when > we see that `CreateFileW()` has returned any of the above errors we > simply retry the operation. It wouldn't fix the race itself, but it > would hopefully make it less likely to hit. If you would be okay with > such a solution I can implement it. > > Also, one thing to note: this problem isn't caused by the reftable > library, it's caused by the lockfile subsystem. So if we don't want to > do this in `mingw_open()`, any self-contained fix should go into the > lockfile system, not into the reftable library, because we may hit the > same symptoms anywhere else where we race around creation/deletion of a > lockfile. We just happen to hit this case in the reftable library > because the test is intentionally stress-testing and racing this code > path. As I mentioned, I had hoped that we could address this at another layer. But let's move forward with the `RtlGetLastNtStatus()` solution because, as you correctly pointed out, it is the only solution so far that lets Git determine precisely whether the underlying problem is a pending delete. I had only one remaining concern: If `RtlGetLastNtStatus()` has not yet been initialized, would we not potentially overwrite the last NTSTATUS while initializing it? And the answer I can give to myself is: unlikely. The `ntdll` is already loaded, so there won't be an update to the `NTSTATUS` there, likewise the `GetProcAddress()` call won't fail and hence also not update it. So let's go ahead with v2! Ciao, Johannes
On Fri, Mar 28, 2025 at 04:41:05PM +0100, Johannes Schindelin wrote: > On Fri, 28 Mar 2025, Patrick Steinhardt wrote: > > On Wed, Mar 26, 2025 at 01:20:12PM +0100, Johannes Schindelin wrote: > > > I know that e.g. PostgreSQL used this undocumented function at least at > > > some stage, but SQLite avoided it by introducing a simple poll strategy. > > > We could also do that, but if there is already code in the reftable > > > library that skips doing things if a `.lock` file exists, then doing the > > > same if the `.lock` file cannot be created, too, should be a safe argument > > > to make. > > > > I did stumble over the PostgreSQL patch at one point indeed, yeah. > > > > Thanks for the pointer to SQLite. It indeed has the following snippet: > > > > #define winIoerrCanRetry1(a) (((a)==ERROR_ACCESS_DENIED) || \ > > ((a)==ERROR_SHARING_VIOLATION) || \ > > ((a)==ERROR_LOCK_VIOLATION) || \ > > ((a)==ERROR_DEV_NOT_EXIST) || \ > > ((a)==ERROR_NETNAME_DELETED) || \ > > ((a)==ERROR_SEM_TIMEOUT) || \ > > ((a)==ERROR_NETWORK_UNREACHABLE)) > > > > The function gets used via `winRetryIoerr()`, which is used in various > > I/O functions to retry the operation, including `winOpen()` to open or > > create a file. And it indeed uses a rather simple polling system there > > where it sleeps for 25ms up to 10 times. > > > > This certainly is something we could implement in `mingw_open()`: when > > we see that `CreateFileW()` has returned any of the above errors we > > simply retry the operation. It wouldn't fix the race itself, but it > > would hopefully make it less likely to hit. If you would be okay with > > such a solution I can implement it. > > > > Also, one thing to note: this problem isn't caused by the reftable > > library, it's caused by the lockfile subsystem. So if we don't want to > > do this in `mingw_open()`, any self-contained fix should go into the > > lockfile system, not into the reftable library, because we may hit the > > same symptoms anywhere else where we race around creation/deletion of a > > lockfile. We just happen to hit this case in the reftable library > > because the test is intentionally stress-testing and racing this code > > path. > > As I mentioned, I had hoped that we could address this at another layer. > > But let's move forward with the `RtlGetLastNtStatus()` solution because, > as you correctly pointed out, it is the only solution so far that lets Git > determine precisely whether the underlying problem is a pending delete. > > I had only one remaining concern: If `RtlGetLastNtStatus()` has not yet > been initialized, would we not potentially overwrite the last NTSTATUS > while initializing it? And the answer I can give to myself is: unlikely. > The `ntdll` is already loaded, so there won't be an update to the > `NTSTATUS` there, likewise the `GetProcAddress()` call won't fail and > hence also not update it. > > So let's go ahead with v2! Great, thanks a lot for your expertise and guidance! Patrick
diff --git a/compat/mingw.c b/compat/mingw.c index f524c54d06d..50c80b1b750 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -21,6 +21,9 @@ #include "gettext.h" #define SECURITY_WIN32 #include <sspi.h> +#include <winternl.h> + +#define STATUS_DELETE_PENDING ((NTSTATUS) 0xC0000056) #define HCAST(type, handle) ((type)(intptr_t)handle) @@ -621,6 +624,8 @@ int mingw_open (const char *filename, int oflags, ...) wchar_t wfilename[MAX_PATH]; open_fn_t open_fn; + DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NTAPI, RtlGetLastNtStatus, void); + va_start(args, oflags); mode = va_arg(args, int); va_end(args); @@ -644,6 +649,21 @@ int mingw_open (const char *filename, int oflags, ...) fd = open_fn(wfilename, oflags, mode); + /* + * Internally, `_wopen()` uses the `CreateFile()` API with CREATE_NEW, + * which may error out with ERROR_ACCESS_DENIED and an NtStatus of + * STATUS_DELETE_PENDING when the file is scheduled for deletion via + * `DeleteFileW()`. The file essentially exists, so we map errno to + * EEXIST instead of EACCESS so that callers don't have to special-case + * this. + * + * This fixes issues for example with the lockfile interface when one + * process has a lock that it is about to commit or release while + * another process wants to acquire it. + */ + if (fd < 0 && create && GetLastError() == ERROR_ACCESS_DENIED && + INIT_PROC_ADDR(RtlGetLastNtStatus) && RtlGetLastNtStatus() == STATUS_DELETE_PENDING) + errno = EEXIST; if (fd < 0 && (oflags & O_ACCMODE) != O_RDONLY && errno == EACCES) { DWORD attrs = GetFileAttributesW(wfilename); if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))
In our CI systems we can observe that t0610 fails rather frequently. This testcase races a bunch of git-update-ref(1) processes with one another which are all trying to update a unique reference, where we expect that all processes succeed and end up updating the reftable stack. The error message in this case looks like the following: fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error Instrumenting the code with a couple of calls to `BUG()` in relevant sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover that this error is caused when calling `flock_acquire()`, which is a thin wrapper around our lockfile API. Curiously, the error code we get in such cases is `EACCESS`, indicating that we are not allowed to access the file. The root cause of this is an oddity of `CreateFileW()`, which is what `_wopen()` uses internally. Quoting its documentation [1]: If you call CreateFile on a file that is pending deletion as a result of a previous call to DeleteFile, the function fails. The operating system delays file deletion until all handles to the file are closed. GetLastError returns ERROR_ACCESS_DENIED. This behaviour is triggered quite often in the above testcase because all the processes race with one another trying to acquire the lock for the "tables.list" file. This is due to how locking works in the reftable library when compacting a stack: 1. Lock the "tables.list" file and reads its contents. 2. Decide which tables to compact. 3. Lock each of the individual tables that we are about to compact. 4. Unlock the "tables.list" file. 5. Compact the individual tables into one large table. 6. Re-lock the "tables.list" file. 7. Write the new list of tables into it. 8. Commit the "tables.list" file. The important step is (4): we don't commit the file directly by renaming it into place, but instead we delete the lockfile so that concurrent processes can continue to append to the reftable stack while we compact the tables. And because we use `DeleteFileW()` to do so, we may now race with another process that wants to acquire that lockfile. So if we are unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected `ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to handle and thus it will bail out without retrying to acquire the lock. In theory, the issue is not limited to the reftable library and can be triggered by every other user of the lockfile subsystem, as well. My gut feeling tells me it's rather unlikely to surface elsewhere though. Fix the issue by translating the error to `EEXIST`. This makes the lockfile subsystem handle the error correctly: in case a timeout is set it will now retry acquiring the lockfile until the timeout has expired. With this, t0610 is now always passing on my machine whereas it was previously failing in around 20-30% of all test runs. [1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew Signed-off-by: Patrick Steinhardt <ps@pks.im> --- compat/mingw.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)