mbox series

[0/3] wildmatch: fix exponential behavior

Message ID cover.1679328580.git.phillip.wood@dunelm.org.uk (mailing list archive)
Headers show
Series wildmatch: fix exponential behavior | expand

Message

Phillip Wood March 20, 2023, 4:09 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

The wildmatch implementation in git suffers from exponential behavior as
described in [1] where the time taken for a failing match is exponential
in the number of wildcards it contains. The original implementation
imported from rsync is immune but the optimizations introduced by [2.3]
failed to prevent unnecessary backtracking when handling '*' and '/**/'.

This bug was were discussed on the security list and the conclusion was
that it only affects operations that are already potential DoS vectors.

In the long term it would be nice to get rid of the recursion in the
wildmatch() code but the patches here focus on a minimal fix.

This series is based on maint. Unfortunately it conflicts with
my/wildmatch-cleanups when merged with seen. There are sematic
conflicts with the removal of dowild() in  e303cf8092 (wildmatch:
more cleanups after killing uchar, 2023-02-26) as well as textual
conflicts around the change of uchar->char.

[1] https://research.swtch.com/glob
[2] 6f1a31f0aa (wildmatch: advance faster in <asterisk> + <literal> patterns, 2013-01-01)
[3] 46983441ae (wildmatch: make a special case for "*/" with FNM_PATHNAME, 2013-01-01)

Published-As: https://github.com/phillipwood/git/releases/tag/wildmatch-fixes%2Fv1
View-Changes-At: https://github.com/phillipwood/git/compare/73876f486...a74ab7138
Fetch-It-Via: git fetch https://github.com/phillipwood/git wildmatch-fixes/v1

Phillip Wood (3):
  wildmatch: fix exponential behavior
  wildmatch: avoid undefined behavior
  wildmatch: hide internal return values

 t/t3070-wildmatch.sh |  9 +++++++++
 wildmatch.c          | 23 ++++++++++++++++-------
 wildmatch.h          |  2 --
 3 files changed, 25 insertions(+), 9 deletions(-)

Comments

Junio C Hamano March 20, 2023, 5:58 p.m. UTC | #1
Phillip Wood <phillip.wood123@gmail.com> writes:

> This series is based on maint. Unfortunately it conflicts with
> my/wildmatch-cleanups when merged with seen. There are sematic
> conflicts with the removal of dowild() in  e303cf8092 (wildmatch:
> more cleanups after killing uchar, 2023-02-26) as well as textual
> conflicts around the change of uchar->char.

Thanks.  What's not in 'next' are fair game to break and force
reroll ;-)

> Phillip Wood (3):
>   wildmatch: fix exponential behavior
>   wildmatch: avoid undefined behavior
>   wildmatch: hide internal return values
Derrick Stolee March 23, 2023, 2:19 p.m. UTC | #2
On 3/20/2023 12:09 PM, Phillip Wood wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> The wildmatch implementation in git suffers from exponential behavior as
> described in [1] where the time taken for a failing match is exponential
> in the number of wildcards it contains. The original implementation
> imported from rsync is immune but the optimizations introduced by [2.3]
> failed to prevent unnecessary backtracking when handling '*' and '/**/'.
> 
> This bug was were discussed on the security list and the conclusion was
> that it only affects operations that are already potential DoS vectors.
> 
> In the long term it would be nice to get rid of the recursion in the
> wildmatch() code but the patches here focus on a minimal fix.

Thanks for these changes. The patches look good to me.

I particularly appreciate that there is a regression test to avoid
this accidentally happening again in the future. The two second
timeout is a reasonable balance between "not taking too long" and
"will not be flaky, assuming the code is correct". I could imagine
that it might _pass_ unexpectedly if it runs on fast-enough hardware,
but that's not a huge concern right now. CI machines are not normally
powered significantly more than a typical developer machine.

Thanks,
-Stolee
Phillip Wood March 24, 2023, 2:04 p.m. UTC | #3
Hi Stolee

On 23/03/2023 14:19, Derrick Stolee wrote:
> On 3/20/2023 12:09 PM, Phillip Wood wrote:
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> The wildmatch implementation in git suffers from exponential behavior as
>> described in [1] where the time taken for a failing match is exponential
>> in the number of wildcards it contains. The original implementation
>> imported from rsync is immune but the optimizations introduced by [2.3]
>> failed to prevent unnecessary backtracking when handling '*' and '/**/'.
>>
>> This bug was were discussed on the security list and the conclusion was
>> that it only affects operations that are already potential DoS vectors.
>>
>> In the long term it would be nice to get rid of the recursion in the
>> wildmatch() code but the patches here focus on a minimal fix.
> 
> Thanks for these changes. The patches look good to me.
> 
> I particularly appreciate that there is a regression test to avoid
> this accidentally happening again in the future. The two second
> timeout is a reasonable balance between "not taking too long" and
> "will not be flaky, assuming the code is correct". I could imagine
> that it might _pass_ unexpectedly if it runs on fast-enough hardware,
> but that's not a huge concern right now. CI machines are not normally
> powered significantly more than a typical developer machine.

Thanks for taking the time to look at these again and for prompting me 
to add the regression test in the first place.

Best Wishes

Phillip

> Thanks,
> -Stolee