Message ID | 20240108120029.38816-1-cgzones@googlemail.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Petr Lautrbach |
Headers | show |
Series | [RFC] libselinux: disable capturing in fcontext matching | expand |
On Mon, Jan 8, 2024 at 7:00 AM Christian Göttsche <cgzones@googlemail.com> wrote: > > The path of a file context definition is compared as a regular > expression against actual pathnames. Those definitions make frequently > use of groups, like `(/.*)?`, which are capturing by default, causing > the regex engine to extract and save the matched input. Matching > context definitions against pathnames only cares about whether it's a > match or not, potential captures are never accessed. > > Compile regular expressions (in the default PCRE2 variant) with the flag > PCRE2_NO_AUTO_CAPTURE to turn captured groups automatically into non > captured ones, like `(/.*)?` into `(?:/.*)?`. This saves some cycles > during lookup operations (~1.5%). > > Only potential regression would be the advanced usage of backreferences > or recursion/subroutine calls to numbered captures, which would need an > update to use named captures instead. > I don't think a ~1.5% improvement is going to be worth the potential for regressions. I don't know of any policy that makes use of capture groups, but there probably is someone somewhere who does. That being said, if others are strongly for it, I can be persuaded. Jim > Signed-off-by: Christian Göttsche <cgzones@googlemail.com> > --- > libselinux/src/regex.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c > index 88d82fed..87423c48 100644 > --- a/libselinux/src/regex.c > +++ b/libselinux/src/regex.c > @@ -80,7 +80,7 @@ int regex_prepare_data(struct regex_data **regex, char const *pattern_string, > return -1; > > (*regex)->regex = pcre2_compile( > - (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL, > + (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL | PCRE2_NO_AUTO_CAPTURE, > &errordata->error_code, &errordata->error_offset, NULL); > if (!(*regex)->regex) { > goto err; > -- > 2.43.0 > >
diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c index 88d82fed..87423c48 100644 --- a/libselinux/src/regex.c +++ b/libselinux/src/regex.c @@ -80,7 +80,7 @@ int regex_prepare_data(struct regex_data **regex, char const *pattern_string, return -1; (*regex)->regex = pcre2_compile( - (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL, + (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL | PCRE2_NO_AUTO_CAPTURE, &errordata->error_code, &errordata->error_offset, NULL); if (!(*regex)->regex) { goto err;
The path of a file context definition is compared as a regular expression against actual pathnames. Those definitions make frequently use of groups, like `(/.*)?`, which are capturing by default, causing the regex engine to extract and save the matched input. Matching context definitions against pathnames only cares about whether it's a match or not, potential captures are never accessed. Compile regular expressions (in the default PCRE2 variant) with the flag PCRE2_NO_AUTO_CAPTURE to turn captured groups automatically into non captured ones, like `(/.*)?` into `(?:/.*)?`. This saves some cycles during lookup operations (~1.5%). Only potential regression would be the advanced usage of backreferences or recursion/subroutine calls to numbered captures, which would need an update to use named captures instead. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> --- libselinux/src/regex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)