mbox series

[0/3] improve performance of pcre matches

Message ID 20230123014047.84911-1-carenas@gmail.com (mailing list archive)
Headers show
Series improve performance of pcre matches | expand

Message

Carlo Marcelo Arenas Belón Jan. 23, 2023, 1:40 a.m. UTC
The following series optimizes the way PCRE2 matches are done by using
more efficiently the match_data resources and shows over 30% reduction
in cpu utilization while only using 1/5 of the memory as shown in the
bug report[1] that started it all.

The first patch is just an administrative change that I found useful
while building and testing the code with a development version of PCRE2.

The second patch reverts the workaround merged in 30b3e9d2 (libselinux:
Workaround for heap overhead of pcre, 2023-01-12) while addressing the
increased memory utilization that it was trying to prevent by changing
the way the match_data required for each match was being handled.

The last patch changes the single threaded codepath to allow for a similar
performance improvement done to the multithreaded codepath by intentionally
leaking one match_data that could be reused for all matches.

Carlo Marcelo Arenas Belón (3):
  scripts: respect an initial LD_LIBRARY_PATH with env_use_destdir
  libselinux: improve performance with pcre matches
  libselinux: use a static match_data if single threaded

 libselinux/src/regex.c            | 111 +++++++++++++++++-------------
 libselinux/src/selinux_internal.h |   4 ++
 scripts/env_use_destdir           |   8 ++-
 3 files changed, 73 insertions(+), 50 deletions(-)

[1] https://github.com/PCRE2Project/pcre2/issues/194

Comments

Stephen Smalley July 21, 2023, 4:16 p.m. UTC | #1
On Sun, Jan 22, 2023 at 8:46 PM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
>
> The following series optimizes the way PCRE2 matches are done by using
> more efficiently the match_data resources and shows over 30% reduction
> in cpu utilization while only using 1/5 of the memory as shown in the
> bug report[1] that started it all.
>
> The first patch is just an administrative change that I found useful
> while building and testing the code with a development version of PCRE2.
>
> The second patch reverts the workaround merged in 30b3e9d2 (libselinux:
> Workaround for heap overhead of pcre, 2023-01-12) while addressing the
> increased memory utilization that it was trying to prevent by changing
> the way the match_data required for each match was being handled.
>
> The last patch changes the single threaded codepath to allow for a similar
> performance improvement done to the multithreaded codepath by intentionally
> leaking one match_data that could be reused for all matches.
>
> Carlo Marcelo Arenas Belón (3):
>   scripts: respect an initial LD_LIBRARY_PATH with env_use_destdir
>   libselinux: improve performance with pcre matches
>   libselinux: use a static match_data if single threaded
>
>  libselinux/src/regex.c            | 111 +++++++++++++++++-------------
>  libselinux/src/selinux_internal.h |   4 ++
>  scripts/env_use_destdir           |   8 ++-
>  3 files changed, 73 insertions(+), 50 deletions(-)
>
> [1] https://github.com/PCRE2Project/pcre2/issues/194

Sorry for the delay in responding. Looking at AOSP, I only see the 2nd
patch in this series applied, not the third one. Checking to see
whether the 3rd patch is still desired/needed.