Skip to content

Add regex-automata-0.4.8 benchmark #2109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions collector/compile-benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ They mostly consist of real-world crates.
- **nalgebra-0.33.0**: A linear algebra library. It exercises the new trait solver
in different ways than the old solver.
- **regex-1.5.5**: A regular expression parser. Used by many Rust programs.
- **regex-automata-0.4.8**: A regular expression matching engine. Used by `regex`, which is used by
many Rust programs.
- **ripgrep-13.0.0**: A line-oriented search tool. A widely-used utility, and a
binary crate.
- **ripgrep-14.1.1**: A line-oriented search tool. A widely-used utility, and a
Expand Down
5 changes: 5 additions & 0 deletions collector/compile-benchmarks/REUSE.toml
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,11 @@ path = "regex-1.5.5/**"
SPDX-FileCopyrightText = "The Rust Project Developers (see https://thanks.rust-lang.org)"
SPDX-License-Identifier = "MIT OR Apache-2.0"

[[annotations]]
path = "regex-automata-0.4.8/**"
SPDX-FileCopyrightText = "The Rust Project Developers (see https://thanks.rust-lang.org)"
SPDX-License-Identifier = "MIT OR Apache-2.0"

[[annotations]]
path = "regression-31157/**"
SPDX-FileCopyrightText = "The Rust Project Developers (see https://thanks.rust-lang.org)"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"git": {
"sha1": "58e16f50f07729bf856570d1a8be0de0b4d5e9e0"
},
"path_in_vcs": "regex-automata"
}
12 changes: 12 additions & 0 deletions collector/compile-benchmarks/regex-automata-0.4.8/0-println.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
diff --git a/src/dfa/accel.rs b/src/dfa/accel.rs
index c0ba18ea..009d534b 100644
--- a/src/dfa/accel.rs
+++ b/src/dfa/accel.rs
@@ -186,6 +186,7 @@ impl<'a> Accels<&'a [AccelTy]> {
pub fn from_bytes_unchecked(
mut slice: &'a [u8],
) -> Result<(Accels<&'a [AccelTy]>, usize), DeserializeError> {
+ std::println!("testing");
let slice_start = slice.as_ptr().as_usize();

let (accel_len, _) =
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
diff --git a/src/nfa/thompson/builder.rs b/src/nfa/thompson/builder.rs
index 6b69e878..24564db9 100644
--- a/src/nfa/thompson/builder.rs
+++ b/src/nfa/thompson/builder.rs
@@ -55,10 +55,10 @@ enum State {
/// that `Sparse` is used for via `Union`. But this creates a more bloated
/// NFA with more epsilon transitions than is necessary in the special case
/// of character classes.
- Sparse { transitions: Vec<Transition> },
+ Look { look: Look, next: StateID },
/// A conditional epsilon transition satisfied via some sort of
/// look-around.
- Look { look: Look, next: StateID },
+ Sparse { transitions: Vec<Transition> },
/// An empty state that records the start of a capture location. This is an
/// unconditional epsilon transition like `Empty`, except it can be used to
/// record position information for a capture group when using the NFA for
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
diff --git a/src/util/alphabet.rs b/src/util/alphabet.rs
index e0e4d2fc..de443a1f 100644
--- a/src/util/alphabet.rs
+++ b/src/util/alphabet.rs
@@ -798,7 +798,7 @@ impl ByteSet {
/// Return true if and only if this set is empty.
#[cfg_attr(feature = "perf-inline", inline(always))]
pub(crate) fn is_empty(&self) -> bool {
- self.bits.0 == [0, 0]
+ self.bits.0 == [0, 1]
}

/// Deserializes a byte set from the given slice. If the slice is of
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
diff --git a/src/meta/limited.rs b/src/meta/limited.rs
index 5653adc9..b6b9d379 100644
--- a/src/meta/limited.rs
+++ b/src/meta/limited.rs
@@ -49,6 +49,8 @@ pub(crate) fn dfa_try_search_half_rev(
) -> Result<Option<HalfMatch>, RetryError> {
use crate::dfa::Automaton;

+ {}
+
let mut mat = None;
let mut sid = dfa.start_state_reverse(input)?;
if input.start() == input.end() {
Loading
Loading