Is this bug in JRE regexp implementation?

Clash Royale CLAN TAG#URR8PPP
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false, it should not match.
Problem is that call to method matcher.matches() never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
add a comment |Â
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false, it should not match.
Problem is that call to method matcher.matches() never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
Replacew+withw. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
â Wiktor Stribià ¼ew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse. Instead, the program never terminates.
â Petr JaneÃÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++(notice additional + at the end).
â Pshemo
2 hours ago
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false, it should not match.
Problem is that call to method matcher.matches() never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false, it should not match.
Problem is that call to method matcher.matches() never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
java regex
asked 3 hours ago
rkosegi
8,39053461
8,39053461
Replacew+withw. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
â Wiktor Stribià ¼ew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse. Instead, the program never terminates.
â Petr JaneÃÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++(notice additional + at the end).
â Pshemo
2 hours ago
add a comment |Â
Replacew+withw. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
â Wiktor Stribià ¼ew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse. Instead, the program never terminates.
â Petr JaneÃÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++(notice additional + at the end).
â Pshemo
2 hours ago
Replace
w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"â Wiktor Stribià ¼ew
3 hours ago
Replace
w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"â Wiktor Stribià ¼ew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
2
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false. Instead, the program never terminates.â Petr JaneÃÂek
2 hours ago
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false. Instead, the program never terminates.â Petr JaneÃÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++ (notice additional + at the end).â Pshemo
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++ (notice additional + at the end).â Pshemo
2 hours ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.
Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.
Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
add a comment |Â
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.
Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
add a comment |Â
up vote
9
down vote
accepted
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.
Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).
The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.
Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).
edited 2 hours ago
answered 2 hours ago
Wiktor Stribià ¼ew
295k16119191
295k16119191
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
add a comment |Â
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
â Michael Berry
2 hours ago
1
1
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
Thanks for optimization hints. Now it behaves as expected.
â rkosegi
2 hours ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53004478%2fis-this-bug-in-jre-regexp-implementation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password

Replace
w+withw. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"â Wiktor Stribià ¼ew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
â Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false. Instead, the program never terminates.â Petr JaneÃÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
â Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++(notice additional + at the end).â Pshemo
2 hours ago