Is this bug in JRE regexp implementation?
Clash Royale CLAN TAG#URR8PPP
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl
against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false
, it should not match.
Problem is that call to method matcher.matches()
never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
add a comment |Â
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl
against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false
, it should not match.
Problem is that call to method matcher.matches()
never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
Replacew+
withw
. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse
. Instead, the program never terminates.
– Petr JaneÄÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++
(notice additional + at the end).
– Pshemo
2 hours ago
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl
against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false
, it should not match.
Problem is that call to method matcher.matches()
never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl
against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang
Expected result if false
, it should not match.
Problem is that call to method matcher.matches()
never returns.
Is this bug in JRE regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld_
I'm using
java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
java regex
java regex
asked 3 hours ago


rkosegi
8,39053461
8,39053461
Replacew+
withw
. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse
. Instead, the program never terminates.
– Petr JaneÄÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++
(notice additional + at the end).
– Pshemo
2 hours ago
add a comment |Â
Replacew+
withw
. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result isfalse
. Instead, the program never terminates.
– Petr JaneÄÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with(\w+|_|\.|-)++
(notice additional + at the end).
– Pshemo
2 hours ago
Replace
w+
with w
. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Replace
w+
with w
. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
2
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false
. Instead, the program never terminates.– Petr JaneÄÂek
2 hours ago
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false
. Instead, the program never terminates.– Petr JaneÄÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++
(notice additional + at the end).– Pshemo
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++
(notice additional + at the end).– Pshemo
2 hours ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+
=> [\w.-]+
.
Note that w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+
=> [\w.-]+
.
Note that w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
add a comment |Â
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+
=> [\w.-]+
.
Note that w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
add a comment |Â
up vote
9
down vote
accepted
up vote
9
down vote
accepted
The pattern contains nested quantifiers. The w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+
=> [\w.-]+
.
Note that w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
The pattern contains nested quantifiers. The w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+
=> [\w.-]+
.
Note that w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
edited 2 hours ago
answered 2 hours ago
Wiktor Stribiżew
295k16119191
295k16119191
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
add a comment |Â
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago
1
1
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53004478%2fis-this-bug-in-jre-regexp-implementation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Replace
w+
withw
. But a more efficient regex will look like"[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago
Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago
2
@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is
false
. Instead, the program never terminates.– Petr JaneÄÂek
2 hours ago
yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago
You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with
(\w+|_|\.|-)++
(notice additional + at the end).– Pshemo
2 hours ago