Is this bug in JRE regexp implementation?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite












I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang



Expected result if false, it should not match.



Problem is that call to method matcher.matches() never returns.



Is this bug in JRE regexp implementation?



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld_


I'm using



java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)









share|improve this question





















  • Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
    – Wiktor Stribiżew
    3 hours ago











  • Why should it match when your string ends with .xsl and your regex .yang?
    – Pushpesh Kumar Rajwanshi
    2 hours ago






  • 2




    @PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
    – Petr Janeček
    2 hours ago










  • yea just noticed :) Thanks for pointing out.
    – Pushpesh Kumar Rajwanshi
    2 hours ago










  • You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
    – Pshemo
    2 hours ago















up vote
8
down vote

favorite












I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang



Expected result if false, it should not match.



Problem is that call to method matcher.matches() never returns.



Is this bug in JRE regexp implementation?



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld_


I'm using



java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)









share|improve this question





















  • Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
    – Wiktor Stribiżew
    3 hours ago











  • Why should it match when your string ends with .xsl and your regex .yang?
    – Pushpesh Kumar Rajwanshi
    2 hours ago






  • 2




    @PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
    – Petr Janeček
    2 hours ago










  • yea just noticed :) Thanks for pointing out.
    – Pushpesh Kumar Rajwanshi
    2 hours ago










  • You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
    – Pshemo
    2 hours ago













up vote
8
down vote

favorite









up vote
8
down vote

favorite











I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang



Expected result if false, it should not match.



Problem is that call to method matcher.matches() never returns.



Is this bug in JRE regexp implementation?



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld_


I'm using



java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)









share|improve this question













I'm trying to match string iso_schematron_skeleton_for_xslt1.xsl against regexp ([a-zA-Z|_])?(w+|_|.|-)+(@d4-d2-d2)?.yang



Expected result if false, it should not match.



Problem is that call to method matcher.matches() never returns.



Is this bug in JRE regexp implementation?



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld_


I'm using



java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)






java regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 3 hours ago









rkosegi

8,39053461




8,39053461











  • Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
    – Wiktor Stribiżew
    3 hours ago











  • Why should it match when your string ends with .xsl and your regex .yang?
    – Pushpesh Kumar Rajwanshi
    2 hours ago






  • 2




    @PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
    – Petr Janeček
    2 hours ago










  • yea just noticed :) Thanks for pointing out.
    – Pushpesh Kumar Rajwanshi
    2 hours ago










  • You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
    – Pshemo
    2 hours ago

















  • Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
    – Wiktor Stribiżew
    3 hours ago











  • Why should it match when your string ends with .xsl and your regex .yang?
    – Pushpesh Kumar Rajwanshi
    2 hours ago






  • 2




    @PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
    – Petr Janeček
    2 hours ago










  • yea just noticed :) Thanks for pointing out.
    – Pushpesh Kumar Rajwanshi
    2 hours ago










  • You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
    – Pshemo
    2 hours ago
















Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago





Replace w+ with w. But a more efficient regex will look like "[a-zA-Z_]?[\w.-]+(@\d4-\d2-\d2)?\.yang"
– Wiktor Stribiżew
3 hours ago













Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago




Why should it match when your string ends with .xsl and your regex .yang?
– Pushpesh Kumar Rajwanshi
2 hours ago




2




2




@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
– Petr Janeček
2 hours ago




@PushpeshKumarRajwanshi It shouldn't, OP says the expected result is false. Instead, the program never terminates.
– Petr Janeček
2 hours ago












yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago




yea just noticed :) Thanks for pointing out.
– Pushpesh Kumar Rajwanshi
2 hours ago












You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
– Pshemo
2 hours ago





You can also try using possessive quantifier which will never allow to reuse already matched part, even while backtracking. Try with (\w+|_|\.|-)++ (notice additional + at the end).
– Pshemo
2 hours ago













1 Answer
1






active

oldest

votes

















up vote
9
down vote



accepted










The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.



Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.



Use



.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")


Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.



See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).






share|improve this answer






















  • +1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
    – Michael Berry
    2 hours ago






  • 1




    Thanks for optimization hints. Now it behaves as expected.
    – rkosegi
    2 hours ago










Your Answer





StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53004478%2fis-this-bug-in-jre-regexp-implementation%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
9
down vote



accepted










The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.



Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.



Use



.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")


Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.



See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).






share|improve this answer






















  • +1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
    – Michael Berry
    2 hours ago






  • 1




    Thanks for optimization hints. Now it behaves as expected.
    – rkosegi
    2 hours ago














up vote
9
down vote



accepted










The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.



Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.



Use



.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")


Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.



See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).






share|improve this answer






















  • +1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
    – Michael Berry
    2 hours ago






  • 1




    Thanks for optimization hints. Now it behaves as expected.
    – rkosegi
    2 hours ago












up vote
9
down vote



accepted







up vote
9
down vote



accepted






The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.



Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.



Use



.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")


Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.



See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).






share|improve this answer














The pattern contains nested quantifiers. The w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\w+|_|\.|-)+ => [\w.-]+.



Note that w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.



Use



.compile("[a-zA-Z_]?[\w.-]+(?:@\d4-\d2-\d2)?\.yang")


Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.



See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 hours ago

























answered 2 hours ago









Wiktor Stribiżew

295k16119191




295k16119191











  • +1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
    – Michael Berry
    2 hours ago






  • 1




    Thanks for optimization hints. Now it behaves as expected.
    – rkosegi
    2 hours ago
















  • +1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
    – Michael Berry
    2 hours ago






  • 1




    Thanks for optimization hints. Now it behaves as expected.
    – rkosegi
    2 hours ago















+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago




+1. The regex engine in Java isn't fantastic - I've run into situations like this reasonably frequently where you have to "think" a bit more about the regex in Java than say, Perl.
– Michael Berry
2 hours ago




1




1




Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago




Thanks for optimization hints. Now it behaves as expected.
– rkosegi
2 hours ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53004478%2fis-this-bug-in-jre-regexp-implementation%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery