Capture all numbers up to three digits

up vote
8
down vote

favorite

I have the following string:

1 2 134 2009

And I'd like to capture the strings with between 1-3 digits, so the result should be:

['1', '2', '134']

What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:

>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']

What would be the correct way to make sure that another digit doesn't immediate proceed it?

edited 34 mins ago

asked 41 mins ago

David L

2788

1

What is the logic to match 123 in ['1', '2', '123']
â€“Â The fourth bird
38 mins ago

@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
â€“Â David L
36 mins ago

1

@Thefourthbird oh I see. Sorry that was a typo -- fixed.
â€“Â David L
34 mins ago

add a commentÂ |Â

up vote
8
down vote

favorite

I have the following string:

1 2 134 2009

And I'd like to capture the strings with between 1-3 digits, so the result should be:

['1', '2', '134']

What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:

>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']

What would be the correct way to make sure that another digit doesn't immediate proceed it?

edited 34 mins ago

asked 41 mins ago

David L

2788

1

What is the logic to match 123 in ['1', '2', '123']
â€“Â The fourth bird
38 mins ago

@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
â€“Â David L
36 mins ago

1

@Thefourthbird oh I see. Sorry that was a typo -- fixed.
â€“Â David L
34 mins ago

add a commentÂ |Â

up vote
8
down vote

favorite

I have the following string:

1 2 134 2009

And I'd like to capture the strings with between 1-3 digits, so the result should be:

['1', '2', '134']

What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:

>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']

What would be the correct way to make sure that another digit doesn't immediate proceed it?

edited 34 mins ago

asked 41 mins ago

David L

2788

I have the following string:

1 2 134 2009

And I'd like to capture the strings with between 1-3 digits, so the result should be:

['1', '2', '134']

What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:

>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']

What would be the correct way to make sure that another digit doesn't immediate proceed it?

python regex

edited 34 mins ago

asked 41 mins ago

David L

2788

edited 34 mins ago

asked 41 mins ago

David L

2788

edited 34 mins ago

asked 41 mins ago

David L

2788

asked 41 mins ago

David L

2788

asked 41 mins ago

David L

2788

1

What is the logic to match 123 in ['1', '2', '123']
â€“Â The fourth bird
38 mins ago

@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
â€“Â David L
36 mins ago

1

@Thefourthbird oh I see. Sorry that was a typo -- fixed.
â€“Â David L
34 mins ago

add a commentÂ |Â

1

What is the logic to match 123 in ['1', '2', '123']
â€“Â The fourth bird
38 mins ago

@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
â€“Â David L
36 mins ago

1

@Thefourthbird oh I see. Sorry that was a typo -- fixed.
â€“Â David L
34 mins ago

What is the logic to match 123 in ['1', '2', '123']
â€“Â The fourth bird
38 mins ago

@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
â€“Â David L
36 mins ago

@Thefourthbird oh I see. Sorry that was a typo -- fixed.
â€“Â David L
34 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Add word boundaries:

import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)

Output

['1', '2', '134']

From the documentation b:

Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with PythonÃ¢Â€Â™s string literals.

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

add a commentÂ |Â

up vote
5
down vote

If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.

>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']

answered 35 mins ago

timgeb

42.1k105681

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53196561%2fcapture-all-numbers-up-to-three-digits%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Add word boundaries:

import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)

Output

['1', '2', '134']

From the documentation b:

Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with PythonÃ¢Â€Â™s string literals.

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

add a commentÂ |Â

up vote
10
down vote

accepted

Add word boundaries:

import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)

Output

['1', '2', '134']

From the documentation b:

Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with PythonÃ¢Â€Â™s string literals.

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

add a commentÂ |Â

up vote
10
down vote

accepted

Add word boundaries:

import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)

Output

['1', '2', '134']

From the documentation b:

Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with PythonÃ¢Â€Â™s string literals.

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

Add word boundaries:

import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)

Output

['1', '2', '134']

From the documentation b:

Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with PythonÃ¢Â€Â™s string literals.

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

edited 30 mins ago

answered 38 mins ago

Daniel Mesejo

6,6901621

answered 38 mins ago

Daniel Mesejo

6,6901621

answered 38 mins ago

Daniel Mesejo

6,6901621

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

add a commentÂ |Â

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

thanks for this. For a 'word boundary', what does this include other than a space?
â€“Â David L
36 mins ago

@DavidL Updated the answer!
â€“Â Daniel Mesejo
30 mins ago

add a commentÂ |Â

up vote
5
down vote

If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.

>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']

answered 35 mins ago

timgeb

42.1k105681

add a commentÂ |Â

up vote
5
down vote

If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.

>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']

answered 35 mins ago

timgeb

42.1k105681

add a commentÂ |Â

up vote
5
down vote

If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.

>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']

answered 35 mins ago

timgeb

42.1k105681

If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.

>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']

answered 35 mins ago

timgeb

42.1k105681

answered 35 mins ago

timgeb

42.1k105681

answered 35 mins ago

timgeb

42.1k105681

answered 35 mins ago

timgeb

42.1k105681

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky