Capture all numbers up to three digits

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite












I have the following string:



1 2 134 2009


And I'd like to capture the strings with between 1-3 digits, so the result should be:



['1', '2', '134']


What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:



>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']


What would be the correct way to make sure that another digit doesn't immediate proceed it?










share|improve this question



















  • 1




    What is the logic to match 123 in ['1', '2', '123']
    – The fourth bird
    38 mins ago










  • @Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
    – David L
    36 mins ago






  • 1




    @Thefourthbird oh I see. Sorry that was a typo -- fixed.
    – David L
    34 mins ago














up vote
8
down vote

favorite












I have the following string:



1 2 134 2009


And I'd like to capture the strings with between 1-3 digits, so the result should be:



['1', '2', '134']


What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:



>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']


What would be the correct way to make sure that another digit doesn't immediate proceed it?










share|improve this question



















  • 1




    What is the logic to match 123 in ['1', '2', '123']
    – The fourth bird
    38 mins ago










  • @Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
    – David L
    36 mins ago






  • 1




    @Thefourthbird oh I see. Sorry that was a typo -- fixed.
    – David L
    34 mins ago












up vote
8
down vote

favorite









up vote
8
down vote

favorite











I have the following string:



1 2 134 2009


And I'd like to capture the strings with between 1-3 digits, so the result should be:



['1', '2', '134']


What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:



>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']


What would be the correct way to make sure that another digit doesn't immediate proceed it?










share|improve this question















I have the following string:



1 2 134 2009


And I'd like to capture the strings with between 1-3 digits, so the result should be:



['1', '2', '134']


What I have now captures those, but also captures the "first 3" digits in strings that contain more than 3 digits. This is the current regex I have:



>>> re.findall(r'd1,3', '1 2 134 2009')
['1', '2', '134', '200', '9']

# or a bit closer --

>>> re.findall(r'd1,3(?!d)', '1 2 134 2009')
['1', '2', '134', '009']


What would be the correct way to make sure that another digit doesn't immediate proceed it?







python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 34 mins ago

























asked 41 mins ago









David L

2788




2788







  • 1




    What is the logic to match 123 in ['1', '2', '123']
    – The fourth bird
    38 mins ago










  • @Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
    – David L
    36 mins ago






  • 1




    @Thefourthbird oh I see. Sorry that was a typo -- fixed.
    – David L
    34 mins ago












  • 1




    What is the logic to match 123 in ['1', '2', '123']
    – The fourth bird
    38 mins ago










  • @Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
    – David L
    36 mins ago






  • 1




    @Thefourthbird oh I see. Sorry that was a typo -- fixed.
    – David L
    34 mins ago







1




1




What is the logic to match 123 in ['1', '2', '123']
– The fourth bird
38 mins ago




What is the logic to match 123 in ['1', '2', '123']
– The fourth bird
38 mins ago












@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
– David L
36 mins ago




@Thefourthbird I suppose that it would be a 'self-contained number', for example if someone looked the above string they could see that 4 numbers were contained in it. Not sure if I can give a more rigorous explanation.
– David L
36 mins ago




1




1




@Thefourthbird oh I see. Sorry that was a typo -- fixed.
– David L
34 mins ago




@Thefourthbird oh I see. Sorry that was a typo -- fixed.
– David L
34 mins ago












2 Answers
2






active

oldest

votes

















up vote
10
down vote



accepted










Add word boundaries:



import re

result = re.findall(r'bd1,3b', '1 2 134 2009')

print(result)


Output



['1', '2', '134']


From the documentation b:




Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of word characters. Note that
formally, b is defined as the boundary between a w and a W
character (or vice versa), or between w and the beginning/end of the
string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
'bar foo baz' but not 'foobar' or 'foo3'.



By default Unicode alphanumerics are the ones used in Unicode
patterns, but this can be changed by using the ASCII flag. Word
boundaries are determined by the current locale if the LOCALE flag is
used. Inside a character range, b represents the backspace character,
for compatibility with Python’s string literals.







share|improve this answer






















  • thanks for this. For a 'word boundary', what does this include other than a space?
    – David L
    36 mins ago










  • @DavidL Updated the answer!
    – Daniel Mesejo
    30 mins ago

















up vote
5
down vote













If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.



>>> numbers = '1 2 134 2009'
>>> [n for n in numbers.split() if len(n) <= 3]
>>> ['1', '2', '134']





share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53196561%2fcapture-all-numbers-up-to-three-digits%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    10
    down vote



    accepted










    Add word boundaries:



    import re

    result = re.findall(r'bd1,3b', '1 2 134 2009')

    print(result)


    Output



    ['1', '2', '134']


    From the documentation b:




    Matches the empty string, but only at the beginning or end of a word.
    A word is defined as a sequence of word characters. Note that
    formally, b is defined as the boundary between a w and a W
    character (or vice versa), or between w and the beginning/end of the
    string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
    'bar foo baz' but not 'foobar' or 'foo3'.



    By default Unicode alphanumerics are the ones used in Unicode
    patterns, but this can be changed by using the ASCII flag. Word
    boundaries are determined by the current locale if the LOCALE flag is
    used. Inside a character range, b represents the backspace character,
    for compatibility with Python’s string literals.







    share|improve this answer






















    • thanks for this. For a 'word boundary', what does this include other than a space?
      – David L
      36 mins ago










    • @DavidL Updated the answer!
      – Daniel Mesejo
      30 mins ago














    up vote
    10
    down vote



    accepted










    Add word boundaries:



    import re

    result = re.findall(r'bd1,3b', '1 2 134 2009')

    print(result)


    Output



    ['1', '2', '134']


    From the documentation b:




    Matches the empty string, but only at the beginning or end of a word.
    A word is defined as a sequence of word characters. Note that
    formally, b is defined as the boundary between a w and a W
    character (or vice versa), or between w and the beginning/end of the
    string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
    'bar foo baz' but not 'foobar' or 'foo3'.



    By default Unicode alphanumerics are the ones used in Unicode
    patterns, but this can be changed by using the ASCII flag. Word
    boundaries are determined by the current locale if the LOCALE flag is
    used. Inside a character range, b represents the backspace character,
    for compatibility with Python’s string literals.







    share|improve this answer






















    • thanks for this. For a 'word boundary', what does this include other than a space?
      – David L
      36 mins ago










    • @DavidL Updated the answer!
      – Daniel Mesejo
      30 mins ago












    up vote
    10
    down vote



    accepted







    up vote
    10
    down vote



    accepted






    Add word boundaries:



    import re

    result = re.findall(r'bd1,3b', '1 2 134 2009')

    print(result)


    Output



    ['1', '2', '134']


    From the documentation b:




    Matches the empty string, but only at the beginning or end of a word.
    A word is defined as a sequence of word characters. Note that
    formally, b is defined as the boundary between a w and a W
    character (or vice versa), or between w and the beginning/end of the
    string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
    'bar foo baz' but not 'foobar' or 'foo3'.



    By default Unicode alphanumerics are the ones used in Unicode
    patterns, but this can be changed by using the ASCII flag. Word
    boundaries are determined by the current locale if the LOCALE flag is
    used. Inside a character range, b represents the backspace character,
    for compatibility with Python’s string literals.







    share|improve this answer














    Add word boundaries:



    import re

    result = re.findall(r'bd1,3b', '1 2 134 2009')

    print(result)


    Output



    ['1', '2', '134']


    From the documentation b:




    Matches the empty string, but only at the beginning or end of a word.
    A word is defined as a sequence of word characters. Note that
    formally, b is defined as the boundary between a w and a W
    character (or vice versa), or between w and the beginning/end of the
    string. This means that r'bfoob' matches 'foo', 'foo.', '(foo)',
    'bar foo baz' but not 'foobar' or 'foo3'.



    By default Unicode alphanumerics are the ones used in Unicode
    patterns, but this can be changed by using the ASCII flag. Word
    boundaries are determined by the current locale if the LOCALE flag is
    used. Inside a character range, b represents the backspace character,
    for compatibility with Python’s string literals.








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 30 mins ago

























    answered 38 mins ago









    Daniel Mesejo

    6,6901621




    6,6901621











    • thanks for this. For a 'word boundary', what does this include other than a space?
      – David L
      36 mins ago










    • @DavidL Updated the answer!
      – Daniel Mesejo
      30 mins ago
















    • thanks for this. For a 'word boundary', what does this include other than a space?
      – David L
      36 mins ago










    • @DavidL Updated the answer!
      – Daniel Mesejo
      30 mins ago















    thanks for this. For a 'word boundary', what does this include other than a space?
    – David L
    36 mins ago




    thanks for this. For a 'word boundary', what does this include other than a space?
    – David L
    36 mins ago












    @DavidL Updated the answer!
    – Daniel Mesejo
    30 mins ago




    @DavidL Updated the answer!
    – Daniel Mesejo
    30 mins ago












    up vote
    5
    down vote













    If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.



    >>> numbers = '1 2 134 2009'
    >>> [n for n in numbers.split() if len(n) <= 3]
    >>> ['1', '2', '134']





    share|improve this answer
























      up vote
      5
      down vote













      If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.



      >>> numbers = '1 2 134 2009'
      >>> [n for n in numbers.split() if len(n) <= 3]
      >>> ['1', '2', '134']





      share|improve this answer






















        up vote
        5
        down vote










        up vote
        5
        down vote









        If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.



        >>> numbers = '1 2 134 2009'
        >>> [n for n in numbers.split() if len(n) <= 3]
        >>> ['1', '2', '134']





        share|improve this answer












        If there are only digits separated by whitespace in your string, using re is overkill. You can simply split the string and check the length of the substrings.



        >>> numbers = '1 2 134 2009'
        >>> [n for n in numbers.split() if len(n) <= 3]
        >>> ['1', '2', '134']






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 35 mins ago









        timgeb

        42.1k105681




        42.1k105681



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53196561%2fcapture-all-numbers-up-to-three-digits%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery