Imgur URL parser

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.



The challenge itself:




Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.



The link could be pointing to:



  • An album (e.g. http://imgur.com/a/cjh4E)

  • A gallery (e.g. http://imgur.com/gallery/59npG)

  • An image (e.g. http://imgur.com/OzZUNMM)

  • An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)

Examples



  • "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"

  • "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"

  • "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"



I came up with the following.



import re

def imgurUrlParser(url):

url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string

gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)

if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"









share|improve this question









New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
    – JakeD
    6 hours ago






  • 1




    Edited. Thanks, Jake.
    – snorkle
    6 hours ago














up vote
3
down vote

favorite












Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.



The challenge itself:




Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.



The link could be pointing to:



  • An album (e.g. http://imgur.com/a/cjh4E)

  • A gallery (e.g. http://imgur.com/gallery/59npG)

  • An image (e.g. http://imgur.com/OzZUNMM)

  • An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)

Examples



  • "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"

  • "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"

  • "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"



I came up with the following.



import re

def imgurUrlParser(url):

url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string

gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)

if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"









share|improve this question









New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
    – JakeD
    6 hours ago






  • 1




    Edited. Thanks, Jake.
    – snorkle
    6 hours ago












up vote
3
down vote

favorite









up vote
3
down vote

favorite











Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.



The challenge itself:




Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.



The link could be pointing to:



  • An album (e.g. http://imgur.com/a/cjh4E)

  • A gallery (e.g. http://imgur.com/gallery/59npG)

  • An image (e.g. http://imgur.com/OzZUNMM)

  • An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)

Examples



  • "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"

  • "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"

  • "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"



I came up with the following.



import re

def imgurUrlParser(url):

url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string

gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)

if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"









share|improve this question









New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.



The challenge itself:




Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.



The link could be pointing to:



  • An album (e.g. http://imgur.com/a/cjh4E)

  • A gallery (e.g. http://imgur.com/gallery/59npG)

  • An image (e.g. http://imgur.com/OzZUNMM)

  • An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)

Examples



  • "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"

  • "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"

  • "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"



I came up with the following.



import re

def imgurUrlParser(url):

url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string

gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)

if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"






python python-3.x programming-challenge regex url






share|improve this question









New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 5 hours ago









200_success

124k14144401




124k14144401






New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 6 hours ago









snorkle

162




162




New contributor




snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
    – JakeD
    6 hours ago






  • 1




    Edited. Thanks, Jake.
    – snorkle
    6 hours ago
















  • This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
    – JakeD
    6 hours ago






  • 1




    Edited. Thanks, Jake.
    – snorkle
    6 hours ago















This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
– JakeD
6 hours ago




This code doesn't run. direct_link_regex is a string and strings don't have a group() method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
– JakeD
6 hours ago




1




1




Edited. Thanks, Jake.
– snorkle
6 hours ago




Edited. Thanks, Jake.
– snorkle
6 hours ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote













By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.



Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.



There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.



A docstring with doctests would be very beneficial for this function.



import re

def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.

>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',



Note that the regex above relies on the (?aiLmsux-imsx:...) feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.






share|improve this answer





























    up vote
    1
    down vote













    For a first pass, not bad! Your code is pretty easy to follow.



    Problems:



    1. Don't use to match different strings. matches any set of characters, so [imgur|i.imgur]* will match ``, g, mgi, etc. You probably wanted a non-capturing group, which is specified with (?: ...), re Docs


    2. Name functions with snake_case, as recommended by PEP 8.


    3. The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an AttributeError, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.


    4. Your last case, direct_link_regex is never reached with valid input since it is handled by image_regex.


    Improvements:



    1. Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.


    2. Regular expressions are usually expressed using raw strings, that is, strings with an r prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.


    3. Including a docstring is always a good idea, and you can even embed tests using doctest.


    How I would implement this function:



    def imgur_url_parser(url):
    """
    Parses an imgur url into components.

    >>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
    True
    >>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
    True
    >>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
    True
    >>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
    True
    """
    match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
    if not match:
    raise ValueError('The string "" is not a valid imgur link'.format(url))
    # Empty when this is an image link
    if not match.group(2):
    # Remove image extension, if it exists
    image_id = re.sub(r"(.w+)?$", "", match.group(1))
    return "id": image_id, "type": "image"
    url_type = match.group(1) == "a" and "album" or "gallery"
    return "id": match.group(2), "type": url_type


    if __name__ == "__main__":
    import doctest
    doctest.testmod()





    share|improve this answer




















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "196"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      snorkle is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f204316%2fimgur-url-parser%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote













      By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.



      Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.



      There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.



      A docstring with doctests would be very beneficial for this function.



      import re

      def parse_imgur_url(url):
      """
      Extract the type and id from an Imgur URL.

      >>> parse_imgur_url('http://imgur.com/a/cjh4E')
      'id': 'cjh4E', 'type': 'album'
      >>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
      'id': '59npG', 'type': 'gallery'
      >>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
      'id': 'altd8Ld', 'type': 'image'
      >>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
      'id': 'ELmEk', 'type': 'image'
      >>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
      Traceback (most recent call last):
      ...
      ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
      >>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
      Traceback (most recent call last):
      ...
      ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
      >>> parse_imgur_url('Blah') is None
      Traceback (most recent call last):
      ...
      ValueError: "Blah" is not a valid imgur URL
      """
      match = re.match(
      r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
      r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
      url
      )
      if not match:
      raise ValueError('"" is not a valid imgur URL'.format(url))
      return
      'id': match.group('id'),
      'type': 'album' if match.group('album') else
      'gallery' if match.group('gallery') else
      'image',



      Note that the regex above relies on the (?aiLmsux-imsx:...) feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.






      share|improve this answer


























        up vote
        2
        down vote













        By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.



        Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.



        There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.



        A docstring with doctests would be very beneficial for this function.



        import re

        def parse_imgur_url(url):
        """
        Extract the type and id from an Imgur URL.

        >>> parse_imgur_url('http://imgur.com/a/cjh4E')
        'id': 'cjh4E', 'type': 'album'
        >>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
        'id': '59npG', 'type': 'gallery'
        >>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
        'id': 'altd8Ld', 'type': 'image'
        >>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
        'id': 'ELmEk', 'type': 'image'
        >>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
        Traceback (most recent call last):
        ...
        ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
        >>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
        Traceback (most recent call last):
        ...
        ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
        >>> parse_imgur_url('Blah') is None
        Traceback (most recent call last):
        ...
        ValueError: "Blah" is not a valid imgur URL
        """
        match = re.match(
        r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
        r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
        url
        )
        if not match:
        raise ValueError('"" is not a valid imgur URL'.format(url))
        return
        'id': match.group('id'),
        'type': 'album' if match.group('album') else
        'gallery' if match.group('gallery') else
        'image',



        Note that the regex above relies on the (?aiLmsux-imsx:...) feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.






        share|improve this answer
























          up vote
          2
          down vote










          up vote
          2
          down vote









          By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.



          Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.



          There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.



          A docstring with doctests would be very beneficial for this function.



          import re

          def parse_imgur_url(url):
          """
          Extract the type and id from an Imgur URL.

          >>> parse_imgur_url('http://imgur.com/a/cjh4E')
          'id': 'cjh4E', 'type': 'album'
          >>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
          'id': '59npG', 'type': 'gallery'
          >>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
          'id': 'altd8Ld', 'type': 'image'
          >>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
          'id': 'ELmEk', 'type': 'image'
          >>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
          Traceback (most recent call last):
          ...
          ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
          >>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
          Traceback (most recent call last):
          ...
          ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
          >>> parse_imgur_url('Blah') is None
          Traceback (most recent call last):
          ...
          ValueError: "Blah" is not a valid imgur URL
          """
          match = re.match(
          r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
          r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
          url
          )
          if not match:
          raise ValueError('"" is not a valid imgur URL'.format(url))
          return
          'id': match.group('id'),
          'type': 'album' if match.group('album') else
          'gallery' if match.group('gallery') else
          'image',



          Note that the regex above relies on the (?aiLmsux-imsx:...) feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.






          share|improve this answer














          By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.



          Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.



          There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.



          A docstring with doctests would be very beneficial for this function.



          import re

          def parse_imgur_url(url):
          """
          Extract the type and id from an Imgur URL.

          >>> parse_imgur_url('http://imgur.com/a/cjh4E')
          'id': 'cjh4E', 'type': 'album'
          >>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
          'id': '59npG', 'type': 'gallery'
          >>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
          'id': 'altd8Ld', 'type': 'image'
          >>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
          'id': 'ELmEk', 'type': 'image'
          >>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
          Traceback (most recent call last):
          ...
          ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
          >>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
          Traceback (most recent call last):
          ...
          ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
          >>> parse_imgur_url('Blah') is None
          Traceback (most recent call last):
          ...
          ValueError: "Blah" is not a valid imgur URL
          """
          match = re.match(
          r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
          r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
          url
          )
          if not match:
          raise ValueError('"" is not a valid imgur URL'.format(url))
          return
          'id': match.group('id'),
          'type': 'album' if match.group('album') else
          'gallery' if match.group('gallery') else
          'image',



          Note that the regex above relies on the (?aiLmsux-imsx:...) feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 3 hours ago

























          answered 3 hours ago









          200_success

          124k14144401




          124k14144401






















              up vote
              1
              down vote













              For a first pass, not bad! Your code is pretty easy to follow.



              Problems:



              1. Don't use to match different strings. matches any set of characters, so [imgur|i.imgur]* will match ``, g, mgi, etc. You probably wanted a non-capturing group, which is specified with (?: ...), re Docs


              2. Name functions with snake_case, as recommended by PEP 8.


              3. The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an AttributeError, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.


              4. Your last case, direct_link_regex is never reached with valid input since it is handled by image_regex.


              Improvements:



              1. Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.


              2. Regular expressions are usually expressed using raw strings, that is, strings with an r prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.


              3. Including a docstring is always a good idea, and you can even embed tests using doctest.


              How I would implement this function:



              def imgur_url_parser(url):
              """
              Parses an imgur url into components.

              >>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
              True
              >>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
              True
              >>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
              True
              >>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
              True
              """
              match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
              if not match:
              raise ValueError('The string "" is not a valid imgur link'.format(url))
              # Empty when this is an image link
              if not match.group(2):
              # Remove image extension, if it exists
              image_id = re.sub(r"(.w+)?$", "", match.group(1))
              return "id": image_id, "type": "image"
              url_type = match.group(1) == "a" and "album" or "gallery"
              return "id": match.group(2), "type": url_type


              if __name__ == "__main__":
              import doctest
              doctest.testmod()





              share|improve this answer
























                up vote
                1
                down vote













                For a first pass, not bad! Your code is pretty easy to follow.



                Problems:



                1. Don't use to match different strings. matches any set of characters, so [imgur|i.imgur]* will match ``, g, mgi, etc. You probably wanted a non-capturing group, which is specified with (?: ...), re Docs


                2. Name functions with snake_case, as recommended by PEP 8.


                3. The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an AttributeError, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.


                4. Your last case, direct_link_regex is never reached with valid input since it is handled by image_regex.


                Improvements:



                1. Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.


                2. Regular expressions are usually expressed using raw strings, that is, strings with an r prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.


                3. Including a docstring is always a good idea, and you can even embed tests using doctest.


                How I would implement this function:



                def imgur_url_parser(url):
                """
                Parses an imgur url into components.

                >>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
                True
                >>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
                True
                >>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
                True
                >>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
                True
                """
                match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
                if not match:
                raise ValueError('The string "" is not a valid imgur link'.format(url))
                # Empty when this is an image link
                if not match.group(2):
                # Remove image extension, if it exists
                image_id = re.sub(r"(.w+)?$", "", match.group(1))
                return "id": image_id, "type": "image"
                url_type = match.group(1) == "a" and "album" or "gallery"
                return "id": match.group(2), "type": url_type


                if __name__ == "__main__":
                import doctest
                doctest.testmod()





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  For a first pass, not bad! Your code is pretty easy to follow.



                  Problems:



                  1. Don't use to match different strings. matches any set of characters, so [imgur|i.imgur]* will match ``, g, mgi, etc. You probably wanted a non-capturing group, which is specified with (?: ...), re Docs


                  2. Name functions with snake_case, as recommended by PEP 8.


                  3. The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an AttributeError, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.


                  4. Your last case, direct_link_regex is never reached with valid input since it is handled by image_regex.


                  Improvements:



                  1. Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.


                  2. Regular expressions are usually expressed using raw strings, that is, strings with an r prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.


                  3. Including a docstring is always a good idea, and you can even embed tests using doctest.


                  How I would implement this function:



                  def imgur_url_parser(url):
                  """
                  Parses an imgur url into components.

                  >>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
                  True
                  >>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
                  True
                  >>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
                  True
                  >>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
                  True
                  """
                  match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
                  if not match:
                  raise ValueError('The string "" is not a valid imgur link'.format(url))
                  # Empty when this is an image link
                  if not match.group(2):
                  # Remove image extension, if it exists
                  image_id = re.sub(r"(.w+)?$", "", match.group(1))
                  return "id": image_id, "type": "image"
                  url_type = match.group(1) == "a" and "album" or "gallery"
                  return "id": match.group(2), "type": url_type


                  if __name__ == "__main__":
                  import doctest
                  doctest.testmod()





                  share|improve this answer












                  For a first pass, not bad! Your code is pretty easy to follow.



                  Problems:



                  1. Don't use to match different strings. matches any set of characters, so [imgur|i.imgur]* will match ``, g, mgi, etc. You probably wanted a non-capturing group, which is specified with (?: ...), re Docs


                  2. Name functions with snake_case, as recommended by PEP 8.


                  3. The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an AttributeError, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.


                  4. Your last case, direct_link_regex is never reached with valid input since it is handled by image_regex.


                  Improvements:



                  1. Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.


                  2. Regular expressions are usually expressed using raw strings, that is, strings with an r prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.


                  3. Including a docstring is always a good idea, and you can even embed tests using doctest.


                  How I would implement this function:



                  def imgur_url_parser(url):
                  """
                  Parses an imgur url into components.

                  >>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
                  True
                  >>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
                  True
                  >>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
                  True
                  >>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
                  True
                  """
                  match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
                  if not match:
                  raise ValueError('The string "" is not a valid imgur link'.format(url))
                  # Empty when this is an image link
                  if not match.group(2):
                  # Remove image extension, if it exists
                  image_id = re.sub(r"(.w+)?$", "", match.group(1))
                  return "id": image_id, "type": "image"
                  url_type = match.group(1) == "a" and "album" or "gallery"
                  return "id": match.group(2), "type": url_type


                  if __name__ == "__main__":
                  import doctest
                  doctest.testmod()






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 5 hours ago









                  Gerrit0

                  2,7401520




                  2,7401520




















                      snorkle is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      snorkle is a new contributor. Be nice, and check out our Code of Conduct.












                      snorkle is a new contributor. Be nice, and check out our Code of Conduct.











                      snorkle is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f204316%2fimgur-url-parser%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What does second last employer means? [closed]

                      List of Gilmore Girls characters

                      Confectionery