Imgur URL parser
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.
The challenge itself:
Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.
The link could be pointing to:
- An album (e.g. http://imgur.com/a/cjh4E)
- A gallery (e.g. http://imgur.com/gallery/59npG)
- An image (e.g. http://imgur.com/OzZUNMM)
- An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)
Examples
- "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"
- "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"
- "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"
I came up with the following.
import re
def imgurUrlParser(url):
url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string
gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)
if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"
python python-3.x programming-challenge regex url
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
3
down vote
favorite
Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.
The challenge itself:
Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.
The link could be pointing to:
- An album (e.g. http://imgur.com/a/cjh4E)
- A gallery (e.g. http://imgur.com/gallery/59npG)
- An image (e.g. http://imgur.com/OzZUNMM)
- An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)
Examples
- "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"
- "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"
- "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"
I came up with the following.
import re
def imgurUrlParser(url):
url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string
gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)
if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"
python python-3.x programming-challenge regex url
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
This code doesn't run.direct_link_regex
is a string and strings don't have agroup()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
– JakeD
6 hours ago
1
Edited. Thanks, Jake.
– snorkle
6 hours ago
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.
The challenge itself:
Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.
The link could be pointing to:
- An album (e.g. http://imgur.com/a/cjh4E)
- A gallery (e.g. http://imgur.com/gallery/59npG)
- An image (e.g. http://imgur.com/OzZUNMM)
- An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)
Examples
- "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"
- "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"
- "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"
I came up with the following.
import re
def imgurUrlParser(url):
url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string
gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)
if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"
python python-3.x programming-challenge regex url
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Fairly new to Python, and I have been doing a few Edabit challenges, to better help with my problem-solving. I have just completed a some what semi-difficult challenge, and I was hoping for some feed back.
The challenge itself:
Create a function that takes an imgur link (as a string) and extracts
the unique id and type. Return an object containing the unique id, and
a string indicating what type of link it is.
The link could be pointing to:
- An album (e.g. http://imgur.com/a/cjh4E)
- A gallery (e.g. http://imgur.com/gallery/59npG)
- An image (e.g. http://imgur.com/OzZUNMM)
- An image (direct link) (e.g. http://i.imgur.com/altd8Ld.png)
Examples
- "http://imgur.com/a/cjh4E" ➞ id: "cjh4E", type: "album"
- "http://imgur.com/gallery/59npG" ➞ id: "59npG", type: "gallery"
- "http://i.imgur.com/altd8Ld.png" ➞ id: "altd8Ld", type: "image"
I came up with the following.
import re
def imgurUrlParser(url):
url_regex = "^[http://www.|https://www.|http://|https://|www.]*[imgur|i.imgur]*.com"
url = re.match(url_regex, url).string
gallery_regex = re.match(url_regex + "(/gallery/)(w+)", url)
album_regex = re.match(url_regex + "(/a/)(w+)", url)
image_regex = re.match(url_regex + "/(w+)", url)
direct_link_regex = re.match(url_regex + "(w+)(.w+)", url)
if gallery_regex:
return "id" : gallery_regex.group(2), "type" : "gallery"
elif album_regex:
return "id" : album_regex.group(2), "type" : "album"
elif image_regex:
return "id" : image_regex.group(1), "type" : "image"
elif direct_link_regex:
return "id" : direct_link_regex.group(1), "type" : "image"
python python-3.x programming-challenge regex url
python python-3.x programming-challenge regex url
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 5 hours ago


200_success
124k14144401
124k14144401
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 6 hours ago
snorkle
162
162
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
snorkle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
This code doesn't run.direct_link_regex
is a string and strings don't have agroup()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
– JakeD
6 hours ago
1
Edited. Thanks, Jake.
– snorkle
6 hours ago
add a comment |Â
This code doesn't run.direct_link_regex
is a string and strings don't have agroup()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?
– JakeD
6 hours ago
1
Edited. Thanks, Jake.
– snorkle
6 hours ago
This code doesn't run.
direct_link_regex
is a string and strings don't have a group()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?– JakeD
6 hours ago
This code doesn't run.
direct_link_regex
is a string and strings don't have a group()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?– JakeD
6 hours ago
1
1
Edited. Thanks, Jake.
– snorkle
6 hours ago
Edited. Thanks, Jake.
– snorkle
6 hours ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.
Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.
There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.
A docstring with doctests would be very beneficial for this function.
import re
def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.
>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',
Note that the regex above relies on the (?aiLmsux-imsx:...)
feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.
add a comment |Â
up vote
1
down vote
For a first pass, not bad! Your code is pretty easy to follow.
Problems:
Don't use
to match different strings.
matches any set of characters, so
[imgur|i.imgur]*
will match ``,g
,mgi
, etc. You probably wanted a non-capturing group, which is specified with(?: ...)
, re DocsName functions with
snake_case
, as recommended by PEP 8.The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an
AttributeError
, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.Your last case,
direct_link_regex
is never reached with valid input since it is handled byimage_regex
.
Improvements:
Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.
Regular expressions are usually expressed using raw strings, that is, strings with an
r
prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.Including a docstring is always a good idea, and you can even embed tests using doctest.
How I would implement this function:
def imgur_url_parser(url):
"""
Parses an imgur url into components.
>>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
True
>>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
True
>>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
True
>>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
True
"""
match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
if not match:
raise ValueError('The string "" is not a valid imgur link'.format(url))
# Empty when this is an image link
if not match.group(2):
# Remove image extension, if it exists
image_id = re.sub(r"(.w+)?$", "", match.group(1))
return "id": image_id, "type": "image"
url_type = match.group(1) == "a" and "album" or "gallery"
return "id": match.group(2), "type": url_type
if __name__ == "__main__":
import doctest
doctest.testmod()
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.
Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.
There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.
A docstring with doctests would be very beneficial for this function.
import re
def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.
>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',
Note that the regex above relies on the (?aiLmsux-imsx:...)
feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.
add a comment |Â
up vote
2
down vote
By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.
Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.
There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.
A docstring with doctests would be very beneficial for this function.
import re
def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.
>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',
Note that the regex above relies on the (?aiLmsux-imsx:...)
feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.
Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.
There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.
A docstring with doctests would be very beneficial for this function.
import re
def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.
>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',
Note that the regex above relies on the (?aiLmsux-imsx:...)
feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.
By RFC 1738, the scheme and host portions of URLs are case-insensitive. Also, it is allowable to include a redundant port number in the URL.
Imgur also also partners with certain other websites. For instance, when you upload an image through the question editor Stack Exchange site, it will end up on https://i.stack.imgur.com.
There is a lot of commonality in the various regexes. Consider combining them all into a single regex. Use named capture groups to avoid the magic group numbers.
A docstring with doctests would be very beneficial for this function.
import re
def parse_imgur_url(url):
"""
Extract the type and id from an Imgur URL.
>>> parse_imgur_url('http://imgur.com/a/cjh4E')
'id': 'cjh4E', 'type': 'album'
>>> parse_imgur_url('HtTP://imgur.COM:80/gallery/59npG')
'id': '59npG', 'type': 'gallery'
>>> parse_imgur_url('https://i.imgur.com/altd8Ld.png')
'id': 'altd8Ld', 'type': 'image'
>>> parse_imgur_url('https://i.stack.imgur.com/ELmEk.png')
'id': 'ELmEk', 'type': 'image'
>>> parse_imgur_url('http://not-imgur.com/altd8Ld.png') is None
Traceback (most recent call last):
...
ValueError: "http://not-imgur.com/altd8Ld.png" is not a valid imgur URL
>>> parse_imgur_url('tftp://imgur.com/gallery/59npG') is None
Traceback (most recent call last):
...
ValueError: "tftp://imgur.com/gallery/59npG" is not a valid imgur URL
>>> parse_imgur_url('Blah') is None
Traceback (most recent call last):
...
ValueError: "Blah" is not a valid imgur URL
"""
match = re.match(
r'^(?i:https?://(?:.+.)?imgur.com)(:d+)?'
r'/(?:(?P<album>a/)|(?P<gallery>gallery/))?(?P<id>w+)',
url
)
if not match:
raise ValueError('"" is not a valid imgur URL'.format(url))
return
'id': match.group('id'),
'type': 'album' if match.group('album') else
'gallery' if match.group('gallery') else
'image',
Note that the regex above relies on the (?aiLmsux-imsx:...)
feature of Python 3.6, and the doctests rely on the predictable order of dictionary keys in Python 3.6 / 3.7.
edited 3 hours ago
answered 3 hours ago


200_success
124k14144401
124k14144401
add a comment |Â
add a comment |Â
up vote
1
down vote
For a first pass, not bad! Your code is pretty easy to follow.
Problems:
Don't use
to match different strings.
matches any set of characters, so
[imgur|i.imgur]*
will match ``,g
,mgi
, etc. You probably wanted a non-capturing group, which is specified with(?: ...)
, re DocsName functions with
snake_case
, as recommended by PEP 8.The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an
AttributeError
, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.Your last case,
direct_link_regex
is never reached with valid input since it is handled byimage_regex
.
Improvements:
Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.
Regular expressions are usually expressed using raw strings, that is, strings with an
r
prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.Including a docstring is always a good idea, and you can even embed tests using doctest.
How I would implement this function:
def imgur_url_parser(url):
"""
Parses an imgur url into components.
>>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
True
>>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
True
>>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
True
>>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
True
"""
match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
if not match:
raise ValueError('The string "" is not a valid imgur link'.format(url))
# Empty when this is an image link
if not match.group(2):
# Remove image extension, if it exists
image_id = re.sub(r"(.w+)?$", "", match.group(1))
return "id": image_id, "type": "image"
url_type = match.group(1) == "a" and "album" or "gallery"
return "id": match.group(2), "type": url_type
if __name__ == "__main__":
import doctest
doctest.testmod()
add a comment |Â
up vote
1
down vote
For a first pass, not bad! Your code is pretty easy to follow.
Problems:
Don't use
to match different strings.
matches any set of characters, so
[imgur|i.imgur]*
will match ``,g
,mgi
, etc. You probably wanted a non-capturing group, which is specified with(?: ...)
, re DocsName functions with
snake_case
, as recommended by PEP 8.The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an
AttributeError
, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.Your last case,
direct_link_regex
is never reached with valid input since it is handled byimage_regex
.
Improvements:
Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.
Regular expressions are usually expressed using raw strings, that is, strings with an
r
prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.Including a docstring is always a good idea, and you can even embed tests using doctest.
How I would implement this function:
def imgur_url_parser(url):
"""
Parses an imgur url into components.
>>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
True
>>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
True
>>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
True
>>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
True
"""
match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
if not match:
raise ValueError('The string "" is not a valid imgur link'.format(url))
# Empty when this is an image link
if not match.group(2):
# Remove image extension, if it exists
image_id = re.sub(r"(.w+)?$", "", match.group(1))
return "id": image_id, "type": "image"
url_type = match.group(1) == "a" and "album" or "gallery"
return "id": match.group(2), "type": url_type
if __name__ == "__main__":
import doctest
doctest.testmod()
add a comment |Â
up vote
1
down vote
up vote
1
down vote
For a first pass, not bad! Your code is pretty easy to follow.
Problems:
Don't use
to match different strings.
matches any set of characters, so
[imgur|i.imgur]*
will match ``,g
,mgi
, etc. You probably wanted a non-capturing group, which is specified with(?: ...)
, re DocsName functions with
snake_case
, as recommended by PEP 8.The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an
AttributeError
, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.Your last case,
direct_link_regex
is never reached with valid input since it is handled byimage_regex
.
Improvements:
Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.
Regular expressions are usually expressed using raw strings, that is, strings with an
r
prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.Including a docstring is always a good idea, and you can even embed tests using doctest.
How I would implement this function:
def imgur_url_parser(url):
"""
Parses an imgur url into components.
>>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
True
>>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
True
>>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
True
>>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
True
"""
match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
if not match:
raise ValueError('The string "" is not a valid imgur link'.format(url))
# Empty when this is an image link
if not match.group(2):
# Remove image extension, if it exists
image_id = re.sub(r"(.w+)?$", "", match.group(1))
return "id": image_id, "type": "image"
url_type = match.group(1) == "a" and "album" or "gallery"
return "id": match.group(2), "type": url_type
if __name__ == "__main__":
import doctest
doctest.testmod()
For a first pass, not bad! Your code is pretty easy to follow.
Problems:
Don't use
to match different strings.
matches any set of characters, so
[imgur|i.imgur]*
will match ``,g
,mgi
, etc. You probably wanted a non-capturing group, which is specified with(?: ...)
, re DocsName functions with
snake_case
, as recommended by PEP 8.The challenge as stated doesn't specify what should happen if the string passed in doesn't match the link format. Right now your code will throw an
AttributeError
, which isn't very helpful to the caller. I'd recommend raising an explicit exception with a more helpful message.Your last case,
direct_link_regex
is never reached with valid input since it is handled byimage_regex
.
Improvements:
Concatenating the regex to handle each case is somewhat messy. It would be better to have a single regex which handles all cases.
Regular expressions are usually expressed using raw strings, that is, strings with an
r
prefix. This helps with escaping characters correctly. In this case I'm guessing you just got lucky that it worked as you expected.Including a docstring is always a good idea, and you can even embed tests using doctest.
How I would implement this function:
def imgur_url_parser(url):
"""
Parses an imgur url into components.
>>> imgur_url_parser("http://imgur.com/a/cjh4E") == "type": "album", "id": "cjh4E"
True
>>> imgur_url_parser("http://imgur.com/gallery/59npG") == "type": "gallery", "id": "59npG"
True
>>> imgur_url_parser("http://i.imgur.com/altd8Ld.png") == "type": "image", "id": "altd8Ld"
True
>>> imgur_url_parser("http://imgur.com/OzZUNMM") == "type": "image", "id": "OzZUNMM"
True
"""
match = re.match(r"^https?://(?:www.|i.)?imgur.com/([w.]+)/?(w*)$", url)
if not match:
raise ValueError('The string "" is not a valid imgur link'.format(url))
# Empty when this is an image link
if not match.group(2):
# Remove image extension, if it exists
image_id = re.sub(r"(.w+)?$", "", match.group(1))
return "id": image_id, "type": "image"
url_type = match.group(1) == "a" and "album" or "gallery"
return "id": match.group(2), "type": url_type
if __name__ == "__main__":
import doctest
doctest.testmod()
answered 5 hours ago
Gerrit0
2,7401520
2,7401520
add a comment |Â
add a comment |Â
snorkle is a new contributor. Be nice, and check out our Code of Conduct.
snorkle is a new contributor. Be nice, and check out our Code of Conduct.
snorkle is a new contributor. Be nice, and check out our Code of Conduct.
snorkle is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f204316%2fimgur-url-parser%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
This code doesn't run.
direct_link_regex
is a string and strings don't have agroup()
method. Also, the parentheses for that regex are not balanced properly. Perhaps you copied an old version of your code?– JakeD
6 hours ago
1
Edited. Thanks, Jake.
– snorkle
6 hours ago