Subclassed Python Counter for a more visually user-friendly __str__() method, for anagrams
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
7
down vote
favorite
I wrote a Counter
subclass suitable for anagrams, with a more visually user-friendly __str__()
method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:
c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
Code below. My questions:
- translating integer to superscript(/subscript) is really clunky. Anything more compact?
Counter.items()
actually returns a view. It would be more elegant if it was similarly possible to definesorted_items()
as a view instead of takingsorted(list(self.items()))
- leave the subscript code in even though it's not currently used
applying.format()
to a variable-length list of tuples is a pain, can't just use *args.'0[0]0[1]'.format(x for x in value_count)
fails with'generator' object is not subscriptable
'0[0]0[1] '.format([x for x in value_count])
is wrong, the format is outside the list comprehension.[ '0[0]0[1] '.format(x) for x in value_count ]
won't allow us callLetterCounter.int_to_superscript()
on only the second item of each tuple.- any more elegant approaches?
- in general we're supposed to override
__str__()
not__repr__()
, but if we don't care about pickling why not just override__repr__()
, then we can directly see the object just by typing its name, don't need print? - after I independently wrote this I found Printing subscript in python
- it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'â°'
Code:
import collections
class LetterCounter(collections.Counter):
subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)
@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)
def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )
# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
python unicode
add a comment |Â
up vote
7
down vote
favorite
I wrote a Counter
subclass suitable for anagrams, with a more visually user-friendly __str__()
method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:
c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
Code below. My questions:
- translating integer to superscript(/subscript) is really clunky. Anything more compact?
Counter.items()
actually returns a view. It would be more elegant if it was similarly possible to definesorted_items()
as a view instead of takingsorted(list(self.items()))
- leave the subscript code in even though it's not currently used
applying.format()
to a variable-length list of tuples is a pain, can't just use *args.'0[0]0[1]'.format(x for x in value_count)
fails with'generator' object is not subscriptable
'0[0]0[1] '.format([x for x in value_count])
is wrong, the format is outside the list comprehension.[ '0[0]0[1] '.format(x) for x in value_count ]
won't allow us callLetterCounter.int_to_superscript()
on only the second item of each tuple.- any more elegant approaches?
- in general we're supposed to override
__str__()
not__repr__()
, but if we don't care about pickling why not just override__repr__()
, then we can directly see the object just by typing its name, don't need print? - after I independently wrote this I found Printing subscript in python
- it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'â°'
Code:
import collections
class LetterCounter(collections.Counter):
subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)
@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)
def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )
# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
python unicode
1
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I wrote a Counter
subclass suitable for anagrams, with a more visually user-friendly __str__()
method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:
c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
Code below. My questions:
- translating integer to superscript(/subscript) is really clunky. Anything more compact?
Counter.items()
actually returns a view. It would be more elegant if it was similarly possible to definesorted_items()
as a view instead of takingsorted(list(self.items()))
- leave the subscript code in even though it's not currently used
applying.format()
to a variable-length list of tuples is a pain, can't just use *args.'0[0]0[1]'.format(x for x in value_count)
fails with'generator' object is not subscriptable
'0[0]0[1] '.format([x for x in value_count])
is wrong, the format is outside the list comprehension.[ '0[0]0[1] '.format(x) for x in value_count ]
won't allow us callLetterCounter.int_to_superscript()
on only the second item of each tuple.- any more elegant approaches?
- in general we're supposed to override
__str__()
not__repr__()
, but if we don't care about pickling why not just override__repr__()
, then we can directly see the object just by typing its name, don't need print? - after I independently wrote this I found Printing subscript in python
- it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'â°'
Code:
import collections
class LetterCounter(collections.Counter):
subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)
@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)
def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )
# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
python unicode
I wrote a Counter
subclass suitable for anagrams, with a more visually user-friendly __str__()
method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:
c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
Code below. My questions:
- translating integer to superscript(/subscript) is really clunky. Anything more compact?
Counter.items()
actually returns a view. It would be more elegant if it was similarly possible to definesorted_items()
as a view instead of takingsorted(list(self.items()))
- leave the subscript code in even though it's not currently used
applying.format()
to a variable-length list of tuples is a pain, can't just use *args.'0[0]0[1]'.format(x for x in value_count)
fails with'generator' object is not subscriptable
'0[0]0[1] '.format([x for x in value_count])
is wrong, the format is outside the list comprehension.[ '0[0]0[1] '.format(x) for x in value_count ]
won't allow us callLetterCounter.int_to_superscript()
on only the second item of each tuple.- any more elegant approaches?
- in general we're supposed to override
__str__()
not__repr__()
, but if we don't care about pickling why not just override__repr__()
, then we can directly see the object just by typing its name, don't need print? - after I independently wrote this I found Printing subscript in python
- it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'â°'
Code:
import collections
class LetterCounter(collections.Counter):
subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)
@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)
def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )
# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹
python unicode
python unicode
asked 2 days ago
smci
1886
1886
1
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago
add a comment |Â
1
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago
1
1
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
9
down vote
1. Review
There is no docstring. What kind of object is a
LetterCounter
?The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.
The Python style guide recommends naming constants (like
superscript_digits
) usingALL_CAPITALS
. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.Instead of:
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
add an encoding declaration to the start of the source code:
# -*- coding: utf-8 -*-
and write:
SUPERSCRIPT_DIGITS = "â°¹²³â´âµâ¶â·â¸â¹"
The function
int_to_superscript
is short, and only called once, so it would make sense to inline it at its single point of use.There is no need for the call to
list
insorted(list(self.items()))
:sorted
accepts an iterable.Referring to tuple elements by index, for example
x[0]
andx[1]
, makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )write something like:
return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
for key, count in sorted(self.items())or, if you are happy to use formatted string literals:
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
2. Revised code
# -*- coding: utf-8 -*-
from collections import Counter
# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "â°¹²³â´âµâ¶â·â¸â¹")
class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.
>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹
"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
3. Answers to questions
If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the
SortedItemsView
that you are looking for. (The package has noSortedCounter
, but it is not difficult to write one by subclassingSortedDict
.)It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.
The pickling protocol uses its own special methods like
__getstate__
, not__repr__
.There are situations in which it's useful for
eval(repr(o))
to return an object similar too
, but otherwise you are free to redefine__repr__
however you like.
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keepsuperscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, likestring.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
1. Review
There is no docstring. What kind of object is a
LetterCounter
?The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.
The Python style guide recommends naming constants (like
superscript_digits
) usingALL_CAPITALS
. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.Instead of:
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
add an encoding declaration to the start of the source code:
# -*- coding: utf-8 -*-
and write:
SUPERSCRIPT_DIGITS = "â°¹²³â´âµâ¶â·â¸â¹"
The function
int_to_superscript
is short, and only called once, so it would make sense to inline it at its single point of use.There is no need for the call to
list
insorted(list(self.items()))
:sorted
accepts an iterable.Referring to tuple elements by index, for example
x[0]
andx[1]
, makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )write something like:
return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
for key, count in sorted(self.items())or, if you are happy to use formatted string literals:
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
2. Revised code
# -*- coding: utf-8 -*-
from collections import Counter
# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "â°¹²³â´âµâ¶â·â¸â¹")
class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.
>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹
"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
3. Answers to questions
If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the
SortedItemsView
that you are looking for. (The package has noSortedCounter
, but it is not difficult to write one by subclassingSortedDict
.)It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.
The pickling protocol uses its own special methods like
__getstate__
, not__repr__
.There are situations in which it's useful for
eval(repr(o))
to return an object similar too
, but otherwise you are free to redefine__repr__
however you like.
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keepsuperscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, likestring.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
add a comment |Â
up vote
9
down vote
1. Review
There is no docstring. What kind of object is a
LetterCounter
?The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.
The Python style guide recommends naming constants (like
superscript_digits
) usingALL_CAPITALS
. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.Instead of:
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
add an encoding declaration to the start of the source code:
# -*- coding: utf-8 -*-
and write:
SUPERSCRIPT_DIGITS = "â°¹²³â´âµâ¶â·â¸â¹"
The function
int_to_superscript
is short, and only called once, so it would make sense to inline it at its single point of use.There is no need for the call to
list
insorted(list(self.items()))
:sorted
accepts an iterable.Referring to tuple elements by index, for example
x[0]
andx[1]
, makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )write something like:
return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
for key, count in sorted(self.items())or, if you are happy to use formatted string literals:
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
2. Revised code
# -*- coding: utf-8 -*-
from collections import Counter
# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "â°¹²³â´âµâ¶â·â¸â¹")
class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.
>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹
"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
3. Answers to questions
If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the
SortedItemsView
that you are looking for. (The package has noSortedCounter
, but it is not difficult to write one by subclassingSortedDict
.)It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.
The pickling protocol uses its own special methods like
__getstate__
, not__repr__
.There are situations in which it's useful for
eval(repr(o))
to return an object similar too
, but otherwise you are free to redefine__repr__
however you like.
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keepsuperscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, likestring.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
add a comment |Â
up vote
9
down vote
up vote
9
down vote
1. Review
There is no docstring. What kind of object is a
LetterCounter
?The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.
The Python style guide recommends naming constants (like
superscript_digits
) usingALL_CAPITALS
. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.Instead of:
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
add an encoding declaration to the start of the source code:
# -*- coding: utf-8 -*-
and write:
SUPERSCRIPT_DIGITS = "â°¹²³â´âµâ¶â·â¸â¹"
The function
int_to_superscript
is short, and only called once, so it would make sense to inline it at its single point of use.There is no need for the call to
list
insorted(list(self.items()))
:sorted
accepts an iterable.Referring to tuple elements by index, for example
x[0]
andx[1]
, makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )write something like:
return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
for key, count in sorted(self.items())or, if you are happy to use formatted string literals:
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
2. Revised code
# -*- coding: utf-8 -*-
from collections import Counter
# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "â°¹²³â´âµâ¶â·â¸â¹")
class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.
>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹
"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
3. Answers to questions
If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the
SortedItemsView
that you are looking for. (The package has noSortedCounter
, but it is not difficult to write one by subclassingSortedDict
.)It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.
The pickling protocol uses its own special methods like
__getstate__
, not__repr__
.There are situations in which it's useful for
eval(repr(o))
to return an object similar too
, but otherwise you are free to redefine__repr__
however you like.
1. Review
There is no docstring. What kind of object is a
LetterCounter
?The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.
The Python style guide recommends naming constants (like
superscript_digits
) usingALL_CAPITALS
. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.Instead of:
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
add an encoding declaration to the start of the source code:
# -*- coding: utf-8 -*-
and write:
SUPERSCRIPT_DIGITS = "â°¹²³â´âµâ¶â·â¸â¹"
The function
int_to_superscript
is short, and only called once, so it would make sense to inline it at its single point of use.There is no need for the call to
list
insorted(list(self.items()))
:sorted
accepts an iterable.Referring to tuple elements by index, for example
x[0]
andx[1]
, makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )write something like:
return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
for key, count in sorted(self.items())or, if you are happy to use formatted string literals:
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
2. Revised code
# -*- coding: utf-8 -*-
from collections import Counter
# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "â°¹²³â´âµâ¶â·â¸â¹")
class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.
>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹
"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))
3. Answers to questions
If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the
SortedItemsView
that you are looking for. (The package has noSortedCounter
, but it is not difficult to write one by subclassingSortedDict
.)It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.
The pickling protocol uses its own special methods like
__getstate__
, not__repr__
.There are situations in which it's useful for
eval(repr(o))
to return an object similar too
, but otherwise you are free to redefine__repr__
however you like.
edited 2 days ago
answered 2 days ago
Gareth Rees
41.8k394168
41.8k394168
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keepsuperscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, likestring.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
add a comment |Â
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keepsuperscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, likestring.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep
superscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, like string.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake– smci
2 days ago
7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep
superscript_trans
inside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'â°' 3. Generally flouted, like string.digits
5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203443%2fsubclassed-python-counter-for-a-more-visually-user-friendly-str-method-fo%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago