Subclassed Python Counter for a more visually user-friendly __str__() method, for anagrams

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
7
down vote

favorite












I wrote a Counter subclass suitable for anagrams, with a more visually user-friendly __str__() method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:



c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹


Code below. My questions:



  • translating integer to superscript(/subscript) is really clunky. Anything more compact?


  • Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

  • leave the subscript code in even though it's not currently used


  • applying .format() to a variable-length list of tuples is a pain, can't just use *args.


    • '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable


    • '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.


    • [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.

    • any more elegant approaches?


  • in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

  • after I independently wrote this I found Printing subscript in python

  • it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than '⁰'

Code:



import collections

class LetterCounter(collections.Counter):

subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)

@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)

def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹









share|improve this question

















  • 1




    I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
    – smci
    2 days ago

















up vote
7
down vote

favorite












I wrote a Counter subclass suitable for anagrams, with a more visually user-friendly __str__() method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:



c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹


Code below. My questions:



  • translating integer to superscript(/subscript) is really clunky. Anything more compact?


  • Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

  • leave the subscript code in even though it's not currently used


  • applying .format() to a variable-length list of tuples is a pain, can't just use *args.


    • '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable


    • '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.


    • [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.

    • any more elegant approaches?


  • in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

  • after I independently wrote this I found Printing subscript in python

  • it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than '⁰'

Code:



import collections

class LetterCounter(collections.Counter):

subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)

@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)

def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹









share|improve this question

















  • 1




    I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
    – smci
    2 days ago













up vote
7
down vote

favorite









up vote
7
down vote

favorite











I wrote a Counter subclass suitable for anagrams, with a more visually user-friendly __str__() method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:



c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹


Code below. My questions:



  • translating integer to superscript(/subscript) is really clunky. Anything more compact?


  • Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

  • leave the subscript code in even though it's not currently used


  • applying .format() to a variable-length list of tuples is a pain, can't just use *args.


    • '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable


    • '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.


    • [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.

    • any more elegant approaches?


  • in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

  • after I independently wrote this I found Printing subscript in python

  • it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than '⁰'

Code:



import collections

class LetterCounter(collections.Counter):

subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)

@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)

def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹









share|improve this question













I wrote a Counter subclass suitable for anagrams, with a more visually user-friendly __str__() method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:



c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹


Code below. My questions:



  • translating integer to superscript(/subscript) is really clunky. Anything more compact?


  • Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

  • leave the subscript code in even though it's not currently used


  • applying .format() to a variable-length list of tuples is a pain, can't just use *args.


    • '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable


    • '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.


    • [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.

    • any more elegant approaches?


  • in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

  • after I independently wrote this I found Printing subscript in python

  • it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than '⁰'

Code:



import collections

class LetterCounter(collections.Counter):

subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
trans_to_superscript = str.maketrans("0123456789", superscript_digits)
trans_to_subscript = str.maketrans("0123456789", subscript_digits)

@classmethod
def int_to_superscript(cls, n):
return str(n).translate(cls.trans_to_superscript)
@classmethod
def int_to_subscript(cls, n):
return str(n).translate(cls.trans_to_subscript)

def __str__(self):
value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET")
print(c1)
D¹ E² I¹ M¹ N¹ P² R² S¹ T² U¹






python unicode






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 days ago









smci

1886




1886







  • 1




    I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
    – smci
    2 days ago













  • 1




    I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
    – smci
    2 days ago








1




1




I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago





I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
– smci
2 days ago











1 Answer
1






active

oldest

votes

















up vote
9
down vote













1. Review



  1. There is no docstring. What kind of object is a LetterCounter?


  2. The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.


  3. The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.



  4. Instead of:



    superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])


    add an encoding declaration to the start of the source code:



    # -*- coding: utf-8 -*-


    and write:



    SUPERSCRIPT_DIGITS = "⁰¹²³⁴⁵⁶⁷⁸⁹"


  5. The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.


  6. There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.



  7. Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:



    value_count = sorted(list(self.items()))
    return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )


    write something like:



    return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
    for key, count in sorted(self.items())


    or, if you are happy to use formatted string literals:



    return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
    for key, count in sorted(self.items()))


2. Revised code



# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")

class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.

>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹

"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))


3. Answers to questions



  1. If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)


  2. It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.


  3. The pickling protocol uses its own special methods like __getstate__, not __repr__.


  4. There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.






share|improve this answer






















  • 7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
    – smci
    2 days ago











  • Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
    – smci
    2 days ago











  • You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
    – Peilonrayz
    2 days ago











  • @Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
    – smci
    yesterday











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203443%2fsubclassed-python-counter-for-a-more-visually-user-friendly-str-method-fo%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
9
down vote













1. Review



  1. There is no docstring. What kind of object is a LetterCounter?


  2. The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.


  3. The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.



  4. Instead of:



    superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])


    add an encoding declaration to the start of the source code:



    # -*- coding: utf-8 -*-


    and write:



    SUPERSCRIPT_DIGITS = "⁰¹²³⁴⁵⁶⁷⁸⁹"


  5. The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.


  6. There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.



  7. Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:



    value_count = sorted(list(self.items()))
    return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )


    write something like:



    return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
    for key, count in sorted(self.items())


    or, if you are happy to use formatted string literals:



    return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
    for key, count in sorted(self.items()))


2. Revised code



# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")

class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.

>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹

"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))


3. Answers to questions



  1. If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)


  2. It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.


  3. The pickling protocol uses its own special methods like __getstate__, not __repr__.


  4. There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.






share|improve this answer






















  • 7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
    – smci
    2 days ago











  • Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
    – smci
    2 days ago











  • You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
    – Peilonrayz
    2 days ago











  • @Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
    – smci
    yesterday















up vote
9
down vote













1. Review



  1. There is no docstring. What kind of object is a LetterCounter?


  2. The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.


  3. The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.



  4. Instead of:



    superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])


    add an encoding declaration to the start of the source code:



    # -*- coding: utf-8 -*-


    and write:



    SUPERSCRIPT_DIGITS = "⁰¹²³⁴⁵⁶⁷⁸⁹"


  5. The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.


  6. There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.



  7. Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:



    value_count = sorted(list(self.items()))
    return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )


    write something like:



    return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
    for key, count in sorted(self.items())


    or, if you are happy to use formatted string literals:



    return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
    for key, count in sorted(self.items()))


2. Revised code



# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")

class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.

>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹

"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))


3. Answers to questions



  1. If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)


  2. It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.


  3. The pickling protocol uses its own special methods like __getstate__, not __repr__.


  4. There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.






share|improve this answer






















  • 7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
    – smci
    2 days ago











  • Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
    – smci
    2 days ago











  • You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
    – Peilonrayz
    2 days ago











  • @Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
    – smci
    yesterday













up vote
9
down vote










up vote
9
down vote









1. Review



  1. There is no docstring. What kind of object is a LetterCounter?


  2. The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.


  3. The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.



  4. Instead of:



    superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])


    add an encoding declaration to the start of the source code:



    # -*- coding: utf-8 -*-


    and write:



    SUPERSCRIPT_DIGITS = "⁰¹²³⁴⁵⁶⁷⁸⁹"


  5. The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.


  6. There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.



  7. Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:



    value_count = sorted(list(self.items()))
    return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )


    write something like:



    return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
    for key, count in sorted(self.items())


    or, if you are happy to use formatted string literals:



    return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
    for key, count in sorted(self.items()))


2. Revised code



# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")

class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.

>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹

"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))


3. Answers to questions



  1. If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)


  2. It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.


  3. The pickling protocol uses its own special methods like __getstate__, not __repr__.


  4. There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.






share|improve this answer














1. Review



  1. There is no docstring. What kind of object is a LetterCounter?


  2. The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.


  3. The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.



  4. Instead of:



    superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])


    add an encoding declaration to the start of the source code:



    # -*- coding: utf-8 -*-


    and write:



    SUPERSCRIPT_DIGITS = "⁰¹²³⁴⁵⁶⁷⁸⁹"


  5. The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.


  6. There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.



  7. Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:



    value_count = sorted(list(self.items()))
    return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )


    write something like:



    return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
    for key, count in sorted(self.items())


    or, if you are happy to use formatted string literals:



    return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
    for key, count in sorted(self.items()))


2. Revised code



# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "⁰¹²³⁴⁵⁶⁷⁸⁹")

class LetterCounter(Counter):
"""Subclass of Counter with readable string conversion using sorted
ordering of keys and superscript digits for counts.

>>> print(LetterCounter("HELLOWORLD"))
D¹ E¹ H¹ L³ O² R¹ W¹

"""
def __str__(self):
return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
for key, count in sorted(self.items()))


3. Answers to questions



  1. If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)


  2. It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.


  3. The pickling protocol uses its own special methods like __getstate__, not __repr__.


  4. There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered 2 days ago









Gareth Rees

41.8k394168




41.8k394168











  • 7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
    – smci
    2 days ago











  • Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
    – smci
    2 days ago











  • You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
    – Peilonrayz
    2 days ago











  • @Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
    – smci
    yesterday

















  • 7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
    – smci
    2 days ago











  • Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
    – smci
    2 days ago











  • You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
    – Peilonrayz
    2 days ago











  • @Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
    – smci
    yesterday
















7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago





7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with '⁰' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
– smci
2 days ago













Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago





Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
– smci
2 days ago













You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago





You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
– Peilonrayz
2 days ago













@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday





@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
– smci
yesterday


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203443%2fsubclassed-python-counter-for-a-more-visually-user-friendly-str-method-fo%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery