Subclassed Python Counter for a more visually user-friendly __str_

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
7
down vote

favorite

I wrote a Counter subclass suitable for anagrams, with a more visually user-friendly __str__() method using alphabetic ordering and Unicode superscripts(/subscripts) for letter-counts, so you can compare at-a-glance if two phrases are anagrams or not:

c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

Code below. My questions:

translating integer to superscript(/subscript) is really clunky. Anything more compact?

Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

leave the subscript code in even though it's not currently used

applying .format() to a variable-length list of tuples is a pain, can't just use *args.
- '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable
- '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.
- [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.
- any more elegant approaches?

in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

after I independently wrote this I found Printing subscript in python

it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'Ã¢ÂÂ°'

Code:

import collections

class LetterCounter(collections.Counter):

 subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
 superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
 trans_to_superscript = str.maketrans("0123456789", superscript_digits)
 trans_to_subscript = str.maketrans("0123456789", subscript_digits)

 @classmethod
 def int_to_superscript(cls, n):
 return str(n).translate(cls.trans_to_superscript)
 @classmethod
 def int_to_subscript(cls, n):
 return str(n).translate(cls.trans_to_subscript)

 def __str__(self):
 value_count = sorted(list(self.items()))
 return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code 
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET") 
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

asked 2 days ago

smci

1886

1

I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
â€“Â smci
2 days ago

add a commentÂ |Â

up vote
7
down vote

favorite

c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

Code below. My questions:

translating integer to superscript(/subscript) is really clunky. Anything more compact?

Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

leave the subscript code in even though it's not currently used

applying .format() to a variable-length list of tuples is a pain, can't just use *args.
- '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable
- '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.
- [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.
- any more elegant approaches?

in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

after I independently wrote this I found Printing subscript in python

it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'Ã¢ÂÂ°'

Code:

import collections

class LetterCounter(collections.Counter):

 subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
 superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
 trans_to_superscript = str.maketrans("0123456789", superscript_digits)
 trans_to_subscript = str.maketrans("0123456789", subscript_digits)

 @classmethod
 def int_to_superscript(cls, n):
 return str(n).translate(cls.trans_to_superscript)
 @classmethod
 def int_to_subscript(cls, n):
 return str(n).translate(cls.trans_to_subscript)

 def __str__(self):
 value_count = sorted(list(self.items()))
 return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code 
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET") 
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

asked 2 days ago

smci

1886

1

I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
â€“Â smci
2 days ago

add a commentÂ |Â

up vote
7
down vote

favorite

c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

Code below. My questions:

translating integer to superscript(/subscript) is really clunky. Anything more compact?

Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

leave the subscript code in even though it's not currently used

applying .format() to a variable-length list of tuples is a pain, can't just use *args.
- '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable
- '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.
- [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.
- any more elegant approaches?

in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

after I independently wrote this I found Printing subscript in python

it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'Ã¢ÂÂ°'

Code:

import collections

class LetterCounter(collections.Counter):

 subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
 superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
 trans_to_superscript = str.maketrans("0123456789", superscript_digits)
 trans_to_subscript = str.maketrans("0123456789", subscript_digits)

 @classmethod
 def int_to_superscript(cls, n):
 return str(n).translate(cls.trans_to_superscript)
 @classmethod
 def int_to_subscript(cls, n):
 return str(n).translate(cls.trans_to_subscript)

 def __str__(self):
 value_count = sorted(list(self.items()))
 return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code 
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET") 
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

asked 2 days ago

smci

1886

c1 = LetterCounter("PRESIDENTTRUMP")
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

Code below. My questions:

translating integer to superscript(/subscript) is really clunky. Anything more compact?

Counter.items() actually returns a view. It would be more elegant if it was similarly possible to define sorted_items() as a view instead of taking sorted(list(self.items()))

leave the subscript code in even though it's not currently used

applying .format() to a variable-length list of tuples is a pain, can't just use *args.
- '0[0]0[1]'.format(x for x in value_count) fails with 'generator' object is not subscriptable
- '0[0]0[1] '.format([x for x in value_count]) is wrong, the format is outside the list comprehension.
- [ '0[0]0[1] '.format(x) for x in value_count ] won't allow us call LetterCounter.int_to_superscript() on only the second item of each tuple.
- any more elegant approaches?

in general we're supposed to override __str__() not __repr__(), but if we don't care about pickling why not just override __repr__(), then we can directly see the object just by typing its name, don't need print?

after I independently wrote this I found Printing subscript in python

it seems more friendly for general use/reposting to enter Unicode literals in source as 'u2070' rather than 'Ã¢ÂÂ°'

Code:

import collections

class LetterCounter(collections.Counter):

 subscript_digits = ''.join([chr(0x2080 + d) for d in range(10)])
 superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])
 trans_to_superscript = str.maketrans("0123456789", superscript_digits)
 trans_to_subscript = str.maketrans("0123456789", subscript_digits)

 @classmethod
 def int_to_superscript(cls, n):
 return str(n).translate(cls.trans_to_superscript)
 @classmethod
 def int_to_subscript(cls, n):
 return str(n).translate(cls.trans_to_subscript)

 def __str__(self):
 value_count = sorted(list(self.items()))
 return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

# Test code 
c1, c2 = LetterCounter("PRESIDENTTRUMP") , LetterCounter("MRPUTINSREDPET") 
print(c1)
DÃ‚Â¹ EÃ‚Â² IÃ‚Â¹ MÃ‚Â¹ NÃ‚Â¹ PÃ‚Â² RÃ‚Â² SÃ‚Â¹ TÃ‚Â² UÃ‚Â¹

python unicode

asked 2 days ago

smci

1886

asked 2 days ago

smci

1886

asked 2 days ago

smci

1886

asked 2 days ago

smci

1886

asked 2 days ago

smci

1886

1

I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
â€“Â smci
2 days ago

add a commentÂ |Â

1

I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
â€“Â smci
2 days ago

I guess one improvement is to only call the translation to superscript once, on the concatenated output. We can get away with that in this case because the keys cannot contain digits.
â€“Â smci
2 days ago

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
9
down vote

1. Review

There is no docstring. What kind of object is a LetterCounter?

The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.

The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.

Instead of:

superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])

add an encoding declaration to the start of the source code:

# -*- coding: utf-8 -*-

and write:

SUPERSCRIPT_DIGITS = "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹"

The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.

There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.

Referring to tuple elements by index, for example x[0] and x[1], makes it hard for the reader to understand what the elements are. It is clearer to use tuple unpacking to assign meaningful names to the elements. So instead of:

value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

write something like:

return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
 for key, count in sorted(self.items())

or, if you are happy to use formatted string literals:

return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

2. Revised code

# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹")

class LetterCounter(Counter):
 """Subclass of Counter with readable string conversion using sorted
 ordering of keys and superscript digits for counts.

 >>> print(LetterCounter("HELLOWORLD"))
 DÃ‚Â¹ EÃ‚Â¹ HÃ‚Â¹ LÃ‚Â³ OÃ‚Â² RÃ‚Â¹ WÃ‚Â¹

 """
 def __str__(self):
 return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

3. Answers to questions

If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)

It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.

The pickling protocol uses its own special methods like __getstate__, not __repr__.

There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203443%2fsubclassed-python-counter-for-a-more-visually-user-friendly-str-method-fo%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
9
down vote

1. Review

There is no docstring. What kind of object is a LetterCounter?

The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.

The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.

Instead of:

superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])

add an encoding declaration to the start of the source code:

# -*- coding: utf-8 -*-

and write:

SUPERSCRIPT_DIGITS = "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹"

The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.

There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.

value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

write something like:

return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
 for key, count in sorted(self.items())

or, if you are happy to use formatted string literals:

return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

2. Revised code

# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹")

class LetterCounter(Counter):
 """Subclass of Counter with readable string conversion using sorted
 ordering of keys and superscript digits for counts.

 >>> print(LetterCounter("HELLOWORLD"))
 DÃ‚Â¹ EÃ‚Â¹ HÃ‚Â¹ LÃ‚Â³ OÃ‚Â² RÃ‚Â¹ WÃ‚Â¹

 """
 def __str__(self):
 return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

3. Answers to questions

If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)

It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.

The pickling protocol uses its own special methods like __getstate__, not __repr__.

There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

add a commentÂ |Â

up vote
9
down vote

1. Review

There is no docstring. What kind of object is a LetterCounter?

The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.

The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.

Instead of:

superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])

add an encoding declaration to the start of the source code:

# -*- coding: utf-8 -*-

and write:

SUPERSCRIPT_DIGITS = "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹"

The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.

There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.

value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

write something like:

return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
 for key, count in sorted(self.items())

or, if you are happy to use formatted string literals:

return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

2. Revised code

# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹")

class LetterCounter(Counter):
 """Subclass of Counter with readable string conversion using sorted
 ordering of keys and superscript digits for counts.

 >>> print(LetterCounter("HELLOWORLD"))
 DÃ‚Â¹ EÃ‚Â¹ HÃ‚Â¹ LÃ‚Â³ OÃ‚Â² RÃ‚Â¹ WÃ‚Â¹

 """
 def __str__(self):
 return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

3. Answers to questions

If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)

It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.

The pickling protocol uses its own special methods like __getstate__, not __repr__.

There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

add a commentÂ |Â

up vote
9
down vote

1. Review

There is no docstring. What kind of object is a LetterCounter?

The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.

The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.

Instead of:

superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])

add an encoding declaration to the start of the source code:

# -*- coding: utf-8 -*-

and write:

SUPERSCRIPT_DIGITS = "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹"

The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.

There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.

value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

write something like:

return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
 for key, count in sorted(self.items())

or, if you are happy to use formatted string literals:

return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

2. Revised code

# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹")

class LetterCounter(Counter):
 """Subclass of Counter with readable string conversion using sorted
 ordering of keys and superscript digits for counts.

 >>> print(LetterCounter("HELLOWORLD"))
 DÃ‚Â¹ EÃ‚Â¹ HÃ‚Â¹ LÃ‚Â³ OÃ‚Â² RÃ‚Â¹ WÃ‚Â¹

 """
 def __str__(self):
 return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

3. Answers to questions

If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)

It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.

The pickling protocol uses its own special methods like __getstate__, not __repr__.

There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

1. Review

There is no docstring. What kind of object is a LetterCounter?

The Python style guide recommends restricting lines to 79 characters. If you did this, then we wouldn't have to scroll the code horizontally to read it.

The Python style guide recommends naming constants (like superscript_digits) using ALL_CAPITALS. It's not compulsory to follow this guide but it makes it easier to collaborate with other Python programmers.

Instead of:

superscript_digits = ''.join(['u2070','u00b9','u00b2','u00b3','u2074','u2075','u2076','u2077','u2078','u2079'])

add an encoding declaration to the start of the source code:

# -*- coding: utf-8 -*-

and write:

SUPERSCRIPT_DIGITS = "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹"

The function int_to_superscript is short, and only called once, so it would make sense to inline it at its single point of use.

There is no need for the call to list in sorted(list(self.items())): sorted accepts an iterable.

value_count = sorted(list(self.items()))
return ' '.join( ''.format(x[0], LetterCounter.int_to_superscript(x[1])) for x in value_count )

write something like:

return " ".join("".format(key, str(count).translate(SUPERSCRIPT_TRANS))
 for key, count in sorted(self.items())

or, if you are happy to use formatted string literals:

return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

2. Revised code

# -*- coding: utf-8 -*-

from collections import Counter

# Code point mapping from digits to superscript digits.
SUPERSCRIPT_TRANS = str.maketrans("0123456789", "Ã¢ÂÂ°Ã‚Â¹Ã‚Â²Ã‚Â³Ã¢ÂÂ´Ã¢ÂÂµÃ¢ÂÂ¶Ã¢ÂÂ·Ã¢ÂÂ¸Ã¢ÂÂ¹")

class LetterCounter(Counter):
 """Subclass of Counter with readable string conversion using sorted
 ordering of keys and superscript digits for counts.

 >>> print(LetterCounter("HELLOWORLD"))
 DÃ‚Â¹ EÃ‚Â¹ HÃ‚Â¹ LÃ‚Â³ OÃ‚Â² RÃ‚Â¹ WÃ‚Â¹

 """
 def __str__(self):
 return " ".join(f"keystr(count).translate(SUPERSCRIPT_TRANS)"
 for key, count in sorted(self.items()))

3. Answers to questions

If you want a mapping with sorted views of its keys and items, then you need the sortedcontainers package, which has the SortedItemsView that you are looking for. (The package has no SortedCounter, but it is not difficult to write one by subclassing SortedDict.)

It is rarely a good idea to leave dead code (like the subscripts in your example). The problem is that dead code does not get tested, and so as the live code changes it is easy to forget to make corresponding changes to the dead code, so that when you come to try to resurrect the dead code you find that it is broken.

The pickling protocol uses its own special methods like __getstate__, not __repr__.

There are situations in which it's useful for eval(repr(o)) to return an object similar to o, but otherwise you are free to redefine __repr__ however you like.

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

edited 2 days ago

answered 2 days ago

Gareth Rees

41.8k394168

answered 2 days ago

Gareth Rees

41.8k394168

answered 2 days ago

Gareth Rees

41.8k394168

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

add a commentÂ |Â

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

7,4. Neat, thanks, that was the main thing I was looking for. 4. I would still keep superscript_transinside the class as a class attribute. 2. Per the question I asked, the line length would be fine if I replaced all the literals like 'u2070' with 'Ã¢ÂÂ°' 3. Generally flouted, like string.digits5. I would have but the original formatting line was getting unreadably long, but you solved that. 6. Thanks 7. I knew all about tuple unpacking (viz. mentioning *args), I spent an unsuccessful hour trying to get format() to work with *args 1. The icing on the cake
â€“Â smci
2 days ago

Any response on the question about whether overriding __repr__() for convenience is ok, when we don't care about pickleability? It seems ok to me.
â€“Â smci
2 days ago

You don't have to add the encoding - "If no encoding declaration is found, the default encoding is UTF-8." I don't think Python did this in Python 2 however.
â€“Â Peilonrayz
2 days ago

@Peilonrayz: however you do want to add the encoding for the sake of third-party tools, SCM, difftools etc. Also if people ever view/display it on a console which mangles the Unicode literals, the header will help them understand why
â€“Â smci
yesterday

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky