Why is this loop faster than a dictionary comprehension for creating a dictionary?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite
1












I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?




Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)




and the code



a = 'a':'hi','b':'hey','c':'yo'

b =
for i,j in a.items():
b[j]=i

%% timeit 932 ns ± 37.2 ns per loop

b = v: k for k, v in a.items()

%% timeit 1.08 µs ± 16.4 ns per loop









share|improve this question



















  • 6




    That's a mighty small dictionary to test that with..
    – Martijn Pieters♦
    1 hour ago






  • 2




    Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
    – KarlMW
    1 hour ago











  • When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
    – Martijn Pieters♦
    1 hour ago










  • Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
    – Nadim Younes
    43 mins ago














up vote
7
down vote

favorite
1












I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?




Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)




and the code



a = 'a':'hi','b':'hey','c':'yo'

b =
for i,j in a.items():
b[j]=i

%% timeit 932 ns ± 37.2 ns per loop

b = v: k for k, v in a.items()

%% timeit 1.08 µs ± 16.4 ns per loop









share|improve this question



















  • 6




    That's a mighty small dictionary to test that with..
    – Martijn Pieters♦
    1 hour ago






  • 2




    Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
    – KarlMW
    1 hour ago











  • When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
    – Martijn Pieters♦
    1 hour ago










  • Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
    – Nadim Younes
    43 mins ago












up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?




Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)




and the code



a = 'a':'hi','b':'hey','c':'yo'

b =
for i,j in a.items():
b[j]=i

%% timeit 932 ns ± 37.2 ns per loop

b = v: k for k, v in a.items()

%% timeit 1.08 µs ± 16.4 ns per loop









share|improve this question















I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?




Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)




and the code



a = 'a':'hi','b':'hey','c':'yo'

b =
for i,j in a.items():
b[j]=i

%% timeit 932 ns ± 37.2 ns per loop

b = v: k for k, v in a.items()

%% timeit 1.08 µs ± 16.4 ns per loop






python python-3.x performance python-internals dictionary-comprehension






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 19 mins ago









Martijn Pieters♦

675k11823012168




675k11823012168










asked 1 hour ago









Nadim Younes

745




745







  • 6




    That's a mighty small dictionary to test that with..
    – Martijn Pieters♦
    1 hour ago






  • 2




    Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
    – KarlMW
    1 hour ago











  • When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
    – Martijn Pieters♦
    1 hour ago










  • Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
    – Nadim Younes
    43 mins ago












  • 6




    That's a mighty small dictionary to test that with..
    – Martijn Pieters♦
    1 hour ago






  • 2




    Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
    – KarlMW
    1 hour ago











  • When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
    – Martijn Pieters♦
    1 hour ago










  • Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
    – Nadim Younes
    43 mins ago







6




6




That's a mighty small dictionary to test that with..
– Martijn Pieters♦
1 hour ago




That's a mighty small dictionary to test that with..
– Martijn Pieters♦
1 hour ago




2




2




Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
– KarlMW
1 hour ago





Does similar timing hold true for a larger dictionary? Having only 3 elements is not much of a test. [Edit: beaten to the punch by Martijn! I'm glad I'm not the only one who thought 3 was a small number :-) ]
– KarlMW
1 hour ago













When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
– Martijn Pieters♦
1 hour ago




When I do this with a dictionary with 1000 random keys and values, the dictcomp is marginally slightly faster. But not by much.
– Martijn Pieters♦
1 hour ago












Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
– Nadim Younes
43 mins ago




Thank you so much for all your responses, my apologies I should have tested with a larger dictionary.
– Nadim Younes
43 mins ago












2 Answers
2






active

oldest

votes

















up vote
10
down vote













Two problems: you are testing with way too small an input, and a dictionary comprehension doesn't really have that much of an advantage over a plain for loop, not when the target name is a local variable.



Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:



>>> import timeit
>>> from random import choice, randint; from string import ascii_lowercase as letters
>>> looped = '''
... b =
... for i,j in a.items():
... b[j]=i
... '''
>>> dictcomp = '''b = v: k for k, v in a.items()'''
>>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))])
...
>>> a = rs(): rs() for _ in range(1000)
>>> len(a)
1000
>>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
>>> (total / count) * 1000000 # microseconds per run
66.62004760000855
>>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
>>> (total / count) * 1000000 # microseconds per run
64.5464928005822


The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:



>>> a = rs(): rs() for _ in range(100000)
>>> len(a)
98476
>>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
>>> total / count * 1000 # milliseconds, different scale!
15.48140200029593
>>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
>>> total / count * 1000 # milliseconds, different scale!
13.674790799996117


but that's not that big a difference when you consider both processed nearly 100k key-value pairs.



So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.



Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:



>>> import dis
>>> dis.dis(looped)
1 0 BUILD_MAP 0
2 STORE_NAME 0 (b)

2 4 SETUP_LOOP 28 (to 34)
6 LOAD_NAME 1 (a)
8 LOAD_METHOD 2 (items)
10 CALL_METHOD 0
12 GET_ITER
>> 14 FOR_ITER 16 (to 32)
16 UNPACK_SEQUENCE 2
18 STORE_NAME 3 (i)
20 STORE_NAME 4 (j)

3 22 LOAD_NAME 3 (i)
24 LOAD_NAME 0 (b)
26 LOAD_NAME 4 (j)
28 STORE_SUBSCR
30 JUMP_ABSOLUTE 14
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
>>> dis.dis(dictcomp)
1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (a)
8 LOAD_METHOD 1 (items)
10 CALL_METHOD 0
12 GET_ITER
14 CALL_FUNCTION 1
16 STORE_NAME 2 (b)
18 LOAD_CONST 2 (None)
20 RETURN_VALUE

Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:
1 0 BUILD_MAP 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 14 (to 20)
6 UNPACK_SEQUENCE 2
8 STORE_FAST 1 (k)
10 STORE_FAST 2 (v)
12 LOAD_FAST 1 (k)
14 LOAD_FAST 2 (v)
16 MAP_ADD 2
18 JUMP_ABSOLUTE 4
>> 20 RETURN_VALUE


The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time. But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:



>>> make_and_call = '(lambda i: None)(None)'
>>> dis.dis(make_and_call)
1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<lambda>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 (None)
8 CALL_FUNCTION 1
10 RETURN_VALUE

Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:
1 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
>>> count, total = timeit.Timer(make_and_call).autorange()
>>> total / count * 1000000
0.12945385499915574


More than 0.1 ms to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.



But beyond a few key-value pairs, that cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:



  • take the next key-value pair, pop those on the stack

  • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive



What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:



>>> namespace = 
>>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange()
>>> (total / count) * 1000000
76.72348440100905
>>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange()
>>> (total / count) * 1000000
64.72114819916897
>>> len(namespace['b'])
1000


So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.






share|improve this answer






















  • So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
    – Rook
    45 mins ago











  • This is amazing Martijn. Thank you
    – Nadim Younes
    41 mins ago






  • 2




    @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
    – Martijn Pieters♦
    40 mins ago

















up vote
4
down vote













The reason that it's surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. Here is a



Let's go under the skin of tow snippets:



In [1]: a = 'a':'hi','b':'hey','c':'yo'
...:
...: def reg_loop(a):
...: b =
...: for i,j in a.items():
...: b[j]=i
...:

In [2]: def dict_comp(a):
...: b = v: k for k, v in a.items()
...:

In [3]:

In [3]: %timeit reg_loop(a)
529 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]:

In [4]: %timeit dict_comp(a)
656 ns ± 5.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]:

In [5]: import dis

In [6]: dis.dis(reg_loop)
4 0 BUILD_MAP 0
2 STORE_FAST 1 (b)

5 4 SETUP_LOOP 28 (to 34)
6 LOAD_FAST 0 (a)
8 LOAD_METHOD 0 (items)
10 CALL_METHOD 0
12 GET_ITER
>> 14 FOR_ITER 16 (to 32)
16 UNPACK_SEQUENCE 2
18 STORE_FAST 2 (i)
20 STORE_FAST 3 (j)

6 22 LOAD_FAST 2 (i)
24 LOAD_FAST 1 (b)
26 LOAD_FAST 3 (j)
28 STORE_SUBSCR
30 JUMP_ABSOLUTE 14
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE

In [7]:

In [7]: dis.dis(dict_comp)
2 0 LOAD_CONST 1 (<code object <dictcomp> at 0x7fbada1adf60, file "<ipython-input-2-aac022159794>", line 2>)
2 LOAD_CONST 2 ('dict_comp.<locals>.<dictcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (a)
8 LOAD_METHOD 0 (items)
10 CALL_METHOD 0
12 GET_ITER
14 CALL_FUNCTION 1
16 STORE_FAST 1 (b)
18 LOAD_CONST 0 (None)
20 RETURN_VALUE


On second disassembled code (code using dict comprehension) you have a MAKE_FUNCTION opcode which as it's stated in documentation Pushes a new function object on the stack. and later CALL_FUNCTION which Calls a callable object with positional arguments. and then:




pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.




All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will increase. In other words cost of calling the __setitem__ method of the dictionary from a certain point will exceed from the cost of creating and suspending a dictionary object on the fly.



Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).






share|improve this answer






















    Your Answer





    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52542742%2fwhy-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionar%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    10
    down vote













    Two problems: you are testing with way too small an input, and a dictionary comprehension doesn't really have that much of an advantage over a plain for loop, not when the target name is a local variable.



    Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:



    >>> import timeit
    >>> from random import choice, randint; from string import ascii_lowercase as letters
    >>> looped = '''
    ... b =
    ... for i,j in a.items():
    ... b[j]=i
    ... '''
    >>> dictcomp = '''b = v: k for k, v in a.items()'''
    >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))])
    ...
    >>> a = rs(): rs() for _ in range(1000)
    >>> len(a)
    1000
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    66.62004760000855
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    64.5464928005822


    The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:



    >>> a = rs(): rs() for _ in range(100000)
    >>> len(a)
    98476
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    15.48140200029593
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    13.674790799996117


    but that's not that big a difference when you consider both processed nearly 100k key-value pairs.



    So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.



    Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:



    >>> import dis
    >>> dis.dis(looped)
    1 0 BUILD_MAP 0
    2 STORE_NAME 0 (b)

    2 4 SETUP_LOOP 28 (to 34)
    6 LOAD_NAME 1 (a)
    8 LOAD_METHOD 2 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    >> 14 FOR_ITER 16 (to 32)
    16 UNPACK_SEQUENCE 2
    18 STORE_NAME 3 (i)
    20 STORE_NAME 4 (j)

    3 22 LOAD_NAME 3 (i)
    24 LOAD_NAME 0 (b)
    26 LOAD_NAME 4 (j)
    28 STORE_SUBSCR
    30 JUMP_ABSOLUTE 14
    >> 32 POP_BLOCK
    >> 34 LOAD_CONST 0 (None)
    36 RETURN_VALUE
    >>> dis.dis(dictcomp)
    1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<dictcomp>')
    4 MAKE_FUNCTION 0
    6 LOAD_NAME 0 (a)
    8 LOAD_METHOD 1 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    14 CALL_FUNCTION 1
    16 STORE_NAME 2 (b)
    18 LOAD_CONST 2 (None)
    20 RETURN_VALUE

    Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:
    1 0 BUILD_MAP 0
    2 LOAD_FAST 0 (.0)
    >> 4 FOR_ITER 14 (to 20)
    6 UNPACK_SEQUENCE 2
    8 STORE_FAST 1 (k)
    10 STORE_FAST 2 (v)
    12 LOAD_FAST 1 (k)
    14 LOAD_FAST 2 (v)
    16 MAP_ADD 2
    18 JUMP_ABSOLUTE 4
    >> 20 RETURN_VALUE


    The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time. But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:



    >>> make_and_call = '(lambda i: None)(None)'
    >>> dis.dis(make_and_call)
    1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<lambda>')
    4 MAKE_FUNCTION 0
    6 LOAD_CONST 2 (None)
    8 CALL_FUNCTION 1
    10 RETURN_VALUE

    Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:
    1 0 LOAD_CONST 0 (None)
    2 RETURN_VALUE
    >>> count, total = timeit.Timer(make_and_call).autorange()
    >>> total / count * 1000000
    0.12945385499915574


    More than 0.1 ms to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.



    But beyond a few key-value pairs, that cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:



    • take the next key-value pair, pop those on the stack

    • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

    This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive



    What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:



    >>> namespace = 
    >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    76.72348440100905
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    64.72114819916897
    >>> len(namespace['b'])
    1000


    So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.






    share|improve this answer






















    • So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
      – Rook
      45 mins ago











    • This is amazing Martijn. Thank you
      – Nadim Younes
      41 mins ago






    • 2




      @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
      – Martijn Pieters♦
      40 mins ago














    up vote
    10
    down vote













    Two problems: you are testing with way too small an input, and a dictionary comprehension doesn't really have that much of an advantage over a plain for loop, not when the target name is a local variable.



    Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:



    >>> import timeit
    >>> from random import choice, randint; from string import ascii_lowercase as letters
    >>> looped = '''
    ... b =
    ... for i,j in a.items():
    ... b[j]=i
    ... '''
    >>> dictcomp = '''b = v: k for k, v in a.items()'''
    >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))])
    ...
    >>> a = rs(): rs() for _ in range(1000)
    >>> len(a)
    1000
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    66.62004760000855
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    64.5464928005822


    The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:



    >>> a = rs(): rs() for _ in range(100000)
    >>> len(a)
    98476
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    15.48140200029593
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    13.674790799996117


    but that's not that big a difference when you consider both processed nearly 100k key-value pairs.



    So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.



    Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:



    >>> import dis
    >>> dis.dis(looped)
    1 0 BUILD_MAP 0
    2 STORE_NAME 0 (b)

    2 4 SETUP_LOOP 28 (to 34)
    6 LOAD_NAME 1 (a)
    8 LOAD_METHOD 2 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    >> 14 FOR_ITER 16 (to 32)
    16 UNPACK_SEQUENCE 2
    18 STORE_NAME 3 (i)
    20 STORE_NAME 4 (j)

    3 22 LOAD_NAME 3 (i)
    24 LOAD_NAME 0 (b)
    26 LOAD_NAME 4 (j)
    28 STORE_SUBSCR
    30 JUMP_ABSOLUTE 14
    >> 32 POP_BLOCK
    >> 34 LOAD_CONST 0 (None)
    36 RETURN_VALUE
    >>> dis.dis(dictcomp)
    1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<dictcomp>')
    4 MAKE_FUNCTION 0
    6 LOAD_NAME 0 (a)
    8 LOAD_METHOD 1 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    14 CALL_FUNCTION 1
    16 STORE_NAME 2 (b)
    18 LOAD_CONST 2 (None)
    20 RETURN_VALUE

    Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:
    1 0 BUILD_MAP 0
    2 LOAD_FAST 0 (.0)
    >> 4 FOR_ITER 14 (to 20)
    6 UNPACK_SEQUENCE 2
    8 STORE_FAST 1 (k)
    10 STORE_FAST 2 (v)
    12 LOAD_FAST 1 (k)
    14 LOAD_FAST 2 (v)
    16 MAP_ADD 2
    18 JUMP_ABSOLUTE 4
    >> 20 RETURN_VALUE


    The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time. But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:



    >>> make_and_call = '(lambda i: None)(None)'
    >>> dis.dis(make_and_call)
    1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<lambda>')
    4 MAKE_FUNCTION 0
    6 LOAD_CONST 2 (None)
    8 CALL_FUNCTION 1
    10 RETURN_VALUE

    Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:
    1 0 LOAD_CONST 0 (None)
    2 RETURN_VALUE
    >>> count, total = timeit.Timer(make_and_call).autorange()
    >>> total / count * 1000000
    0.12945385499915574


    More than 0.1 ms to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.



    But beyond a few key-value pairs, that cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:



    • take the next key-value pair, pop those on the stack

    • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

    This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive



    What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:



    >>> namespace = 
    >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    76.72348440100905
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    64.72114819916897
    >>> len(namespace['b'])
    1000


    So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.






    share|improve this answer






















    • So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
      – Rook
      45 mins ago











    • This is amazing Martijn. Thank you
      – Nadim Younes
      41 mins ago






    • 2




      @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
      – Martijn Pieters♦
      40 mins ago












    up vote
    10
    down vote










    up vote
    10
    down vote









    Two problems: you are testing with way too small an input, and a dictionary comprehension doesn't really have that much of an advantage over a plain for loop, not when the target name is a local variable.



    Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:



    >>> import timeit
    >>> from random import choice, randint; from string import ascii_lowercase as letters
    >>> looped = '''
    ... b =
    ... for i,j in a.items():
    ... b[j]=i
    ... '''
    >>> dictcomp = '''b = v: k for k, v in a.items()'''
    >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))])
    ...
    >>> a = rs(): rs() for _ in range(1000)
    >>> len(a)
    1000
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    66.62004760000855
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    64.5464928005822


    The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:



    >>> a = rs(): rs() for _ in range(100000)
    >>> len(a)
    98476
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    15.48140200029593
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    13.674790799996117


    but that's not that big a difference when you consider both processed nearly 100k key-value pairs.



    So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.



    Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:



    >>> import dis
    >>> dis.dis(looped)
    1 0 BUILD_MAP 0
    2 STORE_NAME 0 (b)

    2 4 SETUP_LOOP 28 (to 34)
    6 LOAD_NAME 1 (a)
    8 LOAD_METHOD 2 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    >> 14 FOR_ITER 16 (to 32)
    16 UNPACK_SEQUENCE 2
    18 STORE_NAME 3 (i)
    20 STORE_NAME 4 (j)

    3 22 LOAD_NAME 3 (i)
    24 LOAD_NAME 0 (b)
    26 LOAD_NAME 4 (j)
    28 STORE_SUBSCR
    30 JUMP_ABSOLUTE 14
    >> 32 POP_BLOCK
    >> 34 LOAD_CONST 0 (None)
    36 RETURN_VALUE
    >>> dis.dis(dictcomp)
    1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<dictcomp>')
    4 MAKE_FUNCTION 0
    6 LOAD_NAME 0 (a)
    8 LOAD_METHOD 1 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    14 CALL_FUNCTION 1
    16 STORE_NAME 2 (b)
    18 LOAD_CONST 2 (None)
    20 RETURN_VALUE

    Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:
    1 0 BUILD_MAP 0
    2 LOAD_FAST 0 (.0)
    >> 4 FOR_ITER 14 (to 20)
    6 UNPACK_SEQUENCE 2
    8 STORE_FAST 1 (k)
    10 STORE_FAST 2 (v)
    12 LOAD_FAST 1 (k)
    14 LOAD_FAST 2 (v)
    16 MAP_ADD 2
    18 JUMP_ABSOLUTE 4
    >> 20 RETURN_VALUE


    The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time. But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:



    >>> make_and_call = '(lambda i: None)(None)'
    >>> dis.dis(make_and_call)
    1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<lambda>')
    4 MAKE_FUNCTION 0
    6 LOAD_CONST 2 (None)
    8 CALL_FUNCTION 1
    10 RETURN_VALUE

    Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:
    1 0 LOAD_CONST 0 (None)
    2 RETURN_VALUE
    >>> count, total = timeit.Timer(make_and_call).autorange()
    >>> total / count * 1000000
    0.12945385499915574


    More than 0.1 ms to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.



    But beyond a few key-value pairs, that cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:



    • take the next key-value pair, pop those on the stack

    • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

    This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive



    What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:



    >>> namespace = 
    >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    76.72348440100905
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    64.72114819916897
    >>> len(namespace['b'])
    1000


    So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.






    share|improve this answer














    Two problems: you are testing with way too small an input, and a dictionary comprehension doesn't really have that much of an advantage over a plain for loop, not when the target name is a local variable.



    Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:



    >>> import timeit
    >>> from random import choice, randint; from string import ascii_lowercase as letters
    >>> looped = '''
    ... b =
    ... for i,j in a.items():
    ... b[j]=i
    ... '''
    >>> dictcomp = '''b = v: k for k, v in a.items()'''
    >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))])
    ...
    >>> a = rs(): rs() for _ in range(1000)
    >>> len(a)
    1000
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    66.62004760000855
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> (total / count) * 1000000 # microseconds per run
    64.5464928005822


    The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:



    >>> a = rs(): rs() for _ in range(100000)
    >>> len(a)
    98476
    >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    15.48140200029593
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange()
    >>> total / count * 1000 # milliseconds, different scale!
    13.674790799996117


    but that's not that big a difference when you consider both processed nearly 100k key-value pairs.



    So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.



    Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:



    >>> import dis
    >>> dis.dis(looped)
    1 0 BUILD_MAP 0
    2 STORE_NAME 0 (b)

    2 4 SETUP_LOOP 28 (to 34)
    6 LOAD_NAME 1 (a)
    8 LOAD_METHOD 2 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    >> 14 FOR_ITER 16 (to 32)
    16 UNPACK_SEQUENCE 2
    18 STORE_NAME 3 (i)
    20 STORE_NAME 4 (j)

    3 22 LOAD_NAME 3 (i)
    24 LOAD_NAME 0 (b)
    26 LOAD_NAME 4 (j)
    28 STORE_SUBSCR
    30 JUMP_ABSOLUTE 14
    >> 32 POP_BLOCK
    >> 34 LOAD_CONST 0 (None)
    36 RETURN_VALUE
    >>> dis.dis(dictcomp)
    1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<dictcomp>')
    4 MAKE_FUNCTION 0
    6 LOAD_NAME 0 (a)
    8 LOAD_METHOD 1 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    14 CALL_FUNCTION 1
    16 STORE_NAME 2 (b)
    18 LOAD_CONST 2 (None)
    20 RETURN_VALUE

    Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:
    1 0 BUILD_MAP 0
    2 LOAD_FAST 0 (.0)
    >> 4 FOR_ITER 14 (to 20)
    6 UNPACK_SEQUENCE 2
    8 STORE_FAST 1 (k)
    10 STORE_FAST 2 (v)
    12 LOAD_FAST 1 (k)
    14 LOAD_FAST 2 (v)
    16 MAP_ADD 2
    18 JUMP_ABSOLUTE 4
    >> 20 RETURN_VALUE


    The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time. But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:



    >>> make_and_call = '(lambda i: None)(None)'
    >>> dis.dis(make_and_call)
    1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)
    2 LOAD_CONST 1 ('<lambda>')
    4 MAKE_FUNCTION 0
    6 LOAD_CONST 2 (None)
    8 CALL_FUNCTION 1
    10 RETURN_VALUE

    Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:
    1 0 LOAD_CONST 0 (None)
    2 RETURN_VALUE
    >>> count, total = timeit.Timer(make_and_call).autorange()
    >>> total / count * 1000000
    0.12945385499915574


    More than 0.1 ms to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.



    But beyond a few key-value pairs, that cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:



    • take the next key-value pair, pop those on the stack

    • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

    This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive



    What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:



    >>> namespace = 
    >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    76.72348440100905
    >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange()
    >>> (total / count) * 1000000
    64.72114819916897
    >>> len(namespace['b'])
    1000


    So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 16 mins ago

























    answered 55 mins ago









    Martijn Pieters♦

    675k11823012168




    675k11823012168











    • So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
      – Rook
      45 mins ago











    • This is amazing Martijn. Thank you
      – Nadim Younes
      41 mins ago






    • 2




      @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
      – Martijn Pieters♦
      40 mins ago
















    • So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
      – Rook
      45 mins ago











    • This is amazing Martijn. Thank you
      – Nadim Younes
      41 mins ago






    • 2




      @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
      – Martijn Pieters♦
      40 mins ago















    So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
    – Rook
    45 mins ago





    So from what you are saying, Martijn, can it be concluded that comprehensions are really just for improving readability of code? And from the last segment in your answer, it also seems as though all dictionary comprehensions can be written with for loops but not vice-versa?
    – Rook
    45 mins ago













    This is amazing Martijn. Thank you
    – Nadim Younes
    41 mins ago




    This is amazing Martijn. Thank you
    – Nadim Younes
    41 mins ago




    2




    2




    @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
    – Martijn Pieters♦
    40 mins ago




    @Rook: dictionary comprehensions only have a small speed advantage inside a function. Other comprehensions have a better chance to be faster, provided you use them for their purpose, so building a list or a dict or a set. All comprehensions can be written with a regular loop, but that's not being discussed here.
    – Martijn Pieters♦
    40 mins ago












    up vote
    4
    down vote













    The reason that it's surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. Here is a



    Let's go under the skin of tow snippets:



    In [1]: a = 'a':'hi','b':'hey','c':'yo'
    ...:
    ...: def reg_loop(a):
    ...: b =
    ...: for i,j in a.items():
    ...: b[j]=i
    ...:

    In [2]: def dict_comp(a):
    ...: b = v: k for k, v in a.items()
    ...:

    In [3]:

    In [3]: %timeit reg_loop(a)
    529 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [4]:

    In [4]: %timeit dict_comp(a)
    656 ns ± 5.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [5]:

    In [5]: import dis

    In [6]: dis.dis(reg_loop)
    4 0 BUILD_MAP 0
    2 STORE_FAST 1 (b)

    5 4 SETUP_LOOP 28 (to 34)
    6 LOAD_FAST 0 (a)
    8 LOAD_METHOD 0 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    >> 14 FOR_ITER 16 (to 32)
    16 UNPACK_SEQUENCE 2
    18 STORE_FAST 2 (i)
    20 STORE_FAST 3 (j)

    6 22 LOAD_FAST 2 (i)
    24 LOAD_FAST 1 (b)
    26 LOAD_FAST 3 (j)
    28 STORE_SUBSCR
    30 JUMP_ABSOLUTE 14
    >> 32 POP_BLOCK
    >> 34 LOAD_CONST 0 (None)
    36 RETURN_VALUE

    In [7]:

    In [7]: dis.dis(dict_comp)
    2 0 LOAD_CONST 1 (<code object <dictcomp> at 0x7fbada1adf60, file "<ipython-input-2-aac022159794>", line 2>)
    2 LOAD_CONST 2 ('dict_comp.<locals>.<dictcomp>')
    4 MAKE_FUNCTION 0
    6 LOAD_FAST 0 (a)
    8 LOAD_METHOD 0 (items)
    10 CALL_METHOD 0
    12 GET_ITER
    14 CALL_FUNCTION 1
    16 STORE_FAST 1 (b)
    18 LOAD_CONST 0 (None)
    20 RETURN_VALUE


    On second disassembled code (code using dict comprehension) you have a MAKE_FUNCTION opcode which as it's stated in documentation Pushes a new function object on the stack. and later CALL_FUNCTION which Calls a callable object with positional arguments. and then:




    pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.




    All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will increase. In other words cost of calling the __setitem__ method of the dictionary from a certain point will exceed from the cost of creating and suspending a dictionary object on the fly.



    Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).






    share|improve this answer


























      up vote
      4
      down vote













      The reason that it's surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. Here is a



      Let's go under the skin of tow snippets:



      In [1]: a = 'a':'hi','b':'hey','c':'yo'
      ...:
      ...: def reg_loop(a):
      ...: b =
      ...: for i,j in a.items():
      ...: b[j]=i
      ...:

      In [2]: def dict_comp(a):
      ...: b = v: k for k, v in a.items()
      ...:

      In [3]:

      In [3]: %timeit reg_loop(a)
      529 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

      In [4]:

      In [4]: %timeit dict_comp(a)
      656 ns ± 5.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

      In [5]:

      In [5]: import dis

      In [6]: dis.dis(reg_loop)
      4 0 BUILD_MAP 0
      2 STORE_FAST 1 (b)

      5 4 SETUP_LOOP 28 (to 34)
      6 LOAD_FAST 0 (a)
      8 LOAD_METHOD 0 (items)
      10 CALL_METHOD 0
      12 GET_ITER
      >> 14 FOR_ITER 16 (to 32)
      16 UNPACK_SEQUENCE 2
      18 STORE_FAST 2 (i)
      20 STORE_FAST 3 (j)

      6 22 LOAD_FAST 2 (i)
      24 LOAD_FAST 1 (b)
      26 LOAD_FAST 3 (j)
      28 STORE_SUBSCR
      30 JUMP_ABSOLUTE 14
      >> 32 POP_BLOCK
      >> 34 LOAD_CONST 0 (None)
      36 RETURN_VALUE

      In [7]:

      In [7]: dis.dis(dict_comp)
      2 0 LOAD_CONST 1 (<code object <dictcomp> at 0x7fbada1adf60, file "<ipython-input-2-aac022159794>", line 2>)
      2 LOAD_CONST 2 ('dict_comp.<locals>.<dictcomp>')
      4 MAKE_FUNCTION 0
      6 LOAD_FAST 0 (a)
      8 LOAD_METHOD 0 (items)
      10 CALL_METHOD 0
      12 GET_ITER
      14 CALL_FUNCTION 1
      16 STORE_FAST 1 (b)
      18 LOAD_CONST 0 (None)
      20 RETURN_VALUE


      On second disassembled code (code using dict comprehension) you have a MAKE_FUNCTION opcode which as it's stated in documentation Pushes a new function object on the stack. and later CALL_FUNCTION which Calls a callable object with positional arguments. and then:




      pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.




      All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will increase. In other words cost of calling the __setitem__ method of the dictionary from a certain point will exceed from the cost of creating and suspending a dictionary object on the fly.



      Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).






      share|improve this answer
























        up vote
        4
        down vote










        up vote
        4
        down vote









        The reason that it's surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. Here is a



        Let's go under the skin of tow snippets:



        In [1]: a = 'a':'hi','b':'hey','c':'yo'
        ...:
        ...: def reg_loop(a):
        ...: b =
        ...: for i,j in a.items():
        ...: b[j]=i
        ...:

        In [2]: def dict_comp(a):
        ...: b = v: k for k, v in a.items()
        ...:

        In [3]:

        In [3]: %timeit reg_loop(a)
        529 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

        In [4]:

        In [4]: %timeit dict_comp(a)
        656 ns ± 5.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

        In [5]:

        In [5]: import dis

        In [6]: dis.dis(reg_loop)
        4 0 BUILD_MAP 0
        2 STORE_FAST 1 (b)

        5 4 SETUP_LOOP 28 (to 34)
        6 LOAD_FAST 0 (a)
        8 LOAD_METHOD 0 (items)
        10 CALL_METHOD 0
        12 GET_ITER
        >> 14 FOR_ITER 16 (to 32)
        16 UNPACK_SEQUENCE 2
        18 STORE_FAST 2 (i)
        20 STORE_FAST 3 (j)

        6 22 LOAD_FAST 2 (i)
        24 LOAD_FAST 1 (b)
        26 LOAD_FAST 3 (j)
        28 STORE_SUBSCR
        30 JUMP_ABSOLUTE 14
        >> 32 POP_BLOCK
        >> 34 LOAD_CONST 0 (None)
        36 RETURN_VALUE

        In [7]:

        In [7]: dis.dis(dict_comp)
        2 0 LOAD_CONST 1 (<code object <dictcomp> at 0x7fbada1adf60, file "<ipython-input-2-aac022159794>", line 2>)
        2 LOAD_CONST 2 ('dict_comp.<locals>.<dictcomp>')
        4 MAKE_FUNCTION 0
        6 LOAD_FAST 0 (a)
        8 LOAD_METHOD 0 (items)
        10 CALL_METHOD 0
        12 GET_ITER
        14 CALL_FUNCTION 1
        16 STORE_FAST 1 (b)
        18 LOAD_CONST 0 (None)
        20 RETURN_VALUE


        On second disassembled code (code using dict comprehension) you have a MAKE_FUNCTION opcode which as it's stated in documentation Pushes a new function object on the stack. and later CALL_FUNCTION which Calls a callable object with positional arguments. and then:




        pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.




        All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will increase. In other words cost of calling the __setitem__ method of the dictionary from a certain point will exceed from the cost of creating and suspending a dictionary object on the fly.



        Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).






        share|improve this answer














        The reason that it's surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. Here is a



        Let's go under the skin of tow snippets:



        In [1]: a = 'a':'hi','b':'hey','c':'yo'
        ...:
        ...: def reg_loop(a):
        ...: b =
        ...: for i,j in a.items():
        ...: b[j]=i
        ...:

        In [2]: def dict_comp(a):
        ...: b = v: k for k, v in a.items()
        ...:

        In [3]:

        In [3]: %timeit reg_loop(a)
        529 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

        In [4]:

        In [4]: %timeit dict_comp(a)
        656 ns ± 5.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

        In [5]:

        In [5]: import dis

        In [6]: dis.dis(reg_loop)
        4 0 BUILD_MAP 0
        2 STORE_FAST 1 (b)

        5 4 SETUP_LOOP 28 (to 34)
        6 LOAD_FAST 0 (a)
        8 LOAD_METHOD 0 (items)
        10 CALL_METHOD 0
        12 GET_ITER
        >> 14 FOR_ITER 16 (to 32)
        16 UNPACK_SEQUENCE 2
        18 STORE_FAST 2 (i)
        20 STORE_FAST 3 (j)

        6 22 LOAD_FAST 2 (i)
        24 LOAD_FAST 1 (b)
        26 LOAD_FAST 3 (j)
        28 STORE_SUBSCR
        30 JUMP_ABSOLUTE 14
        >> 32 POP_BLOCK
        >> 34 LOAD_CONST 0 (None)
        36 RETURN_VALUE

        In [7]:

        In [7]: dis.dis(dict_comp)
        2 0 LOAD_CONST 1 (<code object <dictcomp> at 0x7fbada1adf60, file "<ipython-input-2-aac022159794>", line 2>)
        2 LOAD_CONST 2 ('dict_comp.<locals>.<dictcomp>')
        4 MAKE_FUNCTION 0
        6 LOAD_FAST 0 (a)
        8 LOAD_METHOD 0 (items)
        10 CALL_METHOD 0
        12 GET_ITER
        14 CALL_FUNCTION 1
        16 STORE_FAST 1 (b)
        18 LOAD_CONST 0 (None)
        20 RETURN_VALUE


        On second disassembled code (code using dict comprehension) you have a MAKE_FUNCTION opcode which as it's stated in documentation Pushes a new function object on the stack. and later CALL_FUNCTION which Calls a callable object with positional arguments. and then:




        pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.




        All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will increase. In other words cost of calling the __setitem__ method of the dictionary from a certain point will exceed from the cost of creating and suspending a dictionary object on the fly.



        Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 14 mins ago

























        answered 43 mins ago









        Kasrâmvd

        75.6k982115




        75.6k982115



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52542742%2fwhy-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionar%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            What does second last employer means? [closed]

            One-line joke