Accessing dictionary items by position in Python 3.6+ efficiently

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.



Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:



  1. Convert to a list via an O(n) process and then use list.__getitem__.


  2. enumerate dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.

Since getting an item from a list has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict or collections.OrderedDict would work.



If it's not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?










share|improve this question





















  • It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
    – o11c
    3 hours ago










  • I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
    – roganjosh
    3 hours ago











  • @o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
    – jpp
    3 hours ago







  • 2




    @o11c it is not implemented as a linked-list
    – juanpa.arrivillaga
    3 hours ago






  • 1




    I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
    – juanpa.arrivillaga
    3 hours ago















up vote
6
down vote

favorite












I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.



Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:



  1. Convert to a list via an O(n) process and then use list.__getitem__.


  2. enumerate dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.

Since getting an item from a list has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict or collections.OrderedDict would work.



If it's not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?










share|improve this question





















  • It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
    – o11c
    3 hours ago










  • I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
    – roganjosh
    3 hours ago











  • @o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
    – jpp
    3 hours ago







  • 2




    @o11c it is not implemented as a linked-list
    – juanpa.arrivillaga
    3 hours ago






  • 1




    I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
    – juanpa.arrivillaga
    3 hours ago













up vote
6
down vote

favorite









up vote
6
down vote

favorite











I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.



Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:



  1. Convert to a list via an O(n) process and then use list.__getitem__.


  2. enumerate dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.

Since getting an item from a list has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict or collections.OrderedDict would work.



If it's not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?










share|improve this question













I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.



Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:



  1. Convert to a list via an O(n) process and then use list.__getitem__.


  2. enumerate dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.

Since getting an item from a list has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict or collections.OrderedDict would work.



If it's not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?







python python-3.x dictionary python-internals






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 3 hours ago









jpp

66.1k173981




66.1k173981











  • It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
    – o11c
    3 hours ago










  • I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
    – roganjosh
    3 hours ago











  • @o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
    – jpp
    3 hours ago







  • 2




    @o11c it is not implemented as a linked-list
    – juanpa.arrivillaga
    3 hours ago






  • 1




    I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
    – juanpa.arrivillaga
    3 hours ago

















  • It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
    – o11c
    3 hours ago










  • I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
    – roganjosh
    3 hours ago











  • @o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
    – jpp
    3 hours ago







  • 2




    @o11c it is not implemented as a linked-list
    – juanpa.arrivillaga
    3 hours ago






  • 1




    I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
    – juanpa.arrivillaga
    3 hours ago
















It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
– o11c
3 hours ago




It's implemented as a linked list. Otherwise, it would be impossible to delete elements.
– o11c
3 hours ago












I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
– roganjosh
3 hours ago





I can think of an obscure reason. It makes JSON-lines more stable without an enclosing list and individual dictionaries. Beyond that very minor detail, I've not really understood the hype
– roganjosh
3 hours ago













@o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
– jpp
3 hours ago





@o11c, OK, so there's clearly a gap in my understanding. But I can see (maybe) what you mean, perhaps you need to have O(n) position access to keep O(1) deletion vs O(n) for lists.
– jpp
3 hours ago





2




2




@o11c it is not implemented as a linked-list
– juanpa.arrivillaga
3 hours ago




@o11c it is not implemented as a linked-list
– juanpa.arrivillaga
3 hours ago




1




1




I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
– juanpa.arrivillaga
3 hours ago





I think they just don't want to add sequence-like behavior to what is fundamentally a mapping. IOW: you shouldn't be using your dict like a list, but it does maintain order.
– juanpa.arrivillaga
3 hours ago













1 Answer
1






active

oldest

votes

















up vote
8
down vote



accepted










For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.



For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).



But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.






share|improve this answer




















  • From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
    – juanpa.arrivillaga
    3 hours ago







  • 4




    @juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
    – Tim Peters
    3 hours ago











Your Answer





StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52507860%2faccessing-dictionary-items-by-position-in-python-3-6-efficiently%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
8
down vote



accepted










For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.



For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).



But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.






share|improve this answer




















  • From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
    – juanpa.arrivillaga
    3 hours ago







  • 4




    @juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
    – Tim Peters
    3 hours ago















up vote
8
down vote



accepted










For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.



For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).



But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.






share|improve this answer




















  • From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
    – juanpa.arrivillaga
    3 hours ago







  • 4




    @juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
    – Tim Peters
    3 hours ago













up vote
8
down vote



accepted







up vote
8
down vote



accepted






For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.



For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).



But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.






share|improve this answer












For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.



For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).



But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.







share|improve this answer












share|improve this answer



share|improve this answer










answered 3 hours ago









Tim Peters

40.3k66890




40.3k66890











  • From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
    – juanpa.arrivillaga
    3 hours ago







  • 4




    @juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
    – Tim Peters
    3 hours ago

















  • From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
    – juanpa.arrivillaga
    3 hours ago







  • 4




    @juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
    – Tim Peters
    3 hours ago
















From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
– juanpa.arrivillaga
3 hours ago





From my understanding of the >3.6 implementation, there are two vectors, the sparse-index array is where the open-addressing occurs, but that the actual entries vector is simply that, an array of the entries in order, with no dummies, no?
– juanpa.arrivillaga
3 hours ago





4




4




@juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
– Tim Peters
3 hours ago





@juanpa.arrivillaga, it's more complicated - what isn't? ;-) There are "split" and "non-split" dicts under the covers, etc. For a regular old dict ("non-split"), deleting a key also sets the corresponding value slot to NULL, so same thing; you have to skip over the NULL values one at a time. See the loop in dictobject.c's dictiter_iternextkey(). Iterating over "the keys" actually iterates over the values, which are in insertion order, but can contain NULLs in arbitrary places. Once a none-NULL value is found, it contains a pointer back to the key.
– Tim Peters
3 hours ago


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52507860%2faccessing-dictionary-items-by-position-in-python-3-6-efficiently%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery