List generation speed change at ~800 iterations

up vote
6
down vote

favorite

I recently have been developing a project for my students to explore some functionality of the Wolfram Language and have requested that they investigate the timing for several different methods of list generation. Here are the techniques i've asked them to explore:

p1 = 
 ListPlot[
 Table[First[Table[Prime[n], n, 1, k] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Black];
p2 = 
 ListPlot[
 Table[First[Prime[Range[k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Red];
p3 = 
 ListPlot[
 Table[First[Reap[Do[Sow[Prime[n]], n, k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Green];
p4 = 
 ListPlot[
 Table[
 First[Reap[For[n = 1, n <= k, n++, Sow[Prime[n]]]] // AbsoluteTiming], 
 k, 1, 200000, 1000], 
 PlotStyle -> Blue];
p5 = 
 ListPlot[
 Table[First[Array[Prime[#] &, k] // AbsoluteTiming], k, 1, 200000,1000], 
 PlotStyle -> Orange];

Show[p1, p2, p3, p4, p5]

The result is the following graph:

Plot of timing as a function of iterations

Is there something special happening around $n=80000$? Looks like the rate of change of cost for creating a list is abruptly changing near this value.

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

3

Using Prime as test is somewhat atypical. It would provide much more insight using a function that is also vectorized such as f = Sin[N[#]] &;. I guess that Prime shifts from a very cheap algorithm for small primes (maybe a lookup table) to a more general, but also more expensive algorithm. The fact that the performance curves look continuous tells me that this switching is either well-balanced or that simply an additional effort has to be made for primes after the 80k-th one. Btw., Array[Prime, k] will speed up p5 a little.
â€“Â Henrik Schumacher
7 hours ago

1

@HenrikSchumacher I replaced Prime with Sin and no change in slope is visible.
â€“Â mikado
7 hours ago

@mikado Right. That was my point. Prime does not show how Mathematica works in general. I am not sure whether any real computations are done at all with Prime in this range of numbers; maybe only two different file formats for the storage of primes are used: one with low compression for the first 80000 primes and one with higher compression (and thus slower access) for the next 120000 primes.
â€“Â Henrik Schumacher
7 hours ago

Can this be the point where compilation no longer works because the numbers involved become too large to be represent by machine integers?
â€“Â Sjoerd Smit
7 hours ago

@SjoerdSmit I don't think so: Mathematica's integers are 64 bit long. N[Developer`$MaxMachineInteger/Prime[80000]] returns almost 10^13. So the machine integer border is far, far away.
â€“Â Henrik Schumacher
6 hours ago

Â |Â
show 2 more comments

up vote
6
down vote

favorite

p1 = 
 ListPlot[
 Table[First[Table[Prime[n], n, 1, k] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Black];
p2 = 
 ListPlot[
 Table[First[Prime[Range[k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Red];
p3 = 
 ListPlot[
 Table[First[Reap[Do[Sow[Prime[n]], n, k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Green];
p4 = 
 ListPlot[
 Table[
 First[Reap[For[n = 1, n <= k, n++, Sow[Prime[n]]]] // AbsoluteTiming], 
 k, 1, 200000, 1000], 
 PlotStyle -> Blue];
p5 = 
 ListPlot[
 Table[First[Array[Prime[#] &, k] // AbsoluteTiming], k, 1, 200000,1000], 
 PlotStyle -> Orange];

Show[p1, p2, p3, p4, p5]

The result is the following graph:

Plot of timing as a function of iterations

Is there something special happening around $n=80000$? Looks like the rate of change of cost for creating a list is abruptly changing near this value.

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

3

Using Prime as test is somewhat atypical. It would provide much more insight using a function that is also vectorized such as f = Sin[N[#]] &;. I guess that Prime shifts from a very cheap algorithm for small primes (maybe a lookup table) to a more general, but also more expensive algorithm. The fact that the performance curves look continuous tells me that this switching is either well-balanced or that simply an additional effort has to be made for primes after the 80k-th one. Btw., Array[Prime, k] will speed up p5 a little.
â€“Â Henrik Schumacher
7 hours ago

1

@HenrikSchumacher I replaced Prime with Sin and no change in slope is visible.
â€“Â mikado
7 hours ago

@mikado Right. That was my point. Prime does not show how Mathematica works in general. I am not sure whether any real computations are done at all with Prime in this range of numbers; maybe only two different file formats for the storage of primes are used: one with low compression for the first 80000 primes and one with higher compression (and thus slower access) for the next 120000 primes.
â€“Â Henrik Schumacher
7 hours ago

Can this be the point where compilation no longer works because the numbers involved become too large to be represent by machine integers?
â€“Â Sjoerd Smit
7 hours ago

@SjoerdSmit I don't think so: Mathematica's integers are 64 bit long. N[Developer`$MaxMachineInteger/Prime[80000]] returns almost 10^13. So the machine integer border is far, far away.
â€“Â Henrik Schumacher
6 hours ago

Â |Â
show 2 more comments

up vote
6
down vote

favorite

p1 = 
 ListPlot[
 Table[First[Table[Prime[n], n, 1, k] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Black];
p2 = 
 ListPlot[
 Table[First[Prime[Range[k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Red];
p3 = 
 ListPlot[
 Table[First[Reap[Do[Sow[Prime[n]], n, k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Green];
p4 = 
 ListPlot[
 Table[
 First[Reap[For[n = 1, n <= k, n++, Sow[Prime[n]]]] // AbsoluteTiming], 
 k, 1, 200000, 1000], 
 PlotStyle -> Blue];
p5 = 
 ListPlot[
 Table[First[Array[Prime[#] &, k] // AbsoluteTiming], k, 1, 200000,1000], 
 PlotStyle -> Orange];

Show[p1, p2, p3, p4, p5]

The result is the following graph:

Plot of timing as a function of iterations

Is there something special happening around $n=80000$? Looks like the rate of change of cost for creating a list is abruptly changing near this value.

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

p1 = 
 ListPlot[
 Table[First[Table[Prime[n], n, 1, k] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Black];
p2 = 
 ListPlot[
 Table[First[Prime[Range[k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Red];
p3 = 
 ListPlot[
 Table[First[Reap[Do[Sow[Prime[n]], n, k]] // AbsoluteTiming], k, 1, 200000, 1000], 
 PlotStyle -> Green];
p4 = 
 ListPlot[
 Table[
 First[Reap[For[n = 1, n <= k, n++, Sow[Prime[n]]]] // AbsoluteTiming], 
 k, 1, 200000, 1000], 
 PlotStyle -> Blue];
p5 = 
 ListPlot[
 Table[First[Array[Prime[#] &, k] // AbsoluteTiming], k, 1, 200000,1000], 
 PlotStyle -> Orange];

Show[p1, p2, p3, p4, p5]

The result is the following graph:

Plot of timing as a function of iterations

Is there something special happening around $n=80000$? Looks like the rate of change of cost for creating a list is abruptly changing near this value.

list-manipulation timing

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

edited 5 hours ago

m_goldberg

82.2k869190

edited 5 hours ago

m_goldberg

82.2k869190

edited 5 hours ago

m_goldberg

82.2k869190

asked 7 hours ago

JEM

462313

asked 7 hours ago

JEM

462313

asked 7 hours ago

JEM

462313

3

Using Prime as test is somewhat atypical. It would provide much more insight using a function that is also vectorized such as f = Sin[N[#]] &;. I guess that Prime shifts from a very cheap algorithm for small primes (maybe a lookup table) to a more general, but also more expensive algorithm. The fact that the performance curves look continuous tells me that this switching is either well-balanced or that simply an additional effort has to be made for primes after the 80k-th one. Btw., Array[Prime, k] will speed up p5 a little.
â€“Â Henrik Schumacher
7 hours ago

1

@HenrikSchumacher I replaced Prime with Sin and no change in slope is visible.
â€“Â mikado
7 hours ago

@mikado Right. That was my point. Prime does not show how Mathematica works in general. I am not sure whether any real computations are done at all with Prime in this range of numbers; maybe only two different file formats for the storage of primes are used: one with low compression for the first 80000 primes and one with higher compression (and thus slower access) for the next 120000 primes.
â€“Â Henrik Schumacher
7 hours ago

Can this be the point where compilation no longer works because the numbers involved become too large to be represent by machine integers?
â€“Â Sjoerd Smit
7 hours ago

@SjoerdSmit I don't think so: Mathematica's integers are 64 bit long. N[Developer`$MaxMachineInteger/Prime[80000]] returns almost 10^13. So the machine integer border is far, far away.
â€“Â Henrik Schumacher
6 hours ago

Â |Â
show 2 more comments

3

Using Prime as test is somewhat atypical. It would provide much more insight using a function that is also vectorized such as f = Sin[N[#]] &;. I guess that Prime shifts from a very cheap algorithm for small primes (maybe a lookup table) to a more general, but also more expensive algorithm. The fact that the performance curves look continuous tells me that this switching is either well-balanced or that simply an additional effort has to be made for primes after the 80k-th one. Btw., Array[Prime, k] will speed up p5 a little.
â€“Â Henrik Schumacher
7 hours ago

1

@HenrikSchumacher I replaced Prime with Sin and no change in slope is visible.
â€“Â mikado
7 hours ago

@mikado Right. That was my point. Prime does not show how Mathematica works in general. I am not sure whether any real computations are done at all with Prime in this range of numbers; maybe only two different file formats for the storage of primes are used: one with low compression for the first 80000 primes and one with higher compression (and thus slower access) for the next 120000 primes.
â€“Â Henrik Schumacher
7 hours ago

Can this be the point where compilation no longer works because the numbers involved become too large to be represent by machine integers?
â€“Â Sjoerd Smit
7 hours ago

@SjoerdSmit I don't think so: Mathematica's integers are 64 bit long. N[Developer`$MaxMachineInteger/Prime[80000]] returns almost 10^13. So the machine integer border is far, far away.
â€“Â Henrik Schumacher
6 hours ago

Using Prime as test is somewhat atypical. It would provide much more insight using a function that is also vectorized such as f = Sin[N[#]] &;. I guess that Prime shifts from a very cheap algorithm for small primes (maybe a lookup table) to a more general, but also more expensive algorithm. The fact that the performance curves look continuous tells me that this switching is either well-balanced or that simply an additional effort has to be made for primes after the 80k-th one. Btw., Array[Prime, k] will speed up p5 a little.
â€“Â Henrik Schumacher
7 hours ago

@HenrikSchumacher I replaced Prime with Sin and no change in slope is visible.
â€“Â mikado
7 hours ago

@mikado Right. That was my point. Prime does not show how Mathematica works in general. I am not sure whether any real computations are done at all with Prime in this range of numbers; maybe only two different file formats for the storage of primes are used: one with low compression for the first 80000 primes and one with higher compression (and thus slower access) for the next 120000 primes.
â€“Â Henrik Schumacher
7 hours ago

Can this be the point where compilation no longer works because the numbers involved become too large to be represent by machine integers?
â€“Â Sjoerd Smit
7 hours ago

@SjoerdSmit I don't think so: Mathematica's integers are 64 bit long. N[Developer`$MaxMachineInteger/Prime[80000]] returns almost 10^13. So the machine integer border is far, far away.
â€“Â Henrik Schumacher
6 hours ago

Â |Â
show 2 more comments

1 Answer
1

active

oldest

votes

up vote
3
down vote

Regarding this statement

Looks like the rate of change of cost for creating a list is abruptly changing near this value.

I have a different opinion. We can use RepeatedTiming to get a somewhat consistent measurement and check the runtime for one single Prime call taking the list creation out of the game

data1 = Table[i, First[RepeatedTiming[Prime[i]]], i, 1, 100000, 
 100];
ListLinePlot[data1]

Mathematica graphics

So we see the same jump, and I believe it is a switch from one algorithm to the next in calculating the primes. On closer inspection, you will find that the position of the jump appears to be precisely at n=78499. Why? I guess the reason is that we reached one million:

Prime[78498]
Prime[78499]
(* 999983 *)
(* 1000003 *)

In the documentation we find

Prime and PrimePi use sparse caching and sieving. For large n, the LagariasÃ¢Â€Â“MillerÃ¢Â€Â“Odlyzko algorithm for PrimePi is used, based on asymptotic estimates of the density of primes, and is inverted to give Prime.

So let's assume for n<78499 Mathematica uses caching. Then getting a result, mostly depends on how fast the lookup is. Of course, you need to take into account the real smart users, who, when using Prime[n] definitely will request Prime[n+1] (or 2,3,...).

Let us make an experiment. Instead of using RepeatedTiming, we will simply access dn primes and measure the time. So we will touch each prime but we measure in chunks:

With[dn = 100,
 dataDoForward = 
 Table[i, First[AbsoluteTiming[Do[Prime[j], j, i, i + dn]]], i,
 1, 150000, dn]
];
ListLinePlot[dataDoForward, PlotRange -> All]

Mathematica graphics

No surprise here. The same jump. However, what happens if we access the primes in the inner Do in reverse order?

With[dn = 100,
 dataDoReverse = 
 Table[i, 
 First[AbsoluteTiming[Do[Prime[j], j, i + dn, i, -1]]], i, 1, 
 150000, dn]
 ];
ListLinePlot[dataDoReverse, PlotRange -> All]

Mathematica graphics

It seems the Wolfram Developers are counting on that we are somewhat reasonable people that don't do these stunts. Please pay particular attention to the order of the difference in time. For 100 primes around n=75000 we need under 0.1 microseconds, while the reverse access is almost 1000x slower.

So my conclusion without official information is that primes under 1 million are heavily cached and accessing them loads some larger primes so that the next access is faster. For lager primes, an algorithm is used that really calculates the result. This seems stable in timing, no matter if you access chunks of primes forward or backwards

ListLinePlot[dataDoForward, dataDoReverse, 
 PlotRange -> Automatic, 0, 0.0005,
 PlotLegends -> "Forward Access", "Backward Access"
 ]

Mathematica graphics

I cannot explain, why the forward access method is a bit slower at the beginning for primes larger 1 million, but it is possible that some caches need to be cleared. This however is a topic for someone who actually knows what Wolfram is doing under the hood.

answered 1 hour ago

halirutanâ™¦

93k5212405

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f182427%2flist-generation-speed-change-at-800-iterations%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

Regarding this statement

Looks like the rate of change of cost for creating a list is abruptly changing near this value.

I have a different opinion. We can use RepeatedTiming to get a somewhat consistent measurement and check the runtime for one single Prime call taking the list creation out of the game

data1 = Table[i, First[RepeatedTiming[Prime[i]]], i, 1, 100000, 
 100];
ListLinePlot[data1]

Mathematica graphics

Prime[78498]
Prime[78499]
(* 999983 *)
(* 1000003 *)

In the documentation we find

Prime and PrimePi use sparse caching and sieving. For large n, the LagariasÃ¢Â€Â“MillerÃ¢Â€Â“Odlyzko algorithm for PrimePi is used, based on asymptotic estimates of the density of primes, and is inverted to give Prime.

Let us make an experiment. Instead of using RepeatedTiming, we will simply access dn primes and measure the time. So we will touch each prime but we measure in chunks:

With[dn = 100,
 dataDoForward = 
 Table[i, First[AbsoluteTiming[Do[Prime[j], j, i, i + dn]]], i,
 1, 150000, dn]
];
ListLinePlot[dataDoForward, PlotRange -> All]

Mathematica graphics

No surprise here. The same jump. However, what happens if we access the primes in the inner Do in reverse order?

With[dn = 100,
 dataDoReverse = 
 Table[i, 
 First[AbsoluteTiming[Do[Prime[j], j, i + dn, i, -1]]], i, 1, 
 150000, dn]
 ];
ListLinePlot[dataDoReverse, PlotRange -> All]

Mathematica graphics

ListLinePlot[dataDoForward, dataDoReverse, 
 PlotRange -> Automatic, 0, 0.0005,
 PlotLegends -> "Forward Access", "Backward Access"
 ]

Mathematica graphics

answered 1 hour ago

halirutanâ™¦

93k5212405

add a commentÂ |Â

up vote
3
down vote

Regarding this statement

Looks like the rate of change of cost for creating a list is abruptly changing near this value.

I have a different opinion. We can use RepeatedTiming to get a somewhat consistent measurement and check the runtime for one single Prime call taking the list creation out of the game

data1 = Table[i, First[RepeatedTiming[Prime[i]]], i, 1, 100000, 
 100];
ListLinePlot[data1]

Mathematica graphics

Prime[78498]
Prime[78499]
(* 999983 *)
(* 1000003 *)

In the documentation we find

Prime and PrimePi use sparse caching and sieving. For large n, the LagariasÃ¢Â€Â“MillerÃ¢Â€Â“Odlyzko algorithm for PrimePi is used, based on asymptotic estimates of the density of primes, and is inverted to give Prime.

Let us make an experiment. Instead of using RepeatedTiming, we will simply access dn primes and measure the time. So we will touch each prime but we measure in chunks:

With[dn = 100,
 dataDoForward = 
 Table[i, First[AbsoluteTiming[Do[Prime[j], j, i, i + dn]]], i,
 1, 150000, dn]
];
ListLinePlot[dataDoForward, PlotRange -> All]

Mathematica graphics

No surprise here. The same jump. However, what happens if we access the primes in the inner Do in reverse order?

With[dn = 100,
 dataDoReverse = 
 Table[i, 
 First[AbsoluteTiming[Do[Prime[j], j, i + dn, i, -1]]], i, 1, 
 150000, dn]
 ];
ListLinePlot[dataDoReverse, PlotRange -> All]

Mathematica graphics

ListLinePlot[dataDoForward, dataDoReverse, 
 PlotRange -> Automatic, 0, 0.0005,
 PlotLegends -> "Forward Access", "Backward Access"
 ]

Mathematica graphics

answered 1 hour ago

halirutanâ™¦

93k5212405

add a commentÂ |Â

up vote
3
down vote

Regarding this statement

Looks like the rate of change of cost for creating a list is abruptly changing near this value.

I have a different opinion. We can use RepeatedTiming to get a somewhat consistent measurement and check the runtime for one single Prime call taking the list creation out of the game

data1 = Table[i, First[RepeatedTiming[Prime[i]]], i, 1, 100000, 
 100];
ListLinePlot[data1]

Mathematica graphics

Prime[78498]
Prime[78499]
(* 999983 *)
(* 1000003 *)

In the documentation we find

Prime and PrimePi use sparse caching and sieving. For large n, the LagariasÃ¢Â€Â“MillerÃ¢Â€Â“Odlyzko algorithm for PrimePi is used, based on asymptotic estimates of the density of primes, and is inverted to give Prime.

Let us make an experiment. Instead of using RepeatedTiming, we will simply access dn primes and measure the time. So we will touch each prime but we measure in chunks:

With[dn = 100,
 dataDoForward = 
 Table[i, First[AbsoluteTiming[Do[Prime[j], j, i, i + dn]]], i,
 1, 150000, dn]
];
ListLinePlot[dataDoForward, PlotRange -> All]

Mathematica graphics

No surprise here. The same jump. However, what happens if we access the primes in the inner Do in reverse order?

With[dn = 100,
 dataDoReverse = 
 Table[i, 
 First[AbsoluteTiming[Do[Prime[j], j, i + dn, i, -1]]], i, 1, 
 150000, dn]
 ];
ListLinePlot[dataDoReverse, PlotRange -> All]

Mathematica graphics

ListLinePlot[dataDoForward, dataDoReverse, 
 PlotRange -> Automatic, 0, 0.0005,
 PlotLegends -> "Forward Access", "Backward Access"
 ]

Mathematica graphics

answered 1 hour ago

halirutanâ™¦

93k5212405

Regarding this statement

Looks like the rate of change of cost for creating a list is abruptly changing near this value.

I have a different opinion. We can use RepeatedTiming to get a somewhat consistent measurement and check the runtime for one single Prime call taking the list creation out of the game

data1 = Table[i, First[RepeatedTiming[Prime[i]]], i, 1, 100000, 
 100];
ListLinePlot[data1]

Mathematica graphics

Prime[78498]
Prime[78499]
(* 999983 *)
(* 1000003 *)

In the documentation we find

Prime and PrimePi use sparse caching and sieving. For large n, the LagariasÃ¢Â€Â“MillerÃ¢Â€Â“Odlyzko algorithm for PrimePi is used, based on asymptotic estimates of the density of primes, and is inverted to give Prime.

Let us make an experiment. Instead of using RepeatedTiming, we will simply access dn primes and measure the time. So we will touch each prime but we measure in chunks:

With[dn = 100,
 dataDoForward = 
 Table[i, First[AbsoluteTiming[Do[Prime[j], j, i, i + dn]]], i,
 1, 150000, dn]
];
ListLinePlot[dataDoForward, PlotRange -> All]

Mathematica graphics

No surprise here. The same jump. However, what happens if we access the primes in the inner Do in reverse order?

With[dn = 100,
 dataDoReverse = 
 Table[i, 
 First[AbsoluteTiming[Do[Prime[j], j, i + dn, i, -1]]], i, 1, 
 150000, dn]
 ];
ListLinePlot[dataDoReverse, PlotRange -> All]

Mathematica graphics

ListLinePlot[dataDoForward, dataDoReverse, 
 PlotRange -> Automatic, 0, 0.0005,
 PlotLegends -> "Forward Access", "Backward Access"
 ]

Mathematica graphics

answered 1 hour ago

halirutanâ™¦

93k5212405

answered 1 hour ago

halirutanâ™¦

93k5212405

answered 1 hour ago

halirutanâ™¦

93k5212405

answered 1 hour ago

halirutanâ™¦

93k5212405

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky