Speed up Flatten[] of a large nested list

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I have a large jagged list, that is each sub-list has a different length. I would like to Flatten this list for Histogram purposes, but it seems to be taking an inordinate amount of time and memory



jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];


Just to illustrate, length of each of elements of the main list



ListPlot[Length/@jaggedList]


list lengths



Full Flatten takes a long time, my real data is several times larger, it gets painfully slow



fullFlatten=Flatten@jaggedList;//AbsoluteTiming
10.0055,Null


I noticed flattening non-jagged sub-lists is not a problem



partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
0.289219,Null


Memory usage is huge on the final result of the full list, even though number of elements is the same:



ByteCount/@fullFlatten,partialFlatten,jaggedList
1460378864,486808224,486808224


Would super appreciate any tips on what I can change to make this faster / more memory compact !










share|improve this question

























    up vote
    2
    down vote

    favorite












    I have a large jagged list, that is each sub-list has a different length. I would like to Flatten this list for Histogram purposes, but it seems to be taking an inordinate amount of time and memory



    jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];


    Just to illustrate, length of each of elements of the main list



    ListPlot[Length/@jaggedList]


    list lengths



    Full Flatten takes a long time, my real data is several times larger, it gets painfully slow



    fullFlatten=Flatten@jaggedList;//AbsoluteTiming
    10.0055,Null


    I noticed flattening non-jagged sub-lists is not a problem



    partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
    0.289219,Null


    Memory usage is huge on the final result of the full list, even though number of elements is the same:



    ByteCount/@fullFlatten,partialFlatten,jaggedList
    1460378864,486808224,486808224


    Would super appreciate any tips on what I can change to make this faster / more memory compact !










    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I have a large jagged list, that is each sub-list has a different length. I would like to Flatten this list for Histogram purposes, but it seems to be taking an inordinate amount of time and memory



      jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];


      Just to illustrate, length of each of elements of the main list



      ListPlot[Length/@jaggedList]


      list lengths



      Full Flatten takes a long time, my real data is several times larger, it gets painfully slow



      fullFlatten=Flatten@jaggedList;//AbsoluteTiming
      10.0055,Null


      I noticed flattening non-jagged sub-lists is not a problem



      partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
      0.289219,Null


      Memory usage is huge on the final result of the full list, even though number of elements is the same:



      ByteCount/@fullFlatten,partialFlatten,jaggedList
      1460378864,486808224,486808224


      Would super appreciate any tips on what I can change to make this faster / more memory compact !










      share|improve this question













      I have a large jagged list, that is each sub-list has a different length. I would like to Flatten this list for Histogram purposes, but it seems to be taking an inordinate amount of time and memory



      jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];


      Just to illustrate, length of each of elements of the main list



      ListPlot[Length/@jaggedList]


      list lengths



      Full Flatten takes a long time, my real data is several times larger, it gets painfully slow



      fullFlatten=Flatten@jaggedList;//AbsoluteTiming
      10.0055,Null


      I noticed flattening non-jagged sub-lists is not a problem



      partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
      0.289219,Null


      Memory usage is huge on the final result of the full list, even though number of elements is the same:



      ByteCount/@fullFlatten,partialFlatten,jaggedList
      1460378864,486808224,486808224


      Would super appreciate any tips on what I can change to make this faster / more memory compact !







      list-manipulation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 1 hour ago









      Anatoly

      306




      306




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Applying Join is much faster than Flatten:



          SeedRandom[1]
          jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

          fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First



          8.2375848




          fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First



          0.29729




          fullFlatten2 == fullFlatten



          True




          ByteCount /@ fullFlatten, fullFlatten2, jaggedList



          1462957016, 487652456, 487667608







          share|improve this answer




















          • Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
            – Anatoly
            1 hour ago

















          up vote
          3
          down vote













          The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:



          SeedRandom[1]
          list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]



          0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
          0.698472, 0.344389, 0.830322, 0.556863




          Turn on packing messages:



          On["Packing"]


          Then, using Flatten:



          Flatten[list]



          Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



          Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.



          Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



          Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



          Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



          General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.



          Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



          General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.



          0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
          0.698472, 0.344389, 0.830322, 0.556863




          and using Join:



          Join @@ list



          0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
          0.698472, 0.344389, 0.830322, 0.556863




          As you can see, using Join generates no unpacking messages, which is why it is much faster.






          share|improve this answer




















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "387"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f184575%2fspeed-up-flatten-of-a-large-nested-list%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            Applying Join is much faster than Flatten:



            SeedRandom[1]
            jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

            fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First



            8.2375848




            fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First



            0.29729




            fullFlatten2 == fullFlatten



            True




            ByteCount /@ fullFlatten, fullFlatten2, jaggedList



            1462957016, 487652456, 487667608







            share|improve this answer




















            • Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
              – Anatoly
              1 hour ago














            up vote
            2
            down vote



            accepted










            Applying Join is much faster than Flatten:



            SeedRandom[1]
            jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

            fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First



            8.2375848




            fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First



            0.29729




            fullFlatten2 == fullFlatten



            True




            ByteCount /@ fullFlatten, fullFlatten2, jaggedList



            1462957016, 487652456, 487667608







            share|improve this answer




















            • Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
              – Anatoly
              1 hour ago












            up vote
            2
            down vote



            accepted







            up vote
            2
            down vote



            accepted






            Applying Join is much faster than Flatten:



            SeedRandom[1]
            jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

            fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First



            8.2375848




            fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First



            0.29729




            fullFlatten2 == fullFlatten



            True




            ByteCount /@ fullFlatten, fullFlatten2, jaggedList



            1462957016, 487652456, 487667608







            share|improve this answer












            Applying Join is much faster than Flatten:



            SeedRandom[1]
            jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

            fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First



            8.2375848




            fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First



            0.29729




            fullFlatten2 == fullFlatten



            True




            ByteCount /@ fullFlatten, fullFlatten2, jaggedList



            1462957016, 487652456, 487667608








            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 1 hour ago









            kglr

            167k8188390




            167k8188390











            • Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
              – Anatoly
              1 hour ago
















            • Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
              – Anatoly
              1 hour ago















            Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
            – Anatoly
            1 hour ago




            Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
            – Anatoly
            1 hour ago










            up vote
            3
            down vote













            The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:



            SeedRandom[1]
            list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]



            0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
            0.698472, 0.344389, 0.830322, 0.556863




            Turn on packing messages:



            On["Packing"]


            Then, using Flatten:



            Flatten[list]



            Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



            Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.



            Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



            Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



            Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



            General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.



            Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



            General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.



            0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
            0.698472, 0.344389, 0.830322, 0.556863




            and using Join:



            Join @@ list



            0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
            0.698472, 0.344389, 0.830322, 0.556863




            As you can see, using Join generates no unpacking messages, which is why it is much faster.






            share|improve this answer
























              up vote
              3
              down vote













              The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:



              SeedRandom[1]
              list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]



              0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
              0.698472, 0.344389, 0.830322, 0.556863




              Turn on packing messages:



              On["Packing"]


              Then, using Flatten:



              Flatten[list]



              Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



              Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.



              Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



              Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



              Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



              General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.



              Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



              General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.



              0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
              0.698472, 0.344389, 0.830322, 0.556863




              and using Join:



              Join @@ list



              0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
              0.698472, 0.344389, 0.830322, 0.556863




              As you can see, using Join generates no unpacking messages, which is why it is much faster.






              share|improve this answer






















                up vote
                3
                down vote










                up vote
                3
                down vote









                The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:



                SeedRandom[1]
                list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                Turn on packing messages:



                On["Packing"]


                Then, using Flatten:



                Flatten[list]



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



                General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                and using Join:



                Join @@ list



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                As you can see, using Join generates no unpacking messages, which is why it is much faster.






                share|improve this answer












                The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:



                SeedRandom[1]
                list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                Turn on packing messages:



                On["Packing"]


                Then, using Flatten:



                Flatten[list]



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



                Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.



                General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.



                Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.



                General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                and using Join:



                Join @@ list



                0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
                0.698472, 0.344389, 0.830322, 0.556863




                As you can see, using Join generates no unpacking messages, which is why it is much faster.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 1 hour ago









                Carl Woll

                62.1k280158




                62.1k280158



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f184575%2fspeed-up-flatten-of-a-large-nested-list%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    Long meetings (6-7 hours a day): Being “babysat” by supervisor

                    Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                    Confectionery