Java 8 Stream API - Does any stateful intermediate operation guarantee a new source collection?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
21
down vote

favorite
5












Is the following statement true? (source and source - they seem to copy from each other or come from the same source).




The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.




I have tested Stream::sorted as a snippet from sources above:



final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);

System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]


It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:



final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove);


For my surprise, the NullPointerException is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.



Where my confusion comes from and what is the explanation of the behavior above?










share|improve this question



























    up vote
    21
    down vote

    favorite
    5












    Is the following statement true? (source and source - they seem to copy from each other or come from the same source).




    The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.




    I have tested Stream::sorted as a snippet from sources above:



    final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

    list.stream()
    .filter(i -> i > 5)
    .sorted()
    .forEach(list::remove);

    System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]


    It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:



    final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

    list.stream()
    .filter(i -> i > 5)
    .distinct()
    .forEach(list::remove);


    For my surprise, the NullPointerException is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.



    Where my confusion comes from and what is the explanation of the behavior above?










    share|improve this question

























      up vote
      21
      down vote

      favorite
      5









      up vote
      21
      down vote

      favorite
      5






      5





      Is the following statement true? (source and source - they seem to copy from each other or come from the same source).




      The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.




      I have tested Stream::sorted as a snippet from sources above:



      final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

      list.stream()
      .filter(i -> i > 5)
      .sorted()
      .forEach(list::remove);

      System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]


      It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:



      final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

      list.stream()
      .filter(i -> i > 5)
      .distinct()
      .forEach(list::remove);


      For my surprise, the NullPointerException is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.



      Where my confusion comes from and what is the explanation of the behavior above?










      share|improve this question















      Is the following statement true? (source and source - they seem to copy from each other or come from the same source).




      The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.




      I have tested Stream::sorted as a snippet from sources above:



      final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

      list.stream()
      .filter(i -> i > 5)
      .sorted()
      .forEach(list::remove);

      System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]


      It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:



      final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

      list.stream()
      .filter(i -> i > 5)
      .distinct()
      .forEach(list::remove);


      For my surprise, the NullPointerException is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.



      Where my confusion comes from and what is the explanation of the behavior above?







      java java-8 java-stream






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday

























      asked yesterday









      Nikolas

      10.9k52956




      10.9k52956






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          27
          down vote



          accepted










          The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.



          Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.



          To show a counter-example



          Set<Integer> set = IntStream.range(0, 10).boxed()
          .collect(Collectors.toCollection(TreeSet::new));
          set.stream()
          .filter(i -> i > 5)
          .sorted()
          .forEach(set::remove);


          throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.



          There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.



          So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.






          share|improve this answer


















          • 9




            @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
            – Holger
            yesterday

















          up vote
          6
          down vote













          Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.



          This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:



          List<Integer> sortedList = IntStream.range(0, 10)
          .boxed()
          .collect(Collectors.toList());

          StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
          .sorted()
          .forEach(sortedList::remove); // fails with CME, thus no copying occurred


          Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.



          distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...






          share|improve this answer





























            up vote
            2
            down vote













            You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.



            It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.



            I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.






            share|improve this answer






















            • I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
              – Nikolas
              yesterday











            • @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
              – Andrew Tobilko
              yesterday










            Your Answer





            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52272533%2fjava-8-stream-api-does-any-stateful-intermediate-operation-guarantee-a-new-sou%23new-answer', 'question_page');

            );

            Post as a guest






























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            27
            down vote



            accepted










            The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.



            Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.



            To show a counter-example



            Set<Integer> set = IntStream.range(0, 10).boxed()
            .collect(Collectors.toCollection(TreeSet::new));
            set.stream()
            .filter(i -> i > 5)
            .sorted()
            .forEach(set::remove);


            throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.



            There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.



            So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.






            share|improve this answer


















            • 9




              @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
              – Holger
              yesterday














            up vote
            27
            down vote



            accepted










            The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.



            Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.



            To show a counter-example



            Set<Integer> set = IntStream.range(0, 10).boxed()
            .collect(Collectors.toCollection(TreeSet::new));
            set.stream()
            .filter(i -> i > 5)
            .sorted()
            .forEach(set::remove);


            throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.



            There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.



            So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.






            share|improve this answer


















            • 9




              @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
              – Holger
              yesterday












            up vote
            27
            down vote



            accepted







            up vote
            27
            down vote



            accepted






            The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.



            Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.



            To show a counter-example



            Set<Integer> set = IntStream.range(0, 10).boxed()
            .collect(Collectors.toCollection(TreeSet::new));
            set.stream()
            .filter(i -> i > 5)
            .sorted()
            .forEach(set::remove);


            throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.



            There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.



            So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.






            share|improve this answer














            The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.



            Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.



            To show a counter-example



            Set<Integer> set = IntStream.range(0, 10).boxed()
            .collect(Collectors.toCollection(TreeSet::new));
            set.stream()
            .filter(i -> i > 5)
            .sorted()
            .forEach(set::remove);


            throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.



            There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.



            So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited yesterday

























            answered yesterday









            Holger

            150k20206390




            150k20206390







            • 9




              @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
              – Holger
              yesterday












            • 9




              @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
              – Holger
              yesterday







            9




            9




            @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
            – Holger
            yesterday




            @Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: “we can only suggest that you never actually do modify a backing collection while consuming a stream”.
            – Holger
            yesterday












            up vote
            6
            down vote













            Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.



            This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:



            List<Integer> sortedList = IntStream.range(0, 10)
            .boxed()
            .collect(Collectors.toList());

            StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
            .sorted()
            .forEach(sortedList::remove); // fails with CME, thus no copying occurred


            Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.



            distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...






            share|improve this answer


























              up vote
              6
              down vote













              Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.



              This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:



              List<Integer> sortedList = IntStream.range(0, 10)
              .boxed()
              .collect(Collectors.toList());

              StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
              .sorted()
              .forEach(sortedList::remove); // fails with CME, thus no copying occurred


              Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.



              distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...






              share|improve this answer
























                up vote
                6
                down vote










                up vote
                6
                down vote









                Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.



                This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:



                List<Integer> sortedList = IntStream.range(0, 10)
                .boxed()
                .collect(Collectors.toList());

                StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
                .sorted()
                .forEach(sortedList::remove); // fails with CME, thus no copying occurred


                Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.



                distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...






                share|improve this answer














                Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.



                This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:



                List<Integer> sortedList = IntStream.range(0, 10)
                .boxed()
                .collect(Collectors.toList());

                StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
                .sorted()
                .forEach(sortedList::remove); // fails with CME, thus no copying occurred


                Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.



                distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered yesterday









                Eugene

                59.7k980139




                59.7k980139




















                    up vote
                    2
                    down vote













                    You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.



                    It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.



                    I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.






                    share|improve this answer






















                    • I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                      – Nikolas
                      yesterday











                    • @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                      – Andrew Tobilko
                      yesterday














                    up vote
                    2
                    down vote













                    You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.



                    It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.



                    I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.






                    share|improve this answer






















                    • I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                      – Nikolas
                      yesterday











                    • @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                      – Andrew Tobilko
                      yesterday












                    up vote
                    2
                    down vote










                    up vote
                    2
                    down vote









                    You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.



                    It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.



                    I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.






                    share|improve this answer














                    You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.



                    It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.



                    I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited yesterday

























                    answered yesterday









                    Andrew Tobilko

                    20.5k83873




                    20.5k83873











                    • I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                      – Nikolas
                      yesterday











                    • @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                      – Andrew Tobilko
                      yesterday
















                    • I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                      – Nikolas
                      yesterday











                    • @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                      – Andrew Tobilko
                      yesterday















                    I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                    – Nikolas
                    yesterday





                    I am aware the first paragraph of your answer well, Andrew. Here I’d use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
                    – Nikolas
                    yesterday













                    @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                    – Andrew Tobilko
                    yesterday




                    @Nikolas no one pointed that list::remove is inappropriate here. I wanted to highlight it.
                    – Andrew Tobilko
                    yesterday

















                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52272533%2fjava-8-stream-api-does-any-stateful-intermediate-operation-guarantee-a-new-sou%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    Installing NextGIS Connect into QGIS 3?

                    Confectionery