Java 8 Stream API - Does any stateful intermediate operation guarantee a new source collection?
Clash Royale CLAN TAG#URR8PPP
up vote
21
down vote
favorite
Is the following statement true? (source and source - they seem to copy from each other or come from the same source).
The
sorted()
operation is a âÂÂstateful intermediate operationâÂÂ, which means that subsequent operations no longer operate on the backing collection, but on an internal state.
I have tested Stream::sorted
as a snippet from sources above:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);
System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]
It works. I replaced Stream::sorted
with Stream::distinct
, Stream::limit
and Stream::skip
:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove);
For my surprise, the NullPointerException
is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted
is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.
Where my confusion comes from and what is the explanation of the behavior above?
java java-8 java-stream
add a comment |Â
up vote
21
down vote
favorite
Is the following statement true? (source and source - they seem to copy from each other or come from the same source).
The
sorted()
operation is a âÂÂstateful intermediate operationâÂÂ, which means that subsequent operations no longer operate on the backing collection, but on an internal state.
I have tested Stream::sorted
as a snippet from sources above:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);
System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]
It works. I replaced Stream::sorted
with Stream::distinct
, Stream::limit
and Stream::skip
:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove);
For my surprise, the NullPointerException
is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted
is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.
Where my confusion comes from and what is the explanation of the behavior above?
java java-8 java-stream
add a comment |Â
up vote
21
down vote
favorite
up vote
21
down vote
favorite
Is the following statement true? (source and source - they seem to copy from each other or come from the same source).
The
sorted()
operation is a âÂÂstateful intermediate operationâÂÂ, which means that subsequent operations no longer operate on the backing collection, but on an internal state.
I have tested Stream::sorted
as a snippet from sources above:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);
System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]
It works. I replaced Stream::sorted
with Stream::distinct
, Stream::limit
and Stream::skip
:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove);
For my surprise, the NullPointerException
is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted
is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.
Where my confusion comes from and what is the explanation of the behavior above?
java java-8 java-stream
Is the following statement true? (source and source - they seem to copy from each other or come from the same source).
The
sorted()
operation is a âÂÂstateful intermediate operationâÂÂ, which means that subsequent operations no longer operate on the backing collection, but on an internal state.
I have tested Stream::sorted
as a snippet from sources above:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);
System.out.println(list); // prints correctly [0, 1, 2, 3, 4, 5]
It works. I replaced Stream::sorted
with Stream::distinct
, Stream::limit
and Stream::skip
:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove);
For my surprise, the NullPointerException
is thrown. All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted
is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.
Where my confusion comes from and what is the explanation of the behavior above?
java java-8 java-stream
java java-8 java-stream
edited yesterday
asked yesterday
Nikolas
10.9k52956
10.9k52956
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
27
down vote
accepted
The API documentation makes no such guarantee âÂÂthat subsequent operations no longer operate on the backing collectionâÂÂ, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; thereâÂÂs not even a guarantee that the List
created by collect(Collectors.toList())
supports the remove
operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException
. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach
is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst()
could get converted to a âÂÂfind the minimumâ operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
add a comment |Â
up vote
6
down vote
Well sorted
has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted
per-se, but what other optimization can be done to the stream pipeline, so that sorted
could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted
needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct
on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
add a comment |Â
up vote
2
down vote
You shouldn't have brought up the cases with a terminal operation forEach(list::remove)
because list::remove
is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove
is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach
.
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed thatlist::remove
is inappropriate here. I wanted to highlight it.
â Andrew Tobilko
yesterday
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
27
down vote
accepted
The API documentation makes no such guarantee âÂÂthat subsequent operations no longer operate on the backing collectionâÂÂ, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; thereâÂÂs not even a guarantee that the List
created by collect(Collectors.toList())
supports the remove
operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException
. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach
is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst()
could get converted to a âÂÂfind the minimumâ operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
add a comment |Â
up vote
27
down vote
accepted
The API documentation makes no such guarantee âÂÂthat subsequent operations no longer operate on the backing collectionâÂÂ, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; thereâÂÂs not even a guarantee that the List
created by collect(Collectors.toList())
supports the remove
operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException
. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach
is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst()
could get converted to a âÂÂfind the minimumâ operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
add a comment |Â
up vote
27
down vote
accepted
up vote
27
down vote
accepted
The API documentation makes no such guarantee âÂÂthat subsequent operations no longer operate on the backing collectionâÂÂ, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; thereâÂÂs not even a guarantee that the List
created by collect(Collectors.toList())
supports the remove
operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException
. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach
is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst()
could get converted to a âÂÂfind the minimumâ operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
The API documentation makes no such guarantee âÂÂthat subsequent operations no longer operate on the backing collectionâÂÂ, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; thereâÂÂs not even a guarantee that the List
created by collect(Collectors.toList())
supports the remove
operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException
. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach
is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst()
could get converted to a âÂÂfind the minimumâ operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
edited yesterday
answered yesterday
Holger
150k20206390
150k20206390
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
add a comment |Â
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
9
9
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
@Nikolas well, that blog is focusing on observed behavior and looking at the code, where it should have referred to the specification instead. But at least the conclusion is basically correct: âÂÂwe can only suggest that you never actually do modify a backing collection while consuming a streamâÂÂ.
â Holger
yesterday
add a comment |Â
up vote
6
down vote
Well sorted
has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted
per-se, but what other optimization can be done to the stream pipeline, so that sorted
could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted
needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct
on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
add a comment |Â
up vote
6
down vote
Well sorted
has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted
per-se, but what other optimization can be done to the stream pipeline, so that sorted
could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted
needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct
on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
add a comment |Â
up vote
6
down vote
up vote
6
down vote
Well sorted
has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted
per-se, but what other optimization can be done to the stream pipeline, so that sorted
could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted
needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct
on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
Well sorted
has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted
per-se, but what other optimization can be done to the stream pipeline, so that sorted
could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted
needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct
on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
edited yesterday
answered yesterday
Eugene
59.7k980139
59.7k980139
add a comment |Â
add a comment |Â
up vote
2
down vote
You shouldn't have brought up the cases with a terminal operation forEach(list::remove)
because list::remove
is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove
is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach
.
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed thatlist::remove
is inappropriate here. I wanted to highlight it.
â Andrew Tobilko
yesterday
add a comment |Â
up vote
2
down vote
You shouldn't have brought up the cases with a terminal operation forEach(list::remove)
because list::remove
is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove
is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach
.
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed thatlist::remove
is inappropriate here. I wanted to highlight it.
â Andrew Tobilko
yesterday
add a comment |Â
up vote
2
down vote
up vote
2
down vote
You shouldn't have brought up the cases with a terminal operation forEach(list::remove)
because list::remove
is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove
is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach
.
You shouldn't have brought up the cases with a terminal operation forEach(list::remove)
because list::remove
is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove
is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach
.
edited yesterday
answered yesterday
Andrew Tobilko
20.5k83873
20.5k83873
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed thatlist::remove
is inappropriate here. I wanted to highlight it.
â Andrew Tobilko
yesterday
add a comment |Â
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed thatlist::remove
is inappropriate here. I wanted to highlight it.
â Andrew Tobilko
yesterday
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
I am aware the first paragraph of your answer well, Andrew. Here IâÂÂd use Iterator or collect to a List to remove it all from the source. The second paragraph makes sence.
â Nikolas
yesterday
@Nikolas no one pointed that
list::remove
is inappropriate here. I wanted to highlight it.â Andrew Tobilko
yesterday
@Nikolas no one pointed that
list::remove
is inappropriate here. I wanted to highlight it.â Andrew Tobilko
yesterday
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52272533%2fjava-8-stream-api-does-any-stateful-intermediate-operation-guarantee-a-new-sou%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password