Idiomatic Peeking with Java Stream API

The peek() method from the Java Stream API is often misunderstood. Let’s try to clarify how it works and when should be used.

Stream

 

Introduction

Let’s start responsibly by RTFM inspecting peek()’s user’s manual.

Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.
Sounds pretty straightforward. We can use it for applying side-effects for every consumed Stream element:

Stream.of("one", "two", "three")
  .peek(e -> System.out.println(e + " was consumed by peek()."))
  .collect(Collectors.toList());

The result of such operation is not surprising:

one was consumed by peek().
two was consumed by peek().
three was consumed by peek().

Stream/peek Lazy Evaluation

but… what will happen if we replace the terminal collect() operation with a forEach()?

Stream.of("one", "two", "three")
  .peek(e -> System.out.println(e + " was consumed by peek()."))
  .forEach(System.out::println);

It might be tempting to think that we’ll see a series of peek() logs followed by a series of for-each logs but this is not the case:

one was consumed by peek().
one
two was consumed by peek().
two
three was consumed by peek().
three

It gets even more interesting if we get rid of the forEach call:

Stream.of("one", "two", "three")
  .peek(e -> System.out.println(e + " was consumed by peek()."));

the result would be nothing:


As JavaDoc states:

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
So, because of the lazy evaluation Stream pipelines are always being traversed “vertically” and not “horizontally” which allows to avoid doing unnecessary calculations. Additionally, the traversal is triggered only when a terminal method is present. Hence, the observed behaviour.

This is why it’s possible to represent and manipulate infinite sequences using Streams. (Because where do you store an infinite sequence? In the cloud?):

Stream.iterate(0, i -> i + 1)
  .peek(System.out::println)
  .findFirst();

The above operation completes almost immediately because of the lazy character of Stream traversal. Of course, if you try to collect the whole infinite sequence to some data structure, even laziness will not save you.

So, we can see that peek() can’t be treated as an intermediate for-each replacement because it invokes the passed Consumer only on elements that are visited by the Stream.

Unfortunately, Streams do not always behave entirely lazily.

Proper Usage

Further inspection of the official docs reveals a note:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline
The point of confusion is the “mainly” word. Let’s have a peek(pun intended) at the English dictionary:

We can see that non-debugging usages are not forbidden nor discouraged.

So, technically, we should be able to, e.g., modify the Stream elements on the fly which would not be possible in the immutable, functional world. Let’s use the infamous mutable java.util.Date:

Stream.of(Date.from(Instant.EPOCH))
  .peek(d -> d.setTime(Long.MAX_VALUE))
  .forEach(System.out::println);

// Sun Aug 17 08:12:55 CET 292278994

and we can observe that the result is far away from the standard epoch, which means the mutating operation was indeed applied.

The problem is that this behaviour is highly deceiving because certain Stream implementations can optimize out peek() calls.

This gets clarified in the early draft of JDK 9’s docs eventually:

In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements. (…) An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source.
So, now it’s clear that it might not be the best choice if we want to perform some side effects deterministically.

According to this, peek() might not be even that reliable debugging tool after all.

Key Takeaways

  • The peek() method works fine as a debugging tool when we want to see what is being consumed by a Stream
  • It seems to work fine when applying mutating operations but should not be used this way because this behaviour is non-deterministic due to the possibility of certain peek() calls being omitted due to internal optimization
  • The discussion, whether mutation operations should be allowed or not, would never take place if we were restricted to operate only on immutable values
  • We have around 292276977 years before we run out of java.util.Date range
You May Also Like

Spock, Java and Maven

Few months ago I've came across Groovy - powerful language for JVM platform which combines the power of Java with abilities typical for scripting languages (dynamic typing, metaprogramming).

Together with Groovy I've discovered spock framework (https://code.google.com/p/spock/) - specification framework for Groovy (of course you can test Java classes too!). But spock is not only test/specification framework - it also contains powerful mocking tools.

Even though spock is dedicated for Groovy there is no problem with using it for Java classes tests. In this post I'm going to describe how to configure Maven project to build and run spock specifications together with traditional JUnit tests.


Firstly, we need to prepare pom.xml and add necessary dependencies and plugins.

Two obligatory libraries are:
<dependency>
<groupid>org.spockframework</groupId>
<artifactid>spock-core</artifactId>
<version>0.7-groovy-2.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
<version>${groovy.version}</version>
<scope>test</scope>
</dependency>
Where groovy.version is property defined in pom.xml for more convenient use and easy version change, just like this:
<properties>
<gmaven-plugin.version>1.4</gmaven-plugin.version>
<groovy.version>2.1.5</groovy.version>
</properties>

I've added property for gmaven-plugin version for the same reason ;)

Besides these two dependencies, we can use few additional ones providing extra functionality:
  • cglib - for class mocking
  • objenesis - enables mocking classes without default constructor
To add them to the project put these lines in <dependencies> section of pom.xml:
<dependency>
<groupid>cglib</groupId>
<artifactid>cglib-nodep</artifactId>
<version>3.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupid>org.objenesis</groupId>
<artifactid>objenesis</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>

And that's all for dependencies section. Now we will focus on plugins necessary to compile Groovy classes. We need to add gmaven-plugin with gmaven-runtime-2.0 dependency in plugins section:
<plugin>
<groupid>org.codehaus.gmaven</groupId>
<artifactid>gmaven-plugin</artifactId>
<version>${gmaven-plugin.version}</version>
<configuration>
<providerselection>2.0</providerSelection>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<dependencies>
<dependency>
<groupid>org.codehaus.gmaven.runtime</groupId>
<artifactid>gmaven-runtime-2.0</artifactId>
<version>${gmaven-plugin.version}</version>
<exclusions>
<exclusion>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
<version>${groovy.version}</version>
</dependency>
</dependencies>
</plugin>

With these configuration we can use spock and write our first specifications. But there is one issue: default settings for maven-surefire plugin demand that test classes must end with "..Test" postfix, which is ok when we want to use such naming scheme for our spock tests. But if we want to name them like CommentSpec.groovy or whatever with "..Spec" ending (what in my opinion is much more readable) we need to make little change in surefire plugin configuration:
<plugin>
<groupid>org.apache.maven.plugins</groupId>
<artifactid>maven-surefire-plugin</artifactId>
<version>2.15</version>
<configuration>
<includes>
<include>**/*Test.java</include>
<include>**/*Spec.java</include>
</includes>
</configuration>
</plugin>

As you can see there is a little trick ;) We add include directive for standard Java JUnit test ending with "..Test" postfix, but there is also an entry for spock test ending with "..Spec". And there is a trick: we must write "**/*Spec.java", not "**/*Spec.groovy", otherwise Maven will not run spock tests (which is strange and I've spent some time to figure out why Maven can't run my specs).

Little update: instead of "*.java" postfix for both types of tests we can write "*.class" what is in my opinion more readable and clean:
<include>**/*Test.class</include>
<include>**/*Spec.class</include>
(thanks to Tomek Pęksa for pointing this out!)

With such configuration, we can write either traditional JUnit test and put them in src/test/java directory or groovy spock specifications and place them in src/test/groovy. And both will work together just fine :) In one of my next posts I'll write something about using spock and its mocking abilities in practice, so stay in tune.