DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • The Long Road to Java Virtual Threads
  • Generics in Java and Their Implementation
  • The Two-Pointers Technique

Trending

  • AI-Driven API and Microservice Architecture Design for Cloud
  • Why You Should Move From Monolith to Microservices
  • Integrating Salesforce APEX REST
  • An Explanation of Jenkins Architecture
  1. DZone
  2. Data Engineering
  3. Data
  4. Stream Summary Statistics

Stream Summary Statistics

Execute multiple operations on a Java Stream at once to avoid repeated traversal. Note that the Stream becomes invalid after the terminal operation.

By 
Horatiu Dan user avatar
Horatiu Dan
·
Jul. 18, 23 · Analysis
Like (2)
Save
Tweet
Share
7.8K Views

Join the DZone community and get the full member experience.

Join For Free

In order to be able to leverage various capabilities of the Java Streams, one shall first understand two general concepts – the stream and the stream pipeline. A Stream in Java is a sequential flow of data. A stream pipeline, on the other hand, represents a series of steps applied to data, a series that ultimately produce a result.

My family and I recently visited the Legoland Resort in Germany – a great place, by the way – and there, among other attractions, we had the chance to observe in detail a sample of the brick-building process. Briefly, everything starts from the granular plastic that is melted, modeled accordingly, assembled, painted, stenciled if needed, and packed up in bags and boxes. All the steps are part of an assembly factory pipeline.

What is worth mentioning is the fact that the next step cannot be done until the previous one has been completed and also that the number of steps is finite. Moreover, at every step, each Lego element is touched to perform the corresponding operation, and then it moves only forward, never backward, so that the next step is done. The same applies to Java streams.

In functional programming, the steps are called stream operations, and they are of three categories – one that starts the job (source), one that ends it and produces the result (terminal), and a couple of intermediate ones in between.

As a last consideration, it’s worth mentioning the intermediate operations have the ability to transform the stream into another one but are never run until the terminal operation runs (they are lazily evaluated). Finally, once the result is produced and the initial scope is achieved, the stream is no longer valid.

Abstract

Having as starting point the fact that in the case of Java Streams, once the terminal stream operation is done, the stream is no longer valid, this article aims to present a way of computing multiple operations at once through only one stream traversal. It is accomplished by leveraging the Java summary statistics objects (in particular IntSummaryStatistics) that reside since version 1.8.

Proof of Concept

The small project was built especially to showcase the statistics computation uses the following:

  • Java 17
  • Maven 3.6.3
  • JUnit Jupiter Engine v.5.9.3

As a domain, there is one straightforward entity – a parent.

Java
 
public record Parent(String name, int age) { }


It is modeled by two attributes – the name and its age. While the name is present only for being able to distinguish the parents, the age is the one of interest here.

The purpose is to be able to compute a few age statistics on a set of parents, that is:

  • The total sample count
  • The ages of the youngest and the oldest parent.
  • The age range of the group.
  • The average age
  • The total number of years the parents accumulate.

The results are encapsulated into a ParentStats structure and represented as a record as well.

Java
 
public record ParentStats(long count,
                          int youngest,
                          int oldest,
                          int ageRange,
                          double averageAge,
                          long totalYearsOfAge) { }


In order to accomplish this, an interface is defined. 

Java
 
public interface Service {
 
    ParentStats getStats(List<Parent> parents);
}


For now, it has only one method that receives input from a list of Parents and provides as output the desired statistics. 

Initial Implementation

As the problem is trivial, an initial and imperative implementation of the service might be as below:

Java
 
public class InitialService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        int count = parents.size();
        int min = Integer.MAX_VALUE;
        int max = 0;
        int sum = 0;
        for (Parent human : parents) {
            int age = human.age();
            if (age < min) {
                min = age;
            }
            if (age > max) {
                max = age;
            }
            sum += age;
        }
 
        return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
    }
}


The code looks clear, but it seems too focused on the how rather than on the what; thus, the problem seems to get lost in the implementation, and the code is hard to read.

As the functional style and streams are already part of every Java developer’s practices, most probably, the next service implementation would be chosen.

Java
 
public class StreamService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        int count = parents.size();
 
        int min = parents.stream()
                .mapToInt(Parent::age)
                .min()
                .orElseThrow(RuntimeException::new);
 
        int max = parents.stream()
                .mapToInt(Parent::age)
                .max()
                .orElseThrow(RuntimeException::new);
 
        int sum = parents.stream()
                .mapToInt(Parent::age)
                .sum();
 
        return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
    }
}


The code is more readable now; the downside though is the stream traversal redundancy for computing all the desired stats – three times in this particular case. As stated at the beginning of the article, once the terminal operation is done – min, max, sum – the stream is no longer valid. It would be convenient to be able to compute the aimed statistics without having to loop the list of parents multiple times. 

Summary Statistics Implementation

In Java, there is a series of objects called SummaryStatistics which come in different types – IntSummaryStatistics, LongSummaryStatistics, DoubleSummaryStatistics.

According to the JavaDoc, IntSummaryStatistics is “a state object for collecting statistics such as count, min, max, sum and average. The class is designed to work with (though does not require) streams”. 

It is a good candidate for the initial purpose; thus, the following implementation of the Service seems the preferred one.

Java
 
public class StatsService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        IntSummaryStatistics stats = parents.stream()
                .mapToInt(Parent::age)
                .summaryStatistics();
 
        return new ParentStats(stats.getCount(),
                stats.getMin(),
                stats.getMax(),
                stats.getMax() - stats.getMin(),
                stats.getAverage(),
                stats.getSum());
    }
}


There is only one stream of parents, the statistics get computed, and the code is way readable this time.

In order to check all three implementations, the following abstract base unit test is used.

Java
 
abstract class ServiceTest {
 
    private Service service;
 
    private List<Parent> mothers;
    private List<Parent> fathers;
    private List<Parent> parents;
 
    protected abstract Service setupService();
 
    @BeforeEach
    void setup() {
        service = setupService();
 
        mothers = IntStream.rangeClosed(1, 3)
                .mapToObj(i -> new Parent("Mother" + i, i + 30))
                .collect(Collectors.toList());
 
        fathers = IntStream.rangeClosed(4, 6)
                .mapToObj(i -> new Parent("Father" + i, i + 30))
                .collect(Collectors.toList());
 
        parents = new ArrayList<>(mothers);
        parents.addAll(fathers);
    }
 
    private void assertParentStats(ParentStats stats) {
        Assertions.assertNotNull(stats);
        Assertions.assertEquals(6, stats.count());
        Assertions.assertEquals(31, stats.youngest());
        Assertions.assertEquals(36, stats.oldest());
        Assertions.assertEquals(5, stats.ageRange());
 
        final int sum = 31 + 32 + 33 + 34 + 35 + 36;
 
        Assertions.assertEquals((double) sum/6, stats.averageAge());
        Assertions.assertEquals(sum, stats.totalYearsOfAge());
    }
 
    @Test
    void getStats() {
        final ParentStats stats = service.getStats(parents);
        assertParentStats(stats);
    }
}


As the stats are computed for all the parents, the mothers, and fathers are first put together in the same parents list (we will see later why there were two lists in the first place).

The particular unit test for each implementation is trivial – it sets up the service instance.

Java
 
class StatsServiceTest extends ServiceTest {
 
    @Override
    protected Service setupService() {
        return new StatsService();
    }
}


Combining Statistics

In addition to the already used methods – getMin(), getMax(), getCount(), getSum(), getAverage() – IntSummaryStatistics provides a way to combine the state of another similar object into the current one. 

Java
 
void combine(IntSummaryStatistics other)


As we saw in the above unit test, initially, there are two source lists – mothers and fathers. It would be convenient to be able to directly compute the statistics without first merging them.

In order to accomplish this, the Service is enriched with the following method.

Java
 
default ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
    final List<Parent> parents = new ArrayList<>(mothers);
    parents.addAll(fathers);
    return getStats(parents);
}


The first two implementations – InitialService and StreamService – are not of interest here; thus, a default implementation was provided for convenience. It is overwritten only by the StatsService. 

Java
 
@Override
public ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
    Collector<Parent, ?, IntSummaryStatistics> collector = Collectors.summarizingInt(Parent::age);
 
    IntSummaryStatistics stats = mothers.stream().collect(collector);
    stats.combine(fathers.stream().collect(collector));
 
    return new ParentStats(stats.getCount(),
            stats.getMin(),
            stats.getMax(),
            stats.getMax() - stats.getMin(),
            stats.getAverage(),
            stats.getSum());
}


By leveraging the combine() method, the statistics can be merged directly as different source lists are available.

The corresponding unit test is straightforward.

Java
 
@Test
void getCombinedStats() {
    final ParentStats stats = service.getCombinedStats(mothers, fathers);
    assertParentStats(stats);
}


Having seen the above Collector, the initial getStats() method may be written even more briefly. 

Java
 
@Override
public ParentStats getStats(List<Parent> parents) {
    IntSummaryStatistics stats = parents.stream()
            .collect(Collectors.summarizingInt(Parent::age));
 
    return new ParentStats(stats.getCount(),
            stats.getMin(),
            stats.getMax(),
            stats.getMax() - stats.getMin(),
            stats.getAverage(),
            stats.getSum());
}


Conclusion

Depending on the used data types, IntSummaryStatistics, LongSummaryStatistics or DoubleSummaryStatistics are convenient out-of-the-box structures that one can use to quickly compute simple statistics and focus on writing more readable and maintainable code. 

Data structure Statistics Java (programming language) Stream (computing) unit test Data Types

Published at DZone with permission of Horatiu Dan. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • The Long Road to Java Virtual Threads
  • Generics in Java and Their Implementation
  • The Two-Pointers Technique

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: