Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.
2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.
Java is an object-oriented programming language that allows engineers to produce software for multiple platforms. Our resources in this Zone are designed to help engineers with Java program development, Java SDKs, compilers, interpreters, documentation generators, and other tools used to produce a complete application.
Spring Strategy Pattern Example
Scaling Java Microservices to Extreme Performance Using NCache
In this article, learn how the Dapr project can reduce the cognitive load on Java developers and decrease application dependencies. Coding Java applications for the cloud requires not only a deep understanding of distributed systems, cloud best practices, and common patterns but also an understanding of the Java ecosystem to know how to combine many libraries to get things working. Tools and frameworks like Spring Boot have significantly impacted developer experience by curating commonly used Java libraries, for example, logging (Log4j), parsing different formats (Jackson), serving HTTP requests (Tomcat, Netty, the reactive stack), etc. While Spring Boot provides a set of abstractions, best practices, and common patterns, there are still two things that developers must know to write distributed applications. First, they must clearly understand which dependencies (clients/drivers) they must add to their applications depending on the available infrastructure. For example, they need to understand which database or message broker they need and what driver or client they need to add to their classpath to connect to it. Secondly, they must know how to configure that connection, the credentials, connection pools, retries, and other critical parameters for the application to work as expected. Understanding these configuration parameters pushes developers to know how these components (databases, message brokers, configurations stores, identity management tools) work to a point that goes beyond their responsibilities of writing business logic for their applications. Learning best practices, common patterns, and how a large set of application infrastructure components work is not bad, but it takes a lot of development time out of building important features for your application. In this short article, we will look into how the Dapr project can help Java developers not only to implement best practices and distributed patterns out of the box but also to reduce the application’s dependencies and the amount of knowledge required by developers to code their applications. We will be looking at a simple example that you can find here. This Pizza Store application demonstrates some basic behaviors that most business applications can relate to. The application is composed of three services that allow customers to place pizza orders in the system. The application will store orders in a database, in this case, PostgreSQL, and use Kafka to exchange events between the services to cover async notifications. All the asynchronous communications between the services are marked with red dashed arrows. Let’s look at how to implement this with Spring Boot, and then let’s add Dapr. The Spring Boot Way Using Spring Boot, developers can create these three services and start writing the business logic to process the order placed by the customer. Spring Boot Developers can use http://start.spring.io to select which dependencies their applications will have. For example, with the Pizza Store Service, they will need Spring Web (to host and serve the FrontEnd and some REST endpoints), but also the Spring Actuators extension if we aim to run these services on Kubernetes. But as with any application, if we want to store data, we will need a database/persistent storage, and we have many options to select from. If you look into Spring Data, you can see that Spring Data JPA provides an abstraction to SQL (relational) databases. As you can see in the previous screenshot, there are also NoSQL options and different layers of abstractions here, depending on what your application is doing. If you decide to use Spring Data JPA, you are still responsible for adding the correct database Driver to the application classpath. In the case of PostgreSQL, you can also select it from the list. We face a similar dilemma if we think about exchanging asynchronous messages between the application’s services. There are too many options: Because we are developers and want to get things moving forward, we must make some choices here. Let’s use PostgreSQL as our database and Kafka as our messaging system/broker. I am a true believer in the Spring Boot programming model, including the abstraction layers and auto-configurations. However, as a developer, you are still responsible for ensuring that the right PostgreSQL JDBC driver and Kafka Client are included in your services classpath. While this is quite common in the Java space, there are a few drawbacks when dealing with larger applications that might consist of tens or hundreds of services. Application and Infrastructure Dependencies Drawbacks Looking at our simple application, we can spot a couple of challenges that application and operation teams must deal with when taking this application to production. Let’s start with application dependencies and their relationship with the infrastructure components we have decided to use. The Kafka Client included in all services needs to be kept in sync with the Kafka instance version that the application will use. This dependency pushes developers to ensure they use the same Kafka Instance version for development purposes. If we want to upgrade the Kafka Instance version, we need to upgrade, which means releasing every service that includes the Kafka Client again. This is particularly hard because Kafka tends to be used as a shared component across different services. Databases such as PostgreSQL can be hidden behind a service and never exposed to other services directly. But imagine two or more services need to store data; if they choose to use different database versions, operation teams will need to deal with different stack versions, configurations, and maybe certifications for each version. Aligning on a single version, say PostgreSQL 16.x, once again couples all the services that need to store or read persistent data with their respective infrastructure components. While versions, clients, and drivers create these coupling between applications and the available infrastructure, understanding complex configurations and their impact on application behavior is still a tough challenge to solve. Spring Boot does a fantastic job at ensuring that all configurations can be externalized and consumed from environment variables or property files, and while this aligns perfectly with the 12-factor apps principles and with container technologies such as Docker, defining these configurations parameter values is the core problem. Developers using different connection pool sizes, retry, and reconnection mechanisms being configured differently across environments are still, to this day, common issues while moving the same application from development environments to production. Learning how to configure Kafka and PostgreSQL for this example will depend a lot on how many concurrent orders the application receives and how many resources (CPU and memory) the application has available to run. Once again, learning the specifics of each infrastructure component is not a bad thing for developers. Still, it gets in the way of implementing new services and new functionalities for the store. Decoupling Infrastructure Dependencies and Reusing Best Practices With Dapr What if we can extract best practices, configurations, and the decision of which infrastructure components we need for our applications behind a set of APIs that application developers can consume without worrying about which driver/client they need or how to configure the connections to be efficient, secure and work across environments? This is not a new idea. Any company dealing with complex infrastructure and multiple services that need to connect to infrastructure will sooner or later implement an abstraction layer on top of common services that developers can use. The main problem is that building those abstractions and then maintaining them over time is hard, costs development time, and tends to get bypassed by developers who don’t agree or like the features provided. This is where Dapr offers a set of building blocks to decouple your applications from infrastructure. Dapr Building Block APIs allow you to set up different component implementations and configurations without exposing developers to the hassle of choosing the right drivers or clients to connect to the infrastructure. Developers focus on building their applications by just consuming APIs. As you can see in the diagram, developers don’t need to know about “infrastructure land” as they can consume and trust APIs to, for example, store and retrieve data and publish and subscribe to events. This separation of concern allows operation teams to provide consistent configurations across environments where we may want to use another version of PostgreSQL, Kafka, or a cloud provider service such as Google PubSub. Dapr uses the component model to define these configurations without affecting the application behavior and without pushing developers to worry about any of those parameters or the client/driver version they need to use. Dapr for Spring Boot Developers So, how does this look in practice? Dapr typically deploys to Kubernetes, meaning you need a Kubernetes cluster to install Dapr. Learning about how Dapr works and how to configure it might be too complicated and not related at all to developer tasks like building features. For development purposes, you can use the Dapr CLI, a command line tool designed to be language agnostic, allowing you to run Dapr locally for your applications. I like the Dapr CLI, but once again, you will need to learn about how to use it, how to configure it, and how it connects to your application. As a Spring Boot developer, adding a new command line tool feels strange, as it is not integrated with the tools that I am used to using or my IDE. If I see that I need to download a new CLI or if I depend on deploying my apps into a Kubernetes cluster even to test them, I would probably step away and look for other tools and projects. That is why the Dapr community has worked so hard to integrate with Spring Boot more natively. These integrations seamlessly tap into the Spring Boot ecosystem without adding new tools or steps to your daily work. Let’s see how this works with concrete examples. You can add the following dependency in your Spring Boot application that integrates Dapr with Testcontainers. <dependency> <groupId>io.diagrid.dapr</groupId> <artifactId>dapr-spring-boot-starter</artifactId> <version>0.10.7</version> </dependency> View the repository here. Testcontainers (now part of Docker) is a popular tool in Java to work with containers, primarily for tests, specifically integration tests that use containers to set up complex infrastructure. Our three Pizza Spring Boot services have the same dependency. This allows developers to enable their Spring Boot applications to consume the Dapr Building Block APIs for their local development without any Kubernetes, YAML, or configurations needed. Once you have this dependency in place, you can start using the Dapr SDK to interact with Dapr Building Blocks APIs, for example, if you want to store an incoming order using the Statestore APIs: Where `STATESTORE_NAME` is a configured Statestore component name, the `KEY` is just a key that we want to use to store this order and `order` is the order that we received from the Pizza Store front end. Similarly, if you want to publish events to other services, you can use the PubSub Dapr API; for example, to emit an event that contains the order as the payload, you can use the following API: The publishEvent API publishes an event containing the `order` as a payload into the Dapr PubSub component named (PUBSUB_NAME) and inside a specific topic indicated by PUBSUB_TOPIC. Now, how is this going to work? How is Dapr storing state when we call the saveState() API, or how are events published when we call publishEvent()? By default, the Dapr SDK will try to call the Dapr API endpoints to localhost, as Dapr was designed to run beside our applications. For development purposes, to enable Dapr for your Spring Boot application, you can use one of the two built-in profiles: DaprBasicProfile or DaprFullProfile. The Basic profile provides access to the Statestore and PubSub API, but more advanced features such as Actors and Workflows will not work. If you want to get access to all Dapr Building Blocks, you can use the Full profile. Both of these profiles use in-memory implementations for the Dapr components, making your applications faster to bootstrap. The dapr-spring-boot-starter was created to minimize the amount of Dapr knowledge developers need to start using it in their applications. For this reason, besides the dependency mentioned above, a test configuration is required in order to select which Dapr profile we want to use. Since Spring Boot 3.1.x, you can define a Spring Boot application that will be used for test purposes. The idea is to allow tests to set up your application with all that is needed to test it. From within the test packages (`src/test/<package>`) you can define a new @SpringBootApplication class, in this case, configured to use a Dapr profile. As you can see, this is just a wrapper for our PizzaStore application, which adds a configuration that includes the DaprBasicProfile. With the DaprBasicProfile enabled, whenever we start our application for testing purposes, all the components that we need for the Dapr APIs to work will be started for our application to consume. If you need more advanced Dapr setups, you can always create your domain-specific Dapr profiles. Another advantage of using these test configurations is that we can also start the application using test configuration for local development purposes by running `mvn spring-boot:test-run` You can see how Testcontainers is transparently starting the `daprio/daprd` container. As a developer, how that container is configured is not important as soon as we can consume the Dapr APIs. I strongly recommend you check out the full example here, where you can run the application on Kubernetes with Dapr installed or start each service and test locally using Maven. If this example is too complex for you, I recommend you to check these blog posts where I create a very simple application from scratch: Using the Dapr StateStore API with Spring Boot Deploying and configuring our simple application in Kubernetes
Java 17 heralds a new era in Java's evolution, bringing forth the Foreign Function and Memory API as part of its feature set. This API, a cornerstone of Project Panama, is designed to revolutionize the way Java applications interact with native code and memory. Its introduction is a response to the long-standing complexities and inefficiencies associated with the Java Native Interface (JNI), offering a more straightforward, safe, and efficient pathway for Java to interface with non-Java code. This modernization is not just an upgrade but a transformation in how Java developers will approach native interoperability, promising to enhance performance, reduce boilerplate, and minimize error-prone code. Background Traditionally, interfacing Java with native code was predominantly handled through the Java Native Interface (JNI), a framework that allowed Java code to interact with applications and libraries written in other languages like C or C++. However, JNI's steep learning curve, performance overhead, and manual error handling made it less than ideal. The Java Native Access (JNA) library emerged as an alternative, offering easier use but at the cost of performance. Both methods left a gap in the Java ecosystem for a more integrated, efficient, and developer-friendly approach to native interfacing. The Foreign Function and Memory API in Java 17 fills this gap, overcoming the limitations of its predecessors and setting a new standard for native integration. Overview of the Foreign Function and Memory API The Foreign Function and Memory API is a testament to Java's ongoing evolution, designed to provide seamless and efficient interaction with native code and memory. It comprises two main components: the Foreign Function API and the Memory Access API. The Foreign Function API facilitates calling native functions from Java code, addressing type safety and reducing the boilerplate code associated with JNI. The Memory Access API allows for safe and efficient operations on native memory, including allocation, access, and deallocation, mitigating the risks of memory leaks and undefined behavior. Key Features and Advancements The Foreign Function and Memory API introduces several key features that significantly advance Java's native interfacing capabilities: Enhanced Type Safety The API advances type safety in native interactions, addressing the runtime type errors commonly associated with JNI through compile-time type resolution. This is achieved via a combination of method handles and a sophisticated linking mechanism, ensuring a robust match between Java and native types before execution. Linking at compile-time: Employing descriptors for native functions, the API ensures early type resolution, minimizing runtime type discrepancies and enhancing application stability. Utilization of method handles: The adoption of method handles in the API not only enforces strong typing but also introduces flexibility and immutability in native method invocation, elevating the safety and robustness of native calls. Minimized Boilerplate Code Addressing the verbosity inherent in JNI, the Foreign Function and Memory API offers a more concise approach to native method declaration and invocation, significantly reducing the required boilerplate code. Simplified method linking: With straightforward linking descriptors, the API negates the need for verbose JNI-style declarations, streamlining the process of interfacing with native libraries. Streamlined type conversions: The API's automatic mapping for common data types simplifies the translation between Java and native types, extending even to complex structures through direct memory layout descriptions. Streamlined Resource Management The API introduces a robust model for managing native resources, addressing the common pitfalls of memory management in JNI-based applications, such as leaks and manual deallocation. Scoped resource management: Through the concept of resource scopes, the API delineates the lifecycle of native allocations, ensuring automatic cleanup and reducing the likelihood of leaks. Integration with try-with-resources: The compatibility of resource scopes and other native allocations with Java's try-with-resources mechanism facilitates deterministic resource management, further mitigating memory management issues. Enhanced Performance Designed with performance optimization in mind, the Foreign Function and Memory API outperforms its predecessors by reducing call overhead and optimizing memory operations, crucial for high-performance native interactions. Efficient memory operations: The API's Memory Access component optimizes native memory manipulation, offering low-overhead access crucial for applications demanding high throughput or minimal latency. Reduced call overhead: By refining the native call process and minimizing intermediary operations, the API achieves a more efficient execution path for native function invocations compared to JNI. Seamless Java Integration The API is meticulously crafted to complement existing Java features, ensuring a harmonious integration that leverages the strengths of the Java ecosystem. NIO compatibility: The API's synergy with Java NIO enables efficient data exchanges between Java byte buffers and native memory, vital for I/O-centric applications. VarHandle and MethodHandle integration: By embracing VarHandle and MethodHandle, the API offers dynamic and sophisticated means for native memory and function manipulation, enriching the interaction with native code through Java's established handle framework. Practical Examples Simple To illustrate the API's utility, consider a scenario where a Java application needs to call a native library function, int sum(int a, int b), which sums two integers. With the Foreign Function and Memory API, this can be achieved with minimal boilerplate: Java MethodHandle sum = CLinker.getInstance().downcallHandle( LibraryLookup.ofPath("libnative.so").lookup("sum").get(), MethodType.methodType(int.class, int.class, int.class), FunctionDescriptor.of(CLinker.C_INT, CLinker.C_INT, CLinker.C_INT) ); int result = (int) sum.invokeExact(5, 10); System.out.println("The sum is: " + result); This example demonstrates the simplicity and type safety of invoking native functions, contrasting sharply with the more cumbersome and error-prone JNI approach. Calling a Struct-Manipulating Native Function Consider a scenario where you have a native library function that manipulates a C struct. For instance, a function void updatePerson(Person* p, const char* name, int age) that updates a Person struct. With the Memory Access API, you can define and manipulate this struct directly from Java: Java var scope = ResourceScope.newConfinedScope(); var personLayout = MemoryLayout.structLayout( CLinker.C_POINTER.withName("name"), CLinker.C_INT.withName("age") ); var personSegment = MemorySegment.allocateNative(personLayout, scope); var cString = CLinker.toCString("John Doe", scope); CLinker.getInstance().upcallStub( LibraryLookup.ofPath("libperson.so").lookup("updatePerson").get(), MethodType.methodType(void.class, MemoryAddress.class, MemoryAddress.class, int.class), FunctionDescriptor.ofVoid(CLinker.C_POINTER, CLinker.C_POINTER, CLinker.C_INT), personSegment.address(), cString.address(), 30 ); This example illustrates how you can use the Memory Access API to interact with complex data structures expected by native libraries, providing a powerful tool for Java applications that need to work closely with native code. Interfacing With Operating System APIs Another common use case for the Foreign Function and Memory API is interfacing with operating system-level APIs. For example, calling the POSIX getpid function, which returns the calling process's ID, can be done as follows: Java MethodHandle getpid = CLinker.getInstance().downcallHandle( LibraryLookup.ofDefault().lookup("getpid").get(), MethodType.methodType(int.class), FunctionDescriptor.of(CLinker.C_INT) ); int pid = (int) getpid.invokeExact(); System.out.println("Process ID: " + pid); This example demonstrates the ease with which Java applications can now invoke OS-level functions, opening up new possibilities for direct system interactions without relying on Java libraries or external processes. Advanced Memory Access The Memory Access API also allows for more advanced memory operations, such as slicing, dicing, and iterating over memory segments. This is particularly useful for operations on arrays or buffers of native memory. Working With Native Arrays Suppose you need to interact with a native function that expects an array of integers. You can allocate, populate, and pass a native array as follows: Java var intArrayLayout = MemoryLayout.sequenceLayout(10, CLinker.C_INT); try (var scope = ResourceScope.newConfinedScope()) { var intArraySegment = MemorySegment.allocateNative(intArrayLayout, scope); for (int i = 0; i < 10; i++) { CLinker.C_INT.set(intArraySegment.asSlice(i * CLinker.C_INT.byteSize()), i); } // Assuming a native function `void processArray(int* arr, int size)` MethodHandle processArray = CLinker.getInstance().downcallHandle( LibraryLookup.ofPath("libarray.so").lookup("processArray").get(), MethodType.methodType(void.class, MemoryAddress.class, int.class), FunctionDescriptor.ofVoid(CLinker.C_POINTER, CLinker.C_INT) ); processArray.invokeExact(intArraySegment.address(), 10); } This example showcases how to create and manipulate native arrays, enabling Java applications to work with native libraries that process large datasets or perform bulk operations on data. Byte Buffers and Direct Memory The Memory Access API seamlessly integrates with Java's existing NIO buffers, allowing for efficient data transfer between Java and native memory. For instance, transferring data from a ByteBuffer to native memory can be achieved as follows: Java ByteBuffer javaBuffer = ByteBuffer.allocateDirect(100); // Populate the ByteBuffer with data ... try (var scope = ResourceScope.newConfinedScope()) { var nativeBuffer = MemorySegment.allocateNative(100, scope); CLinker.asByteBuffer(nativeBuffer).put(javaBuffer); // Now nativeBuffer contains the data from javaBuffer, ready for native processing } This interoperability with NIO buffers enhances the flexibility and efficiency of data exchange between Java and native code, making it ideal for applications that require high-performance IO operations. Best Practices and Considerations Scalability and Concurrency When working with the Foreign Function and Memory API in concurrent or high-load environments, consider the implications on scalability and resource management. Leveraging ResourceScope effectively can help manage the lifecycle of native resources in complex scenarios. Security Implications Interfacing with native code can introduce security risks, such as buffer overflows or unauthorized memory access. Always validate inputs and outputs when dealing with native functions to mitigate these risks. Debugging and Diagnostics Debugging issues that span Java and native code can be challenging. Utilize Java's built-in diagnostic tools and consider logging or tracing native function calls to simplify debugging. Future Developments and Community Involvement The Foreign Function and Memory API is a living part of the Java ecosystem, with ongoing developments and enhancements influenced by community feedback and use cases. Active involvement in the Java community, through forums, JEP discussions, and contributing to OpenJDK, can help shape the future of this API and ensure it meets the evolving needs of Java developers. Conclusion The Foreign Function and Memory API in Java 17 represents a paradigm shift in Java's capability for native interfacing, offering unprecedented ease of use, safety, and performance. Through practical examples, we've seen how this API simplifies complex native interactions, from manipulating structs and arrays to interfacing with OS-level functions. As Java continues to evolve, the Foreign Function and Memory API stands as a testament to the language's adaptability and its commitment to meeting the needs of modern developers. With this API, the Java ecosystem is better equipped than ever to build high-performance, native-integrated applications, heralding a new era of Java-native interoperability.
Introduction to Datafaker Datafaker is a modern framework that enables JVM programmers to efficiently generate fake data for their projects using over 200 data providers allowing for quick setup and usage. Custom providers could be written when you need some domain-specific data. In addition to providers, the generated data can be exported to popular formats like CSV, JSON, SQL, XML, and YAML. For a good introduction to the basic features, see "Datafaker: An Alternative to Using Production Data." Datafaker offers many features, such as working with sequences and collections and generating custom objects based on schemas (see "Datafaker 2.0"). Bulk Data Generation In software development and testing, the need to frequently generate data for various purposes arises, whether it's to conduct non-functional tests or to simulate burst loads. Let's consider a straightforward scenario when we have the task of generating 10,000 messages in JSON format to be sent to RabbitMQ. From my perspective, these options are worth considering: Developing your own tool: One option is to write a custom application from scratch to generate these records(messages). If the generated data needs to be more realistic, it makes sense to use Datafaker or JavaFaker. Using specific tools: Alternatively, we could select specific tools designed for particular databases or message brokers. For example, tools like voluble for Kafka provide specialized functionalities for generating and publishing messages to Kafka topics; or a more modern tool like ShadowTraffic, which is currently under development and directed towards a container-based approach, which may not always be necessary. Datafaker Gen: Finally, we have the option to use Datafaker Gen, which I want to consider in the current article. Datafaker Gen Overview Datafaker Gen offers a command-line generator based on the Datafaker library which allows for the continuous generation of data in various formats and integration with different storage systems, message brokers, and backend services. Since this tool uses Datafaker, there may be a possibility that the data is realistic. Configuration of the scheme, format type, and sink can be done without rebuilding the project. Datafake Gen consists of the following main components that can be configured: 1. Schema Definition Users can define the schema for their records in the config.yaml file. The schema specifies the field definitions of the record based on the Datafaker provider. It also allows for the definition of embedded fields. YAML default_locale: en-EN fields: - name: lastname generators: [ Name#lastName ] - name: firstname generators: [ Name#firstName ] 2. Format Datafake Gen allows users to specify the format in which records will be generated. Currently, there are basic implementations for CSV, JSON, SQL, XML, and YAML formats. Additionally, formats can be extended with custom implementations. The configuration for formats is specified in the output.yaml file. YAML formats: csv: quote: "@" separator: $$$$$$$ json: formattedAs: "[]" yaml: xml: pretty: true 3. Sink The sink component determines where the generated data will be stored or published. The basic implementation includes command-line output and text file sinks. Additionally, sinks can be extended with custom implementations such as RabbitMQ, as demonstrated in the current article. The configuration for sinks is specified in the output.yaml file. YAML sinks: rabbitmq: batchsize: 1 # when 1 message contains 1 document, when >1 message contains a batch of documents host: localhost port: 5672 username: guest password: guest exchange: test.direct.exchange routingkey: products.key Extensibility via Java SPI Datafake Gen uses the Java SPI (Service Provider Interface) to make it easy to add new formats or sinks. This extensibility allows for customization of Datafake Gen according to specific requirements. How To Add a New Sink in Datafake Gen Before adding a new sink, you may want to check if it already exists in the datafaker-gen-examples repository. If it does not exist, you can refer to examples on how to add a new sink. When it comes to extending Datafake Gen with new sink implementations, developers have two primary options to consider: By using this parent project, developers can implement sink interfaces for their sink extensions, similar to those available in the datafaker-gen-examples repository. Include dependencies from the Maven repository to access the required interfaces. For this approach, Datafake Gen should be built and exist in the local Maven repository. This approach provides flexibility in project structure and requirements. 1. Implementing RabbitMQ Sink To add a new RabbitMQ sink, one simply needs to implement the net.datafaker.datafaker_gen.sink.Sink interface. This interface contains two methods: getName - This method defines the sink name. run - This method triggers the generation of records and then sends or saves all the generated records to the specified destination. The method parameters include the configuration specific to this sink retrieved from the output.yaml file as well as the data generation function and the desired number of lines to be generated. Java import net.datafaker.datafaker_gen.sink.Sink; public class RabbitMqSink implements Sink { @Override public String getName() { return "rabbitmq"; } @Override public void run(Map<String, ?> config, Function<Integer, ?> function, int numberOfLines) { // Read output configuration ... int numberOfLinesToPrint = numberOfLines; String host = (String) config.get("host"); // Generate lines String lines = (String) function.apply(numberOfLinesToPrint); // Sending or saving results to the expected resource // In this case, this is connecting to RebbitMQ and sending messages. ConnectionFactory factory = getConnectionFactory(host, port, username, password); try (Connection connection = factory.newConnection()) { Channel channel = connection.createChannel(); JsonArray jsonArray = JsonParser.parseString(lines).getAsJsonArray(); jsonArray.forEach(jsonElement -> { try { channel.basicPublish(exchange, routingKey, null, jsonElement.toString().getBytes()); } catch (Exception e) { throw new RuntimeException(e); } }); } catch (Exception e) { throw new RuntimeException(e); } } } 2. Adding Configuration for the New RabbitMQ Sink As previously mentioned, the configuration for sinks or formats can be added to the output.yaml file. The specific fields may vary depending on your custom sink. Below is an example configuration for a RabbitMQ sink: YAML sinks: rabbitmq: batchsize: 1 # when 1 message contains 1 document, when >1 message contains a batch of documents host: localhost port: 5672 username: guest password: guest exchange: test.direct.exchange routingkey: products.key 3. Adding Custom Sink via SPI Adding a custom sink via SPI (Service Provider Interface) involves including the provider configuration in the ./resources/META-INF/services/net.datafaker.datafaker_gen.sink.Sink file. This file contains paths to the sink implementations: Properties files net.datafaker.datafaker_gen.sink.RabbitMqSink These are all 3 simple steps on how to expand Datafake Gen. In this example, we are not providing a complete implementation of the sink, as well as how to use additional libraries. To see the complete implementations, you can refer to the datafaker-gen-rabbitmq module in the example repository. How To Run Step 1 Build a JAR file based on the new implementation: Shell ./mvnw clean verify Step 2 Define the schema for records in the config.yaml file and place this file in the appropriate location where the generator should run. Additionally, define the sinks and formats in the output.yaml file, as demonstrated previously. Step 3 Datafake Gen can be executed through two options: Use bash script from the bin folder in the parent project: Shell # Format json, number of lines 10000 and new RabbitMq Sink bin/datafaker_gen -f json -n 10000 -sink rabbitmq 2. Execute the JAR directly, like this: Shell java -cp [path_to_jar] net.datafaker.datafaker_gen.DatafakerGen -f json -n 10000 -sink rabbitmq How Fast Is It? The test was done based on the scheme described above, which means that one document consists of two fields. Documents are recorded one by one in the RabbitMQ queue in JSON format. The table below shows the speed for 10,000, 100,000, and 1M records on my local machine: Records Time 10000 401 ms 100000 11613ms 1000000 121601ms Conclusion The Datafake Gen tool enables the creation of flexible and fast data generators for various types of destinations. Built on Datafaker, it facilitates realistic data generation. Developers can easily configure the content of records, formats, and sinks to suit their needs. As a simple Java application, it can be deployed anywhere you want, whether it's in Docker or on-premise machines. The full source code is available here. I would like to thank Sergey Nuyanzin for reviewing this article. Thank you for reading, and I am glad to be of help.
NoSQL databases provide a flexible and scalable option for storing and retrieving data in database management. However, they can need help with object-oriented programming paradigms, such as inheritance, which is a fundamental concept in languages like Java. This article explores the impedance mismatch when dealing with inheritance in NoSQL databases. The Inheritance Challenge in NoSQL Databases The term “impedance mismatch” refers to the disconnect between the object-oriented world of programming languages like Java and NoSQL databases’ tabular, document-oriented, or graph-based structures. One area where this mismatch is particularly evident is in handling inheritance. In Java, inheritance allows you to create a hierarchy of classes, where a subclass inherits properties and behaviors from its parent class. This concept is deeply ingrained in Java programming and is often used to model real-world relationships. However, NoSQL databases have no joins, and the inheritance structure needs to be handled differently. Jakarta Persistence (JPA) and Inheritance Strategies Before diving into more advanced solutions, it’s worth mentioning that there are strategies to simulate inheritance in relational databases in the world of Jakarta Persistence (formerly known as JPA). These strategies include: JOINED inheritance strategy: In this approach, fields specific to a subclass are mapped to a separate table from the fields common to the parent class. A join operation is performed to instantiate the subclass when needed. SINGLE_TABLE inheritance strategy: This strategy uses a single table representing the entire class hierarchy. Discriminator columns are used to differentiate between different subclasses. TABLE_PER_CLASS inheritance strategy: Each concrete entity class in the hierarchy corresponds to its table in the database. These strategies work well in relational databases but are not directly applicable to NoSQL databases, primarily because NoSQL databases do not support traditional joins. Live Code Session: Java SE, Eclipse JNoSQL, and MongoDB In this live code session, we will create a Java SE project using MongoDB as our NoSQL database. We’ll focus on managing game characters, specifically Mario and Sonic characters, using Eclipse JNoSQL. You can run MongoDB locally using Docker or in the cloud with MongoDB Atlas. We’ll start with the database setup and then proceed to the Java code implementation. Setting Up MongoDB Locally To run MongoDB locally, you can use Docker with the following command: Shell docker run -d --name mongodb-instance -p 27017:27017 mongo Alternatively, you can choose to execute it in the cloud by following the instructions provided by MongoDB Atlas. With the MongoDB database up and running, let’s create our Java project. Creating the Java Project We’ll create a Java SE project using Maven and the maven-archetype-quickstart archetype. This project will utilize the following technologies and dependencies: Jakarta CDI Jakarta JSONP Eclipse MicroProfile Eclipse JNoSQL database Maven Dependencies Add the following dependencies to your project’s pom.xml file: XML <dependencies> <dependency> <groupId>org.jboss.weld.se</groupId> <artifactId>weld-se-shaded</artifactId> <version>${weld.se.core.version}</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.eclipse</groupId> <artifactId>yasson</artifactId> <version>3.0.3</version> <scope>compile</scope> </dependency> <dependency> <groupId>io.smallrye.config</groupId> <artifactId>smallrye-config-core</artifactId> <version>3.2.1</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.eclipse.microprofile.config</groupId> <artifactId>microprofile-config-api</artifactId> <version>3.0.2</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.eclipse.jnosql.databases</groupId> <artifactId>jnosql-mongodb</artifactId> <version>${jnosql.version}</version> </dependency> <dependency> <groupId>net.datafaker</groupId> <artifactId>datafaker</artifactId> <version>2.0.2</version> </dependency> </dependencies> Make sure to replace ${jnosql.version} with the appropriate version of Eclipse JNoSQL you intend to use. In the next section, we will proceed with implementing our Java code. Implementing Our Java Code Our GameCharacter class will serve as the parent class for all game characters and will hold the common attributes shared among them. We’ll use inheritance and discriminator columns to distinguish between Sonic’s and Mario’s characters. Here’s the initial definition of the GameCharacter class: Java @Entity @DiscriminatorColumn("type") @Inheritance public abstract class GameCharacter { @Id @Convert(UUIDConverter.class) protected UUID id; @Column protected String character; @Column protected String game; public abstract GameType getType(); } In this code: We annotate the class with @Entity to indicate that it is a persistent entity in our MongoDB database. We use @DiscriminatorColumn("type") to specify that a discriminator column named “type” will be used to differentiate between subclasses. @Inheritance indicates that this class is part of an inheritance hierarchy. The GameCharacter class has a unique identifier (id), attributes for character name (character) and game name (game), and an abstract method getType(), which its subclasses will implement to specify the character type. Specialization Classes: Sonic and Mario Now, let’s create the specialization classes for Sonic and Mario entities. These classes will extend the GameCharacter class and provide additional attributes specific to each character type. We’ll use @DiscriminatorValue to define the values the “type” discriminator column can take for each subclass. Java @Entity @DiscriminatorValue("SONIC") public class Sonic extends GameCharacter { @Column private String zone; @Override public GameType getType() { return GameType.SONIC; } } In the Sonic class: We annotate it with @Entity to indicate it’s a persistent entity. @DiscriminatorValue("SONIC") specifies that the “type” discriminator column will have the value “SONIC” for Sonic entities. We add an attribute zone-specific to Sonic characters. The getType() method returns GameType.SONIC, indicating that this is a Sonic character. Java @Entity @DiscriminatorValue("MARIO") public class Mario extends GameCharacter { @Column private String locations; @Override public GameType getType() { return GameType.MARIO; } } Similarly, in the Mario class: We annotate it with @Entity to indicate it’s a persistent entity. @DiscriminatorValue("MARIO") specifies that the “type” discriminator column will have the value “MARIO” for Mario entities. We add an attribute locations specific to Mario characters. The getType() method returns GameType.MARIO, indicating that this is a Mario character. With this modeling approach, you can easily distinguish between Sonic and Mario characters in your MongoDB database using the discriminator column “type.” We will create our first database integration with MongoDB using Eclipse JNoSQL. To simplify, we will generate data using the Data Faker library. Our Java application will insert Mario and Sonic characters into the database and perform basic operations. Application Code Here’s the main application code that generates and inserts data into the MongoDB database: Java public class App { public static void main(String[] args) { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { DocumentTemplate template = container.select(DocumentTemplate.class).get(); DataFaker faker = new DataFaker(); Mario mario = Mario.of(faker.generateMarioData()); Sonic sonic = Sonic.of(faker.generateSonicData()); // Insert Mario and Sonic characters into the database template.insert(List.of(mario, sonic)); // Count the total number of GameCharacter documents long count = template.count(GameCharacter.class); System.out.println("Total of GameCharacter: " + count); // Find all Mario characters in the database List<Mario> marioCharacters = template.select(Mario.class).getResultList(); System.out.println("Find all Mario characters: " + marioCharacters); // Find all Sonic characters in the database List<Sonic> sonicCharacters = template.select(Sonic.class).getResultList(); System.out.println("Find all Sonic characters: " + sonicCharacters); } } } In this code: We use the SeContainer to manage our CDI container and initialize the DocumentTemplate from Eclipse JNoSQL. We create instances of Mario and Sonic characters using data generated by the DataFaker class. We insert these characters into the MongoDB database using the template.insert() method. We count the total number of GameCharacter documents in the database. We retrieve and display all Mario and Sonic characters from the database. Resulting Database Structure As a result of running this code, you will see data in your MongoDB database similar to the following structure: JSON [ { "_id": "39b8901c-669c-49db-ac42-c1cabdcbb6ed", "character": "Bowser", "game": "Super Mario Bros.", "locations": "Mount Volbono", "type": "MARIO" }, { "_id": "f60e1ada-bfd9-4da7-8228-6a7f870e3dc8", "character": "Perfect Chaos", "game": "Sonic Rivals 2", "type": "SONIC", "zone": "Emerald Hill Zone" } ] As shown in the database structure, each document contains a unique identifier (_id), character name (character), game name (game), and a discriminator column type to differentiate between Mario and Sonic characters. You will see more characters in your MongoDB database depending on your generated data. This integration demonstrates how to insert, count, and retrieve game characters using Eclipse JNoSQL and MongoDB. You can extend and enhance this application to manage and manipulate your game character data as needed. We will create repositories for managing game characters using Eclipse JNoSQL. We will have a Console repository for general game characters and a SonicRepository specifically for Sonic characters. These repositories will allow us to interact with the database and perform various operations easily. Let’s define the repositories for our game characters. Console Repository Java @Repository public interface Console extends PageableRepository<GameCharacter, UUID> { } The Console repository extends PageableRepository and is used for general game characters. It provides common CRUD operations and pagination support. Sonic Repository Java @Repository public interface SonicRepository extends PageableRepository<Sonic, UUID> { } The SonicRepository extends PageableRepository but is specifically designed for Sonic characters. It inherits common CRUD operations and pagination from the parent repository. Main Application Code Now, let’s modify our main application code to use these repositories. For Console Repository Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Console repository = container.select(Console.class).get(); for (int index = 0; index < 5; index++) { Mario mario = Mario.of(faker); Sonic sonic = Sonic.of(faker); repository.saveAll(List.of(mario, sonic)); } long count = repository.count(); System.out.println("Total of GameCharacter: " + count); System.out.println("Find all game characters: " + repository.findAll().toList()); } System.exit(0); } In this code, we use the Console repository to save both Mario and Sonic characters, demonstrating its ability to manage general game characters. For Sonic Repository Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { SonicRepository repository = container.select(SonicRepository.class).get(); for (int index = 0; index < 5; index++) { Sonic sonic = Sonic.of(faker); repository.save(sonic); } long count = repository.count(); System.out.println("Total of Sonic characters: " + count); System.out.println("Find all Sonic characters: " + repository.findAll().toList()); } System.exit(0); } This code uses the SonicRepository to save Sonic characters specifically. It showcases how to work with a repository dedicated to a particular character type. With these repositories, you can easily manage, query, and filter game characters based on their type, simplifying the code and making it more organized. Conclusion In this article, we explored the seamless integration of MongoDB with Java using the Eclipse JNoSQL framework for efficient game character management. We delved into the intricacies of modeling game characters, addressing challenges related to inheritance in NoSQL databases while maintaining compatibility with Java's object-oriented principles. By employing discriminator columns, we could categorize characters and store them within the MongoDB database, creating a well-structured and extensible solution. Through our Java application, we demonstrated how to generate sample game character data using the Data Faker library and efficiently insert it into MongoDB. We performed essential operations, such as counting the number of game characters and retrieving specific character types. Moreover, we introduced the concept of repositories in Eclipse JNoSQL, showcasing their value in simplifying data management and enabling focused queries based on character types. This article provides a solid foundation for harnessing the power of Eclipse JNoSQL and MongoDB to streamline NoSQL database interactions in Java applications, making it easier to manage and manipulate diverse data sets. Source code
Concurrency is one of the most complex problems we (developers) can face in our daily work. Additionally, it is also one of the most common problems that we may face while solving our day-to-day issues. The combination of both these factors is what truly makes concurrency and multithreading the most dangerous issues software engineers may encounter. What is more, solving concurrency problems with low-level abstractions can be quite a cognitive challenge and lead to complex, nondeterministic errors. That is why most languages introduce higher-level abstractions that allow us to solve concurrency-related problems with relative ease and not spend time tweaking low-level switches. In this article, I would like to dive deeper into such abstractions provided by the Java standard library, namely the ExecutorService interface and its implementations. I also want this article to be an entry point for my next article about benchmarking Java Streams. Before that, let's have a quick recap on processes and threads: What Is a Process? It is the simplest unit that can be executed on its own inside our computers. Thanks to the processes, we can split the work being done inside our machines into smaller, more modular, and manageable parts. Such an approach allows us to make particular parts more focused and thus more performant. A split like that also makes it possible to take full advantage of multiple cores build-in our CPUs. In general, each process is an instance of a particular program — for example, our up-and-running Java process is a JVM program instance. Moreover, each process is rooted inside the OS and has its own unique set of resources, accesses, and behavior (the program code) — similar to our application users. Each process can have several threads (at least in most OS) that work together toward the completion of common tasks assigned by the process. What Is a Thread? It can be viewed as a branch of our code with a specific set of instructions that is executed in parallel to the work made by the rest of the application. Threads enable concurrent execution of multiple sequences of instructions within a single process. On the software level, we can differentiate two types of threads: Kernel (System) threads: Threads are managed by the OS directly. The operating system kernel performs thread creation, scheduling, and management. Application (User) threads: Threads are managed at the user level by a thread library or runtime environment, independent of the operating system. They are not visible to the operating system kernel. Thus, the operating system manages them as if they were single-threaded processes. Here, I will focus mostly on application threads. I will also mention CPU-related threads. These are the hardware threads, a trait of our CPUs. Its number describes the capabilities of our CPU to handle multiple threads simultaneously. In principle, threads can share resources in a much lighter fashion than the processes. All threads within the process have access to all the data owned by its parent process. Additionally, each thread can have its data known more commonly as thread-local variables (or, in the case of Java, newer and more recommended scoped values). What is more, switching between threads is much easier than in cases of processes. What Is a Thread Pool? A thread pool is a more concrete term than both thread and process. It is related to application threads and describes a set of such threads that we can use inside our application. It works based on a very simple behavior. We just take threads one by one from the pool until the pool is empty. That’s it. However, there is an additional assumption to this rule, in particular, that the threads will be returned to the pool once their task is completed. Of course, applications may have more than one thread pool, and in fact, the more specialized our thread pool is, the better for us. With such an approach, we can limit contention within the application and remove single points of failure. The industry standard nowadays is at least to have a separate thread pool for database connection. Threads, ThreadPools, and Java In older versions of Java — before Java 21 — all threads used inside the application were bound to CPU threads. Thus, they were quite expensive and heavy. If by accident (or intent), you will spawn too many threads in your Java application; for example, via calling the “new Thread()”. Then you can very quickly run out of resources, and the performance of your application will decrease rapidly — as among others, the CPU needs to do a lot of context switching. Project Loom, the part of the Java 21 release, aimed to address this issue by adding virtual threads — the threads that are not bound to CPU threads, so-called green threads — to the Java standard library. If you would like to know more about Loom and the changes it brings to Java threads I recommend this article. In Java, the concept of the thread pools is implemented by a ThreadPoolExecutor — a class that represents a thread pool of finite size with the upper bound described by the maximumPoolSize parameter of the class constructor. As a side note, I would like to add that this executor is used further in more complex executors as an internal thread pool. Executor, ExecutorService, and Executors Before we move to describe a more complex implementation of executor interfaces that utilize a ThreadPoolExecutor, there is one more question I would like to answer: namely, what are the Executor and ExecutorService themselves? Executor The Executor is an interface exposing only one method executed with the following signature: void execute(Runnable command). The interface is designed to describe a very simple operation - to be exact, that the class implementing it can do such an operation: execute a provided runnable. The interface's purpose is to provide a way of decoupling task submission from the mechanics of how the task will be run. ExecutorService The ExecutorService is yet another interface, the extension of the Executor interface. It has a much more powerful contract than the Executor. With around 13 methods to override, if we decide to implement it, its main aim is to help with the management and running of asynchronous tasks via wrapping such tasks in Java Future. Additionally, the ExecutorService extends the Autocloseable interface. This allows us to use ExecutorService in try-with-resource syntax and close the resource in an ordered fashion. Executors Executors class, on the other hand, is a type of util class. It is a recommended way of spawning new instances of the executors — using the new keyword is not recommended for most of the executors. What is more, it provides the methods for creating variations of callable instances, for example, callable with static return value. With these three basic concepts described, we can move to different executor service implementations. The Executor Services As for now, the Java standard library supports 4 main implementations of the ExecutorService interface. Each one provides a set of more or less unique features. They go as follows: ThreadPoolExecutor ForkJoinPool ScheduledThreadPoolExecutor ThreadPerTaskExecutor Additionally, there are three private statics implementations in the Executors class which implement ExecutorService: DelegatedExecutorService DelegatedScheduledExecutorService AutoShutdownDelegatedExecutorService Overall, the dependency graph between the classes looks more or less like this: ThreadPoolExecutor As I said before, it is an implementation of the thread pool concept in Java. This executor represents a bounded thread pool with a dynamic number of threads. What it exactly means is that TheadPoolExecutor will use a finite number of threads, but the number of used threads will never be higher than specified on pool creation. To achieve that, ThreadPoolExecutor uses two variables: corePoolSize and maximumPoolSize. The first one — corePoolSize — describes the minimal number of threads in the pool, so even if the threads are idle, the pool will keep them alive. On the other hand, the second one — maximumPoolSize — describes, as you probably guessed by now, the maximum number of threads owned by the pool. This is our upper bound of threads inside the pool. The pool will never have more threads than the value of this parameter. Additionally, ThreadPoolExecutor uses BlockingQueue underneath to keep track of incoming tasks. ThreadPoolExecutor Behavior By default, if the current number of current up and running threads is smaller than corePoolSize, the calling of execute method will result in spawning a new thread with an incoming task as a thread’s first work to do — even if there are idle threads in the pool at the moment. If, for some reason, the pool is unable to add a new thread, the pool will move to behavior two. If the number of running threads is higher or equal to the corePoolSize or the pool was unable to spawn a new thread, the calling of the execute method will result in an attempt to add a new task to the queue: isRunning(c) && workQueue.offer(command). If the pool is still running, we add a new thread to the pool without any task first — the only case is when we are spawning a new thread without any task: addWorker(null, false);. On the contrary, if the pool is not running, we remove the new command from the queue: !isRunning(recheck) && remove(command) Then the pool rejects the command with RejectedExecutionException: reject(command);. If for some reason we cannot add a task to the queue, the pool is trying to start the new thread with a task as its first task: else if (!addWorker(command, false)). If it fails to do it, the task is rejected, and the RejectedExecutionException is thrown with a message with a similar message. Task X rejected from java.util.concurrent.ThreadPoolExecutor@3a71f4dd[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] Above, you can see a very, very simplified visualization of ThreadPoolExecutor’s internal state. From here on out, you can expect two outcomes: either Task4 (or Task5) will be processed before submitting Task6 to the pool, or Task6 will be submitted to the pool before the end of Task4 (or Task5) processing. The first scenario is quite boring, as everything stays the same from the perspective of the executor, so I will only spend a little time on this. The second scenario is much more interesting as it will result in a change of executor state. Because the current number of running threads is smaller than corePoolSize, submitting Task6 to the executor will result in spawning a new worker for this task. The final state will look more or less like the one from below. ThreadPoolExecutor Pool Size Fixed-size pool: By setting corePoolSize and maximumPoolSize to the same value, you can essentially create a fixed-size thread pool, and the number of threads run by the pool will never go below or above the set value — at least not for long. Unbounded pool: By setting maximumPoolSize to a high enough value such as Integer.MAX_VALUE — you can make it practically unbounded. There is a practical limit of around 500 million ((2²⁹)-1) threads coming from ThreadPoolExecutor implementation. However, I bet that your machine will be down before reaching the max. If you would like to know more about the reasoning behind such a number being the limit, there is a very nice JavaDoc describing this. It is located just after the declaration of the ThreadPoolExecutor class. I will just drop a hint that it is related to the way how ThreadPoolExecutor holds its state. Spawning ThreadPoolExecutor Executors class gives you a total of 6 methods to spawn the ThreadPoolExecutor. I will describe them in the two packages, as that is how they are designed to work. public static ExecutorService newFixedThreadPool(int nThreads) public static ExecutorService newFixedThreadPool(int nThreads, ThreadFactory threadFactory) These methods create a fixed-size thread pool — size core and max are equal. Additionally, you can pass a threadFactory as an argument if you prefer not to use the default one from the standard library. Executors.newFixedThreadPool(2); Executors.newFixedThreadPool(2, Executors.defaultThreadFactory()); The batch of the next two methods: public static ExecutorService newCachedThreadPool() public static ExecutorService newCachedThreadPool(ThreadFactory threadFactory) The methods above create de facto unbounded thread pools by setting the maxPollSize to Integer.Max; the second version is similar to the FixThreadPool, allowing the passing of a customized ThreadFactory. Executors.newCachedThreadPool(); Executors.newCachedThreadPool(Executors.defaultThreadFactory()); And the last two methods: public static ExecutorService newSingleThreadExecutor() public static ExecutorService newSingleThreadExecutor(ThreadFactory threadFactory) Both methods create a thread pool that uses a single thread. What is more, methods spawn a ThreadPoolExecutors that is wrapped in AutoShutdownDelegatedExecutorService. It only exposes methods of ExecutorService: no ThreadPoolExecutor specific methods are available. Moreover, the shutdown method is overridden with the help of Cleanable and is called when it becomes Phantom reachable. ThreadPoolExecutor Adding New Tasks The default way of adding new tasks to be executed by a ThreadPoolExecutor is to use one of the versions of the submit method. On the other hand, one may also use the execute method from the Executor interface directly. However, it is not the recommended way. Using the execute method will return void, not the future: it means that you would have less control over its execution, so the choice is yours. Of course, both approaches will trigger all the logic described above with the thread pool. Runnable task = () -> System.out.print("test");Future<?> submit = executorService.submit(task); vs executorService.execute(task); ForkJoinPool ForkJoinPool is a totally separate ExecutorService implementation whose main selling point is the concept of work stealing. Work stealing is quite a complex concept worthy of a solely focused blog post. However, it can be reasonably simply described with high enough abstractions — all threads within the pool try to execute any task submitted to the pool, no matter its original owner. That is why the concept is called work stealing, as the threads “steal” each other’s work. In theory, such an approach should result in noticeable performance gains, especially if submitted tasks are small or spawn other subtasks. In the near future, I am planning to publish a separate post solely focused on work-stealing and ForkJoinFramework. Until then, you can read more about work stealing here. ForkJoinPool Behavior This executor is a part of the Java Fork/Join framework introduced in Java 7, a set of classes aimed to add more efficient parallel processing of tasks by utilizing the concept of work stealing. Currently, the framework is extensively used in the context of CompletableFuture and Streams. If you want to get the most out of the ForkJoinPool, I would recommend getting familiar with the Fork/Join framework as a whole. Then, try to switch your approach to handling such tasks to one that is more in line with Fork/Join requirements. Before going head-on into the Fork/Join framework, please do benchmarks and performance tests, as potential gains may not be as good as expected. Additionally, there is a table in ForkJoinPool Java docs describing the method one should use for interacting with ForkJoinPool to get the best results. The crucial parameter in the case of ForkJoinPool is parallelism — it describes the number of worker threads the pool will be using. By default, it is equal to the number of available processors on your CPU. In most cases, it is at least a sufficient setup, and I would recommend not changing it without proper performance tests. Keep in mind that the Java threads are CPU-bound, and we can quickly run out of processing power to progress in our task. Spawning ForkJoinPool Instances The Executors class provides two methods for spawning instances of ForkJoinPool: Executors.newWorkStealingPool(); Executors.newWorkStealingPool(2); The first one creates the forkJoinPoll with default parallelism — the number of available processors — while the second gives us the possibility to specify the parallelism level ourselves. What is more, Executors spawn ForkJoinPool with FIFO queue underneath while the default setting of ForkJoinPool itself (for example, via new ForkJoinPool(2)) is LIFO. Using LIFO with Executors is impossible. Despite that, you can change the type of underlain queue by using the asyncMode constructor parameter of the ForkJoinPool class. With the FIFO setting, the ForkJoinPool may be better suited for the tasks that are never joined — so for example, for callable or runnable usages. ScheduledThreadPoolExecutor ScheduledThreadPoolExecutor adds a layer over the classic ThreadPoolExecutor and allows for scheduling tasks. It supports three types of scheduling: Schedule once with a fixed delay - schedule Schedule with every number of time units - scheduleAtFixedRate Schedule with a fixed delay between execution - scheduleWithFixedDelay For sure, you can also use the “normal” ExecutorService API. Just remember that in this case, methods submit and execute are equal to calling the schedule method with 0 delay — instant execution of the provided task. As ScheduledThreadPoolExecutor extends ThreadPoolExecutor, some parts of its implementation are the same as in classic ThreadPoolExecutor. Nevertheless, it is using its own implementation of task ScheduledFutureTask and queue: DelayedWorkQueue. What is more, ScheduledThreadPoolExecutor always creates a fixed-size ThreadPoolExecutor as its underlying thread pool, so corePoolSize and MaxPoolSize are always equal. There is, however, a catch or two hidden inside the implementation of ScheduledThreadPoolExecutor. Firstly, if two tasks are scheduled to run (or end up scheduled to be run) at the same time, they are executed in FIFO style based on submission time. The next catch is the logical consequence of the first. There are no real guarantees that a particular task will execute at a certain point in time — as, for example, it may wait in a queue from the line above. Last but not least, if for some reason the execution of two tasks should end up overlapping each other, the thread pool guarantees that the execution of the first one will “happen before” the later one. Essentially, in FIFO fashion, even with respect to the tasks being called from different threads. Spawning ScheduledThreadPoolExecutor The Executors class gives us five ways to spawn the ScheduledThreadPoolExecutor. They are organized in a similar fashion to those of ThreadPoolExecutor. ScheduledThreadPoolExecutor with a fixed number of threads: Executors.newScheduledThreadPool(2); Executors.newScheduledThreadPool(2, Executors.defaultThreadFactory()); The first method allows us to create a ScheduledThreadPoolExecutor with a particular number of threads. The second method adds availability to pass the ThreadFactory of choice. ScheduledThreadPoolExecutor with a single thread: Executors.newSingleThreadScheduledExecutor(); Executors.newSingleThreadScheduledExecutor(Executors.defaultThreadFactory()); Create a SingleThreadScheduledExecutor whose underlying ThreadExecutorPool has only one thread. In fact, here we spawn the instance of DelegatedScheduledExecutorService, which uses ScheduledThreadPoolExecutor as a delegate, so in the end, the underlying ThreadExecutorPool of a delegate has only one thread. The last way to spawn the ScheduledThreadPoolExecutor is via using: Executors.unconfigurableScheduledExecutorService(new DummyScheduledExecutorServiceImpl()); This method allows you to wrap your own implementation of the ScheduledExecutorService interface with DelegatedScheduledExecutorService — one of the private static classes from the Executors class. This one only exposes the methods of the ScheduledExecutorService interface. To a degree, we can view it as an encapsulation helper. You can have multiple public methods within your implementation, but when you wrap it with the delegate, all of them will become hidden from the users. I am not exactly a fan of such an approach to encapsulation. It should be the problem of implementation. However, maybe I am missing some other important use cases of all of the delegates. ThreadPerTaskExecutor It is one of the newest additions to the Java standard library and Executors class. It is not a thread pool implementation, but rather a thread spawner. As the name suggests, each submitted task gets its own thread bound to its execution, the thread starts alongside the start of task processing. To achieve such behavior, this executor uses its own custom implementation of Future, namely, ThreadBoundFuture. The lifecycle of threads created by these executors looks more or less like this: The thread is created as soon as the Future is created. The thread starts working only after being programmatically started by the Executor. The thread is interrupted when the Future is interrupted. Thread is stopped on Future completion. What is more, if the Executor is not able to start a new thread for a particular task and no exception is thrown along the way, then the Executor will RejectedExecutionException. Furthermore, the ThreadPerTaskExecutor holds a set of up-threads. Every time the thread is started, it will be added to the set. Respectively, when the thread is stopped, it will be removed from the set. You can then use this set to keep track of how many threads the Executor is running at the given time via the threadCount() method. Spawning ThreadPerTaskExecutor The Executors class exposes two ways of spawning this Executor. I would say that one is more recommended than the other. Let’s start with the not-recommended one: Executors.newThreadPerTaskExecutor(Executors.defaultThreadFactory()); The above method spawns ThreadPerTaskExecutor with the provided thread factory. The reason why it is not recommended, at least by me, is that the instance ThreadPerTaskExecutor will operate on plain old Java CPU-bound threads. In such a case, if you put a high enough number of tasks through the Executor, you can very, very easily run out of processing power for your application. Certainly, nothing stands in your way of doing the following “trick” and use virtual threads anyway. Executors.newThreadPerTaskExecutor(Thread.ofVirtual().factory()); However, there is no reason to do that when you simply can use the following: Executors.newVirtualThreadPerTaskExecutor(); This instance of ThreadPerTaskExecutor will take full advantage of virtual threads from Java 21. Such a setting should also greatly increase the number of tasks that your Executor will be able to handle it before running out of processing power. Summary As you can see, Java provides a set of different executors, from classic ThreadPool implementation via ThreadPoolExecutor to more complex ones like ThreadPerTaskExecutor, which takes full advantage of virtual threads, a feature from Java 21. What is more, each Executor implementation has its unique trait: ThreadPoolExecutor: Classic ThreadPool implementation ForkJoinPool: Work-stealing ScheduledThreadPoolExecutor: Periodical scheduling of tasks ThreadPerTaskExecutor: Usage of virtual threads and the possibility to run a task in its separate short-lived thread Despite that difference, all of the executors have one defining trait: all of them expose an API that makes concurrent processing of multiple tasks much easier. I hope that this knowledge will become useful for you sometime in the future. Thank you for your time. Note: Thank you to Michał Grabowski and Krzysztof Atlasik for the review.
MongoDB is one of the most reliable and robust document-oriented NoSQL databases. It allows developers to provide feature-rich applications and services with various modern built-in functionalities, like machine learning, streaming, full-text search, etc. While not a classical relational database, MongoDB is nevertheless used by a wide range of different business sectors and its use cases cover all kinds of architecture scenarios and data types. Document-oriented databases are inherently different from traditional relational ones where data are stored in tables and a single entity might be spread across several such tables. In contrast, document databases store data in separate unrelated collections, which eliminates the intrinsic heaviness of the relational model. However, given that the real world's domain models are never so simplistic to consist of unrelated separate entities, document databases (including MongoDB) provide several ways to define multi-collection connections similar to the classical databases relationships, but much lighter, more economical, and more efficient. Quarkus, the "supersonic and subatomic" Java stack, is the new kid on the block that the most trendy and influential developers are desperately grabbing and fighting over. Its modern cloud-native facilities, as well as its contrivance (compliant with the best-of-the-breed standard libraries), together with its ability to build native executables have seduced Java developers, architects, engineers, and software designers for a couple of years. We cannot go into further details here of either MongoDB or Quarkus: the reader interested in learning more is invited to check the documentation on the official MongoDB website or Quarkus website. What we are trying to achieve here is to implement a relatively complex use case consisting of CRUDing a customer-order-product domain model using Quarkus and its MongoDB extension. In an attempt to provide a real-world inspired solution, we're trying to avoid simplistic and caricatural examples based on a zero-connections single-entity model (there are dozens nowadays). So, here we go! The Domain Model The diagram below shows our customer-order-product domain model: As you can see, the central document of the model is Order, stored in a dedicated collection named Orders. An Order is an aggregate of OrderItem documents, each of which points to its associated Product. An Order document also references the Customer who placed it. In Java, this is implemented as follows: Java @MongoEntity(database = "mdb", collection="Customers") public class Customer { @BsonId private Long id; private String firstName, lastName; private InternetAddress email; private Set<Address> addresses; ... } The code above shows a fragment of the Customer class. This is a POJO (Plain Old Java Object) annotated with the @MongoEntity annotation, which parameters define the database name and the collection name. The @BsonId annotation is used in order to configure the document's unique identifier. While the most common use case is to implement the document's identifier as an instance of the ObjectID class, this would introduce a useless tidal coupling between the MongoDB-specific classes and our document. The other properties are the customer's first and last name, the email address, and a set of postal addresses. Let's look now at the Order document. Java @MongoEntity(database = "mdb", collection="Orders") public class Order { @BsonId private Long id; private DBRef customer; private Address shippingAddress; private Address billingAddress; private Set<DBRef> orderItemSet = new HashSet<>() ... } Here we need to create an association between an order and the customer who placed it. We could have embedded the associated Customer document in our Order document, but this would have been a poor design because it would have redundantly defined the same object twice. We need to use a reference to the associated Customer document, and we do this using the DBRef class. The same thing happens for the set of the associated order items where, instead of embedding the documents, we use a set of references. The rest of our domain model is quite similar and based on the same normalization ideas; for example, the OrderItem document: Java @MongoEntity(database = "mdb", collection="OrderItems") public class OrderItem { @BsonId private Long id; private DBRef product; private BigDecimal price; private int amount; ... } We need to associate the product which makes the object of the current order item. Last but not least, we have the Product document: Java @MongoEntity(database = "mdb", collection="Products") public class Product { @BsonId private Long id; private String name, description; private BigDecimal price; private Map<String, String> attributes = new HashMap<>(); ... } That's pretty much all as far as our domain model is concerned. There are, however, some additional packages that we need to look at: serializers and codecs. In order to be able to be exchanged on the wire, all our objects, be they business or purely technical ones, have to be serialized and deserialized. These operations are the responsibility of specially designated components called serializers/deserializers. As we have seen, we're using the DBRef type in order to define the association between different collections. Like any other object, a DBRef instance should be able to be serialized/deserialized. The MongoDB driver provides serializers/deserializers for the majority of the data types supposed to be used in the most common cases. However, for some reason, it doesn't provide serializers/deserializers for the DBRef type. Hence, we need to implement our own one, and this is what the serializers package does. Let's look at these classes: Java public class DBRefSerializer extends StdSerializer<DBRef> { public DBRefSerializer() { this(null); } protected DBRefSerializer(Class<DBRef> dbrefClass) { super(dbrefClass); } @Override public void serialize(DBRef dbRef, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException { if (dbRef != null) { jsonGenerator.writeStartObject(); jsonGenerator.writeStringField("id", (String)dbRef.getId()); jsonGenerator.writeStringField("collectionName", dbRef.getCollectionName()); jsonGenerator.writeStringField("databaseName", dbRef.getDatabaseName()); jsonGenerator.writeEndObject(); } } } This is our DBRef serializer and, as you can see, it's a Jackson serializer. This is because the quarkus-mongodb-panache extension that we're using here relies on Jackson. Perhaps, in a future release, JSON-B will be used but, for now, we're stuck with Jackson. It extends the StdSerializer class as usual and serializes its associated DBRef object by using the JSON generator, passed as an input argument, to write on the output stream the DBRef components; i.e., the object ID, the collection name, and the database name. For more information concerning the DBRef structure, please see the MongoDB documentation. The deserializer is performing the complementary operation, as shown below: Java public class DBRefDeserializer extends StdDeserializer<DBRef> { public DBRefDeserializer() { this(null); } public DBRefDeserializer(Class<DBRef> dbrefClass) { super(dbrefClass); } @Override public DBRef deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException, JacksonException { JsonNode node = jsonParser.getCodec().readTree(jsonParser); return new DBRef(node.findValue("databaseName").asText(), node.findValue("collectionName").asText(), node.findValue("id").asText()); } } This is pretty much all that may be said as far as the serializers/deserializers are concerned. Let's move further to see what the codecs package brings to us. Java objects are stored in a MongoDB database using the BSON (Binary JSON) format. In order to store information, the MongoDB driver needs the ability to map Java objects to their associated BSON representation. It does that on behalf of the Codec interface, which contains the required abstract methods for the mapping of the Java objects to BSON and the other way around. Implementing this interface, one can define the conversion logic between Java and BSON, and conversely. The MongoDB driver includes the required Codec implementation for the most common types but again, for some reason, when it comes to DBRef, this implementation is only a dummy one, which raises UnsupportedOperationException. Having contacted the MongoDB driver implementers, I didn't succeed in finding any other solution than implementing my own Codec mapper, as shown by the class DocstoreDBRefCodec. For brevity reasons, we won't reproduce this class' source code here. Once our dedicated Codec is implemented, we need to register it with the MongoDB driver, such that it uses it when it comes to mapping DBRef types to Java objects and conversely. In order to do that, we need to implement the interface CoderProvider which, as shown by the class DocstoreDBRefCodecProvider, returns via its abstract get() method, the concrete class responsible for performing the mapping; i.e., in our case, DocstoreDBRefCodec. And that's all we need to do here as Quarkus will automatically discover and use our CodecProvider customized implementation. Please have a look at these classes to see and understand how things are done. The Data Repositories Quarkus Panache greatly simplifies the data persistence process by supporting both the active record and the repository design patterns. Here, we'll be using the second one. As opposed to similar persistence stacks, Panache relies on the compile-time bytecode enhancements of the entities. It includes an annotation processor that automatically performs these enhancements. All that this annotation processor needs in order to perform its enhancements job is an interface like the one below: Java @ApplicationScoped public class CustomerRepository implements PanacheMongoRepositoryBase<Customer, Long>{} The code above is all that you need in order to define a complete service able to persist Customer document instances. Your interface needs to extend the PanacheMongoRepositoryBase one and parameter it with your object ID type, in our case a Long. The Panache annotation processor will generate all the required endpoints required to perform the most common CRUD operations, including but not limited to saving, updating, deleting, querying, paging, sorting, transaction handling, etc. All these details are fully explained here. Another possibility is to extend PanacheMongoRepository instead of PanacheMongoRepositoryBase and to use the provided ObjectID keys instead of customizing them as Long, as we did in our example. Whether you chose the 1st or the 2nd alternative, this is only a preference matter. The REST API In order for our Panache-generated persistence service to become effective, we need to expose it through a REST API. In the most common case, we have to manually craft this API, together with its implementation, consisting of the full set of the required REST endpoints. This fastidious and repetitive operation might be avoided by using the quarkus-mongodb-rest-data-panache extension, which annotation processor is able to automatically generate the required REST endpoints, out of interfaces having the following pattern: Java public interface CustomerResource extends PanacheMongoRepositoryResource<CustomerRepository, Customer, Long> {} Believe it if you want: this is all you need to generate a full REST API implementation with all the endpoints required to invoke the persistence service generated previously by the mongodb-panache extension annotation processor. Now we are ready to build our REST API as a Quarkus microservice. We chose to build this microservice as a Docker image, on behalf of the quarkus-container-image-jib extension. By simply including the following Maven dependency: XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-container-image-jib</artifactId> </dependency> The quarkus-maven-plugin will create a local Docker image to run our microservice. The parameters of this Docker image are defined by the application.properties file, as follows: Properties files quarkus.container-image.build=true quarkus.container-image.group=quarkus-nosql-tests quarkus.container-image.name=docstore-mongodb quarkus.mongodb.connection-string = mongodb://admin:admin@mongo:27017 quarkus.mongodb.database = mdb quarkus.swagger-ui.always-include=true quarkus.jib.jvm-entrypoint=/opt/jboss/container/java/run/run-java.sh Here we define the name of the newly created Docker image as being quarkus-nosql-tests/docstore-mongodb. This is the concatenation of the parameters quarkus.container-image.group and quarkus.container-image.name separated by a "/". The property quarkus.container-image.build having the value true instructs the Quarkus plugin to bind the build operation to the package phase of maven. This way, simply executing a mvn package command, we generate a Docker image able to run our microservice. This may be tested by running the docker images command. The property named quarkus.jib.jvm-entrypoint defines the command to be run by the newly generated Docker image. quarkus-run.jar is the Quarkus microservice standard startup file used when the base image is ubi8/openjdk-17-runtime, as in our case. Other properties are quarkus.mongodb.connection-string and quarkus.mongodb.database = mdb which define the MongoDB database connection string and the name of the database. Last but not least, the property quarkus.swagger-ui.always-include includes the Swagger UI interface in our microservice space such that it allows us to test it easily. Let's see now how to run and test the whole thing. Running and Testing Our Microservices Now that we looked at the details of our implementation, let's see how to run and test it. We chose to do it on behalf of the docker-compose utility. Here is the associated docker-compose.yml file: YAML version: "3.7" services: mongo: image: mongo environment: MONGO_INITDB_ROOT_USERNAME: admin MONGO_INITDB_ROOT_PASSWORD: admin MONGO_INITDB_DATABASE: mdb hostname: mongo container_name: mongo ports: - "27017:27017" volumes: - ./mongo-init/:/docker-entrypoint-initdb.d/:ro mongo-express: image: mongo-express depends_on: - mongo hostname: mongo-express container_name: mongo-express links: - mongo:mongo ports: - 8081:8081 environment: ME_CONFIG_MONGODB_ADMINUSERNAME: admin ME_CONFIG_MONGODB_ADMINPASSWORD: admin ME_CONFIG_MONGODB_URL: mongodb://admin:admin@mongo:27017/ docstore: image: quarkus-nosql-tests/docstore-mongodb:1.0-SNAPSHOT depends_on: - mongo - mongo-express hostname: docstore container_name: docstore links: - mongo:mongo - mongo-express:mongo-express ports: - "8080:8080" - "5005:5005" environment: JAVA_DEBUG: "true" JAVA_APP_DIR: /home/jboss JAVA_APP_JAR: quarkus-run.jar This file instructs the docker-compose utility to run three services: A service named mongo running the Mongo DB 7 database A service named mongo-express running the MongoDB administrative UI A service named docstore running our Quarkus microservice We should note that the mongo service uses an initialization script mounted on the docker-entrypoint-initdb.d directory of the container. This initialization script creates the MongoDB database named mdb such that it could be used by the microservices. JavaScript db = db.getSiblingDB(process.env.MONGO_INITDB_ROOT_USERNAME); db.auth( process.env.MONGO_INITDB_ROOT_USERNAME, process.env.MONGO_INITDB_ROOT_PASSWORD, ); db = db.getSiblingDB(process.env.MONGO_INITDB_DATABASE); db.createUser( { user: "nicolas", pwd: "password1", roles: [ { role: "dbOwner", db: "mdb" }] }); db.createCollection("Customers"); db.createCollection("Products"); db.createCollection("Orders"); db.createCollection("OrderItems"); This is an initialization JavaScript that creates a user named nicolas and a new database named mdb. The user has administrative privileges on the database. Four new collections, respectively named Customers, Products, Orders and OrderItems, are created as well. In order to test the microservices, proceed as follows: Clone the associated GitHub repository: $ git clone https://github.com/nicolasduminil/docstore.git Go to the project: $ cd docstore Build the project: $ mvn clean install Check that all the required Docker containers are running: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7882102d404d quarkus-nosql-tests/docstore-mongodb:1.0-SNAPSHOT "/opt/jboss/containe…" 8 seconds ago Up 6 seconds 0.0.0.0:5005->5005/tcp, :::5005->5005/tcp, 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 8443/tcp docstore 786fa4fd39d6 mongo-express "/sbin/tini -- /dock…" 8 seconds ago Up 7 seconds 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp mongo-express 2e850e3233dd mongo "docker-entrypoint.s…" 9 seconds ago Up 7 seconds 0.0.0.0:27017->27017/tcp, :::27017->27017/tcp mongo Run the integration tests: $ mvn -DskipTests=false failsafe:integration-test This last command will run all the integration tests which should succeed. These integration tests are implemented using the RESTassured library. The listing below shows one of these integration tests located in the docstore-domain project: Java @QuarkusIntegrationTest @TestMethodOrder(MethodOrderer.OrderAnnotation.class) public class CustomerResourceIT { private static Customer customer; @BeforeAll public static void beforeAll() throws AddressException { customer = new Customer("John", "Doe", new InternetAddress("john.doe@gmail.com")); customer.addAddress(new Address("Gebhard-Gerber-Allee 8", "Kornwestheim", "Germany")); customer.setId(10L); } @Test @Order(10) public void testCreateCustomerShouldSucceed() { given() .header("Content-type", "application/json") .and().body(customer) .when().post("/customer") .then() .statusCode(HttpStatus.SC_CREATED); } @Test @Order(20) public void testGetCustomerShouldSucceed() { assertThat (given() .header("Content-type", "application/json") .when().get("/customer") .then() .statusCode(HttpStatus.SC_OK) .extract().body().jsonPath().getString("firstName[0]")).isEqualTo("John"); } @Test @Order(30) public void testUpdateCustomerShouldSucceed() { customer.setFirstName("Jane"); given() .header("Content-type", "application/json") .and().body(customer) .when().pathParam("id", customer.getId()).put("/customer/{id}") .then() .statusCode(HttpStatus.SC_NO_CONTENT); } @Test @Order(40) public void testGetSingleCustomerShouldSucceed() { assertThat (given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).get("/customer/{id}") .then() .statusCode(HttpStatus.SC_OK) .extract().body().jsonPath().getString("firstName")).isEqualTo("Jane"); } @Test @Order(50) public void testDeleteCustomerShouldSucceed() { given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).delete("/customer/{id}") .then() .statusCode(HttpStatus.SC_NO_CONTENT); } @Test @Order(60) public void testGetSingleCustomerShouldFail() { given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).get("/customer/{id}") .then() .statusCode(HttpStatus.SC_NOT_FOUND); } } You can also use the Swagger UI interface for testing purposes by firing your preferred browser at http://localhost:8080/q:swagger-ui. Then, in order to test endpoints, you can use the payloads in the JSON files located in the src/resources/data directory of the docstore-api project. You also can use the MongoDB UI administrative interface by going to http://localhost:8081 and authenticating yourself with the default credentials (admin/pass). You can find the project source code in my GitHub repository. Enjoy!
Have you ever wished for a coding assistant who could help you write code faster, reduce errors, and improve your overall productivity? In this article, I'll share my journey and experiences with GitHub Copilot, a coding companion, and how it has boosted productivity. The article is specifically focused on IntelliJ IDE which we use for building Java Spring-based microservices. Six months ago, I embarked on a journey to explore GitHub Copilot, an AI-powered coding assistant, while working on Java Spring Microservices projects in IntelliJ IDEA. At first, my experience was not so good. I found the suggestions it provided to be inappropriate, and it seemed to hinder rather than help development work. But I decided to persist with the tool, and today, reaping some of the benefits, there is a lot of scope for improvement. Common Patterns Let's dive into some scenarios where GitHub Copilot has played a vital role. Exception Handling Consider the following method: Java private boolean isLoanEligibleForPurchaseBasedOnAllocation(LoanInfo loanInfo, PartnerBank partnerBank){ boolean result = false; try { if (loanInfo != null && loanInfo.getFico() != null) { Integer fico = loanInfo.getFico(); // Removed Further code for brevity } else { logger.error("ConfirmFundingServiceImpl::isLoanEligibleForPurchaseBasedOnAllocation - Loan info is null or FICO is null"); } } catch (Exception ex) { logger.error("ConfirmFundingServiceImpl::isLoanEligibleForPurchaseBasedOnAllocation - An error occurred while checking loan eligibility for purchase based on allocation, detail error:", ex); } return result; } Initially, without GitHub Copilot, we would have to manually add the exception handling code. However, with Copilot, as soon as we added the try block and started adding catch blocks, it automatically suggested the logger message and generated the entire catch block. None of the content in the catch block was typed manually. Additionally, other logger.error in the else part is prefilled automatically by Co-Pilot as soon as we started typing in logger.error. Mocks for Unit Tests In unit testing, we often need to create mock objects. Consider the scenario where we need to create a list of PartnerBankFundingAllocation objects: Java List<PartnerBankFundingAllocation> partnerBankFundingAllocations = new ArrayList<>(); when(this.fundAllocationRepository.getPartnerBankFundingAllocation(partnerBankObra.getBankId(), "Fico")).thenReturn(partnerBankFundingAllocations); If we create a single object and push it to the list: Java PartnerBankFundingAllocation partnerBankFundingAllocation = new PartnerBankFundingAllocation(); partnerBankFundingAllocation.setBankId(9); partnerBankFundingAllocation.setScoreName("Fico"); partnerBankFundingAllocation.setScoreMin(680); partnerBankFundingAllocation.setScoreMax(1000); partnerBankFundingAllocations.add(partnerBankFundingAllocation); GitHub Copilot automatically suggests code for the remaining objects. We just need to keep hitting enter and adjust values if the suggestions are inappropriate. Java PartnerBankFundingAllocation partnerBankFundingAllocation2 = new PartnerBankFundingAllocation(); partnerBankFundingAllocation2.setBankId(9); partnerBankFundingAllocation2.setScoreName("Fico"); partnerBankFundingAllocation2.setScoreMin(660); partnerBankFundingAllocation2.setScoreMax(679); partnerBankFundingAllocations.add(partnerBankFundingAllocation2); Logging/Debug Statements GitHub Copilot also excels in helping with logging and debugging statements. Consider the following code snippet: Java if (percentage < allocationPercentage){ result = true; logger.info("ConfirmFundingServiceImpl::isLoanEligibleForPurchaseBasedOnAllocation - Loan is eligible for purchase"); } else{ logger.info("ConfirmFundingServiceImpl::isLoanEligibleForPurchaseBasedOnAllocation - Loan is not eligible for purchase"); } In this example, all the logger information statements are auto-generated by GitHub Copilot. It takes into account the context of the code condition and suggests relevant log messages. Code Commenting It helps in adding comments at the top of the method. In the code snippet below, the comment above the method is generated by the Copilot. We just need to start typing in // This method. Java // THis method is used to get the loan program based on the product sub type public static String getLoanProgram(List<Product> products, Integer selectedProductId) { String loanProgram = ""; if (products != null && products.size() > 0) { Product product = products.stream().filter(p -> p.getProductId().equals(selectedProductId)).findFirst().orElse(null); if (product != null) { String productSubType = product.getProductSubType(); switch (productSubType) { case "STANDARD": loanProgram = "Standard"; break; case "PROMO": loanProgram = "Promo"; break; default: loanProgram = "NA"; break; } } } return loanProgram; } Alternatively, we can use a prompt like // Q : What is this method doing?. Copilot will add the second line, // A : This method is used to log the payload for the given api name. Java // Q : What is this method doing? // A : This method is used to log the payload for the given api name public static void logPayload(String apiName, Object payload) { try { if (payload != null && apiName != null && apiName.trim().length() > 0) { ObjectMapper mapper = new ObjectMapper(); String payloadResponse = mapper.writeValueAsString(payload); logger.info("UnderwritingUtility::logPayload - For api : " + apiName + ", payload : " + payloadResponse); } else { logger.error("UnderwritingUtility::logPayload - Either object was null of api name was null or empty"); } } catch (Exception ex) { logger.error("UnderwritingUtility::logPayload - An error occurred while logging the payload, detail error : ", ex); } } Another example of a different method we type in a prompt: // Q : What is this method doing?. Copilot will add the second line, // A : This method is used to validate the locale from request, if locale is not valid then set the default locale. Java //Q - Whats below method doing? //A - This method is used to validate the locale from request, if locale is not valid then set the default locale public static boolean isLocaleValid(LoanQuoteRequest loanQuoteRequest){ boolean result = false; try{ if (org.springframework.util.StringUtils.hasText(loanQuoteRequest.getLocale())){ String localeStr = loanQuoteRequest.getLocale(); logger.info("UnderwritingUtility::validateLocale - Locale from request : " + localeStr); Locale locale = new Locale.Builder().setLanguageTag(localeStr).build(); // Get the language part String language = locale.getLanguage(); if (language.equalsIgnoreCase("en")){ result = true; if (!localeStr.equalsIgnoreCase(UwConstants.DEFAULT_LOCALE_CODE)){ loanQuoteRequest.setLocale(UwConstants.DEFAULT_LOCALE_CODE); } } else if (language.equalsIgnoreCase("es")){ result = true; if (!localeStr.equalsIgnoreCase(UwConstants.SPANISH_LOCALE_CODE)){ loanQuoteRequest.setLocale(UwConstants.SPANISH_LOCALE_CODE); } } } else{ result = true; loanQuoteRequest.setLocale(UwConstants.DEFAULT_LOCALE_CODE); } } catch (Exception ex){ logger.error("UnderwritingUtility::validateLocale - An error occurred, detail error : ", ex); } return result; } Closing Thoughts The benefits of using GitHub Copilot in IntelliJ for Java Spring Microservices development are significant. It saves time, reduces errors, and allows us to focus on core business logic instead of writing repetitive code. As we embark on our coding journey with GitHub Copilot, here are a few tips: Be patient and give it some time to learn and identify common coding patterns that we follow. Keep an eye on the suggestions and adjust them as needed. Sometimes, it hallucinates. Experiment with different scenarios to harness the full power of Copilot. Stay updated with Copilot's improvements and updates to make the most of this cutting-edge tool. We can use this in combination with ChatGPT. Here is an article on how it can help boost our development productivity. Happy coding with GitHub Copilot!
The Spring AI is a new project of the Spring ecosystem that streamlines the creation of AI applications in Java. By using Spring AI together with PostgreSQL pgvector, you can build generative AI applications that draw insights from your data. First, this article introduces you to the Spring AI ChatClient that uses the OpenAI GPT-4 model to generate recommendations based on user prompts. Next, the article shows how to deploy PostgreSQL with the PGVector extension and perform vector similarity searches using the Spring AI EmbeddingClient and Spring JdbcClient. Adding Spring AI Dependency Spring AI supports many large language model (LLM) providers, with each LLM having its own Spring AI dependency. Let's assume that you prefer working with OpenAI models and APIs. Then, you need to add the following dependency to a project: XML <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> <version>{latest.version}</version> </dependency> Also, at the time of writing, Spring AI was in active development, with the framework artifacts being released in the Spring Milestone and/or Snapshot repositories. Thus, if you still can't find Spring AI on https://start.spring.io/, then add the repositories to the pom.xml file: XML <repositories> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshots</id> <name>Spring Snapshots</name> <url>https://repo.spring.io/snapshot</url> <releases> <enabled>false</enabled> </releases> </repository> </repositories> Setting Up OpenAI Module The OpenAI module comes with several configuration properties, allowing the management of connectivity-related settings and fine-tuning the behavior of OpenAI models. At a minimum, you need to provide your OpenAI API key, which will be used by Spring AI to access GPT and embedding models. Once the key is created, add it to the application.properties file: Properties files spring.ai.openai.api-key=sk-... Then, if necessary, you can select particular GPT and embedding models: Properties files spring.ai.openai.chat.model=gpt-4 spring.ai.openai.embedding.model=text-embedding-ada-002 In the end, you can test that the OpenAI module is configured properly by implementing a simple assistant with Spring AI's ChatClient: Java // Inject the ChatClient bean @Autowired private ChatClient aiClient; // Create a system message for ChatGPT explaining the task private static final SystemMessage SYSTEM_MESSAGE = new SystemMessage( """ You're an assistant who helps to find lodging in San Francisco. Suggest three options. Send back a JSON object in the format below. [{\"name\": \"<hotel name>\", \"description\": \"<hotel description>\", \"price\": <hotel price>}] Don't add any other text to the response. Don't add the new line or any other symbols to the response. Send back the raw JSON. """); public void searchPlaces(String prompt) { // Create a Spring AI prompt with the system message and the user message Prompt chatPrompt = new Prompt(List.of(SYSTEM_MESSAGE, new UserMessage(prompt))); // Send the prompt to ChatGPT and get the response ChatResponse response = aiClient.generate(chatPrompt); // Get the raw JSON from the response and print it String rawJson = response.getGenerations().get(0).getContent(); System.out.println(rawJson); } For the sake of the experiment, if you pass the "I'd like to stay near the Golden Gate Bridge" prompt, then the searchPlaces the method might provide lodging recommendations as follows: JSON [ {"name": "Cavallo Point", "description": "Historic hotel offering refined rooms, some with views of the Golden Gate Bridge, plus a spa & dining.", "price": 450}, {"name": "Argonaut Hotel", "description": "Upscale, nautical-themed hotel offering Golden Gate Bridge views, plus a seafood restaurant.", "price": 300}, {"name": "Hotel Del Sol", "description": "Colorful, retro hotel with a pool, offering complimentary breakfast & an afternoon cookies reception.", "price": 200} ] Starting Postgres With PGVector If you run the previous code snippet with the ChatClient, you'll notice that it usually takes over 10 seconds for the OpenAI GPT model to generate a response. The model has a broad and deep knowledge base, and it takes time to produce a relevant response. Apart from the high latency, the GPT model might not have been trained on data that is relevant to your application workload. Thus, it might generate responses that are far from being satisfactory for the user. However, you can always expedite the search and provide users with accurate responses if you generate embeddings on a subset of your data and then let Postgres work with those embeddings. The pgvector extension allows storing and querying vector embeddings in Postgres. The easiest way to start with PGVector is by starting a Postgres instance with the extension in Docker: Shell mkdir ~/postgres-volume/ docker run --name postgres \ -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \ -p 5432:5432 \ -v ~/postgres-volume/:/var/lib/postgresql/data -d ankane/pgvector:latest Once started, you can connect to the container and enable the extension by executing the CREATE EXTENSION vector statement: Shell docker exec -it postgres psql -U postgres -c 'CREATE EXTENSION vector' Lastly, add the Postgres JDBC driver dependency to the pom.xml file: XML <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <version>{latest.version}</version> </dependency> Configure the Spring DataSource by adding the following settings to the application.properties file: Properties files spring.datasource.url = jdbc:postgresql://127.0.0.1:5432/postgres spring.datasource.username = postgres spring.datasource.password = password Performing Vector Similarity Search With Spring AI At a minimum, the vector similarity search is a two-step process. First, you need to use an embedding model to generate a vector/embedding for a provided user prompt or other text. Spring AI supports the EmbeddingClient that connects to OpenAI's or other providers' embedding models and generates a vectorized representation for the text input: Java // Inject the Spring AI Embedding client @Autowired private EmbeddingClient aiClient; public List<Place> searchPlaces(String prompt) { // Use the Embedding client to generate a vector for the user prompt List<Double> promptEmbedding = aiClient.embed(prompt); ... } Second, you use the generated embedding to perform a similarity search across vectors stored in the Postgres database. For instance, you can use the Spring JdbcClient for this task: Java @Autowired private JdbcClient jdbcClient; // Inject the Spring AI Embedding client @Autowired private EmbeddingClient aiClient; public List<Place> searchPlaces(String prompt) { // Use the Embedding client to generate a vector for the user prompt List<Double> promptEmbedding = aiClient.embed(prompt); // Perform the vector similarity search StatementSpec query = jdbcClient.sql( "SELECT name, description, price " + "FROM airbnb_listing WHERE 1 - (description_embedding <=> :user_promt::vector) > 0.7 " + "ORDER BY description_embedding <=> :user_promt::vector LIMIT 3") .param("user_promt", promptEmbedding.toString()); // Return the recommended places return query.query(Place.class).list(); } The description_embedding column stores embeddings that were pre-generated for Airbnb listing overviews from the description column. The Airbnb embeddings were produced by the same model that is used by Spring AI's EmbeddingClient for the user prompts. Postgres uses PGVector to calculate the cosine distance (<=>) between the Airbnb and user prompt embeddings (description_embedding <=> :user_prompt::vector) and then returns only those Airbnb listings whose description is > 0.7 similar to the provided user prompt. The similarity is measured as a value in the range from 0 to 1. The closer the similarity to 1, the more related the vectors are. What's Next Spring AI and PostgreSQL PGVector provide all the essential capabilities needed for building generative AI applications in Java. If you're curious to learn more, watch this hands-on tutorial. It guides you through the process of creating a lodging recommendation service in Java from scratch, optimizing similarity searches with specialized indexes, and scaling with distributed Postgres (YugabyteDB):
Over a decade has passed since Robert C. Martin posted an article about Clean Architecture on the Clean Coder Blog (in 2012). Later (in 2017) it was followed up by an entire book about the same topic. This tutorial is not aimed to argue pro or contra the concept but is written to present some practical aspects on how to implement the persistence layer followed by the idea of Clean Architecture. One aspect discussed in this topic is that the business core (which is normally referred to as being "inner") must not depend on any technical or environmental details (referred to as being "outer"). The motivation behind this rule is to express importance: in other words, less important parts of the code shall depend on more important, but not the other way around. In this regard, business logic is important (as that is the point why the given software exists); how it is deployed and how data is stored is less important. This has several implications. For example, one implication is that none of the business classes shall be annotated by framework-related annotations; in practice, no Spring annotations on classes with business logic, no JPA annotations on entities that these classes are working on, etc. This is fairly easy to comply with in some aspects. For example, it is completely natural to allow a REST controller to depend on a business class and disallow the other way (and this is not even specific to Clean Architecture, most of the layered architectures also contain this rule). The situation is heavily different in other aspects, such as persistence: unlike layered architecture models, which would allow business classes to depend on "lower" layers, Clean Architecture clearly does not allow business classes to depend on persistence classes. A quick description of how it still works is that instead of persistence offering the business its capabilities, the business shall define what it expects from its persistence layer. Practically speaking, business defines a Java interface with all the business-relevant persistence methods, and regardless of what kind of persistence engine will actually handle data, an adapter is going to bridge business needs and technical capabilities. Setting Up Our Domain and Expectations In order to provide practical hints, let's define a simple business domain. In this tutorial we are going to implement a small library: it can handle books (identified by ISBN, holding basic attributes like title and number of pages), readers (identified by name, holding address and telephone number), and relations between them (in order to see which person read which book). Before implementing the actual logic, we can already define what is the expected behavior of the application. The logic shall be able to: Persist new business entities (book, person) Provide existing entities Register new borrowing events (add the book to the person's book list and add the person to the book reader's list) Provide a list of all the book titles accompanied by the lengths of the books We can define it not only in human languages but in the form of unit tests, too - which I actually did, on one hand in general to advocate for TDD, and on the other hand I find it particularly useful to ensure, that different variations have the same capabilities. You can find the tests - as well as the complete code examples - on GitHub. The interface that our application must fulfill is then: Java public interface LibraryService { void createNewBook(final String isbn, final String title, final int numberOfPages); Object findBookByIsbn(String isbn); void createNewReader(String name, String address, String phoneNumber); Object findReaderByName(String name); void borrowBook(String nameOfPerson, String isbnOfBook); List<String> listAllBooksWithLengths(); } Variation 1: Persistence Manages Its Own, Business-Independent Structures This might seem the most straightforward way: business core defines its data model and in its persistence interface (which belongs to the business core!) describes commands and queries on them, such as: Java public class Book { private String title; private String isbn; private int numberOfPages; private List<Person> readers = new ArrayList<>(); } public class Person { private String name; private String address; private String phoneNumber; private List<Book> booksRead = new ArrayList<>(); } public interface LibraryPersistenceProvider { void saveBook(Book book); void savePerson(Person person); Person findPersonByName(String name); Book findBookByIsbn(String isbn); List<Book> getBooks(); } In this variation, the persistence code uses business classes only to read attributes while persisting and set attributes while loading. This also allows persistence to use any kind of data representation, for example in this tutorial everything is persisted into Maps. This is quite convenient for several purposes but also might bring several trade-offs: even though it is such a minimal domain, it is easy to see that if books and people are building a fully connected graph, loading any of the entities causes the entire data model to be loaded. This leads to the motivation of the next variation. Variation 1B: Persistence Uses the Business Classes and Defines Its Own Subclasses The most straightforward solution for the problem mentioned above is to let persistence fill the list of books read and the list of readers of a book with proxy objects, which would only load actual contents when they are touched - in other words, to implement proxy-based lazy loading. This requires persistence to extend the business classes. Note, that the Java class hierarchy of Variation 1 contains only business-relevant classes, but this variation contains business-relevant classes and persistence-relevant classes, too. Also, while in Variation 1, business always works with its own classes, in this variation business classes might work with instances of persistence classes. This might lead to some unexpected problems. Imagine that at some point, no lazy-loading is implemented, but a new business rule that allows only users with the role "admin" to list who read a given book is being implemented. Let's assume that to ensure that this always applies, a guardian is added to the getter method; for example: Java List<Person> getReaders() { if (!currentUserHasAdminRights()) { throw new AdminRightsNeededException(); } return this.readers; } If later on, lazy-loading is added without examining the original getter of the business class, the persistence extension of the class might simply implement the getter as follows: Java List<Person> getReaders() { if (!readersAlreadyLoaded()) { loadReadersFromDB(); } return this.readers; } This means that the business guardian is erased by the persistence code (which changes the business behavior of the system), but in the meantime, it is likely that all the unit tests will pass, as they are written against the original business class! Also, note, that implementing lazy-loading this way is sort of a guess by persistence that business might not need some fields (or at least might not need them immediately). The motivation arises to let business tell persistence which fields are needed for a given operation. This leads to our next variant. Variation 2: Define Only Interfaces Let's consider one of the fundamental ideas behind Clean Architecture again: business shall define what is expected from persistence and this set of expectations might contain attribute-level details, too. In our example above, to list all book titles with their lengths, the object that persistence provides to business does not even have to contain an ISBN or a list of readers. This can be grasped by the usage of interfaces, such as: Java public interface LibraryPersistenceProvider { <T extends HasTitle & HasNumberOfPages> List<T> getBooks(); } In this construct, the business entities might be actual classes that simply implement all the HasXXX interfaces, but it is also possible for the business to define its business model purely by using interfaces. For example: Java public interface HasTitle { String getTitle(); } public interface HasIsbn { String getIsbn(); } public interface HasReaders { List<HasName> getReaders(); } public interface HasNumberOfPages { int getNumberOfPages(); } public interface Book extends HasTitle, HasIsbn, HasReaders, HasNumberOfPages { } At first sight, this feels more than odd: the business clearly does not know what actual instances it uses in its internal flows. On the other hand (unless every class of the domain model is final), this statement is also true for the previous variations, too. It was just hidden behind the generic fact that classes can be extended. Let's see what are the main benefits of this way: It is clear to the business that it uses instances of foreign parts of the code, and it shall not assume anything that is not explicitly described in an interface (for example, do not assume that a getter has a security guard, too). The business can define exactly what it needs for a given flow Business methods can not hide intentions of further methods they call To understand the last point, take a look at the following example: Java private void doSomeBusinessFlow() { final var instance = fetchFromDB(); doSomeAction1(instance); } private <T extends HasTitle & HasIsbn & HasReaders> void doSomeAction1(T onThis) { ... some business logic with onThis.getReaders(); ... doSomeAction2(onThis); } private <K extends HasTitle & HasIsbn> void doSomeAction2(K onThis) { ... } private <L extends HasTitle & HasIsbn & HasReaders> L fetchFromDB() { ... } Here, assuming that doSomeAction2 wants to access Title and ISBN, doSomeAction1 has no way to hide this fact, even if it only wants to access the Readers. Using traditional business objects (like having one Book class) would not bring this kind of transparency. Note that this is not only true while writing the business code for the first time, but this must be maintained all the time. If doSomeAction2 is being reworked and it needs access to NumberOfPages too, this change must be propagated to all the methods until the point when the instance is being fetched from DB. This can help to identify overgrown business methods: the more attributes it expects, probably the more it is doing, which might be an indicator that the given method should be the target of refactoring. It is also worth noting that by using inline classes and lambdas, creating instances for the given requirements is really straightforward. Conclusion As shown above, there are multiple ways to implement data access while sticking to the thoughts of Clean Architecture. It is also clear that there is no golden hammer that can be used in all possible situations. This is exactly the reason why considering non-usual solutions is important: there are examples when they are a better fitting than their usual counterparts. As hinted above already, full code examples for all three presented variations with the example domain of a library can be found in the GitHub repository linked earlier. Additional Remarks The first and most important remark is that Clean Architecture covers way more aspects than summarized above. It is suggested for every IT professional (even for non-developers) to get familiar with it. As a further remark, note that all variations are tested by the same set of unit tests, which set is independent of the variation details. On the trade-off side, this is the reason why LibraryService returns Objects and not variation-specific classes.
Distributed computing, also known as distributed processing, involves connecting numerous computer servers through a network to form a cluster. This cluster, known as a "distributed system," enables the sharing of data and coordinating processing power. Distributed computing provides many benefits, such as: Scalability utilizing a "scale-out architecture" Enhanced performance leveraging parallelism Increased resilience by employing redundancy Cost-effectiveness by utilizing low-cost, commodity hardware There are two main advantages of distributed computing: Utilizing the collective processing capabilities of a clustered system Minimizing network hops by conducting computations on the cluster processing the data In this blog post, we explore how Hazelcast simplifies distributed computing (both as a self-managed and managed service). Hazelcast offers three solutions for distributed computing, depending on the use case: Option #1: Entry Processor An entry processor is a functionality that executes your code on a map entry in a manner that ensures atomicity. So you can update, remove, and read map entries on cluster members (servers). This is a good option to perform bulk processing on an IMap. How To Create EntryProcessor Java public class IncrementingEntryProcessor implements EntryProcessor<Integer, Integer, Integer> { public Integer process( Map.Entry<Integer, Integer> entry ) { Integer value = entry.getValue(); entry.setValue( value + 1 ); return value + 1; } @Override public EntryProcessor<Integer, Integer, Integer> getBackupProcessor() { return IncrementingEntryProcessor.this; } } How To Use EntryProcessor Java IMap<Integer, Integer> map = hazelcastInstance.getMap( "myMap" ); for ( int i = 0; i < 100; i++ ) { map.put( i, i ); } Map<Integer, Object> res = map.executeOnEntries( new IncrementingEntryProcessor() ); How To Optimize EntryProcessor Performance Offloadable refers to the capability of transferring execution from the partition thread to an executor thread. ReadOnly indicates the ability to refrain from acquiring a lock on the key. You can learn more about the entry processor in our documentation. Option #2: Java Executor Service Simply put, you can run your Java code on cluster members and obtain the resulting output. Java boasts a standout feature in its Executor framework, enabling the asynchronous execution of tasks, such as database queries, intricate calculations, and image rendering. In the Java Executor framework, you implement tasks in two ways: Callable or Runnable. Similarly, using Hazelcast, you can implement tasks in two ways: Callable or Runnable. How To Implement a Callable Task Java public class SumTask implements Callable<Integer>, Serializable, HazelcastInstanceAware { private transient HazelcastInstance hazelcastInstance; public void setHazelcastInstance( HazelcastInstance hazelcastInstance ) { this.hazelcastInstance = hazelcastInstance; } public Integer call() throws Exception { IMap<String, Integer> map = hazelcastInstance.getMap( "map" ); int result = 0; for ( String key : map.localKeySet() ) { System.out.println( "Calculating for key: " + key ); result += map.get( key ); } System.out.println( "Local Result: " + result ); return result; } } How To Implement a Runnable Task Java public class EchoTask implements Runnable, Serializable { private final String msg; public EchoTask( String msg ) { this.msg = msg; } @Override public void run() { try { Thread.sleep( 5000 ); } catch ( InterruptedException e ) { } System.out.println( "echo:" + msg ); } } How To Scale the Executor Service To scale up, you should improve the processing capacity of the cluster member (JVM). You can do this by increasing the pool-size property mentioned in Configuring Executor Service (i.e., increasing the thread count). However, please be aware of your member’s capacity. If you think it cannot handle an additional load caused by increasing the thread count, you may want to consider improving the member’s resources (CPU, memory, etc.). For example, set the pool size to 5 and run the above MasterMember. You will see that EchoTask is run as soon as it is produced. To scale out, add more members instead of increasing only one member’s capacity. You may want to expand your cluster by adding more physical or virtual machines. For example, in the EchoTask scenario in the Runnable section, you can create another Hazelcast instance that automatically gets involved in the executions started in MasterMember and start processing. You can read more about the Executor Service in our documentation. Option #3: Pipeline Develop a data pipeline capable of swiftly executing batch or streaming processes across cluster participants. It consists of three elements: one or more sources, processing stages, and at least one sink. Depending on the data source, pipelines can be applied for various use cases, such as stream processing on a continuous stream of data (i.e., events) to provide results in real-time as the data is generated, or, in the form of batch processing of a fixed amount of static data for routine tasks, such as generating daily reports. In Hazelcast, pipelines can be defined using either SQL or the Jet API. Below are a few more resources: SQL pipelines Jet API pipelines Pipelines to enrich streams demo When Should You Choose Entry Processor, Executor Service, or Pipeline? Opting for an entry processor is a suitable choice when conducting bulk processing on a map. Typically, this involves iterating through a loop of keys, retrieving the value with map.get(key), modifying the value, and ultimately reintegrating the entry into the map using map.put(key, value). The executor service is well-suited for executing arbitrary Java code on your cluster members. Pipelines are well-suited for scenarios where you need to process multiple entries (i.e., aggregations or joins) or when multiple computing steps need to be performed in parallel. With pipelines, you can update maps based on the results of your computation using an entry processor sink. So here you have three options for distributed computing using Hazelcast. We look forward to your feedback and comments about this blog post! Share your experience on the Hazelcast GitHub repository.
Nicolas Fränkel
Head of Developer Advocacy,
Api7
Shai Almog
OSS Hacker, Developer Advocate and Entrepreneur,
Codename One
Andrei Tuchin
Lead Software Developer, VP,
JPMorgan & Chase
Ram Lakshmanan
yCrash - Chief Architect