Batch Processing in Mule 4.0

Categories: python

Batch Processing in Mule 4.0

First of all its important to know that Mule allows to process messages in batches.

Overview

Within a Mule application, batch processing provides a construct for asynchronously processing larger-than-memory data sets that are split into individual records.

Batch jobs allow for the description of a reliable process that automatically splits up source data and stores it into persistent queues, which makes it possible to process large data sets while providing reliability.

In the event that the application is redeployed or Mule crashes, the job execution is able to resume at the point it stopped.

Batch Job

A batch job is a scope that splits large messages into records that Mule processes asynchronously. In the same way flows process messages, batch jobs process records.

Within an application, we can initiate a Batch Job scope, which splits messages into individual records, performs actions upon each record, and then reports on the results and potentially pushes the processed output to other systems or queues.

A batch job contains one or more batch steps that act upon records as they move through the batch job.

Each batch step in a batch job contains processors that act upon a record to transform, route, enrich, or otherwise process data contained within it.

By leveraging the functionality of existing Mule processors, the batch step offers a lot of flexibility regarding how a batch job processes records.

A batch job executes when the flow reaches the process-records section of the batch job. When triggered, Mule creates a new batch job instance.

When the job instance becomes executable, the batch engine submits a task for each record block to the I/O pool to process each record.

Parallelism occurs automatically, at the record block level. The Mule runtime engine uses its autotuning capabilities to determine how many threads to use and the level of parallelism to apply.

When the batch job starts executing, Mule splits the incoming message into records, stores them in a persistent queue, and queries and schedules those records in blocks of records to process.

By default, the runtime stores 100 records in each batch step. You can customize this size according to the performance you require.

In which Scenarios we can use Batch processing:

Synchronizing data sets between business applications, such as syncing contacts between NetSuite and Salesforce.
Extracting, transforming and loading (ETL) information into a target system, such as uploading data from a flat file (CSV) to Hadoop.
Handling large quantities of incoming data from an API into a legacy system.