Batch Processing in Mule 4

Mule allows you to process messages in batches.

Within an application, you can initiate a batch job scope, which is a block of code that splits messages into individual records, performs actions upon each record, then reports on the results and potentially pushes the processed output to other systems or queues.

For example, you can use batch processing when:

Synchronizing data sets between business applications, such as syncing contacts between NetSuite and Salesforce.
Extracting, transforming and loading (ETL) information into a target system, such as uploading data from a flat file (CSV) to Hadoop.
Handling large quantities of incoming data from an API into a legacy system.

Overview:

Within a Mule application, batch processing provides a construct for asynchronously processing larger-than-memory data sets that are split into individual records.

Batch jobs allow for the description of a reliable process that automatically splits up source data and stores it into persistent queues, which makes it possible to process large data sets while providing reliability. For more info Mulesoft Online Training

In the event that the application is redeployed or Mule crashes, the job execution is able to resume at the point it stopped.

The job is then expressed in terms of processing individual records, providing semantics for record level variables, aggregation, and error handling.

Basic Anatomy

The heart of Mule’s batch processing lies within the batch job. A batch job is a scope that splits large messages into records that Mule processes asynchronously. In the same way flows process messages, batch jobs process records.

The Batch XML structure was modified on Mule 4.0. The example below shows abbreviated details to highlight batch elements.

<flow name="flowOne">
	<batch:job jobName="batchJob">
		<batch:process-records>

			<batch:step name="batchStep1">
				<event processor/>
				<event processor/>
			</batch:step>

			<batch:step name="batchStep2">
				<event processor/>
				<event processor/>
			</batch:step>
		</batch:process-records>
	</batch:job>
</flow>

A batch job contains one or more batch steps that act upon records as they move through the batch job.

Each batch step in a batch job contains processors that act upon a record to transform, route, enrich, or otherwise process data contained within it. By leveraging the functionality of existing Mule processors, the batch step offers a lot of flexibility regarding how a batch job processes records. A batch job executes when the flow reaches the process-records section of the batch job. When triggered, Mule creates a new batch job instance. Get more additional info from Mulesoft training

When the batch job starts executing, Mule splits the incoming message into records, stores them in a persistent queue, and queries and schedules those records in blocks of records to process. By default, the runtime stores 100 records in each batch step.

You can customize this size according to the performance you require. After all the records have passed through all batch steps, the runtime ends the batch job instance and reports the batch job result indicating which records succeeded and which failed during processing.

Error Handling:

Batch jobs can handle any record-level failure that might occur in processing to prevent the failure of a complete batch job. Further, you can set or remove variables on individual records so that during batch processing, Mule can route or otherwise act upon records in a batch job according to a variable (assigned above).

Batch Job vs. Batch Job Instance:

A batch job is the scope element in an application in which Mule processes a message payload as a batch of records. The term batch job is inclusive of all three phases of processing: Load and Dispatch, Process, and On Complete.
A batch job instance is an occurrence in a Mule application whenever a Mule flow executes a batch job. Mule creates the batch job instance in the Load and Dispatch phase. Every batch job instance is identified internally using a unique String known as batch job instance id.

To get in-depth knowledge, enroll for live free demo of Mule Training

This identifier is useful if you want, for example, to pass the local job instance ID to an external system for referencing and managing data, improve the job’s custom logging, or even send an email or SMS notifications for meaningful events around that specific batch job instance.

Batch Job Processing Phases

Each batch job contains three different phases:

Load and Dispatch.
Process.
On Complete.

Load and Dispatch

This first phase is implicit. During this phase, the runtime performs all the “behind the scenes” work to create a batch job instance. Essentially, this is the phase during which Mule turns a serialized message payload into a collection of records for processing as a batch. You don’t need to configure anything for this activity to occur, though it is useful to understand the tasks Mule completes during this phase.

Mule splits the message using Dataweave. This first step creates a new batch job instance.
Mule exposes the batch job instance ID through the batchJobInstanceId variable. This variable is available in every step and the on-complete phase.
Mule creates a persistent queue and associates it with the new batch job instance.
For each item generated by the splitter, Mule creates a record and stores it in the queue. This activity is “all or nothing” – Mule either successfully generates and queues a record for every item, or the whole message fails during this phase.
Mule presents the batch job instance, with all its queued-up records, to the first batch step for processing.

After this phase completes, the flow will continue.

The next phase, “Process,” is asynchronous, meaning that the flow will not wait for the batch job to finish processing all records.

Process

During the “Process” phase, the runtime begins processing the records in the batch asynchronously. Each record moves through the processors in the first batch step, then is sent back to the original queue while it waits to be processed by the second batch step and so on until every record has passed through every batch step.

Only one queue exists, and records are picked out of it for each batch step, processed, and then sent back to it; each record keeps track of what stages it has been processed through while it sits on this queue.

Take your career to new heights of success with a Mulesoft Certification

Note that a batch job instance does not wait for all its queued records to finish processing in one batch step before pushing any of them to the next batch step. Queues are persistent.

Mule persists a list of all records as they succeed or fail to process through each batch step. If a record should fail to be processed by an event processor in a batch step, the runtime continues processing the batch, skipping over the failed record in each subsequent batch step.

At the end of this phase, the batch job instance completes and, therefore, ceases to exist.

Beyond simple processing of records, there are several things you can do with records within a batch step.

You can apply filters by adding acceptExpressions within each batch step to prevent the step from processing certain records. For example, you can set a filter to prevent a step from processing any records which failed to process in the preceding step.
You can use a batch aggregator processor to aggregate records in groups, sending them as bulk upserts to external sources or services. For example, rather than upserting each contact (that is, a record) in a batch to Google Contacts, you can configure a batch aggregator to accumulate, say, 100 records, then upsert all of them to Google Contacts in one chunk.

Basic Anatomy

Batch Job Processing Phases

Load and Dispatch

Process

Share this:

Related

Leave a comment Cancel reply