This content originally appeared on DEV Community and was authored by sadiul hakim
A great Spring Batch application is robust, scalable, and fault-tolerant. This tutorial explains key components and concepts to help you build such applications, from basic job configuration to advanced features like parallel processing and error handling.
Core Components and Configuration
1. Job Launcher and Task Executor
To run a Spring Batch job, you need a JobLauncher. For synchronous execution, you can use the default JobLauncher
. For asynchronous execution, you can configure a TaskExecutorJobLauncher. This is essential for non-blocking operations, especially in web applications.
-
@EnableBatchProcessing
: This annotation enables Spring Batch features and is required to run a batch application as a server. It automatically configures a defaultJobLauncher
and other core components. -
TaskExecutorJobLauncher
: This implementation ofJobLauncher
runs jobs asynchronously using aTaskExecutor
. -
SimpleAsyncTaskExecutor
: A basicTaskExecutor
that uses a new thread for each task. You can enable virtual threads by setting.setVirtualThread(true)
for lightweight, high-concurrency tasks.
@Bean
public JobLauncher asyncJobLauncher(JobRepository jobRepository) throws Exception {
TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
jobLauncher.setJobRepository(jobRepository);
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setVirtualThreads(true);
jobLauncher.setTaskExecutor(taskExecutor);
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
2. Tasklets
Not all steps in a batch job need a classic reader, processor, and writer. For simple, procedural logic like file cleanup or system commands, you can use a Tasklet.
-
Tasklet
: This is a functional interface with a singleexecute()
method. It returns aRepeatStatus
object. -
RepeatStatus
:-
RepeatStatus.CONTINUABLE
: Tells the framework to call theexecute()
method again. This is useful for repeating tasks until a condition is met. -
RepeatStatus.FINISHED
: Indicates the task is complete.
-
A Tasklet
must be passed to a Step
to be executed.
@Bean
public Step cleanupStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("cleanupStep", jobRepository)
.tasklet((contribution, chunkContext) -> {
// Your custom logic, e.g., deleting a temporary directory
System.out.println("Cleaning up temporary files...");
return RepeatStatus.FINISHED;
}, transactionManager)
.build();
}
Reading from Multiple Resources
Handling multiple input files in a single batch job is a common requirement. Spring Batch provides powerful components for this.
1. MultiResourceItemReaderBuilder
The MultiResourceItemReaderBuilder simplifies reading from multiple files. It manages the list of files and delegates the actual reading of each file to an underlying ResourceAwareItemReaderItemStream
.
2. ResourceAwareItemReaderItemStream
This interface acts as an intermediate layer. It receives a single Resource
(e.g., a file) from the MultiResourceItemReaderBuilder
and sets it on the inner reader (like a FlatFileItemReader
). It then manages the lifecycle of this inner reader through its open()
, close()
, and read()
methods. It does not read data itself, it only manages the resource for the actual reader.
3. FlatFileItemReader
This is the concrete reader that reads lines from a single file. It:
- Reads lines sequentially.
- Keeps track of its state (last read line) in the
ExecutionContext
for restartability. - Requires a
LineMapper
to parse each line into an object.
The overall flow is as follows:
- MultiResourceItemReader receives a list of files (e.g., from
inputFolderPath
). - It delegates a single file to a MultiFileTeamReader (which is a custom implementation of
ResourceAwareItemReaderItemStream
). - The MultiFileTeamReader sets this file as the resource for the FlatFileItemReader.
- The FlatFileItemReader reads lines from the file, parsing them into objects.
- When a file is finished, the MultiResourceItemReader moves to the next file, repeating the process.
State Management and Restartability
1. ExecutionContext
The ExecutionContext is a key-value store that holds state information for a Job
or Step
. It’s crucial for restartability, as it allows the framework to save the last successfully processed position. Most readers and writers automatically save their state to the ExecutionContext
.
2. ItemStream
The ItemStream interface provides a lifecycle for stateful components in Spring Batch.
-
open(ExecutionContext)
: Called at the beginning of a step to initialize resources and restore state. -
update(ExecutionContext)
: Called periodically to save the current state. -
close()
: Called at the end of a step to release resources.
The FlatFileItemReader
implements ItemStream
to save the last read line position, allowing a job to be restarted from where it left off.
3. Sharing Data Between Steps
To pass data from one step to another, you can promote keys from the StepExecutionContext
to the JobExecutionContext
.
-
Manual Promotion: Access the
JobExecutionContext
viaStepExecution.getJobExecution().getExecutionContext()
at the end of a step. -
Automatic Promotion: Use the ExecutionContextPromotionListener. Configure it with the keys you want to promote, and it will automatically transfer those values to the
JobExecutionContext
.
You can access data from the JobExecutionContext
in other steps using the @Value
annotation with the SpEL expression #{jobExecutionContext['your.key']}
.
@Bean
@StepScope
public SomeComponent myComponent(@Value("#{jobExecutionContext['max.score']}") Long maxScore) {
// ...
}
Controlling Output Files
To add custom headers or footers to a flat file, you can use the following callbacks in your FlatFileItemWriter
:
-
FlatFileHeaderCallback
: Adds text to the top of the output file. -
FlatFileFooterCallback
: Adds text to the bottom of the output file.
Parallel and Fault-Tolerant Execution
1. Parallel Step Execution
You can run multiple steps in parallel to improve performance by using a SimpleFlow with a TaskExecutor
.
- Define a separate
SimpleFlow
for each step you want to run in parallel. - Use a
FlowBuilder
to create a “split” flow, adding the individual flows to it. - Provide a
TaskExecutor
to thesplit()
method to manage the parallel threads.
// Define a split flow for parallel steps
SimpleFlow parallelFlow = new FlowBuilder<SimpleFlow>("parallelFlow")
.split(new SimpleAsyncTaskExecutor())
.add(stepOneFlow, stepTwoFlow)
.build();
// Use the parallel flow in your job
return new JobBuilder("myParallelJob", jobRepository)
.start(initialStep)
.next(parallelFlow) // Executes stepOne and stepTwo in parallel
.next(finalStep)
.build();
2. Restarting Jobs
Job restartability is a core feature of Spring Batch.
- Failed Jobs: If a job fails, you can restart it with the same parameters. Spring Batch will automatically skip previously successful steps and only rerun the failed steps.
- New Job Instance: If you run a job with new parameters, it’s treated as a new instance. All steps will run from the beginning.
-
allowStartIfComplete(true)
: Use thisStepBuilder
setting if you want a successful step to rerun when a job is restarted. By default, successful steps are skipped. You need this for tasks that must always run, like a cleanup step.
Scenario | Does Step Run Again? | Need allowStartIfComplete(true) ? |
---|---|---|
Restarting a failed job with same params | No (unless it failed) | Yes, if you want a completed step to run again. |
Running job with new parameters | Yes (it’s a new job instance) | No. |
Restarting a completed job with same params | No | Yes, if you want any step to run again. |
3. Skip and Retry
Spring Batch’s fault tolerance mechanisms allow you to handle transient and non-critical failures gracefully.
-
Skip: Ignores specific exceptions and continues processing.
-
skip(Exception.class)
: Skips items that cause the specified exception. -
skipLimit(n)
: Sets the maximum number of skips allowed before the job fails. - Best for non-critical, permanent failures (e.g., a bad record format).
-
-
Retry: Retries processing a failed item a specified number of times before giving up.
-
retry(Exception.class)
: Retries items that cause the specified exception. -
retryLimit(n)
: Sets the maximum number of retry attempts. - Best for transient failures (e.g., a temporary network issue).
-
You can use both together within a faultTolerant()
step.
4. Listeners
Listeners provide hooks into the job execution lifecycle for custom logic like logging or alerting.
-
SkipListener
: Provides callbacks (onSkipInRead
,onSkipInWrite
,onSkipInProcess
) to track skipped items. -
RetryListener
: Provides callbacks (open
,close
,onError
) to track retry attempts and outcomes.
This content originally appeared on DEV Community and was authored by sadiul hakim