Spring Batch Tutorial Part #5



This content originally appeared on DEV Community and was authored by sadiul hakim

A great Spring Batch application is robust, scalable, and fault-tolerant. This tutorial explains key components and concepts to help you build such applications, from basic job configuration to advanced features like parallel processing and error handling.

⚙ Core Components and Configuration

1. Job Launcher and Task Executor

To run a Spring Batch job, you need a JobLauncher. For synchronous execution, you can use the default JobLauncher. For asynchronous execution, you can configure a TaskExecutorJobLauncher. This is essential for non-blocking operations, especially in web applications.

  • @EnableBatchProcessing: This annotation enables Spring Batch features and is required to run a batch application as a server. It automatically configures a default JobLauncher and other core components.
  • TaskExecutorJobLauncher: This implementation of JobLauncher runs jobs asynchronously using a TaskExecutor.
  • SimpleAsyncTaskExecutor: A basic TaskExecutor that uses a new thread for each task. You can enable virtual threads by setting .setVirtualThread(true) for lightweight, high-concurrency tasks.
@Bean
public JobLauncher asyncJobLauncher(JobRepository jobRepository) throws Exception {
    TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
    jobLauncher.setJobRepository(jobRepository);
    SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
    taskExecutor.setVirtualThreads(true);
    jobLauncher.setTaskExecutor(taskExecutor);
    jobLauncher.afterPropertiesSet();
    return jobLauncher;
}

2. Tasklets

Not all steps in a batch job need a classic reader, processor, and writer. For simple, procedural logic like file cleanup or system commands, you can use a Tasklet.

  • Tasklet: This is a functional interface with a single execute() method. It returns a RepeatStatus object.
  • RepeatStatus:
    • RepeatStatus.CONTINUABLE: Tells the framework to call the execute() method again. This is useful for repeating tasks until a condition is met.
    • RepeatStatus.FINISHED: Indicates the task is complete.

A Tasklet must be passed to a Step to be executed.

@Bean
public Step cleanupStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
    return new StepBuilder("cleanupStep", jobRepository)
        .tasklet((contribution, chunkContext) -> {
            // Your custom logic, e.g., deleting a temporary directory
            System.out.println("Cleaning up temporary files...");
            return RepeatStatus.FINISHED;
        }, transactionManager)
        .build();
}

📚 Reading from Multiple Resources

Handling multiple input files in a single batch job is a common requirement. Spring Batch provides powerful components for this.

1. MultiResourceItemReaderBuilder

The MultiResourceItemReaderBuilder simplifies reading from multiple files. It manages the list of files and delegates the actual reading of each file to an underlying ResourceAwareItemReaderItemStream.

2. ResourceAwareItemReaderItemStream

This interface acts as an intermediate layer. It receives a single Resource (e.g., a file) from the MultiResourceItemReaderBuilder and sets it on the inner reader (like a FlatFileItemReader). It then manages the lifecycle of this inner reader through its open(), close(), and read() methods. It does not read data itself, it only manages the resource for the actual reader.

3. FlatFileItemReader

This is the concrete reader that reads lines from a single file. It:

  • Reads lines sequentially.
  • Keeps track of its state (last read line) in the ExecutionContext for restartability.
  • Requires a LineMapper to parse each line into an object.

The overall flow is as follows:

  1. MultiResourceItemReader receives a list of files (e.g., from inputFolderPath).
  2. It delegates a single file to a MultiFileTeamReader (which is a custom implementation of ResourceAwareItemReaderItemStream).
  3. The MultiFileTeamReader sets this file as the resource for the FlatFileItemReader.
  4. The FlatFileItemReader reads lines from the file, parsing them into objects.
  5. When a file is finished, the MultiResourceItemReader moves to the next file, repeating the process.

🔄 State Management and Restartability

1. ExecutionContext

The ExecutionContext is a key-value store that holds state information for a Job or Step. It’s crucial for restartability, as it allows the framework to save the last successfully processed position. Most readers and writers automatically save their state to the ExecutionContext.

2. ItemStream

The ItemStream interface provides a lifecycle for stateful components in Spring Batch.

  • open(ExecutionContext): Called at the beginning of a step to initialize resources and restore state.
  • update(ExecutionContext): Called periodically to save the current state.
  • close(): Called at the end of a step to release resources.

The FlatFileItemReader implements ItemStream to save the last read line position, allowing a job to be restarted from where it left off.

3. Sharing Data Between Steps

To pass data from one step to another, you can promote keys from the StepExecutionContext to the JobExecutionContext.

  • Manual Promotion: Access the JobExecutionContext via StepExecution.getJobExecution().getExecutionContext() at the end of a step.
  • Automatic Promotion: Use the ExecutionContextPromotionListener. Configure it with the keys you want to promote, and it will automatically transfer those values to the JobExecutionContext.

You can access data from the JobExecutionContext in other steps using the @Value annotation with the SpEL expression #{jobExecutionContext['your.key']}.

@Bean
@StepScope
public SomeComponent myComponent(@Value("#{jobExecutionContext['max.score']}") Long maxScore) {
    // ...
}

📝 Controlling Output Files

To add custom headers or footers to a flat file, you can use the following callbacks in your FlatFileItemWriter:

  • FlatFileHeaderCallback: Adds text to the top of the output file.
  • FlatFileFooterCallback: Adds text to the bottom of the output file.

🚀 Parallel and Fault-Tolerant Execution

1. Parallel Step Execution

You can run multiple steps in parallel to improve performance by using a SimpleFlow with a TaskExecutor.

  1. Define a separate SimpleFlow for each step you want to run in parallel.
  2. Use a FlowBuilder to create a “split” flow, adding the individual flows to it.
  3. Provide a TaskExecutor to the split() method to manage the parallel threads.
// Define a split flow for parallel steps
SimpleFlow parallelFlow = new FlowBuilder<SimpleFlow>("parallelFlow")
    .split(new SimpleAsyncTaskExecutor())
    .add(stepOneFlow, stepTwoFlow)
    .build();

// Use the parallel flow in your job
return new JobBuilder("myParallelJob", jobRepository)
    .start(initialStep)
    .next(parallelFlow) // Executes stepOne and stepTwo in parallel
    .next(finalStep)
    .build();

2. Restarting Jobs

Job restartability is a core feature of Spring Batch.

  • Failed Jobs: If a job fails, you can restart it with the same parameters. Spring Batch will automatically skip previously successful steps and only rerun the failed steps.
  • New Job Instance: If you run a job with new parameters, it’s treated as a new instance. All steps will run from the beginning.
  • allowStartIfComplete(true): Use this StepBuilder setting if you want a successful step to rerun when a job is restarted. By default, successful steps are skipped. You need this for tasks that must always run, like a cleanup step.
Scenario Does Step Run Again? Need allowStartIfComplete(true)?
Restarting a failed job with same params No (unless it failed) Yes, if you want a completed step to run again.
Running job with new parameters Yes (it’s a new job instance) No.
Restarting a completed job with same params No Yes, if you want any step to run again.

3. Skip and Retry

Spring Batch’s fault tolerance mechanisms allow you to handle transient and non-critical failures gracefully.

  • Skip: Ignores specific exceptions and continues processing.

    • skip(Exception.class): Skips items that cause the specified exception.
    • skipLimit(n): Sets the maximum number of skips allowed before the job fails.
    • Best for non-critical, permanent failures (e.g., a bad record format).
  • Retry: Retries processing a failed item a specified number of times before giving up.

    • retry(Exception.class): Retries items that cause the specified exception.
    • retryLimit(n): Sets the maximum number of retry attempts.
    • Best for transient failures (e.g., a temporary network issue).

You can use both together within a faultTolerant() step.

4. Listeners

Listeners provide hooks into the job execution lifecycle for custom logic like logging or alerting.

  • SkipListener: Provides callbacks (onSkipInRead, onSkipInWrite, onSkipInProcess) to track skipped items.
  • RetryListener: Provides callbacks (open, close, onError) to track retry attempts and outcomes.


This content originally appeared on DEV Community and was authored by sadiul hakim