본문 바로가기

프로그래밍/Spring

[Spring Batch] 1. 알아보자

1. Spring Batch

 1) Spring batch란? 

  - Spring Batch는 Job과 Step으로 구성되어 있음.

  - 하나의 Spring Batch안에는 여러 Job이 존재 할 수 있고, 그 Job 안에는 여러 개의 Step 또는 Tasklet을 존재 할 수 있음.

  - Job -> Step -> ItemReader - ItemProcessor - ItemWriter


#https://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domain 에서 가져온 이미지 입니다.


  2) 장점

   - 간단하게 대용량 배치를 만들 수 있다.

   - 이미 만들어진 많은 모듈들을 사용해서 손쉽게 구현가능(CSV 파싱, DB에서 가지고 오기, S3 등에 파일업로드 등)

  

  3) 단점

    - 어렵다.(배우면 된다.)


2. Class 설명

  1) ItemReader

  - Step 안에서 데이터를 가져오는 역할을 하는 Class이다.

  - 하나의 아이템이 리턴되거나 null

package org.springframework.batch.item;

/**
* Strategy interface for providing the data. <br>
*
* Implementations are expected to be stateful and will be called multiple times
* for each batch, with each call to {@link #read()} returning a different value
* and finally returning <code>null</code> when all input data is exhausted.<br>
*
* Implementations need <b>not</b> be thread-safe and clients of a {@link ItemReader}
* need to be aware that this is the case.<br>
*
* A richer interface (e.g. with a look ahead or peek) is not feasible because
* we need to support transactions in an asynchronous batch.
*
* @author Rob Harrop
* @author Dave Syer
* @author Lucas Ward
* @since 1.0
*/
public interface ItemReader<T> {

/**
* Reads a piece of input data and advance to the next one. Implementations
* <strong>must</strong> return <code>null</code> at the end of the input
* data set. In a transactional setting, caller might get the same item
* twice from successive calls (or otherwise), if the first call was in a
* transaction that rolled back.
*
* @throws ParseException if there is a problem parsing the current record
* (but the next one may still be valid)
* @throws NonTransientResourceException if there is a fatal exception in
* the underlying resource. After throwing this exception implementations
* should endeavour to return null from subsequent calls to read.
* @throws UnexpectedInputException if there is an uncategorised problem
* with the input data. Assume potentially transient, so subsequent calls to
* read might succeed.
* @throws Exception if an there is a non-specific error.
* @return T the item to be processed
*/
T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException;

}


 2) ItemProcessor

  - Step 안에서 가져온 데이터를 가공하는 역할을 하는 Class이다.

  - Chunk 사이즈 만큼 Item들이 List로 들어온다.

package org.springframework.batch.item;

/**
* Interface for item transformation. Given an item as input, this interface provides
* an extension point which allows for the application of business logic in an item
* oriented processing scenario. It should be noted that while it's possible to return
* a different type than the one provided, it's not strictly necessary. Furthermore,
* returning null indicates that the item should not be continued to be processed.
*
* @author Robert Kasanicky
* @author Dave Syer
*/
public interface ItemProcessor<I, O> {

/**
* Process the provided item, returning a potentially modified or new item for continued
* processing. If the returned result is null, it is assumed that processing of the item
* should not continue.
*
* @param item to be processed
* @return potentially modified or new item for continued processing, null if processing of the
* provided item should not continue.
* @throws Exception
*/
O process(I item) throws Exception;
}


 3) ItemWriter

  - Step 안에서 데이터를 쓰는 역할을 하는 Class이다.

package org.springframework.batch.item;

import java.util.List;

/**
* <p>
* Basic interface for generic output operations. Class implementing this
* interface will be responsible for serializing objects as necessary.
* Generally, it is responsibility of implementing class to decide which
* technology to use for mapping and how it should be configured.
* </p>
*
* <p>
* The write method is responsible for making sure that any internal buffers are
* flushed. If a transaction is active it will also usually be necessary to
* discard the output on a subsequent rollback. The resource to which the writer
* is sending data should normally be able to handle this itself.
* </p>
*
* @author Dave Syer
* @author Lucas Ward
*/
public interface ItemWriter<T> {

/**
* Process the supplied data element. Will not be called with any null items
* in normal operation.
*
* @param items items to be written
* @throws Exception if there are errors. The framework will catch the
* exception and convert or rethrow it as appropriate.
*/
void write(List<? extends T> items) throws Exception;

}


3. 간단 용어 설명

  1) Item 

   - 데이터의 가장 작은 구성 요소를 말함.


 2) Chunk

   - commit-interval

   - ItemReader 읽은 데이터를 Processor 통해 가공 한 후 ItemWriter 넘겨지는 갯수를 의미함.

   - 트랜젝션이 걸려 있다면 한 트랜젝션 안에서 처리할 Item의 수이다.


4. 옵션

     (1) 옵션 설명

- spring.batch.initializer.enabled : Spring Batch 실행시에 Database 내 Table create 등 실행여부

- spring.batch.job.enabled : Spring Batch 실행시에 Context내 모든 Job들 실행 여부

- spring.batch.job.names : 실행할 Job 리스트, 콤마(,)로 구분한다.

- spring.batch.schema : db 스키마 초기화 sql파일 위치 (바로가기)

- spring.batch.table-prefix : 테이블명 앞에 붙일 명칭


     (2) 설정

  - application.properties

spring.batch.initializer.enabled=false
spring.batch.job.enabled=false
spring.batch.job.names=test1,test2,test3
spring.batch.schema=org/springframework/batch/core/schema-h2.sql
spring.batch.table-prefix=tmp_


  - application.yml

spring:
batch:
schema: org/springframework/batch/core/schema-h2.sql
table-prefix: tmp_
initializer:
enabled: false
job:
enabled: false
names: test1,test2,test3


* 참고페이지

 - https://projects.spring.io/spring-batch/

 - https://docs.spring.io/spring-batch/3.0.x/reference/html/index.html

 - https://docs.spring.io/spring-batch/trunk/reference/html/listOfReadersAndWriters.html