Spring Batch is one of the most popular Open Source batch processing frameworks available today. Also it supports most advanced features such as optimization and partitioning techniques, thus makes it the most suitable framework for high volume and high performance enterprise applications.
In this article, we will discuss about using Spring batch to process files from aws S3(Simple Storage Service).
The lifecycle of a batch process is, read large chunk of data, process it and then write the transformed data back to some storage. So, the main components of a batch process are: a reader, a processor and a writer.
Batch Reader
Spring Batch provides various item readers such as:
- FlatFileItemReader
- HibernatePagingItemReader
- IbatisPagingItemReader
- JdbcPagingItemReader
- JmsItemReader
- MongoItemReader
As you might be knowing, there is no in-built reader available for S3. You can write your own item reader by implementing the interface ItemReader. But here, I will show you how to build an item reader for S3 with some simple steps!
The approach
Here I will use FlatFileItemReader as the ItemReader implementation with a custom resource. The resource will be a ByteArrayResource for which the input will be the bytes read from S3, simple isn't it?
The code to read bytes from S3 will look like:
public byte[] getBytes() throws IOException { S3Object object = getClient().getObject(new GetObjectRequest("bucket", "file")); try (InputStream is = object.getObjectContent()) { ByteArrayOutputStream out = new ByteArrayOutputStream(); IOUtils.copy(is, out); return out.toByteArray(); } }
And here goes the code for building the ItemReader:
public ItemReader- reader() throws IOException { FlatFileItemReader
- reader = new FlatFileItemReader<>(); reader.setResource(new ByteArrayResource(bytes(), "s3 bytes")); lineMapper.setLineTokenizer(your tokenizer); lineMapper.setFieldSetMapper(your field mapper); reader.setLineMapper(your line mapper); return reader; }
That's it! Now we have a S3 item reader which can be used in your Spring Batch application. But there are some issues with this approach, continue to part 2 of this article where I will show you a better way to implement S3 file reader.
ALso I will be writing a detailed article on how to build an S3 item writer as well. Stay tuned!
How are you going with S3 item writer ?
ReplyDeleteHi Do you have any example which reads list of files from s3 in spring batch
ReplyDelete