{"id":14,"date":"2020-07-18T06:56:26","date_gmt":"2020-07-18T06:56:26","guid":{"rendered":"https:\/\/system.camp\/index.php\/2020\/07\/18\/elasticsearch-dump-using-java-with-multi-threading-for-faster-processing\/"},"modified":"2020-10-06T07:23:23","modified_gmt":"2020-10-06T07:23:23","slug":"elasticsearch-dump-using-java-with-multi-threading-for-faster-processing","status":"publish","type":"post","link":"https:\/\/system.camp\/tutorial\/elasticsearch-dump-using-java-with-multi-threading-for-faster-processing\/","title":{"rendered":"Uploading docs in ElasticSearch using Java with async for super-fast processing with examples"},"content":{"rendered":"\n
I wanted a quick and easy method to dump a lot of objects to my ElasticSearch endpoint and I did the rookie mistake of adding the “official” maven repository for ElasticSearch for JAVA.<\/p>\n\n\n\n
That jar is so “huge”. It has pretty much all of server and client code. I tweeted about this but looks like there isn’t going to a fix soon.<\/p>\n\n\n\n
Worry not, I was able to find a light-weight ElasticSearch client for JAVA called “Jest”. This is how you install the dependency<\/p>\n\n\n\n
<dependency> <groupId>io.searchbox<\/groupId>\n <artifactId>jest<\/artifactId>\n <version>6.3.1<\/version>\n<\/dependency><\/code><\/pre>\n\n\n\nInitializing the client:<\/h2>\n\n\n\n
I signed up for an account on “elastic.co”. The parent company that owns ElasticSearch. Using AWS ElasticSearch was really hard and had a lot of requirements & steps. elastic.co was one click and they also had a free trial, so I went with it.<\/p>\n\n\n\n
JestClient getJestClient() {\n JestClientFactory factory = new JestClientFactory();\n factory.setHttpClientConfig(\n new HttpClientConfig.Builder(\n \"https:\/\/<your-endpoint>-central1.gcp.cloud.es.io:9243\")\n .defaultCredentials(\"elastic\", \"password\")\n .build()\n );\n return factory.getObject();\n }<\/code><\/pre>\n\n\n\nThe JestClient offers a lot of APIs but since I want to just dump a lot of documents to my endpoint, I used their bulk async methods. This is how I use it.<\/p>\n\n\n\n
First create a list of items of “Index” items.<\/h2>\n\n\n\nList<Index> indexList = new ArrayList<>();\nString jsonString = objectMapper.writeValueAsString(feedDao);\nIndex index = new Index.Builder(jsonString).index(\"feeds\").type(\"doc\")\n .id(feedDao.getFeedId()).build();\nindexList.add(index);<\/code><\/pre>\n\n\n\nMy obejct is called “FeedDao” and I convert that to json using objectmapper and just saving it to an indexlist.<\/p>\n\n\n\n