{"id":14,"date":"2020-07-18T06:56:26","date_gmt":"2020-07-18T06:56:26","guid":{"rendered":"https:\/\/system.camp\/index.php\/2020\/07\/18\/elasticsearch-dump-using-java-with-multi-threading-for-faster-processing\/"},"modified":"2020-10-06T07:23:23","modified_gmt":"2020-10-06T07:23:23","slug":"elasticsearch-dump-using-java-with-multi-threading-for-faster-processing","status":"publish","type":"post","link":"https:\/\/system.camp\/tutorial\/elasticsearch-dump-using-java-with-multi-threading-for-faster-processing\/","title":{"rendered":"Uploading docs in ElasticSearch using Java with async for super-fast processing with examples"},"content":{"rendered":"\n

I wanted a quick and easy method to dump a lot of objects to my ElasticSearch endpoint and I did the rookie mistake of adding the “official” maven repository for ElasticSearch for JAVA.<\/p>\n\n\n\n

That jar is so “huge”. It has pretty much all of server and client code. I tweeted about this but looks like there isn’t going to a fix soon.<\/p>\n\n\n\n

Worry not, I was able to find a light-weight ElasticSearch client for JAVA called “Jest”. This is how you install the dependency<\/p>\n\n\n\n

<dependency>    <groupId>io.searchbox<\/groupId>\n    <artifactId>jest<\/artifactId>\n    <version>6.3.1<\/version>\n<\/dependency><\/code><\/pre>\n\n\n\n

Initializing the client:<\/h2>\n\n\n\n

I signed up for an account on “elastic.co”. The parent company that owns ElasticSearch. Using AWS ElasticSearch was really hard and had a lot of requirements & steps. elastic.co was one click and they also had a free trial, so I went with it.<\/p>\n\n\n\n

JestClient getJestClient() {\n    JestClientFactory factory = new JestClientFactory();\n    factory.setHttpClientConfig(\n        new HttpClientConfig.Builder(\n            \"https:\/\/<your-endpoint>-central1.gcp.cloud.es.io:9243\")\n            .defaultCredentials(\"elastic\", \"password\")\n            .build()\n    );\n    return factory.getObject();\n  }<\/code><\/pre>\n\n\n\n

The JestClient offers a lot of APIs but since I want to just dump a lot of documents to my endpoint, I used their bulk async methods. This is how I use it.<\/p>\n\n\n\n

First create a list of items of “Index” items.<\/h2>\n\n\n\n
List<Index> indexList = new ArrayList<>();\nString jsonString = objectMapper.writeValueAsString(feedDao);\nIndex index = new Index.Builder(jsonString).index(\"feeds\").type(\"doc\")\n              .id(feedDao.getFeedId()).build();\nindexList.add(index);<\/code><\/pre>\n\n\n\n

My obejct is called “FeedDao” and I convert that to json using objectmapper and just saving it to an indexlist.<\/p>\n\n\n\n

Then you need to create a Bulk request object like this. Make sure you have created your “index” already on ES. You can create index this simple API: https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/indices-create-index.html<\/a><\/p>\n\n\n\n

This is the Bulk Object using builder pattern:<\/p>\n\n\n\n

Bulk bulk = new Bulk.Builder()\n     .defaultIndex(\"feeds\")\n     .defaultType(\"doc\")\n     .addAction(indexList)\n     .build();<\/code><\/pre>\n\n\n\n

The way to call an executeAsync is a bit different. You need to be able to handle the failure and successes. You can do whatever you want with the results.<\/p>\n\n\n\n

jestClient.executeAsync(bulk, new JestResultHandler<JestResult>() {\n      @Override\n      public void completed(JestResult result) {\n        log.info(result);\n      }\n      @Override\n      public void failed(Exception ex) {\n        log.error(ex);\n      }\n});<\/code><\/pre>\n\n\n\n

That’s it. I hope this was useful to you. Feel free to bookmark this for later use of you can use the Search of this portal to find this later.<\/p>\n\n\n\n

If there is an error somewhere, please let me know in the comments.<\/p>\n","protected":false},"excerpt":{"rendered":"

Fastest was to upload a lot of documents to “ElasticSearch” using JAVA. Wit code examples and tutorials.<\/p>\n","protected":false},"author":1,"featured_media":112,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[40,35],"tags":[10,9,3],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/posts\/14"}],"collection":[{"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":3,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":115,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/posts\/14\/revisions\/115"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/media\/112"}],"wp:attachment":[{"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/media?parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/categories?post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/system.camp\/wp-json\/wp\/v2\/tags?post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}