I want to sync two dependent databases (elasticsearch and casandra) with my parent database: postgres. I am trying to implement a method in this article: https://qafoo.com/blog/086_how_to_synchronize_a_database_with_elastic_search.html. So I came up with 2 methods
Sync before updating/inserting data into dependent databases
router.put('/account/edit', function(req, res) { syncElasticWithDatabase().then(() => { elastiClient.update({...}); // client for elasticsearch cassandraClient.execute({...}); // client for cassandra req.end(); }) })
syncElasticWithDatabase() uses data in updates table (from postgres), this method can be slow since some people would have to wait for syncElasticWithDatabase() to finish. I like this method because I leverage sequantial_ids (check article for details). The data is synced before new data comes in, allowing dependencies to catch up and only missed out data will be synced. Preventing reindex/reinsert unlike options 2 below.
Using a backround process (ei: running every 24 hours), I could sync data by selecting "missed out data" from
update_errortable, which contains data when elasticsearch or cassandra fail. Here's a rough examplerouter.put('/account/edit', function(req, res) { psqlClient.query('UPDATE....').then(() => { elastiClient.update({...}); // client for elasticsearch cassandraClient.execute({...}); // client for cassandra }).catch(err => { psqlClient.query('INERT INTO update_error ....') }) })However this method would require to reindex or reinsert data, because in some cases elasticsearch could insert data while cassandra didn't or either way. Because of this I will need a separate column that will record database type that failed. This way I can select data that failed since the last synchronization time for each type of database (ealsticsearch or cassandra).
Questions:
Method 1 seems perfect, but this would mean some people would have to wait for longer than others to update their account due to
syncElasticWithDatabase(). However the article above does exactly the same (look at their diagram) or I am misunderstanding something?Because of the delay described above (if I'm correct), I introduced option 2. However it's just too much in order to sync IMHO. Yet I spent a good time thinking about this... So are there easier or better methods than 1 and 2?
Would Apache Zoo Keeper help in my case?
Thanks :)
Other reference
Sync elasticsearch on connection with database - nodeJS
https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/