As established in one of the forum topics (Requisition data feed) in SELV v3 we’ve started using Kafka to transfer the data from requisition and referencedata schemas to newly created schema requisitionbatch. Unfortunately, we’ve recently discovered that the data is not being transferred at all anymore. In logs we’ve found occurrences of below information for almost all of the sink workers:
Task is being killed and will not recover until manually restarted
It occurred that an unrecoverable Exception thrown during the sink task execution causes its silent shutdown. The whole service is still working but the sink tasks are not transferring the data anymore, and therefore we have our schema out of date. We could find two main causes of the Exceptions, and I’m hoping after resolving those no other will occur.
First one is related to the duplicated key issues:
duplicate key value violates unique constraint “batch_req_prod_fac_per”
and the second one refers to issue with mapping one of the column:
column “golivedate” is of type date but expression is of type integer
(more detailed logs can be found there)
We would like to open a discussion on how to best approach resolving them.
- What is the best option to ensure that the messages are processed only one, or maybe some other idea why duplicate key issue is present?
- How to configure the mapping of certain columns, or do you know a reason why it’s not done automatically? (Both columns have the same name and type - date)
- Do you have any other recommendations to use data-pumps in a production environment? We would like to monitor if some of the tasks are down. Current ideas are to execute API calls to all connectors regularly as a health check or to control the logs and catch such messages with some tool. In both cases, for now, we are thinking about notifying the administrators if any of the connectors is down.
Thanks in advance for sharing your thoughts.