I'm implementing an importer that's using a pool of workers (powered by Celery and RabbitMQ). The data that's being imported includes user data, identified by a unique user ID, and is stored to a PostgreSQL database.
Usually I receive a lot of import files, and those files are processed concurrently. In multiple files, there can be data for the same user ID. This results in the situation where 2 imports try to process data for the same user ID concurrently.
If data for the specific user ID already exists, there's no issue: The database row is locked by utilizing SELECT ... FOR UPDATE. However if there's no data associated with the ID, both imports can run into the troublesome situation where both try to INSERT — using the same user ID.
As the user ID field has a UNIQUE constraint, the transactions fail.
How can I prevent this from happening? Are there other options than locking the full table?