Concurrent saves using JpaRepository

Question

I'm working on a microservice that serves REST endpoints for saving/retrieving data to/from a database, using spring data.

Lets call the entity class Foo which has a simple Long for its ID field some other data fields. IDs for each Foo are not auto-generated in this service, they are supplied from an external source which knows to make them unique.

The service has one POST endpoint that serves both the create and update functions of the CRUD model, which calls a corresponding function in the service layer of the code, let's call this function AddData(Foo foo_incoming). The body of the POST message contains data to save to the database and the ID of the Foo to save the data to. The logic of AddData looks like this:

@Service("fooService")
public class FooServiceImpl {

    @Autowired
    FooRepository fooRepository; // Subinterface of JpaRepository

    @Transactional
    public Long AddData(Foo foo_incoming) {
        Optional<Foo> foo_check = fooRepository.findById(incoming.getId());
        Foo foo_exists;

        // Exists already?
        if (foo_check.isEmpty()) {
            // New Foo
            foo_exists = fooRepository.saveAndFlush(foo_incoming);
        } else {
            // Update existing foo
            foo_exists = foo_check.get();
            foo_exists.addToFieldA(foo_incoming.getFieldA());
            foo_exists.addToFieldB(foo_incoming.getFieldB());
        }

        return foo_exists.getId();
    }

}

This one function is in charge of both creating the initial record for a Foo and updating the record.

When POST requests come in to add data to some Foo with ID=1, let's call it foo-1, which doesn't exist yet, if they come in with a reasonable amount of time between them, the first request will create the initial record for foo-1, and all subsequent calls will only update. I.e. enough time passes for saveAndFlush to actually flush to the database, so subsequent calls to findById find foo-1 in the database, and jump to the else block and just updates its fields.

The problem I'm running into is, when N POSTs for the same Foo (same ID) are sent to the service fast enough, it seems that all the corresponding calls to AddData happen concurrently. So, if foo-1 doesn't exist yet, in each of those calls to AddData, findById(1) returns empty. So saveAndFlush gets called N times for Foos with ID=1, which raises a DataIntegrityViolationException.

I've been digging around the web for days trying to solve this.

The method is already annotated @Transactional. I've tried using @Transactional(isolation = Isolation.SERIALIZABLE) on just the method and on the entire class, doesn't help.
I've tried annotating the findById and saveAndFlush methods in FooRepository with @Lock(LockModeType.PESSIMISTIC_READ) and @Lock(LockModeType.PESSIMISTIC_WRITE), respectively, no luck.
I've tried adding a @Version field to Foo, no change.

I can't figure out how to force AddData to happen serially, I thought that's what @Transactional(isolation = Isolation.SERIALIZABLE) was supposed to do.

I'm considering giving "create" and "update" their own functions - making a PUT endpoint for create. But then the PUT endpoint would have a similar issue - if I wanted to try to prevent primary key collisions in code, I'd have to do a similar check with findById before performing saveAndFlush. But the way this service is actually used, the PUT endpoint may not be an option.

Wrapping the saveAndFlush in a try/catch block does catch the exception, to my surprise. I could try some funky logic to try calling findById again when saveAndFlush fails, but if there's a way to avoid the exception being thrown, I'd prefer that.

Any suggestions would be appreciated!

EDIT: Some more context that may be useful. This microservice runs in a Kubernetes cluster where there can potentially be many instances of this service serving requests concurrently. I'm still researching handling concurrency of multiple instances, and figuring that out isn't something I have to do on my own - my team is developing several microservices like this, we may develop a common library to address such issues for all of them.

EDIT 2: I forgot that as of now, I'm using the H2 database while running the service, not a real database. Might that have something to do with this?

And I'll reiterate, what's happening here is multiple calls to check the database for foo-1 are being made before foo-1 exists yet; because of that, I don't think database locking is going to help me here, because there's no entity to lock on. I thought forcing AddData to happen serially would solve this problem, and I'm completely stumped as to why adding @Transactional(isolation = Isolation.SERIALIZABLE) to AddData isn't doing that for me.

maybe see https://stackoverflow.com/questions/35964692/jpa-optimistic-locking-vs-synchronized-java-method — Scary Wombat, Jul 11 '19 at 00:47
I am having trouble understanding your use-case. Are you saying that the second object should update the one that has just millseconds ago been updated? Are they the same object? Try to synchronize the DB insert seems like code smell to me. — Scary Wombat, Jul 11 '19 at 01:01
@ScaryWombat Yes, that is what is supposed to be happening here. Any `Foo` with the same ID is conceptually the same object. I'm trying to adhere to the REST principle that REST clients don't need to know the state of the server; the client here just has some information for Foo #1, it should be able to POST that information to the server with serial or parallel messages. — Ben Faucher, Jul 11 '19 at 17:38
if foo-1 doesn't exist yet where are the other callers getting the id from? — Nathan Hughes, Jul 11 '19 at 20:12
What if you change the code to use upsert? h2 has a merge statement that does that. that way you can get around having the initial check. — Nathan Hughes, Jul 12 '19 at 13:22

Christian Meyer · Answer 1 · 2019-07-11T16:20:28.097

There are ways to utilize concurrency alongside Jpa in beneficial ways, but all-in-all it is not possible to make Jpa calls concurrently.

Bear in mind, Jpa is relying on class objects such as EntityManager, Session, Connection, etc., that are not Thread-safe. They were designed that way with purpose as to avoid race conditions, deadlock, and all of the issues that can arise from multithreading. That being said, Jpa requires blocking database calls.

Be that as it may, you definitely can implement your business logic concurrently alongside JPA methods for increased performance, which you seem to already know.. I use pools/executors often and still find reason to prefer Jpa over alternatives. In many cases, the time it takes to complete a Jpa operation is very small compared to the time it takes for data-creation, validation, etc. Nonetheless, some compromise is required since each thread in a multi-threaded context will ultimately be need to make a blocking call on the Jpa event loop. So as far as I know, Isolation.SERIALIZABLE seems to already be the most significant step in getting to what you are after.

You may want to look into R2dbc, which is a reactive implementation of JDBC that may help you accomplish what you are trying to do here. It has been in development for a long time now and is nearing release status. Last I heard it is supposed to be done in October, and my team has already begun the conversion in separate branch.

R2DBC is NOT a reactive version of JPA but rather for plain database access so JDBC. — M. Deinum, Jul 11 '19 at 05:41

score -1 · Answer 2 · answered Jul 11 '19 at 00:23

-1

I am not sure how you can achieve using @Transactional, but there is an alternative approach of using synchronized block to resolve your problem. Since you said, your key will be unique based on which you will be finding out the if object already exists, I suggest to use that as a key to synchronize your insert/update block.

I am using String intern to return same object when your key matches.

@Service("fooService")
public class FooServiceImpl {

    @Autowired
    FooRepository fooRepository; // Subinterface of JpaRepository

    @Transactional
    public Long AddData(Foo foo_incoming) {
        Optional<Foo> foo_check = fooRepository.findById(incoming.getId());
        String key = **incoming.getId().intern();**
        Foo foo_exists;
        **synchronized (key)** {
            // Exists already?
            if (foo_check.isEmpty()) {
                // New Foo
                foo_exists = fooRepository.saveAndFlush(foo_incoming);
            } else {
                // Update existing foo
                foo_exists = foo_check.get();
                foo_exists.addToFieldA(foo_incoming.getFieldA());
                foo_exists.addToFieldB(foo_incoming.getFieldB());
            }
        }

        return foo_exists.getId();
    }

}

This post might help you futher. Synchronizing on String objects in Java

answered Jul 11 '19 at 00:23

YoManTaMero

391
4
10

*which has a simple Long for its ID field* - intern that? – Scary Wombat Jul 11 '19 at 00:50
Good point!If it is long, just convert it to String and intern it. – YoManTaMero Jul 11 '19 at 01:23
2

this only works if you have a single app server. fails if you start clustering. also, relying on string intern for locking a common instance is not ideal. – jtahlborn Jul 11 '19 at 01:45
Surely it will flatter on multi server environment, and possibly passimistic locking is the way to go, while designing it for multi server environment. But the question was about synchronization, where he wanted to prevent concurrent access. – YoManTaMero Jul 11 '19 at 09:36
OP stated this was run in a cluster with multiple instances, in which case this can't possibly work. downvoted. – Nathan Hughes Jul 11 '19 at 20:14
@NathanHughes - Original post did not mention about clustering... It was clearly an edit. – YoManTaMero Jul 12 '19 at 01:21
Plus if you read his second EDIT, he is saying forcing AddData to call serially will solve the problem for him. That is what my solution is doing. You downvoted the correct solution. – YoManTaMero Jul 12 '19 at 04:58

Concurrent saves using JpaRepository

2 Answers2