The scenario you described should be covered by optimistic locking using the versioning field. One thing I think I didn’t clarify is that the idempotency-key is more for use for creating new resources - typically a POST/PUT /api/resourceName and less for a PUT /api/resourceName/id.
Check this document out on strong/weak etags and if-match: https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests
Double checking the Requisition I noticed that it looks like we don’t yet use etags. A good alignment here would be to introduce Hibernate’s optimistic locking using a version field as we discussed, and then using that version as a weak etag for the Requisition. This has the benefit that we can then use that version field in the if-match header, and it’s simple for a client with a network connection to send a lightweight HEAD request with that etag to see if they have a stale state.
For Requisition then I think we should:
- Return an etag for the Requisition which is the Version field we talked about for optimistic locking. This is a “weak” etag, so you’ll need a “W/“ in front.
- Use if-match in PUT requests with the previous etag (aka version).
- Use an idempotency-key for create/POST calls such as initiate, approve, submit, etc… (the status changes)
- We shouldn’t use the field modifiedDate for stale state or concurrency checking, the etag with if-match is better suited. However this doesn’t mean we have to remove it from the Response - I won’t presume that no other clients are using this field for other purposes at this point.
As to this point about error responses and idempotency-key, I think I muddied the water with too much of an edge case. When creating a resource, if the previous request failed and now we’re getting a 409 with no Location header set that should be enough to tell the client that the cause is the idempotency-key. A body with an error message for the idempotency-key would clarify it. Either way the client knows to change the key and retry the request to find out why the original request failed. Not ideal but it’s an edge case. If during a create a 409 is returned with a Location header, then we know the previous request succeeded and where to go to load the resource we did create.
I also agree as you pointed out, it’s important to check if the key is used as soon as possible - or more importantly not after a transaction to the DB has been committed. While rolling the transaction back does sound plausible, I think I’d prefer to get a 409 back to the client as soon as possible. This lets the client know what’s happened quickly. i.e. if the API can respond with a 409 ASAP, then as a client I know quickly that that key can’t be used again. If the client has to wait some N seconds till after a bunch of logic checks and I/O occur on the server, that’s N seconds wasted with no feedback occurring over a likely poor network.
The last thing I’d note is that idempotency-key requests should likely be checked after all the authorization steps. It’d be weird to get a 409 for something when in reality I should have gotten a 403 first.
Best,
Josh
···
On 06.06.2018 08:50, Josh Zamor wrote:
Agreed that both idempotent requests and optimistic locking would be needed for OLMIS-4728. And for optimistic locking I have no objection to the incremental version field managed via Hibernate. It sounds like the right approach.
As to v4 UUID’s for the idempotency-key one thought that comes to mind is that it could be enough to have this key cached/stored in the UI’s storage along with the Requisition object when making the request. In this way no more context encoded in the key would be needed. This works in a very similar way for payment API’s - it’s important as a client of the API to remember the idempotency-key you’re trying to use until you know that the key is no longer useful. The key is no longer useful to the client once it hears back from the API. That’s essentially the lifecycle of this ID. The client needs to associate it with an action it’s trying to attempt, and refreshing the page shouldn’t cause the UI to forget that action.
That leads to the question of requiring the key and error codes. I think it’s fine to keep them as optional (but recommended). For the error code the 409 sounds right so long as the server has used the key before. And that brings up a useful detail. If the server responds to a request with a 400, 403, etc I don’t think it’s useful to mark the key as used. If the client doesn’t hear the response, and retries, sending a 409 back masks the original 400 or 403. However if the response is a 200, mark the key as used and send back a 409 with the Location header set to the resource that would have been returned in the original 200’s body (if any). For PUT of course this isn’t really needed, however for POST I think this makes sense.
Best,
Josh
On Jun 5, 2018, at 9:10 AM, Sebastian Brudziński sbrudzinski@soldevelo.com wrote:
Thanks for the input, Josh.
I’d agree that we don’t need to think or consider pessimistic locking for now.
On idempotent requests, that would definitely be helpful as well. I think we need something more than a random UUID generated at the page load as the idempotency-key. I’m not sure yet what would work better and possibly include some context though… I’ll follow up on that. Two immediate thoughts at this point: How would we want to handle the case with two requests and the same idempotency-key? Would we immediately return an error code (eg. 409 Conflict) for the second and following requests? Also, would the idempotency-key be required in the requests or just supported? It looks like Stripe API simply supports idempotency but doesn’t enforce it, which sounds like a good solution for us as well. We could use it on UI, but still give other clients option to call the API without generating random numbers.
It sounds like we would want to implement both optimistic locking and idempotent requests as a part of OLMIS-4728, am I understanding this correctly? As for the optimistic locking solution - are we happy with the incremental version field on the requisition object?
Best regards,
Sebastian.
**
SolDevelo** Sp. z o.o. [LLC] /
www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax:
+48 58 782 45 41
On 02.06.2018 01:02, Josh Zamor wrote:
Thanks for this writeup Sebastian,
I must be forgetting my history here as I had thought we had already implemented optimistic locking in the database. I didn’t realize we attempted it in Java - which might work if someone has a very outdated (likely an old cache on the device) Requisition, but not truly for concurrent requests.
Two things come to mind:
- I find it unlikely that we really have different users trying to update the same Requisition at the same time. Nor that we need pessimistic locking.
- That said we should have solved the “double click” problem which I suppose we thought we did last time, however it looks like we left a gap by not doing what all payment APIs do. The basic idea is to have a form submission/api call send along a random number in the HTTP header which uniquely identifies the request. This request could be the save or submit a Requisition or whatever else it is that the user is attempting to “do once”, but for which they may accidentally do twice (or which our HTTP library might retry if it doesn’t hear back from the network). On the server end we look at that number passed to us and once we see it, we mark it down as being “used”, and we will never accept another request which uses that number. It’s important that this number be remembered outside of the action’s regular database transaction - you don’t want to entertain two requests simply because the first request hasn’t committed yet. An in-memory cache, could be Redis though it’s more than needed strictly speaking, that writes this number down immediately is I think the most effective approach. A quick google search brought up what Stripe does: https://stripe.com/docs/api?lang=curl#idempotent_requests
- I think we should use UUID v4, as we do already for resource IDs.
- We may need to use something with more context than just a random UUID (v4) if we think that the processing of the request is so long that the user could reload the Requisition view, get a new number, and send a new request all while the original request is being processed and the transaction has yet to commit.
- We should also expire the keys/unique numbers, and 24h is probably good for us as well.
I don’t think we need anything more than optimistic locking and creating idempotent requests. The trigger would be useful if we wanted to protect the table against non-hibernate SQL access, however our architecture specifically forbids any other client from using the Service’s database directly and so I think we can skip the trigger.
Best,
Josh
On Jun 1, 2018, at 9:30 AM, Sebastian Brudziński sbrudzinski@soldevelo.com wrote:
Hello everyone,
we have recently started working on OLMIS-4728 which tackles an issue that Malawi has started experiencing after updating to OpenLMIS 3.3. This is once again related to concurrent updates.
In simple words - requisition update can currently overwrite most of the requisition properties (including status) if it’s run concurrently with other requests that change requisition state. This is especially noticeable for updates that take longer time (several seconds). Our current mechanism to prevent concurrent updates is based on the timestamp property that is manually set and validated during updates (the received timestamp must match the database one). This works, but only for a case that doesn’t involve any concurrency. For concurrent requests, this validation can pass for multiple requests and all the updates will be successfully carried, since the timestamps will match until one of the updates commits to the database.
The main problem therefore is that we are validating this at the application logic level. The proposed solution is moving this to the database level and introducing optimistic locking in one of the following ways:
- Use optimistic concurrency control using @Version annotation.
A requisition would have a new property called “version”. This field is automatically updated by Hibernate as it’s modified. Both incremental numbers and timestamps are supported, but Hibernate recommends dedicated version number property (). This essentially adds the version property to the “where” clause of the database update query and in case no rows are affected, a org.hibernate.StaleStateException is returned, indicating the version has changed in the meantime. This ensures that no two updates with the same version will ever be commited into the database. We would still keep our timestamp validation in order to fail faster if the timestamp mismatch is detected. The version-based validation would primarily protect against concurrent updates. This also means it wouldn’t need to be exposed to the REST API. This approach is also well documented here:
- Implement custom post-update trigger
We would implement a custom trigger that is fired after each update on the requisition and validate the timestamps there. This means that we would remove updating the timestamp from the application logic level and move it to the post-update trigger as well (only if the validation passed). This would allow us to re-use the timestamp field that we already have. We could also keep the check on the application level to fail faster if the mismatch is already detected. Custom trigger gives us more flexibility, but may also require more work.
“Why do you have concurrent requests in Malawi?”
I can see this question coming and I’ve been looking/asking for the reason as well. My best guess as of now is a mix of poor Internet connection + users refreshing and re-saving/re-submitting requisitions when it’s taking too long. Whatever the reason is, our API should be able to handle concurrent requests (even due to the fact that multiple users can work on the same requisition).
I’ve also done a basic proof of concept for proposed solution #1 - I’ve added the version field to the requisition, added artificial 5 seconds wait in the update endpoint and was able to get the error as expected. Of course more testing and work is required if we decide to go with this solution.
I’m looking for any feedback you may have concerning the problem itself or the proposed solutions. Can you see any issues with either of the approach? Do you have any preferences? Do you see other solutions?
Thanks,
Sebastian.
**
SolDevelo** Sp. z o.o. [LLC] /
www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax:
+48 58 782 45 41
–
You received this message because you are subscribed to the Google Groups “OpenLMIS Dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
openlmis-dev+unsubscribe@googlegroups.com.
To post to this group, send email to
openlmis-dev@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/openlmis-dev/8469f3a4-b54e-1047-dafd-5097ea2d1094%40soldevelo.com.
For more options, visit
https://groups.google.com/d/optout.
Sebastian Brudziński
Senior Software Developer / Team Leader
Sebastian Brudziński
Senior Software Developer / Team Leader
Sebastian Brudziński
Senior Software Developer / Team Leader
sbrudzinski@soldevelo.com