Approaching performance improvements for resources with large data


(Sebastian Brudziński) #1

Hi everyone,

I’m starting this topic in response to the recent work related to performance improvements that we are doing. I’d like to discuss the options that we have and should consider when it comes to performance improvements to endpoints that return very large datasets. A good example are FacilityType Approved Products. Per non-functional requirements analysis, a country may be managing up to 10000 FTAPs in a single program, so that’s a number of them that requisition initiate is supposed to fetch. A single FTAP contains a whole representation of an orderable (+ ProgramOrderables) and facility types with all related resources. This is making the representation even of a one approved product quite big.

Of course, the long term plan is to have the orderables versioned and cached on the UI. This is a valid approach, but since it is a solution that we won’t be able to implement in the nearest future, let’s not focus on that in this topic. Let’s discuss what options we have to deliver the best performance boost in the next months (v3.6 - v3.7).

One thing that we are currently doing is adding the expanded resource pattern (http://docs.openlmis.org/en/latest/conventions/codeStyleguide.html#restful-interface-design-documentation) This helps us limit the size of the response, while still allowing clients to get the full representation where needed (or only parts they are interested in).

Of course, we are also looking into any other query improvements/optimization that we can make, but the biggest problem with resources like FTAPs is the size of the response, the time required to convert that response into JSON, send it over the wire and then deserialize on the client side. All of this takes a considerable amount of time if the representation is several megabytes big.

What I’m looking forward to in this topic are any additional solutions/ideas that we can consider for the upcoming weeks, and that we can deliver in the performance work.

Thanks!
Sebastian.


(Elias Muluneh) #2

Hi Sebastian,

I played with Initiate requisition workflow for Essential medicines program and observed a few things that may be of interest here. It looks like there may be some issues specific to functionality than cross cutting issues. Looking at issues case by case may help. This is not exhaustive but I think the following ideas are worth considering. Please take a look at the API interactions for initiate R&R here:

  1. The first API call to initiate R&R returns 8.1mb response. However, it does not seem there is any use for this data. When the page redirects to the requisition page, another call is made to get the same data. Would it help if one of the calls does not return the full requisition object?

  2. Some API calls like the first call are being served without compression. The second request seems to have been compressed and the data transferred is 1013kb. The first call transferred the json without compression. Compressing all json responses may decrease the time it takes to transfer data especially where the connection is not good.

  3. Some responses contain data that could benefit from using a different representation or normalization/denormalization. Consider using something like https://github.com/paularmstrong/normalizr to expand such objects. Consider the following example. All 19 objects in the array contain a copy of the same schedule object, that could have been represented as 1 object and denormalized on the client side using normalizr

  1. Consider long datatype for identifiers instead of UUID. This might be too bold and not sure how feasible this is but changing from UUID to Long may reduce the response size. As an experiment, I tried to replace all UUID fields in the 8.1mb response above with 6 digit numbers. The file size reduced to 6mb. I think this is significant reduction in response size.

Thanks,
Elias


(Josh Zamor) #3

Thank you @Elias_Muluneh,

I’ll followup with a second post on overall thoughts, however I first wanted to respond to you directly for clarity:

I could have sworn that we’ve fixed this before however I’m not finding the ticket. Agreed we shouldn’t make the round-trip twice.

Agreed, we have a ticket for this we can work on: https://openlmis.atlassian.net/browse/OLMIS-4531

Normalizr is interesting, thanks for this, though I think your point is the de-normalized data coming from the server is part of the problem. From our start we’ve done a bad job at returning normalized REST data. We’ve elevated normalized REST resources in our conventions, however I think we’re still falling into this trap by leaning too much on either doing this directly, or relying on the Expanded Resource Pattern. I think we need to settle our UI caching approach, and start moving toward normalized REST resources. Normalizr looks like it could be useful for utilizing these normalized responses on the client side, though I’d leave that to those that are working with that code-base everyday.

I remember running this experiment early on and I believe compression made the difference between UUID and JSON numbers very small. My experiment didn’t apply to an 8MB file, so perhaps I’m remembering incorrectly for scale. Have you tried running this through compression and seeing if the gap is closed considerably?


(Josh Zamor) #4

Thank you @Sebastian_Brudzinski for this excellent start.

When I think about what we’ve accomplished in OpenLMIS performance wise, which to fully reflect on could use it’s own post, I think there’s two distinct avenues that’ll lead to the best returns today.

Shared Dictionaries

A shared dictionary is a good analogy for what we get when we make REST resources normalized & cachable. It forms the basis of how to have a shared definition of reference / meta data, without including it in every response. Or implementing complicated expanded representations which give flexibility to the developer, but make it easier to shoot ones self in the foot by overly focusing on representation size vs network overhead introduced in multiple calls.

Are we sure about this @Sebastian_Brudzinski? What makes achieving a shared dictionary of Products/Orderables not viable in 3.6 nor 3.7? As @Elias_Muluneh’s experiment with different ID formats hints at, removing the entire Orderable from the Requisition would be a significant savings in bandwidth. I think we need to put achieving a shared dictionary on the table for 3.7, at the very least.

Server and UI storage latency

Long ago we decided against solving our performance issues by introducing a caching layer between the service and its database. We had good reasons for this at the time (complexity introduced and low hanging fruit in using our ORM correctly chief among them). However I think we’re now at the point where we’ve reduced most of the low hanging fruit and we’re (just about) ready to start using a caching layer. For the Java code a few of the last barriers seem to be:

  • We have Redis, however I’m I think our configuration of using Redis in Spring Boot 1.X could be improved. Last I looked support in Spring Boot 2 was significantly different/better than what we get in 1.x. Is there a path forward that gets us some of the benefits of 2.x, without having to do the upgrade today?
  • We need good exemplars of how caching should be utilized. We need one for reads and updates of single resources which should be easy enough, and another for caching of search results which is more involved.
  • Conventions written down for pointing to these exemplars, and specific rules to avoid issues (e.g. Redis isn’t a method for cross-service communication).

On the UI side I believe we still haven’t achieved:

  • A ubiquitous storage mechanism that can scale to our needs for large Requisitions and many shared dictionaries (e.g. 10k products). I believe we’ve had a false start with localStorage, and have made some attempts with IndexedDB. v1 and v2 of OpenLMIS used IndexedDB, it’s about time we solve our localStorage problem.
  • A ubiquitous library for managing shared dictionaries. A shared dictionary will become out of date, the library should help in refreshing that dictionary when needed. A shared dictionary ideally would support incremental updates, and here also this library would help support that. We should be figuring out how to support this concept of a ubiquitous library in the UI architecture, and here that architecture has been in limbo for almost a year. I think we need to refocus the UI architecture vision on the shorter term, and a way to do that is these performance needs that aren’t being met.

Managing different DTOs using feature flags
(Elias Muluneh) #5

Thanks for the response @joshzamor

I tried to gzip the file with number identifiers and the result is 489K. The 8MB file was 1013K compressed.


(Josh Zamor) #6

Thanks for following up @Elias_Muluneh, that’s more of a difference than I thought I’d see. Since the space for UUID’s is something like 2^122, we’re clearly paying for that in bandwidth. How many bits did you set aside for your numbers?


(Sebastian Brudziński) #7

Thank you both!

We have managed to collect a lot of good ideas. There are only 2 sprints remaining until the code freeze for 3.6, so I didn’t think that any bigger changes, like the shared dictionary or JSON normalization, will make it on time. We can definitely talk about their priorities when planning for 3.7 though.

I’m happy to update that we have started looking into application level caching this sprint though. It would be cool to have a pattern for this for 3.6 release, at least for single resource retrieval, and comparing the improvements we can get with this approach. If this works fine, we can continue research on the more complex caching for searches.

There are also some smaller issues we should consider looking into before the release - nginx compression for POST requests or unnecessary double requisition retrieval on the initiate.

Thanks again for replying, this is a great starting point for future performance work!

Best,
Sebastian.