Storing demo data in microservice images

Hey everyone,

There was some discussion from this morning’s technical committee meeting about demo data for each microservice. Currently each service’s demo data is stored in its own GitHub repository (and built Docker image) and this is what was proposed going forward. This seemed to make the most sense, so that:

  • Each service could “own” its own data
  • Demo data could be source and version controlled with the service (as the service evolves and moves to new versions, the demo data would evolve with it and be “stamped" to the same version)

However, some valid concerns were raised in the technical committee:

  • GitHub only allows file sizes of up to 100MB, which would be encountered quite quickly, so a workaround would need to be found (Git LFS?)
  • Having large files in a GitHub repository would make things like git clone potentially time-consuming, especially for developers with slow Internet connections
  • Storing large files into the built Docker image would make the image large as well, which could make pulling images time-consuming as well

Feedback/discussion is appreciated about this approach and these concerns.

Shalom,

Chongsun

– ​

There are 10 kinds of people in this world; those who understand binary, and those who don’t.

Chongsun Ahn | chongsun.ahn@villagereach.org

Software Development Engineer

Village****Reach* ** Starting at the Last Mile*

2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA

DIRECT: 1.206.512.1536 **CELL: **1.206.910.0973 FAX: 1.206.860.6972

SKYPE: chongsun.ahn.vr

www.villagereach.org

Connect on Facebook****, Twitter** ** and our Blog

Maybe to Malawi team can elaborate more, but from what I recall being unable to succeed with a ‘docker pull’ was one of the bigger issues they faced on site.

My thoughts:

  • 100 mb limit - Git LFS is something worth looking into

  • Large files in git - this issue is generally solved using submodules by codebases struggling with large size. Submodules have their pitfalls however.

  • Large docker images:

  • can we use the same mechanic we use for extension points here? Implementers not interested in demo data could not use the companion images with demo data

  • can we save space by compressing the files? They can be unpacked on container start (if demo data is enabled)

Regards,

Pawel


SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

···

On Tue, May 29, 2018 at 9:06 PM, Chongsun Ahn chongsun.ahn@villagereach.org wrote:

Hey everyone,

There was some discussion from this morning’s technical committee meeting about demo data for each microservice. Currently each service’s demo data is stored in its own GitHub repository (and built Docker image) and this is what was proposed going forward. This seemed to make the most sense, so that:

  • Each service could “own” its own data
  • Demo data could be source and version controlled with the service (as the service evolves and moves to new versions, the demo data would evolve with it and be “stamped" to the same version)

However, some valid concerns were raised in the technical committee:

  • GitHub only allows file sizes of up to 100MB, which would be encountered quite quickly, so a workaround would need to be found (Git LFS?)
  • Having large files in a GitHub repository would make things like git clone potentially time-consuming, especially for developers with slow Internet connections
  • Storing large files into the built Docker image would make the image large as well, which could make pulling images time-consuming as well

Feedback/discussion is appreciated about this approach and these concerns.

Shalom,

Chongsun

– ​

There are 10 kinds of people in this world; those who understand binary, and those who don’t.

Chongsun Ahn | chongsun.ahn@villagereach.org

Software Development Engineer

Village****Reach* ** Starting at the Last Mile*

2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA

DIRECT: 1.206.512.1536 **CELL: **1.206.910.0973 FAX: 1.206.860.6972

SKYPE: chongsun.ahn.vr

www.villagereach.org

Connect on Facebook****, Twitter** ** and our Blog

You received this message because you are subscribed to the Google Groups “OpenLMIS Dev” group.

To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.

To post to this group, send email to openlmis-dev@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/B1818EA5-AFFC-4247-9590-2CA93A914F9F%40villagereach.org.

For more options, visit https://groups.google.com/d/optout.


Paweł Gesek

    Technical Project Manager

     pgesek@soldevelo.com / +48 690 020 875

The more I reflect on this the more I’m reverting back to previous thinking: keep demo data and “performance data” separate. I think we do need to incorporate more of what performance data is today into demo data by making it better (i.e. reasonable and less random). I also think we need this approach that you’re exploring Chongsun for what we don’t really have today: GB (okay maybe hundreds of MB) of data for more robust demos and visualizations. This voluminous set of data I’d propose to focus on “line item data”: stock card line items, requisition line items, order line items, and so on. This line item data will be measured in the hundreds of thousands at least, and perhaps even in the millions.

If we re-orient back to this thinking we’d keep demo data in git, version controlled and just large enough to “define the world” (e.g. Facilities, Products, Requisitions, smaller sets of line items) just enough for most casual uses. We’d then cleanup and redefine what is performance data to become our desired very large set of line item data. In general it’d live outside of git, and more importantly outside of the Service’s git repo and docker hub image. We’ll build and publish it separately so that for those demos / developers etc that’d need very large sets of line item data, they could load it in place/on top of the demo data. I’d think our performance testing, UAT and demo systems would typically be loaded with this set of data.

This would ensure we don’t burden developers and implementers with always being required to download this large demo data set, and it’s also apparently what the OpenMRS community has done as well. And it would ensure we focus this technique where it’s needed: very large sets of line item demo data.

Perhaps we should name it something other than “performance data” however. Very Large Demo Data? Anyone have a better name?

Best,

Josh

···

On Wednesday, May 30, 2018 at 4:30:11 AM UTC-7, Paweł Gesek wrote:

Maybe to Malawi team can elaborate more, but from what I recall being unable to succeed with a ‘docker pull’ was one of the bigger issues they faced on site.

My thoughts:

  • 100 mb limit - Git LFS is something worth looking into
  • Large files in git - this issue is generally solved using submodules by codebases struggling with large size. Submodules have their pitfalls however.
  • Large docker images:
  • can we use the same mechanic we use for extension points here? Implementers not interested in demo data could not use the companion images with demo data
  • can we save space by compressing the files? They can be unpacked on container start (if demo data is enabled)

Regards,

Pawel

On Tue, May 29, 2018 at 9:06 PM, Chongsun Ahn chongs...@villagereach.org wrote:

Hey everyone,

There was some discussion from this morning’s technical committee meeting about demo data for each microservice. Currently each service’s demo data is stored in its own GitHub repository (and built Docker image) and this is what was proposed going forward. This seemed to make the most sense, so that:

  • Each service could “own” its own data
  • Demo data could be source and version controlled with the service (as the service evolves and moves to new versions, the demo data would evolve with it and be “stamped" to the same version)

However, some valid concerns were raised in the technical committee:

  • GitHub only allows file sizes of up to 100MB, which would be encountered quite quickly, so a workaround would need to be found (Git LFS?)
  • Having large files in a GitHub repository would make things like git clone potentially time-consuming, especially for developers with slow Internet connections
  • Storing large files into the built Docker image would make the image large as well, which could make pulling images time-consuming as well

Feedback/discussion is appreciated about this approach and these concerns.

Shalom,

Chongsun

– ​

There are 10 kinds of people in this world; those who understand binary, and those who don’t.

Chongsun Ahn | chongs...@villagereach.org

Software Development Engineer

Village****Reach* ** Starting at the Last Mile*

2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA

DIRECT: 1.206.512.1536 **CELL: **1.206.910.0973 FAX: 1.206.860.6972

SKYPE: chongsun.ahn.vr

www.villagereach.org

Connect on Facebook****, Twitter** ** and our Blog

You received this message because you are subscribed to the Google Groups “OpenLMIS Dev” group.

To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/B1818EA5-AFFC-4247-9590-2CA93A914F9F%40villagereach.org.

For more options, visit https://groups.google.com/d/optout.


Paweł Gesek

    Technical Project Manager


     pge...@soldevelo.com / +48 690 020 875


SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

Hey all,

Some updates:

  • Git LFS does seem possible in using GitHub to store large files; I’ve tried it locally and on the build servers. However, it would probably require developers to install the tool on their dev machines, which is an extra step.
  • Docker Hub does seem to compress images that are pushed, which mitigates the size somewhat (stock management went from 244MB to 531MB, but on Docker Hub, the compressed size went from 131MB to 197MB), but as the demo data size balloons, compressed size may still be an issue.

Shalom,

Chongsun

– ​

There are 10 kinds of people in this world; those who understand binary, and those who don’t.

Chongsun Ahn | chongsun.ahn@villagereach.org

Software Development Engineer

Village****Reach* ** Starting at the Last Mile*

2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA

DIRECT: 1.206.512.1536 **CELL: **1.206.910.0973 FAX: 1.206.860.6972

SKYPE: chongsun.ahn.vr

www.villagereach.org

Connect on Facebook****, Twitter** ** and our Blog

···

On May 30, 2018, at 9:30 PM, > josh.zamor@openlmis.org wrote:

The more I reflect on this the more I’m reverting back to previous thinking: keep demo data and “performance data” separate. I think we do need to incorporate more of what performance data is today into demo data by making it better (i.e. reasonable and less random). I also think we need this approach that you’re exploring Chongsun for what we don’t really have today: GB (okay maybe hundreds of MB) of data for more robust demos and visualizations. This voluminous set of data I’d propose to focus on “line item data”: stock card line items, requisition line items, order line items, and so on. This line item data will be measured in the hundreds of thousands at least, and perhaps even in the millions.

If we re-orient back to this thinking we’d keep demo data in git, version controlled and just large enough to “define the world” (e.g. Facilities, Products, Requisitions, smaller sets of line items) just enough for most casual uses. We’d then cleanup and redefine what is performance data to become our desired very large set of line item data. In general it’d live outside of git, and more importantly outside of the Service’s git repo and docker hub image. We’ll build and publish it separately so that for those demos / developers etc that’d need very large sets of line item data, they could load it in place/on top of the demo data. I’d think our performance testing, UAT and demo systems would typically be loaded with this set of data.

This would ensure we don’t burden developers and implementers with always being required to download this large demo data set, and it’s also apparently what the OpenMRS community has done as well. And it would ensure we focus this technique where it’s needed: very large sets of line item demo data.

Perhaps we should name it something other than “performance data” however. Very Large Demo Data? Anyone have a better name?

Best,

Josh

On Wednesday, May 30, 2018 at 4:30:11 AM UTC-7, Paweł Gesek wrote:

Maybe to Malawi team can elaborate more, but from what I recall being unable to succeed with a ‘docker pull’ was one of the bigger issues they faced on site.

My thoughts:

  • 100 mb limit - Git LFS is something worth looking into
  • Large files in git - this issue is generally solved using submodules by codebases struggling with large size. Submodules have their pitfalls however.
  • Large docker images:
  • can we use the same mechanic we use for extension points here? Implementers not interested in demo data could not use the companion images with demo data
  • can we save space by compressing the files? They can be unpacked on container start (if demo data is enabled)

Regards,

Pawel

On Tue, May 29, 2018 at 9:06 PM, Chongsun Ahn > > chongs...@villagereach.org wrote:

Hey everyone,

There was some discussion from this morning’s technical committee meeting about demo data for each microservice. Currently each service’s demo data is stored in its own GitHub repository (and built Docker image) and this is what was proposed going forward. This seemed to make the most sense, so that:

  • Each service could “own” its own data
  • Demo data could be source and version controlled with the service (as the service evolves and moves to new versions, the demo data would evolve with it and be “stamped" to the same version)

However, some valid concerns were raised in the technical committee:

  • GitHub only allows file sizes of up to 100MB, which would be encountered quite quickly, so a workaround would need to be found (Git LFS?)
  • Having large files in a GitHub repository would make things like git clone potentially time-consuming, especially for developers with slow Internet connections
  • Storing large files into the built Docker image would make the image large as well, which could make pulling images time-consuming as well

Feedback/discussion is appreciated about this approach and these concerns.

Shalom,

Chongsun

– ​

There are 10 kinds of people in this world; those who understand binary, and those who don’t.

Chongsun Ahn | chongs...@villagereach.org

Software Development Engineer

Village****Reach* ** Starting at the Last Mile*

2900 Eastlake Ave. E, Suite 230,
Seattle, WA 98102, USA

DIRECT: 1.206.512.1536 **CELL: **1.206.910.0973 FAX: 1.206.860.6972

SKYPE: chongsun.ahn.vr

www.villagereach.org

Connect on Facebook****, Twitter** ** and our Blog

You received this message because you are subscribed to the Google Groups “OpenLMIS Dev” group.

To unsubscribe from this group and stop receiving emails from it, send an email to
openlmis-dev...@googlegroups.com.

To post to this group, send email to
openlm...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/openlmis-dev/B1818EA5-AFFC-4247-9590-2CA93A914F9F%40villagereach.org
.

For more options, visit
https://groups.google.com/d/optout
.


Paweł Gesek

Technical Project Manager

pge...@soldevelo.com
/ +48 690 020 875

**

SolDevelo** Sp. z o.o. [LLC] /
www.soldevelo.com

Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland

Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

You received this message because you are subscribed to the Google Groups “OpenLMIS Dev” group.

To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.

To post to this group, send email to
openlmis-dev@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/openlmis-dev/0d75425b-14a2-46c9-b439-d5d0e34beae0%40googlegroups.com
.

For more options, visit https://groups.google.com/d/optout.