Persistent Storage with Docker in Production - Which Solution and Why?

Hello, I’ve recently started working for a company that wants to break their monolithic SaaS application up into containerized microservices. I’m having a hard time grasping a fundamental part of persistent storage, though. Why are there so many different competing platforms? Portworx, Rexray, StorageOS, Flocker, Inifint, etc.

My Questions

  1. Why wouldn’t someone simply spin up an NFS server and use a hierarchical folder structure there as their storage backend? What gains do you get when using one of these tools?

  2. How dangerous is it to use something like this with Docker? What are the common causes for catastrophic data loss in a docker-based environment?

  3. What persistent storage solution would you recommend and why? My company operates a SaaS platform. The data payloads are small in size (5kb-100kb). Data processing is small-medium in resource consumption. Overall volume is medium, but continues to grow. We’re hoping to completely move our monolithic application to the cloud as separate containerized microservices. Including our data warehouse.

  4. Somewhat unrelated, but it ties in. What are the strengths of using Kubernetes as an orchestrator as opposed to Rancher/Cattle? Isn’t Kubernetes over-engineered for a small-medium sized platform? Are there any strengths to using Kubernetes in Rancher aside from the one-click installation?

Thank you for the insight. Sorry for the naivety. I welcome all documentation and supplemental reading material.

EDIT: For context we are using Azure as our underlying Cloud platform.

I can answer the 2nd point:

Docker is most suited in a micro service based architecture when the application runs inside the containers but the storage or any other live sessions are maintained in shared RAM or the database.

Basically you just should not store anything inside the docker container. There are many reasons to it:

Consider upgrade: Someone from your team has created a newer image of the application and you need the container running with the latest image. The current docker and popular way of doing this is to bring the existing container down and spin a new container with the same run time parameters as the older container but with the newer image. This is one of the biggest reasons why containers should always be stateless and not contain any data. You can have all your data mounted to some place and sessions kept inside a db or something like memchached etc.

One of the big use case of docker is to build clusters. If you start keeping data inside your containers then its an overhead to maintain that data in sync between the application containers.

The docker community in general does not recommend to keep any data in container and hence nobody has tried taking this risk in production and nobody wants to be the first storyteller of how they messed up production.