I have a side-project that is basically a React frontend, a Django API server and a Node universal React renderer. The killer feature is its Elasticsearch database that searches almost 2.5M large texts and 200K named objects. All the data is stored in a PostgreSQL and there's some Python code that copies that stuff over to Elasticsearch for indexing.
The PostgreSQL database is about 10GB and the Elasticsearch (version 6.1.0) indices are about 6GB. It's moderately big and even though individual searches take, on average ~75ms (in production) it's hefty. At least for a side-project.
On my MacBook Pro, laptop I use Docker to do development. Docker makes it really easy to run one command that starts memcached, Django, a AWS Product API Node app, create-react-app for the search and a separate create-react-app for the stats web app.
At first I tried to also run PostgreSQL and Elasticsearch in Docker too, but after many attempts I had to just give up. It was too slow. Elasticsearch would keep crashing even though I extended my memory in Docker to 4GB.
This very blog (www.peterbe.com) has a similar stack. Redis, PostgreSQL, Elasticsearch all running in Docker. It works great. One single docker-compose up web
starts everything I need. But when it comes to much larger databases, I found my macOS host to be much more performant.
So the dark side of this is that I have remember to do more things when starting work on this project. My PostgreSQL was installed with Homebrew and is always running on my laptop. For Elasticsearch I have to open a dedicated terminal and go to a specific location to start the Elasticsearch for this project (e.g. make start-elasticsearch
).
The way I do this is that I have this in my Django projects settings.py
:
import dj_database_url
from decouple import config
DATABASES = {
'default': config(
'DATABASE_URL',
# Hostname 'docker.for.mac.host.internal' assumes
# you have at least Docker 17.12.
# For older versions of Docker use 'docker.for.mac.localhost'
default='postgresql://peterbe@docker.for.mac.host.internal/songsearch',
cast=dj_database_url.parse
)
}
ES_HOSTS = config('ES_HOSTS', default='docker.for.mac.host.internal:9200', cast=Csv())
(Actually, in reality the defaults in the settings.py
code is localhost
and I use docker-compose.yml
environment variables to override this, but the point is hopefully still there.)
And that's basically it. Now I get Docker to do what various virtualenv
s and terminal scripts used to do but the performance of running the big databases on the host.