Developing with Docker (danielquinn.org)

submitted 1 year ago* (last edited 1 year ago) by danielquinn@lemmy.ca to c/python@programming.dev

19 comments fedilink hide all child comments

I've been writing code professionally for 24 years, 15 of which has been Python and 9 years of that with Docker. I got tired of running into the same complications every time I started a new job, so I wrote this. Maybe you'll find it useful, or it could even start a conversation, but this post has been a long time coming.

Update: I had a few requests for a demo repo as a companion to this post, so I wrote one today. It includes a very small Django demo user Docker, Compose, and GitLab CI.

top 19 comments

sorted by: hot top controversial new old

[-] jacksilver@lemmy.world 8 points 1 year ago

Hey OP, it looks like you're the author of the post? If so I'm curious how you handle cloud services like AWS or Azure when taking this approach? One of the major issues I've run into when working with teams is how to test or evaluate against cloud services without creating an entire infrastructure in the cloud for testing.

[-] danielquinn@lemmy.ca 9 points 1 year ago

It's a tough one, but there are a few options.

For AWS, my favourite one is LocalStack, a Docker image that you can stand up like any other service and then tell it to emulate common AWS services: S3, Lamda, etc. They claim to support 80 different services which is... nuts. They've got a strange licensing model though, which last time I used it meant that they support some of the more common services for free, but if you want more you gotta pay... and they aren't cheap. I don't know if anything like this exists for Azure.

The next-best choice is to use a stand-in. Many cloud services are just managed+branded Free software projects. RDS is either PostgreSQL or MySQL, ElastiCache is just Redis, etc. For these, you can just stand up a copy of the actual service and since the APIs are identical, you should be fine. Where it gets tricky is when the cloud provider has messed with the API or added functionality that doesn't exist elsewhere. SQS for example is kind of like RabbitMQ but not.

In those cases, it's a question of how your application interacts with this service. If it's by way of an external package (say Celery to SQS for example), then using RabbitMQ locally and SQS in production is probably fine because it's Celery that's managing the distinction and not you. They've done the work of testing compatibility, so theoretically you don't have to.

If however your application is the kind of thing that interacts with this service on a low level, opening a direct connection and speaking its protocol yourself, that's probably not a good idea.

That leaves the third option, which isn't great, but I've done it and it's not so bad: use the cloud service in development. Normally this is done by having separate services spun up per user or even with a role account. When your app writes to an S3 bucket locally, it's actually writing to a real bucket called companyname-username-projectbucket. With tools like Terraform, the fiddly process of setting all this up can be drastically simplified, so it's not so bad -- just make sure that the developers are aware of the fact that their actions can incur costs is all.

If none of the above are suitable, then it's probably time to stub out the service and then rely more heavily on a QA or staging environment that's better reflective of production.

[-] jacksilver@lemmy.world 3 points 1 year ago

I appreciate the response!

I've definitely used tools like LocalStack before and when it works it's great, but sadly doesn't usually provide a 1-to-1 replacement.

Seeing your different approaches is helpful and I will have to see what elements I can pull into my current projects!

[-] rhacer@lemmy.world 5 points 1 year ago

This was a great read. Thank you!

[-] fubarx@lemmy.ml 4 points 1 year ago

Good stuff.

A few things I'd change:

A CLI to simplify local vs docker vs cloud operations. Reduces chance of operator error. Have had good luck with python click library.
Moving config settings into separate JSON and .env files to avoid loading too many config and secrets in the docker-compose file.
For AWS, I'd go with CDK. That way, cloud deployment is all in python (or typescript).
For cloud, you can also package Django into a single lambda, with dependencies inside a lambda layer. Not sure I'd use it in heavy production, but for small apps, really handy.
Inside Django settings, you can switch DB and services whether running local (sqlite, Redis), docker (postgres, RabbitMQ), or cloud (RDS, SQS).

[-] danielquinn@lemmy.ca 1 points 1 year ago

I don't mean to be snarky, but I feel like you didn't actually read the post 'cause pretty much everything you've suggested is the opposite of what I was trying to say.

A CLI to make things simple sounds nice, but given that the whole idea is to harmonise the develop/test/deploy process, writing a whole program to hide the differences is counterproductive.
Config settings should be hard-coded into your docker-compose file and absolutely not stored in .json or .env files. The litmus test here is: "How many steps does it take to get this project running?" If it's more than 1 (docker compose up) it's too many.
Suggesting that one package Django into a single Lambda seems like an odd take on a post about Docker.

[-] fubarx@lemmy.ml 1 points 1 year ago

OK, you wanted a conversation… :-)

I did read the post, but I assumed it was the starting point of a system or mechanism, not the end-point. Wanting to just run "docker compose up" is fine, but there is more to developing and deploying to production (and continuing post-launch).

That's why I mentioned the CLI. It lets you go from a simple local app (Django on sqlite) to a Docker one (postgres, celery, redis, etc.), to all the way out to the cloud (ECS/EKS/serverless lambda/RDS), without having to remember what commands do what or managing lots of separate docker-compose files.

I can see we are VERY far apart on how docker should be used in moving toward a production-ready system.

For one thing, recommending putting secrets inside docker-compose is an instantly disqualifying piece of advice. There's a whole 'secrets' section of docker compose that is there to prevent people from inadvertently including those in cleartext and baking them into images: https://docs.docker.com/compose/how-tos/use-secrets/.

Github itself has a secret scanning mechanism to prevent leakage: https://docs.github.com/en/code-security/secret-scanning/introduction/about-secret-scanning. For gitlab, there's also Blackbox or HashiCorp vault. Putting AWS key/secret inside a repo can be VERY expensive and open one to legal liability if the account is misused. Repeated infractions could lead to AWS banning one's account.

I really recommend you take down that part of your post, instead of proliferating bad practices.

As for the rest, to each their own.

[-] danielquinn@lemmy.ca 2 points 1 year ago

I feel like you must have read an entirely different post, which must be a failing in my writing.

I would never condone baking secrets into a compose file, which is why the values in compose.yaml aren't secrets. The idea is that your compose file is used exclusively for testing and development, where the data isn't real, and the priority is easing development. When you deploy, you don't use that compose file because your environment is populated by whatever you use in production (typically Kubernetes these days).

You should not store your development database password in a .env file because it's not a secret. The AWS keys listed in the compose are meant to be exactly as they are there: XXX, because LocalStack doesn't care what these values are, only that they exist.

As for the CLI thing, again I think you've missed the point. The idea is to start from a position of "I'm building images" and therefore neve have a "local app, (Django, sqlite)" because sqlite should not be used unless that's what's used in production. There should be little to no difference between development and production, so scripting a bridge between these doesn't make a lot of sense to me.

[-] ALERT@sh.itjust.works 3 points 1 year ago

requesting RSS feed for your blog. thank you

[-] danielquinn@lemmy.ca 2 points 1 year ago* (last edited 1 year ago)

High praise! Just keep in mind that my blog is a mixed bag of topics. A little code, lots of politics, and some random stuff to boot.

[-] ALERT@sh.itjust.works 3 points 1 year ago

from my 15 years of RSS experience, a typical feed contains not much more than 5% of useful information, so I am prepared for filtering, no worries :)

[-] __init__@programming.dev 2 points 1 year ago

Nice. You should check out devcontainers if you haven’t already. Maybe it deviates a little from the dev/prod parity idea, but you can use it with a compose file like you described. It’s saved my current team quite a bit of headache in maintaining local dev environments and keeping everyone in sync as the project evolves.

[-] sip@programming.dev 1 points 1 year ago* (last edited 1 year ago)

are you me?

instead of that partial thing at the top I extend a base one into web, worker, test and build (simulate CI step).

[-] onlinepersona@programming.dev 1 points 1 year ago

I either missed it or it isn't in the "developer tools" section: how do you connect this to an IDE or editor with an LSP or DAP? The image might have python:3.12 but locally you only have python:3.6 mind you, so it's not something one can ignore. How do you handle this?

Anti Commercial-AI license

[-] dallen@programming.dev 2 points 1 year ago

In VS code these should work through the Remote-Containers flow, just like they do through Remote-SSH.

[-] mapto@lemmy.world 0 points 1 year ago

It is not realistic to replicate a production setup in development when you're working with sensitive user data. I've worked in different contexts (law enforcement, healthcare, financial services) where we've had complicated setups (in one instance including a thing called pre-staging environment), but never would a sizeable team of developers have access to user data, and thus to a realistic setup in terms of size, let alone of quality of data.

[-] danielquinn@lemmy.ca 1 points 1 year ago

It sounds like you're confusing the application with the data. Nothing in this model requires the use of production data.

[-] mapto@lemmy.world 0 points 1 year ago

Just trying not so confuse realistic testing with self-deception :) Not convinced testing with synthetic data can pretend to be similar to a production environment.

[-] danielquinn@lemmy.ca 1 points 1 year ago

But there's nothing stopping you from loading realistic (or even real) data into a system like this. They're entirely different concepts. Indeed, I've loaded gigabytes of production data into systems similar to what I'm proposing here (taking all necessary precautions of course). At one company, I even built a system that pulled production into a developer-friendly snapshot while simultaneously pseudo-anonymising that data so it can be safely (for some value of ${safe}) be tinkered with in development.

In fact, adhering to a system like this makes such things easier, since you don't have to make any concessions to "this is how we do it in development". You just pull a snapshot from the environment you want to work with and load it into your Compose session.