Hello all, I would like a reality check regarding our tech stack and if I’m the one with the wrong expectations. I’ll try to formulate the situation as neutral as possible, but forgive me my bias. Here’s the state:
I work at a Series A fintech company in Europe, mainly on the backend. I did Frontend in the past and would consider myself a Fullstack dev, but have focused on the Backend for the last years.
I joined one year ago, the backend codebase was started roughly 9 months before that by a team of 3 devs. The company focused on hiring senior developers with a lot of industry experience in the beginning (and we only did hire 1 junior and a few mids after that), all of the starter devs have 10+ years industry experience. The CTO had leadership experience of 100+ people. Some of the starting devs are running media content about programming, best practices, etc. CTO has Go experience, rest not very much.
The codebase is built in Go following Clean Architecture using a variety of tools, here’s what they are and what they do:
- Goa: Design first specification of APIs, generates types and interfaces. Just used for HTTP at the moment.
- Gorm: On the database end of the stack we us Gorm as the ORM.
- Wire: DI management framework, We specify what provides dependencies and then wire generates files to link them all up to those that need them. No one in the company really understands it, and everyone hates it.
In general we rely a lot on code generation. Clients for the frontend are generated as well. The layers are also separated. Meaning for each entity we have a “model”, as well as a database implementation and one or more things on the API side. This results in mappers that map: database ↔ “model” ↔ API payload/response.
One of my first PRs was to add a field to the database, model and return it in the API. It took me multiple days to do this and the PRs doing things like that are always ~40 files big. Usually you need to actually touch 5-10 files, mappers, etc. and the rest are generated. PR reviews are not pleasant because of that. After 1 year in that codebase this task would take me 2-3 hours today. It took me 30 minutes on backends I worked before. If this additional field would require to add a new (existing) dependency to the code it will take ~1 day, because wire complicates this by a lot and the codebase is already coupled a lot.
Our business logic is captured in usecases that are called by the service that fulfils the HTTP requests. Refactorings / changes in there, that don’t change any inputs are usually quite easy, because you’re just touching the inside logic. But again, if you need to include another package (could be a domain or a service) that needs to be DIed it will take additional hours or days.
Features that are not adding small stuff to existing things, are usually in the realm of weeks because of all the setting up required in Goa, Gorm, wire and the architecture in general.
The first half year I was under the assumption that I just didn’t get the elaborate structure and features that our codebase had. The further I dug I found that even the developers that created the codebase are unhappy with it, and it’s considered to be a mess generally by all other developer as well.
Since I joined we’re pretty constant with the numbers of engineers which are 11-13, plus 2 EMs and 1 Head of Dev, CTO has left half a year ago. One of the engineers has worked with Go previously.
In the year I’ve been working on it, nothing from an architecture perspective changed, besides other developers introducing event sourcing for one model, that makes everything that relies on this model more complicated in my opinion. Also the codebase became even more coupled, just by adding features.
The business is largely unhappy with the speed we can deliver features. I am as well. It is very unsatisfying to take hours or days to deliver very little. We increased the delivery speed in the last weeks and months at the behest of business leadership but using hacks on top of hacks to deliver at an increased (still slow) speed.
I understand startup pressure is always high and that we will go out of business if we don’t deliver, which is used as a justification why we ended up with this messy codebase. But I have a hard time piecing together why this stack was chosen at the first place, because everything is basically custom, you can’t just google solutions. Also we had very experienced senior engineers building this codebase. It would be another thing if the codebase state was terrible after almost 2 years into a startup if it had been build by a squad of university grads. Connecting the dots between “super experienced starting team” and the state we are in, is where I have a hard time. We’re not building tech platforms or AI models, we’re implementing business logic and flows.
A few months ago we started a new product. I proposed (written, laid out over a few pages) to use another stack to build this in, based on TypeScript, because basically everyone has TypeScript experience, and we use it in the frontend. Not only did the engineers that commented not like it - they were afraid. The head of dev was also heavily against it. The main reasons being that we could not use our already developed tools (which don’t work well for us) and the chance of if this also ends up being a horrible codebase, we have two horrible things to maintain. I backed up and proposed to start a new Go service then, this idea was more welcome, but only if we use the same tools like Goa, that we already have - basically adding all the weight again. This was not done in the end because I could not justify taking extra ramp up speed to get this service up.
So we’re basically in the situation that we can’t do anything new, but the existing codebase is not improving meaninigfully. From my perspective the main problem with the codebase currently being that domains are so coupled that everything needs everything, and that we introduced custom tooling, that make our job harder, and also premature abstractions. Fixing the domain things will take a while and is a big endeavour. Improving all the other stuff, and some refactoring to make things simpler, which I’m constantly pushing for, will make it a little better but won’t fix the main problem.
We had several open discussions over the state of our codebase. Everyone working on it hates it. Head of dev says it’s in an okay state. But nothing significant is happening/moving. I try to move as many things as I possibly can, and try to encourage others to do it as well, as this is my job as a senior engineer. Recently small improvements are being made more often, because delivery pressure is a bit lower than usual. I raised this problems a few times already with our dev leadership. In my opinion we need a strong leader with Go experience to lead us forward, but we don’t have that and I’m not there yet. Head of dev promised a few times that things will be addressed, but I’m seeing very little of it.
I’m not very fulfilled by this job currently, but I’m also not completely hating it, so I’d still like to give this half a year to move this into the right direction.
Here are the reality check questions:
- Is this a normal state to be expected in most companies / teams?
- Am I expecting too much? *
- How did we actually end up in this situation, even before I joined, given the experienced team?
- It seems like it’s not a Go specific problem, but a matter of how set it up and the architecture - is that correct?
- I explored what productive frameworks are out there currently, that would be able to replace our custom Go stack. With Next.js, Remix and Laravel out there it seems a lot of the problems and custom solutions we have, would be covered by those, and it seems like productivity and shipping speed would be a lot higher using on of these frameworks. Maybe this is just wish-thinking and reality is also complex with using those?
And a more productive question: Has anyone been in a similar situation - how did you resolve it?
Didn't read it all and I don't know about Wire but generated code should be in separate commits, making the PR way more readable.
You can always squash them if needed after the code review.
To me, generated code should not be committed at all. Again, I know nothing about this stack but code generators can have different behavior on different machines due to versions, flags and even OS. To deliver consistent results they should run in consistent env. It’s build time concern which CI/CD should take care of.
In general it should not be checked in, but as with everything there are exceptions. If you need it to be deterministic and evaluate all changes to the generated code it can be useful; precisely for the reason you site in opposition. A small change in your build environment can change what was generated. If that isn’t diffed against preceding versions I think we could contrive cases where that would be an issue. Seems sufficient to me to caution that there are always exceptions.
Totally agree, generated code shouldn't be checked in 99% of the time. I'd check it in if it's something like openApi spec file that's generated and then everything else can use that spec file for generating clients and those don't get checked in.
Problem is we depend on all of them in the backend directly, except the clients. So we need to generate them locally. We have CI checking that there is no drift in generation, though.
A build tool like bazel would be able to “hide” the generated code. Essentially you never see it in the repo because it’s generated on the fly and cached