POC + 1
After the POC, what comes next?
At this point you need to manage the project.
This is also called project management.
Features On Demand
At the beginning of this process, we laid out all the products that would have to be transitioned to our new system.
We ranked these items by their mission criticality, owners risk tolerance, product lifecycle, etc.
This allows us to figure out what product to transition first:
… Maybe let’s not start with the mission critical product with no flex.
Starting development from least impactful to most impactful is an exercise in project management.
With the least impactful service, we could literally take our initial POC and add the minimum layer of features overtop it.
We deployed this to a team that were gung-ho about upgrading- who had the opportunity to be gung-ho about tech upgrades since they were still in the prototype phase of their product lifecycle.
We’d take an env, add whatever missing feature was needed, then move on to the next.
With each step-up in impact, so too would the features needed.
We’d build the features for the product we’re migrating AS WE’RE MIGRATING IT
Immediately we would move on to the next product, and start layering on the next suite of features required for the more complex product.
POC + 1 turns into POC + 2, all built directly for each product being migrated.
The project management side of it is layering the next set of features into the capacity of the team-
- X number of team members
- The estimate for the next layer of features
- The sum total gives you an idea of how much time the entire project will take.
Sure it's voodoo math, and can't be taken too seriously, but it does tell you whether you have a shot in hell.
More importantly, the exercise gave an itemized list defining which products to upgrade and migrate, and in what order that should be done in.
This is actually just what project management is, apparently.
But it was an absolutely critical step, and without it the project surely would have failed.
Incur Technical Debt to Pay Technical Debt
Some of you may have caught on to the trap we laid for ourselves here.
If we have one product on POC + 1, and another product on POC + 2… shouldn’t we turn around and update the first product to be POC + 2?
The long-term answer is yes, but the time budget for this project says no.
The result was right off the bat we are incurring new technical debt, each product we work on being on the bleeding edge until we move to the next product.
That team that was gung-ho on upgrades? Ended up being on oldest version of our new system for over a year.
There’s probably a lesson in there.
And It would take a full year post-migration to pull everything up into a consistent state.
But during the migration, we had to be laser-focused on getting all the critical products migrated before D-day.
- Project Management is useful, actually?
- Managing breadth and depth of a project requires careful planning
- Add features as impact/scale increase
- Technical Debt can be good too
- Its like a credit card, something that is incredibly enabling as long as you pay the bill
- Effective project management informs your tech debt budget
Gifts At Gunpoint
Moving from one product team to the next wasn’t exactly a smooth ride.
We had stack ranked each product in terms of company impact- which meant as we moved up the chain, the risk tolerance of each new product team would be lower than before.
In short order we collided head-first with some product teams.
The core issue: a deep conflict in development processes.
The deliberate, steady development workflow of our flagship product?
Meet the rough-n-ready build-as-you-go infrastructure migration project.
Apparently saying things like “we'll build as we migrate” and “the timeline is last week” did not sit well with the team managing the primary profit-center for the entire damn company.
What resulted was a series of… tense… meetings.
Previously our team would be extremely accommodative of our flagship product team- they’re making the money after all.
All of sudden we’re coming in hot and telling them they have no choice, we’re doing it now, and of course its not tested (it’s not even complete yet).
One of the things I wish I had done better during this process was empathize with their team more.
It likely would have been much easier to convince them if I had “helped” them discover the solution…
…As long as the solution was the one our team had already decided on.
Instead, I took a much more glib approach of “Too bad, eat infrastructure.”
This felt expedient, and it can be hard to be polite in the trenches.
And it resulted in some burnt bridges.
I expended a ton of personal political capital dragging the project past the finish line.
This meant a rocky relationship with the flagship product team, and took time to rebuild the trust between our teams.
That being said, once we got past the whole thing our teams started working much better together.
Short timelines forced fast feedback cycles- and having a more equal power dynamic- allowed us to build solutions that enabled both teams to move quickly and safely.
- Empathy is a Force Multiplier
- The only regret I have during this project is I didn’t empathize with others enough.
- Things likely would have been smoother if the product team was on-side.
- Nothing angers people like having leverage clumsily applied to them.
- Tailor your Approach to the Customer
- Different teams have different priorities and processes
- Embrace their processes
- As long as you get what you want
- can lead to better working relationshipsHard Conversations
- People may not like being stood up to, but they will respect it.
- You can't build trust in an org without earning the respect of the org.
Despite the graceless consensus building, we pushed through and were successful.
To be fair, Google did send an email 2 months before the Deprecation Day stating that they would extend the deadline- something that wasn’t totally unexpected.
But we had to plan for the worst, and things were (naturally) taking longer than expected- we were in the process of stack ranking which products we could let slip without too much impact to the business.
In the end, over 700 servers migrated, within 6 months, on a system that was built on the fly.
Did I mention that this was all occurring just as a global pandemic sent everyone home?
And that meant being locked in a two bedroom apartment with a two year old(!)
The real cost of this project was the human kind, in the form of our familiar old friend burnout.
We lost team members in the aftermath, and I feel like I'm just to getting back to a healthy state 3 years on.
Thankfully, we were all given the opportunity to recover.
The new system worked, it worked well- if a bit clunky in some places.
But we had architected a solution that didn’t require a ton of hand-holding once implemented- something very much the opposite of the previous system.
This allowed us to concentrate on smoothing out the rough patches, as well as addressing the technical debt we had incurred during the migration.
But most importantly, it was at whatever pace each of us was comfortable with.
From the start, I knew this was going to be a “long-term short-term solution”.
So we built a system that didn't really need any major changes after the fact.
After the initial migration we added a ton of automation- implementing CICD systems through the tech stack, hardening systems security, improving monitoring and alerting throughout the stack.
Since the migration, the amount of toil/fire-fighting we had to do dropped drastically, allowing us to work on improvements like automation.
Don't Do What Donny Dont Does
Look, Technical Debt can be difficult to manage.
But if you don’t manage it, you run the risk of having things start to seize.
In our case, it seized in a most spectacular fashion.
Through luck, perseverance, a great team & a shit-ton of work, we were able to navigate our infrastructure when our technical debt went to collections.