What I wish I'd known about rewrites
I can’t say enough good things about How to Survive a Ground-Up Rewrite Without Losing Your Sanity. Having been party to a few projects like this, a lot of this advice rings true to me. Let me quote you the good parts!
You must identify the business value of the rewrite:
The key to fixing the "developers will cry less" thing is to identify, specifically, what the current, crappy system is holding you back from doing. E.g. are you not able to pass a security audit? Does the website routinely fall over in a way that customers notice? Is there some sexy new feature you just can't add because the system is too hard to work with? Identifying that kind of specific problem both means you're talking about something observable by the rest of the business, and also that you're in a position to make smart tradeoffs when things blow up (as they will).
The danger of unicorn rewrites:
For the Unhappy Rewrite, the biz value wasn't perfectly clear. And, thus, as often happens in that case, everyone assumed that, in the bright, shiny world of the New System, all their own personal pet peeves would be addressed. The new system would be faster! It would scale better! The front end would be beautiful and clever and new! It would bring our customers coffee in bed and read them the paper.
Delivering value incrementally is of the greatest importance:
Over my career, I've come to place a really strong value on figuring out how to break big changes into small, safe, value-generating pieces. It's a sort of meta-design -- designing the process of gradual, safe change.
But “big bang” incremental delivery is accidental waterfall:
False incrementalism is breaking a large change up into a set of small steps, but where none of those steps generate any value on their own. E.g. you first write an entire new back end (but don't hook it up to anything), and then write an entire new front end (but don't launch it, because the back end doesn't have the legacy data yet), and then migrate all the legacy data. It's only after all of those steps are finished that you have anything of any value at all.
Always keep failure on the table:
If a 3-month rewrite is economically rational, but a 13-month one is a giant loss, you'll generate a lot value by realizing which of those two you're actually facing.
I really wish I’d thought of applying “The Shrink Ray”, an idea borrowed from Kellan Elliot-McCrea:
We have a pattern we call shrink ray. It's a graph of how much the old system is still in place. Most of these run as cron jobs that grep the codebase for a key signature. Sometimes usage is from wire monitoring of a component. Sometimes there are leaderboards. There is always a party when it goes to zero. A big party.
Engineer your migration scripts as idempotent, repeatable machines. You’re going to run them a lot:
Basically, treat your migration code as a first class citizen. It will save you a lot of time in the long run.
Finally, you should fear rewrites, but developing the skill to pull them off is huge:
I want to wrap up by flipping this all around -- if you learn to approach your rewrites with this kind of ferocious, incremental discipline, you can tackle incredibly hard problems without fear.
Whenever I talk to people with monolithic applications, slow-running test suites, and an urge to do something drastic, I want to mind-meld the ideas above into their brains. You can turn code around, but it takes time, patience, and a culture of relentless improvement and quality to make it happen.