Our team recently lost a critical member suddenly and unexpectedly. This has caused a great deal of consternation as he was the glue that held our build system together. For years my company has paid short-shrift to devops, figuring that good enough was good enough.
This developer was hired originally to do devops for the team, but, as he had desire and skills to move into a developer role, he was fairly quickly spending half of his time (at least) writing code. I had worked with him over several years on getting the build system running along, only requiring maintenance on irregular intervals and, frankly, ignoring the odd collection of servers on which we were running.
There were always fires to put out or new products to get off the ground and to customers, sitting down and actually planning out our infrastructure never seemed all that exciting or important.
I'm sure many of you know how this goes. With that key critical piece of the team gone, we've been scrambling to figure out who owns which servers (this is a part of a large corporation, we've got servers owned and managed by at least three different entities), what, exactly, each one does, and what login credentials we have.
I'm not blaming the other developer, as I'm also to blame. Frankly the entire lab shares some responsibility as well. That's not really important at this point. What is important, however, is learning from the mistake and getting our system not only back up and running, but also documented and understood to the point that we can hire someone to manage it and improve it.
There are always excuses for not getting the build (or the project, or the test harness) documented, and sometimes those excuses are valid. Unfortunately there frequently comes a time when you have to pay for that technical debt.
Here's hoping that I personally and my team in general will learn and grow from this experience and find the right balance between moving quickly and ensuring our systems and designs are stable.