A bit of background:
We have a test infrastructure that is fairly large and has grown over a number of years to the state it's in today. Like any piece of code that has grown and extended and morphed over time, it's gotten a bit crufty in areas.
A couple of engineers wanted to add a new feature to the test framework, so that tests could set different timeouts for setup, test run, and teardown. As they started to get into it, they discovered that adding the feature would be a lot easier if they could do some refactoring as well.
So they did. At this point they were pretty deep in the test infrastructure code, and touching a lot of things that are used by a lot of tests.
We walked in the next day to a hundred or more machines that all leaked out of tests. Tickets were getting autologged constantly, no one could reserve machines, and it took a few hours to clean it up.
Blowups happen, and in the land of disasters this was pretty darn minor. Our automated tests had one bad night, and our defect tracking system and reservation code got bit of a workout. That was it. No customers were affected, no releases slipped, and no smoke emerged from the lab.
I'd rather see my team tackle the big problems and occasionally fail (and failure really doesn't happen that often) than have them be afraid to try things. It's important to go in there and refactor code that's getting crufty. It's important to extend and enhance the test infrastructure. Let's not let fear of breaking something get in the way of that.
It's okay to fail. Most of the time you will succeed, and in any case, it's better to fail than to not try.