Quest for Best practices after Configure Experiences / Crash Recovery

Hello:
I'm starting a new day looking at a damaged multisite install. Not sure how I got here, but I'm darned sure I don't want to be here again. The crashed site is one problem, but it illustrates bigger issues:

  • Why can't I fix it?
  • Why did it break in the first place?
  • What should I do on the future to prevent this kind mess?

My errors:

  1. I believed that if the database was okay the install was okay. Silly me!
  2. I overlooked that the software backup must be snapshot-consistent with the DB
  3. I overestimated the stability of wordpress subject to plugin changes
  4. I assumed that wordpress diagnostic facilities would enable me to isolate and resolve problems.

So... I'm either missing something basic, or I need to go back to the drawing board. I probably need to solve both problems.
Wish List:

  • Some way to diagnose & fix issues by isolating code paths, variables & such
  • A sandbox utility to quickly and easily clone a production environment to test upgrades before going live
  • Ideally two server instances with the ability to flip from one to the other

Specific Failures:

  • define('WP_DEBUG', true); was worse than useless. It spewed thousands of depricated function warnings on a home page refresh (still running) into the log (not good for my confidence);
  • AND... It didn't present any information that seemed relevant to the dead console issue;
  • The utter absence the abiltiy to get or create a stack trace, or install diagnostic messaging, leaves me stranded;
  • My attempts to create a trace with a previously installed logger failed - Wouldn't chirp to browser console most of the time. I'm guessing the utility depends on reaching a "page-footer", which never happens because the do_action hits a null hook, and wipes out.
  • Repeated code in themes, plugins & such confounds my sadly meager ability to predict which code path may happen.

So the problem set is full circle: "don't know", "can't guess" and "can't discover".
Specific Questions:

  1. Is there some debug/diagnostic facility that I've missed?
  2. Is there an easy way to clone an instance for testing?
  3. How should I go about diagnosing my dead console?