When Your Verify Script Becomes the Bottleneck

TL;DR

A verify script is supposed to preserve confidence, not consume the day. When the command grows past an hour, the answer is not to weaken it. Refactor it like production code: measure first, split it into gates, cache repeated reads and scans, add timing, support safe partial diagnostics, and ensure any resume path proves freshness before reusing evidence. The full verify command should remain the final authority.

The first version of a verify: script usually feels clean. It runs the checks you care about. It gives you one command to trust. It turns “the model said it fixed it” into “the system still behaves the way we said it should behave.”

That is the right instinct. But then the project grows.

The script starts checking documentation readiness, package freshness, runtime evidence, dependency governance, workflow safety, UI proof, source-input freshness, external blockers, schema compatibility, and regression coverage. Each check is reasonable. Each one protects something real.

Then one day, the command takes more than an hour.

At that point, the verifier has a new failure mode: people stop running it.

That is dangerous because the slow verifier is still carrying product knowledge. It knows which warnings are blocking, which artifacts are stale, and which evidence can be trusted. The fix is not to delete checks, hide failures, or turn blocking issues into warnings. That makes the command faster by making it less honest.

The better fix is to treat the verifier as a product.

Start with measurement. Capture a cold baseline before changing behavior. How long did the full run take? Which gates existed? How many recursive scans happened? How often did you parse the same JSON files, hash the same inputs, or spawn nested PowerShell processes? If you do not measure first, you are guessing.

Next, modularize the script. The top-level command should become an orchestrator, not a thousand-line bag of gate logic. Each gate should have one job: return a normalized result, declare its source inputs, and report its timing. The final report can still preserve the same fields, formulas, and readiness semantics.

Then add a real cache. Most verifier slowness is repeated work: reading the same files, walking the same directories, parsing the same package files, hashing the same evidence, and rechecking the same workflows. A process-local cache does not weaken the verifier. It just prevents the verifier from proving the same fact 10 times in a single run.

The next improvement is diagnostic partial runs. After changing one area, you should be able to run only the relevant gate and its dependencies, rather than the whole suite. But partial runs must stay honest. They should write partial artifacts. They should never refresh the canonical final report. They should never claim release readiness.

Resume support follows the same rule. Reusing a previous gate artifact is fine only when freshness is proven: the same relevant source inputs, the same verifier fingerprint, the same toolchain assumptions, the same Git state, and fresh dependency artifacts. If any of that changed, rerun the gate.

Finally, make runtime performance itself part of readiness. The verifier should report slowest gates, slowest operations, cache stats, subprocess durations, and timeout behavior. A verifier that cannot explain where its time went is not production-ready.

The goal is not merely a faster script. The goal is a verifier that people will actually run, with the same strictness and clearer evidence. Moving fast is useful. Moving fast with a one-hour command nobody wants to run is just delayed drift. The verify script keeps the vibes honest only if it stays usable