July 1, 20265 min read

How to Make a Local QLoRA Starter Fail Safely

A local QLoRA starter should prove data, GPU safety, metrics, tests, and blockers before it claims progress. Here is the small loop I use on owned hardware.

#local AI #QLoRA #RTX 5090 #fine tuning #5090 reports

Share LinkedIn

A local fine-tune should not get credit because a Python process started.

It gets credit when it proves the data path, resource limits, metric output, tests, and blockers. That is the bar I want for owned hardware. The model run can fail. The run log should not lie.

Summary: On July 1, 2026, my Phase 3 5090 starter loaded a verified 150-row Phase 2 dataset, ran with TRUNCATE=1 and SCALE=5, wrote 5 metric rows, and passed 7 tests. It did not claim a full model improvement. The point was a safe QLoRA starter that leaves proof and records why the real train is still blocked.

fail-safe local QLoRA starter loop

What problem does a local QLoRA starter need to solve?

Local model work fails in boring ways. The dataset path is wrong. A checkpoint is partial. GPU memory spikes. A script appends stale metrics. A test passes against toy rows instead of the real training data.

That is why my first goal is not a big benchmark. It is a starter loop that fails safely. If the model load works, great. If it does not, the run should still leave a clean metric row, a clear fallback label, and a blocker I can fix later.

What should the first run prove?

The first run should prove the data is real and the output count is exact.

For my July 1 Phase 3 run, the loader read the Phase 2 dataset with 150 verified rows and provenance. The direct run used TRUNCATE=1 and SCALE=5. That made success measurable: exactly 5 metric rows written to the metrics file.

That is much better than "training started." A started run can still be junk. A counted output gives the next verifier something to check.

How do I keep owned hardware safe?

Put resource checks in the script, not in a sticky note.

In my starter, _check_resources() calls nvidia-smi before the heavy load and before each iteration. The script waits if GPU memory is above 72 percent or temperature is above 82 C. Those numbers fit this box and this run. Your thresholds should fit your card, cooling, and risk tolerance.

The important part is not the exact number. The important part is that the limit is code. The run logs the check before it touches the expensive path.

Why is fallback not cheating?

A fallback is honest when it is labeled.

My starter attempts the Unsloth QLoRA path. If model load or train fails, it writes stub_qlora metrics and logs the fallback. That is not a claim that the model improved. It is a claim that the pipeline survived, wrote proof, and recorded why the real train is blocked.

That matters for unattended work. A crashed script can hide the useful failure. A labeled fallback keeps the queue moving while preserving the truth.

What tests matter before I trust the run?

Test the things that would make the claim false.

For this starter, the test suite checks the real dataset load, the expected row count from SCALE=5, CLI environment behavior, row integrity, sync hashes, no shadow files, and current scratch logs. The useful number was not "many tests." It was 7 tests aimed at the actual failure modes.

If a future change breaks the data path, metric count, sync, or proof log, the starter should fail loudly before I trust the run again.

What should the public claim say?

The public claim should be smaller than the ambition.

For this run, the honest claim is "starter complete." It is not "fine-tune complete." The blockers are still real: model weights, CUDA and WSL limits, memory fit, and the full train path. The artifact should say that plainly.

That is the difference between publishing progress and publishing theater. A local AI report can be useful before the model is fully trained if it tells the truth about what passed, what failed, and what proof exists.

What is the smallest useful version?

Use one dataset, one capped runner, one metrics file, and one test command.

Start with a tiny scale. Make the output count exact. Add GPU checks before the expensive call. Write a fallback row only if it is visibly marked. Then run the tests from a clean shell.

Do that before you tune rank, batch size, learning rate, or export format. If the starter cannot prove 5 safe rows, it has no business claiming a larger run.

Accompanying prompt

What the prompt does: It turns a local fine-tuning idea into a small fail-safe starter with proof, resource checks, and honest blockers.

Copy/paste this prompt:

Copy-ready prompt

Paste the exact block into your coding agent.

No article chrome, no footnotes, no formatting drift.

Role: You are designing a fail-safe local QLoRA starter for owned hardware. Context: Hardware: [GPU, VRAM, cooling notes] Dataset: [path, row count, provenance, privacy limits] Current goal: [what the first run should prove] Task: 1. Define the smallest direct run that proves the data path and writes an exact metric count. 2. Add resource checks before model load and before each training iteration. 3. Define the fallback behavior if model load or training fails. 4. Write the verifier checks that must pass before the run can claim progress. 5. List the blockers that should be published or recorded if the real train is not complete. Output: - A one-command starter run. - A metrics schema with required fields. - A test checklist tied to the claim. - A blocker list with owner, next action, and proof needed. Constraints: - Keep the first run small. - Use exact file paths and row counts when available. - Label any stub or fallback metric clearly. - Do not claim model improvement without a verifier result.

31 lines1020 chars

Ready

Copy the block above.

If you are running local agents or training loops that can spend tokens while you are away, put hard rails around them. AgentGuard sets budget, token, and rate limits for agent runs. Install it with pip install agentguard47, or read the docs at https://bmdpat.com/tools/agentguard

Get the local AI lab notes

Benchmark rows, VRAM fit checks, quant choices, and what actually runs on consumer GPUs. M-F, only when there is something worth sending.

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.