Skip to content
Worix
BrowsePublish
Log inSign Up

Doc-to-LoRA

Internalize a document into a small language model (Gemma 2 2B) using Doc-to-LoRA so it can answer questions WITHOUT the document in the prompt. Use when the...

46 downloads
Free
Reviewed

Doc-to-LoRA Skill

Internalize any document into a small model's weights in seconds. No fine-tuning loop, no RAG retrieval at query time. The model "knows" the document.

How It Works (30-second summary)

A trained hypernetwork reads your document and instantly generates LoRA adapter weights for every layer of Gemma 2 2B. The adapter is applied to the base model, which can then answer questions about the document without it being in the prompt.

Document --> Context Encoder --> Perceiver --> HyperLoRA --> LoRA weights
                                                                |
                                                    Apply to Gemma 2 2B
                                                                |
                                                    Answer questions (no doc in prompt)

For architecture details, read references/ARCHITECTURE.md in this skill directory.

Prerequisites

Run setup once. This installs dependencies and downloads model weights (~7GB total).

bash ${CLAUDE_SKILL_DIR}/scripts/setup.sh

If setup was already completed, skip this step. Check with:

test -d trained_d2l/gemma_demo && echo "Weights present" || echo "Run setup first"

Workflow A: PyTorch Path (simpler, ~10GB RAM)

Use this when the user provides a document and wants answers.

Step 1: Internalize a document

python ${CLAUDE_SKILL_DIR}/scripts/internalize.py \
  --input "path/to/document.txt" \
  --checkpoint trained_d2l/gemma_demo/checkpoint-80000/pytorch_model.bin

Or pass text directly:

python ${CLAUDE_SKILL_DIR}/scripts/internalize.py \
  --text "Paste the document content here..." \
  --checkpoint trained_d2l/gemma_demo/checkpoint-80000/pytorch_model.bin

Step 2: Ask questions

python ${CLAUDE_SKILL_DIR}/scripts/query.py \
  --question "What is the main finding?" \
  --checkpoint trained_d2l/gemma_demo/checkpoint-80000/pytorch_model.bin

For multiple questions, pass them comma-separated:

python ${CLAUDE_SKILL_DIR}/scripts/query.py \
  --question "Question 1?,Question 2?,Question 3?" \
  --checkpoint trained_d2l/gemma_demo/checkpoint-80000/pytorch_model.bin

Workflow B: MLX Path (faster, ~6GB RAM, recommended for Mac)

Use this for best performance on Apple Silicon. Two-phase: export once, query fast.

Step 1: Export LoRA adapter from document

python scripts/export_d2l_to_mlx_adapter.py \
  --checkpoint trained_d2l/gemma_demo/checkpoint-80000/pytorch_model.bin \
  --context-file "path/to/document.txt" \
  --output-dir adapters_d2l

Step 2: Query with MLX (lightweight, Metal-accelerated)

python ${CLAUDE_SKILL_DIR}/scripts/query_mlx.py \
  --adapter-dir adapters_d2l \
  --question "What is the main finding?"

When to Use Which Path

ScenarioPathWhy
Quick one-off question about a docPyTorchSimpler, no export step
Many questions about the same docMLXExport once, query fast and cheap
RAM-constrained (16GB Mac)MLX~6GB vs ~10GB at query time
Multiple documents to compareMLXExport each, swap adapters instantly

Limitations

  • Base model: Gemma 2 2B only (with released weights). Small model = limited reasoning.
  • Document length: Up to ~6144 tokens (~4000-5000 words). Longer docs are chunked.
  • Training required for new base models: The hypernetwork must be trained (8xA100 GPUs) to support a different base model. Inference is Mac-friendly.
  • Factual recall, not reasoning: Best for "what does the doc say" questions, not deep multi-hop reasoning over the document.
  • No real-time updates: Once internalized, the adapter is static. Change the doc = re-internalize.

Troubleshooting

ProblemFix
ModuleNotFoundError: No module named 'ctx_to_lora'Run setup: bash ${CLAUDE_SKILL_DIR}/scripts/setup.sh
FileNotFoundError: trained_d2l/...Download weights: uv run huggingface-cli download SakanaAI/doc-to-lora --local-dir trained_d2l
RuntimeError: MPS backend out of memoryUse MLX path instead, or close other apps
ImportError: bitsandbytesExpected on Mac. The scripts auto-disable quantization on non-CUDA.
Answers seem wrong / genericCheck if LoRA is applied: outputs should differ from baseline. Try rephrasing.

Example End-to-End

User: "Internalize this Wikipedia article and tell me about the person."

# Save the article
cat > /tmp/article.txt << 'EOF'
Albert Einstein was a German-born theoretical physicist...
EOF

# Internalize + query (PyTorch path)
python ${CLAUDE_SKILL_DIR}/scripts/internalize.py --input /tmp/article.txt
python ${CLAUDE_SKILL_DIR}/scripts/query.py --question "Where was Einstein born?"
# Expected: "Germany" or "Ulm, Germany"

Download

ZIP package — ready to use

Skill Info

Creator
Manojbhat09
Downloads
46
Published
Mar 15, 2026
Updated
Mar 16, 2026