Cloud LLM bills add up. For personal automation and sensitive workflows, I run Ollama locally and orchestrate with n8n + Python.
Architecture (Job Hunt Automation pattern)
n8n cron → Next.js API → Python ingest → Ollama tailor → Gmail PDF → Postgres
- Discover jobs via Playwright
- Filter in dashboard
- Compose cover letters + emails with local model
- Track applications and replies
When local wins
- Secrets stay on your machine (encrypted git-crypt)
- Unlimited iterations during prompt tuning
- No per-token anxiety for batch jobs
When cloud wins
- Highest reasoning quality for architecture decisions
- Multimodal (screenshots, PDFs) at scale
- SLA-backed APIs for customer-facing agents
Hardware reality (Mac / Linux)
- 8–16GB RAM: 7B–8B quant models for drafting
- 32GB+: 14B models for better instruction following
- Always measure time-to-first-token vs cloud latency
Takeaway
Hybrid LLM strategy: Ollama for volume and privacy; GPT/Claude for customer-facing agent peaks. Engineer the router — don't pick one religion.



