Abstract
We compare domain-tuned retrievers against leading foundation models using identical budgets: Finance and Clinical corpora at top-K=5 (with and without rerankers), UniProtKB at top-K=2, shared GPT-4o-mini generation, and a t3.large EC2 instance for the local models. The study shows that specializing dense retrievers on production queries, coupled with a cross-encoder reranker, consistently yields higher faithfulness and answer relevancy while reducing catastrophic errors.