When to Fine-Tune vs. When to Prompt

The Central Question

You have a task: classify support tickets, write in your brand voice, extract structured data from medical records. Should you fine-tune a model for it, or should you write a good prompt and use a general-purpose model? This question comes up constantly in production AI development, and the wrong answer is expensive either way.

Arguments for Prompting (Start Here)

Prompting should be your default. The reasons:

No training cost: A good prompt costs API calls to develop. Fine-tuning costs GPU hours plus your engineering time.
No deployment overhead: You use the same API endpoint. Fine-tuned models require separate serving infrastructure.
Updatable: Changing a prompt takes minutes. Updating a fine-tuned model requires a new training run.
Composable: A well-prompted model can handle multiple tasks; a fine-tuned model is optimized for one.

A surprising number of tasks that seem to require fine-tuning can be handled with thorough prompting: clear instructions, few-shot examples, schema definition, and output validation. Before fine-tuning, spend a week on prompt engineering.

When Fine-Tuning Makes Sense

Fine-tuning is worth the investment when:

Quality ceiling: You've exhausted prompt engineering and can't reach acceptable quality. Fine-tuning often closes the remaining gap.
Latency: Prompts that include many few-shot examples use many tokens. Fine-tuning "burns in" examples, reducing prompt length and latency.
Cost at scale: If you're making millions of API calls, a smaller fine-tuned model may be dramatically cheaper than a large prompted model.
Proprietary format: Teaching a model your specific data format (an unusual citation style, a proprietary schema) is much more reliable via fine-tuning than prompting.
Sensitive data: If sending data to an external API is unacceptable, you must use a self-hosted fine-tuned model.
Style/voice: Adapting to a specific writing style (a brand voice, a person's communication style) works better with fine-tuning than prompting for subtle differences.

The Decision Framework

Simple decision tree:

Have you tried a thorough prompt with 5+ examples? If not, do that first.
What's the quality gap? If prompting gets you to 85% of your target quality, fine-tuning may close the remaining 15%. If prompting gets you to 40%, re-examine your task definition.
What's your volume? Under 100K calls/month, prompting is almost always cheaper. Over 1M calls/month, calculate the cost crossover.
Can you tolerate 2-4 week iteration cycles? Fine-tuning experiments take time. If you need to iterate fast, stick with prompting.

Prompt + Fine-Tune Hybrid

The dichotomy is false. Many production systems fine-tune a model for the task's domain and format, then use prompting to handle task-specific instructions that change across requests. The fine-tune handles stable, high-value adaptations; the prompt handles dynamic, request-specific customization.