March 20, 2026

The AI Engineering Stack I Actually Use

A pragmatic look at what tools, models, and patterns are worth the hype in 2026 — from someone building things with them daily.

AI engineeringtoolsLLMsstack

Every week there's a new model or framework claiming to change everything. Most of it is noise. Here's what's actually in my stack after building AI-driven projects for the past two years.

Models

Claude for most generation tasks — coding assistance, content, structured output. The reasoning capabilities on complex, multi-step problems are consistently better than alternatives I've tried.

Gemini Flash for high-volume, latency-sensitive tasks where cost matters. The speed-to-quality ratio is hard to beat for things like classification or summarization at scale.

Local models via Ollama when I need offline capability or privacy guarantees. Mistral 7B and Llama 3 cover most of my local needs. The gap with frontier models is real but narrowing fast.

Frameworks

I've tried most of the orchestration frameworks. My current opinions:

Anthropic SDK / OpenAI SDK directly — Just use these. They're good, they're maintained, and you won't spend hours debugging which layer of abstraction swallowed your tool call.

Structured output — I do almost all structured extraction with JSON mode or tool use rather than parsing unstructured text. Much more reliable.

Dev environment

Claude Code — AI-assisted coding inside the terminal. I use it for most things I'd have reached for a search engine for. It's most useful for tasks with clear context: refactoring a specific function, generating boilerplate, explaining unfamiliar code.

Antigravity — VS Code-based IDE by Google. Familiar environment, clean integration with the Google ecosystem, fast to set up.

GitLab — Version control and CI/CD in one place. The built-in pipelines handle a lot: testing, building, deploying — no external services needed. For solo projects, often more than enough.

Claude Cowork — For everything around data preparation and research. Structuring, summarizing, making sense of information — before the actual development process starts. Saves a lot of time on groundwork.

Higgsfield.ai — AI-generated background graphics and videos. Get to usable visuals fast without spending hours in image editing.

The key thing I've learned: keep your context tight. A small, focused conversation gets better results than a long conversation with lots of accumulated context.

What I'd change

If I were starting fresh today:

I'd spend more time on evals earlier. The hardest part of AI engineering isn't building the first version — it's knowing when it's actually working well enough to ship.
I'd be more skeptical of RAG as a default answer. Vector search + embedding is powerful, but it's also easy to build something that seems to work in demos and fails in production.
I'd design for model upgrades from the start. Models improve fast. If your system is tightly coupled to a specific model's quirks, every upgrade is a migration project.

The field moves fast. What I wrote here will be at least partially wrong in six months.