09.12.2025

Architecting for AI-native organizations: Lessons from the field

Written by — Santeri Salonen, AI Architect

As an AI architect, I've been involved in a lot of generative AI projects over the past few years. Some worked, however most didn't make it to production. Including several of my own.

The failure rate isn't usually about the lack of technical skills. It's about building the wrong things in the wrong ways.

In this blog, I will share with you what I've learned.

1. Solve Your problems, not generic ones

Avoid: Building custom tools for generic tasks, such as PDF processing, PowerPoint generation, or Excel manipulation.

Do this instead: Focus on processes unique to your work or your industry. Use off-the-shelf products for everything else.

It's tempting to build custom tools for PDF processing, document generation, or Excel manipulation. The problems are well-defined, and we know when we are done.

But wait. Are we now building the same thing as Microsoft? Or Google?

Meanwhile, the workflow that's genuinely unique to our business sits untouched.

The difficult part is figuring out whether this problem is specific to us, or if every other company has this need, too? If it's generic, someone with more resources is likely already solving it. We should save our capacity for problems only we can solve.

2. Question necessity before adding complexity

Avoid: Multi-agent systems by default, shiny new protocols, heavy frameworks, or "AI agents" as buzzword compliance.

Do this instead: Single agent + multiple tools → multi-agent only when forced.

A demo from a major cloud provider. Five specialized agents: router, research, synthesis, quality, and formatting. A travel planning query bounces between agents through message queues and shared state.

47 seconds and six API calls later, we get an answer.

The obvious question: what if we just gave all that context to one agent with the same tools?

We see humans organize into teams with specialized roles, so we assume LLMs should too. But LLMs don't have the same constraints. The model already has whatever knowledge it was trained on. There's no specialist to consult. No one's on vacation.

Real companies treat coordination overhead as something to minimize because coordination cost grows faster than the team size. No one starts a company with an org chart; we add one reluctantly when we have to.

The same applies to architectural complexity.

Recordly's 22nd Hackathon

Photo from Recordly's 22nd hackathon in October 2025.

3. Build prototypes first, architecture second

Avoid: Traditional waterfall, i.e. architect data layer, build processing pipelines, and lock in architecture before seeing user reactions.

Do this instead: LLM-assisted rapid prototyping with mock data and real user feedback.

Kimi Räikkönen once told his team mid-race: "Leave me alone, I know what I'm doing." That confidence was earned through direct experience.

We can't actually know what we're doing until we've put something in front of real users. The question is whether we earn that understanding after a week of prototyping or after nine months of architecture.

A company wants an AI text-to-SQL solution. Business users should be able to ask "how much revenue did we generate last quarter for product X?"

Problem: no data platform. The sales data sits in Salesforce, product data in spreadsheets, and customer data in three different systems. The textbook approach is 6-9 months building a data warehouse first.

Or we could build the UI in a week. Mock up the chat interface. Wire up one or two real queries, and put it in front of actual users.

This way, we learn what questions people actually ask, which ones they care about, and how they phrase things. Maybe 80% of queries only need two data sources, not all fifteen.

A warning though: don't ship PoCs that run on localhost. A demo that can't integrate into real workflows is just theater, not progress.

We can fix the architecture later, but what we can't fix is building the wrong thing.

4. Optimize for learning velocity, not abstraction

Avoid: Heavy frameworks that abstract away LLM interactions and create layers between you and actual problem-solving.

Do this instead: Thin wrappers that keep you close to the LLM API where real value is created.

Generative AI is not a mature field; it's a rapidly evolving one.

In mature fields, we optimize for efficiency. Standards emerge. Best practices solidify. We pick proven frameworks and follow established patterns.

In evolving fields, we optimize for adaptation.

LLM APIs have added a lot in 18 months. Structured chat messages, inline images, inline PDFs, native function calling, structured output schemas, streaming responses. The next 18 months will likely bring just as much.

Consider two teams building a customer support chatbot:

Team A: 200 lines of custom code wrapping the API. When inline PDFs launched, they added 15 lines. When structured outputs arrived, 20 more. Each update took about an hour.

Team B: A popular framework with 100+ dependencies. When inline PDFs launched, they waited three weeks for the maintainers to add support. When structured outputs arrived, they upgraded the framework version, which broke their existing tool-calling code. Two days debugging dependency conflicts.

Team A experiments with new capabilities the day they launch. Team B spends more time fighting their tooling than solving user problems.

The real issue is feedback loop speed. Heavy abstractions slow it down. When something breaks, we debug the framework, not our actual problem. When new capabilities arrive, thick layers make them hard to access. The framework's opinions become our constraints.

The LLM API is where the actual work happens. Every layer between us and that API is friction.

Photo from a client AI workshop earlier this year.

Remember that generative AI Is a tool

Generative AI isn't a magical value creator. It's a tool. And like any tool, it only creates value when we use it to solve our real problems.

Foundation model APIs from OpenAI, Anthropic, and Google handle the hard parts. They understand text, images, and PDFs. They follow instructions, use tools, and produce structured outputs.

The hard parts are everything around the API call; what context to provide, how to structure the interaction, when to ask for clarification, how to handle failures, and what to do when the output is wrong.

These problems don't get solved by waiting for GPT-6 or Claude 5. They get solved by putting working systems in front of real users.

It's tempting to focus on the wrong things.

Fine-tuning our own model sounds cool. But the next generation of foundation models will likely outperform our fine-tuned model out of the box. They'll probably ship before our fine-tuning project completes anyway.

Another trap is adding guardrails until we get deterministic behavior. Enough guardrails and we've rebuilt a rule-based system – just slower and more expensive. The power of an LLM lies in generalization, which essentially is useful hallucination. If we need 100% predictability, we should write code.

My final advice is build simple systems, put them in front of users quickly, learn what actually matters, iterate, and keep learning.