,

Problem Frame: AI Agents


It’s almost an afterthought that organisations need AI enablement, especially in the era of agent systems. There is too much information on the subject—from CNBC to your not-so-favourite AI evangelist. It’s hard to understand the solutions and where they fit within your organisation. In this guide, I tackle one of the most challenging aspects of deploying an agentic solution at your firm.

  • What are the options?
  • Where do they fit?
  • What are the costs?
  • Will my data be safe, or will it be used for cross purposes by the solution provider?

This guide works through three steps:

  1. Clear the organisational gates that determine which tools are even eligible
  2. Understand what is out there in the landscape
  3. Apply a decision tree to the eligible tools based on your specific use case.
Executive takeaway Use this guide to prioritise high-ROI workflows and avoid “tool-first” experimentation.The eight gates are a leadership control system: they reduce data/compliance risk, vendor sprawl, and runaway spend before anyone builds.Adopt a standard operating model: 2-week pilot, baseline first, then scale only when results are measurable.
🏢  ENTERPRISE / LARGE CORP IT security review required for any tool outside the approved tenant. Start every evaluation with Gate 1: can the approved stack (Power Automate, Copilot Studio) deliver this? Only proceed to new vendors when the answer is genuinely no.🚀  SMB / FREELANCER / INDIVIDUAL No vendor approval process — any tool is theoretically available. The gates still apply (cost, data handling, ownership, evidence), but they are self-imposed disciplines rather than organisational requirements. Speed of iteration is a genuine competitive advantage.

Step 1 — Clear the Organisational Gates

Before looking at any tool, answer the eight questions below. In an enterprise context, failing any one of these gates means the automation is not ready to build — regardless of how capable the tool is. For SMBs, these are the disciplines that prevent the most common failure modes.

Rule of thumb:  Choose the lightest tool that can reliably deliver the outcome. Do not start from a tool and work backwards to a problem.
GateQuestion to AnswerWhat to Check
1 — Approved StackCan this be delivered within the existing tenant (e.g. Power Automate / Copilot Studio), or does it require a new vendor and governance work?Enterprise: start here. A new vendor means IT security review, procurement, legal, and change management — weeks to months of lead time. Use the approved stack until it genuinely cannot deliver the outcome.   SMB: no approval process, but match to existing subscriptions first to avoid unnecessary tool sprawl.
2 — ConnectivityDo the target systems have reliable interfaces (REST API, connector, MCP), or will you need a legacy fallback (adapter / RPA / Computer Use)?Assess every system in the chain. A missing or broken interface for even one step can make the entire automation non-viable. Legacy systems (SOAP, no API) require an adapter layer or GUI automation — both add fragility and cost. Check the MCP marketplace before building custom connectors.
3 — Data HandlingWhat data classes are involved (PII, customer data, confidential IP), and where will prompts, logs, and traces live?Prompts sent to a cloud LLM contain whatever context you include. Ensure sensitive data classes are permissible under the tool’s data processing terms. Confirm where logs and traces are stored — these often contain prompt content. Enterprise: map to your data classification policy before any build.
4 — Vendor RiskWhat are the training and data-retention terms of each tool, and what compliance exposure do they create?Review the provider’s DPA (Data Processing Agreement). Understand whether prompt data is used for model training, how long it is retained, and in which regions it is processed. For regulated industries (finance, healthcare, legal), this gate often rules out consumer-tier tools entirely.
5 — CostWhat is the expected cost per run and per month, and how will you cap and alert on spend?Agentic tools have variable costs — a single complex task can consume $0.50–$2+ in tokens. At volume (50+ runs/day), costs compound quickly. Define a per-run budget, a monthly ceiling, and a spend alert before launch. Deterministic tools (n8n, Power Automate) carry near-zero incremental token cost.
6 — ReliabilityWhat is the failure mode (vendor outage, auth expiry, API change), and what is the fallback plan?Every external dependency is a failure surface. Define what happens when the LLM API is unavailable, when an OAuth token expires, or when an upstream API changes its schema. For production automations, the fallback plan is as important as the happy path. Human-in-loop gates are often the right answer.
7 — OwnershipWho operates, updates, and debugs the automation after launch?The person who builds the automation is rarely the person who maintains it. Name an owner before launch. Define what happens when it breaks at 2am on a Sunday, when a dependency changes, or when the original builder leaves. Enterprise: this must go through change management. SMB: assign it explicitly even if it’s you.
8 — EvidenceWhat will you measure (time saved, error rate, cycle time), and what baseline proves it is better than today?Without a pre-automation baseline, you cannot demonstrate ROI. Measure the manual process now: how long it takes, how often it fails, what it costs in staff time. Define what success looks like before building, not after. This also becomes your eval baseline for the agent itself.

Step 2 — The Agent Landscape: What Is Actually Out There

Having cleared the organisational gates, the following table maps the tools that are most commonly discussed in 2026. The landscape is noisy — new frameworks appear monthly — but most production use cases are still served by a small set of proven options.

I am aware that by the time you read this, there may be yet another agentic paradigm designed to confuse you. A good rule of thumb is to recognise that these are workflows—whether they are managed in the cloud or developed locally. It is perfectly feasible to create an agentic workflow from scratch, using only connections to large language models.

This is not an exhaustive list of agentic solutions—just a curated selection of high-signal options.

Enterprise note:  Columns marked “Enterprise (IT review)” require a new vendor approval. Start with what is already cleared and only evaluate these when the approved stack cannot deliver.
ToolSolution TypeTypeWhat It DoesOrg FitKey Trade-off
Power Automate + Copilot StudioRPA and Agent WorkflowDeterministic + AgenticMicrosoft’s native automation platform. Power Automate for rule-based flows; Copilot Studio for reasoning steps. Deep M365 integration.Enterprise ✓✓ SMB (M365 only)Pre-approved in most M365 tenants. Limited outside the Microsoft ecosystem.
n8nOpen-Source WorkflowDeterministicOpen-source workflow automation. Hundreds of native connectors. SaaS or self-hosted. The cheapest orchestration layer available.SMB ✓✓ Enterprise (IT review)Best connector coverage. Self-hosted = IT security review in enterprise. Near-zero per-run token cost.
Claude Cowork / Computer UseGUI-based automation where a program or AI agent operates a computer like a human would  Agentic / LocalAI agent that operates your local desktop — reads files, fills forms, navigates apps. Only viable path for systems with no API. Also use in sandbox environment to manipulate legacy (e.g. mainframe)SMB ✓✓ Enterprise (case-by-case)Fragile when UI changes. Subscription limits apply for high-volume use. Enterprise: data handling gate is critical.
OpenClawLocal agent runtime (self-hosted)  Agentic / LocalOpen-source equivalent of Cowork. Full control over the runtime, model, and data residency. Runs 24/7 without a laptop. Not recommended for enterprise due to security concerns.SMB / Freelancer ✓✓Higher technical setup bar. No managed hosting. Best for privacy-sensitive or always-on use cases.
Perplexity ComputerCloud agent (managed execution)  Agentic / CloudCloud agent optimised for long-running web research and information gathering tasks. Set-and-forget model.SMB ✓ Enterprise (vendor review)Less customisable for complex multi-step logic. Good for research-heavy tasks that do not touch internal systems.
Claude Managed AgentsManaged agent platform  Agentic / CloudAnthropic’s hosted agent runtime. Built-in audit logs, RBAC, spend controls, and tracing. Direct API for production volume.Enterprise ✓✓ SMB (if scale warrants)Best governance posture for enterprise. Higher cost at Opus tier. Requires direct API contract for production use.
CrewAIAgent framework (multi-agent orchestration)  FrameworkPython framework for multi-agent workflows with defined roles (researcher, writer, reviewer). Fast to prototype.SMB ✓✓ Enterprise (dev team)Good for role-based pipelines. Less mature state management and persistence. Lighter than LangGraph.
LangGraphAgent framework (stateful orchestration)  FrameworkProduction-grade Python framework for stateful, multi-step agent workflows. Supports human-in-loop, branching, retries, and persistence natively.Both (dev team req.)The most robust choice for complex production agents. Steepest learning curve. Works with any LLM and any connector.
Hybrid is the practical winning pattern.  Use a deterministic tool (n8n or Power Automate) for the predictable 80% of the workflow and add a single agentic layer for the variable 20%. This is cheaper, more reliable, and easier to debug than a fully agentic approach.

Step 3 — Frame the Problem

With eligible tools narrowed by the gates and understood from the landscape, answer four questions to identify the right fit. This step eliminates most remaining ambiguity.

1What exactly is the output? (e.g. “daily sales report from 6 apps,” “analyse 20 PDFs and write an executive summary,” “fill legacy forms + email approvals”)
2How often does it run, and how variable are the steps? Predictable / repeatable every time → deterministic (n8n / Power Automate) Steps change based on data or context → agentic (reasoning + tools)
3Where does the work happen, and does each system pass the connectivity gate? Local desktop / files / apps → Computer Use Cloud / web services with REST APIs or MCP → Managed Agents / n8n Legacy / no API → adapter layer or Computer Use fallback (flag the fragility)
4Who owns and maintains this long-term? Non-technical team / enterprise IT → no-code / approved stack first Dev / engineering → code frameworks
Write the answer in one sentence: “Every weekday, extract data from a legacy Windows app [no API — Computer Use, flagged as fragile], run web research, create a PowerPoint, and email it — steps vary slightly each time.”

Step 4 — Feasibility Gate Check

Before building, run the eight gates as a final go / no-go. Any gate that cannot be answered or that hits its blocker condition is a reason to pause — not to deprioritise and proceed.

GateQuestionBlocker Condition
Approved StackCan the approved stack deliver this, or is a new vendor required?New vendor → flag the governance work and timeline before committing.
ConnectivityDoes every system in the chain have a usable interface?No interface → Computer Use fallback or adapter. Document the fragility.
Data HandlingAre data classes and prompt destinations within policy?PII or confidential IP → confirm DPA and data-residency before build.
Vendor RiskAre training/retention terms acceptable for the data involved?Unacceptable terms → use a different provider or on-prem model.
CostIs the per-run and monthly cost modelled with a cap and alert in place?No cost model → do not go to production. Agents have variable, open-ended costs.
ReliabilityIs there a documented failure mode and fallback plan?No fallback → add a human-in-loop gate or accept the outage risk explicitly.
OwnershipIs a named owner assigned for ongoing operation and debugging?No owner → do not launch. Automation without ownership degrades silently.
EvidenceIs a baseline measured and a success metric defined?No baseline → you cannot prove the automation is better than today.

Cost Rule of Thumb

Cost ItemReality Check (April 2026)
Deterministic tool (n8n, Power Automate)Near-zero incremental per-run cost. Fixed subscription or self-hosting overhead only.
Agentic (Haiku / Sonnet tier)$0.05–$0.50 per complex task. Viable at high volume with the right routing.
Agentic (Opus tier)$0.50–$2+ per complex task. At 50+ runs/day this is $500–$2,000+/month. Requires an explicit monthly cap.
Middleware / legacy adapterMuleSoft, Boomi, n8n Cloud, or iPaaS licensing adds its own cost layer for legacy integrations. Model separately.
OptimisationRoute simple steps to Haiku, complex reasoning to Sonnet/Opus. Use prompt caching. Deterministic sub-flows for the predictable 80%.

Quick-Start Checklist

Use this as the go / no-go gate before any build begins:

  1. Identify org type (Enterprise or SMB/Freelancer).
  2. Gate 1 — Approved Stack: can the existing tenant deliver this? If yes, stop here and use it.
  3. Gate 2 — Connectivity: map every system to be touched. Document any gaps.
  4. Gate 3 — Data Handling: identify data classes and confirm prompt destinations are within policy.
  5. Gate 4 — Vendor Risk: review DPA and training/retention terms for any new tool.
  6. Gate 5 — Cost: model per-run and monthly cost. Set a cap and a spend alert.
  7. Gate 6 — Reliability: document the failure mode and the fallback plan.
  8. Gate 7 — Ownership: name the operator before launch, not after.
  9. Gate 8 — Evidence: measure the manual baseline now. Define what success looks like.
  10. Review the landscape table. Strike out any tools that fail the gates.
  11. Answer the four problem-framing questions. Write a one-sentence summary.
  12. Follow the decision tree on the eligible tools.
  13. Run the feasibility gate check. Any blocker condition → pause.
  14. Pilot with one narrow use case for two weeks. Measure against the baseline.
  15. Generalise only when the pilot passes its pre-set success bar.

Conclusion and Next Steps

The agent ecosystem will continue to evolve, but the leadership decision remains the same: fund outcomes, not tools. Start with a business workflow, clear the eight gates (especially approved stack, data handling, spend controls, and ownership), then choose the lightest approach that can deliver reliably. In practice, the most repeatable pattern is hybrid: deterministic automation for the predictable 80% and a single agentic layer for the variable 20%—lower cost, better auditability, and easier operations.

Action for leadership (operating mechanism): Nominate 3–5 candidate workflows and require every proposal to pass a gate check and a 2-week pilot with a defined baseline (cycle time, error rate, cost-to-serve) before it can scale.

  1. Nominate one workflow and write the one-sentence problem statement (output, frequency, where work happens, systems touched, and named owner).
  2. Run the eight gates and stop on any blocker (data classification, vendor terms, missing connectivity, no fallback, no cost cap).
  3. Pilot the smallest end-to-end slice (one input → one output) for two weeks and measure against the baseline (time, quality, rework, and cost).
  4. Scale only after the pilot clears a pre-set success bar; expand with deterministic steps first, adding agentic reasoning only where variability forces it to protect reliability and spend.
  5. Before production rollout, lock in ownership, audit/logging, fallback (human-in-loop), change control, and spend caps/alerts.

To move quickly, standardise intake: submit the one-sentence problem statement plus constraints (data class, systems touched, volume, and owner). With that, you can route the request through the gates and select the simplest viable approach without reopening the same governance questions each time.