A literature note for keeping up on [[AI Agents]] architectures and workflows, which seems to be changing each day. Occasionally, I do a point-in-time summary to help get a baseline for where we are as an industry.
- [[From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack]]
- [[A High-level Overview of the AI Agent Technical Stack in Early 2026]]
[The Landscape of Agentic Reinforcement Learning for LLMs: A Survey](https://arxiv.org/pdf/2509.02547)
# How Did We Get Here?
![[image-32.png]]
The billion dollar question - what is phase 5?
# Tradeoffs
Like any architecture discipline, creating agentic systems require weighing tradeoffs. You still have the classic system design tradeoffs (e.g., [[Availability]] vs. [[Consistency]]), but you also need to weigh:
- Autonomy vs. controllability
- Latency vs. reliability vs. cost
- Capability vs. safety
These can all be impacted by tuning the agentic architectures:
- Tool selection
- Verification level
- Backtracking
# AI Agent Architectures
## Single Agent
### [[Reason Act (ReAct)]]
![[Reason Act (ReAct)#Overview]]
### Reflexion
![[Reflexion#Overview]]
### Reasoning and Acting through Scratchpad Examples (RAISE)
### Plan-and-Execute Agents
An approach where a plan is generated, then the plan is executed. The plan phase typically limits the amount of tool calls and can be done in a single interaction with the LLM. As a result, there are less LLM calls.
### Search Agents
#### Tree-of-Thought
[[Tree-of-Thoughts (ToT)]]
#### Graph-of-Thought
#### Monte-Carlo Planning
## Multi-Agent
### Characteristics
#### Centralized vs. decentralized
#### Holonic
#### Coalition
### Modular Reasoning, Knowledge and Language (MRKL) Systems
![[Modular Reasoning, Knowledge and Language (MRKL) Systems]]
# Architecture Components
## Orchestrator
[[Agent Harness]]
[GitHub - jrswab/axe: A ligthweight cli for running single-purpose AI agents. Define focused agents in TOML, trigger them from anywhere; pipes, git hooks, cron, or the terminal. · GitHub](https://github.com/jrswab/axe)
### Orchestration Approaches
#### Role-based Orchestration
#### Task-based Orchestration
### Orchestrator Types
#### Rule-based Orchestrators
#### Graph-based Planners
#### Behaviour-tree-style control
#### LLM-based Orchestrators
## Tools
### Code Execution
### APIs
### Search
## The Brain
### The Model
#### Learning Strategies
##### Reinforcement Learning
[[Reinforcement Learning (RL)]] optimizes for long-horizon returns.
##### Imitation LeArning (IL)
A more pragmatic approach compared to RL when high quality traces exist, because it avoids unsafe exploration and is cheaper in tool-rich environments.
As a con, though, it inherits biases and coverage gaps from the demonstration set. To address this, IL is often combined with verification and repair loops (critics, self-correction) to handle out-of-distribution heuristics.
![[AI Agent Architectures-1773768448271.webp]]
##### In-context Learning
Enables rapid task adaption via prompting and exemplars without parameter updates. Essentially, this is describing [[Few-shot Classification]] done within the context window.
### Cognitive Mindset
#### Reasoning
#### Planning
#### Knowledge Retrieval
#### Optimization
## Memory
[[Agentic Memory Tools]]
### Short Term Working Context
#### User Prompts
[Prompting best practices for Agentic Systems](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#agentic-systems)
### Long-term State
## Infrastructure
### Observability / Evaluation
Defines what is learnable. If you don't log the data, it is not learnable.
- Audit logs
- Prompt, tool, and state traces
### Execution and Safety
- Sandboxed tool execution
- Schema validation
- Identity and permission enforcements
- Policy gates and guardrails
### Evaluation
Benchmarks, task suites
#### Verifiers / Critics
#### Supervised Finetuning
- Supervised fine-tuning on traces (tool call + outcomes)
- [[Reinforcement Learning from Human Feedback (RLHF)]]
- [[Direct Preference Optimization (DPO)]]
# Strategies
- Constrain the agent through explicit limits on time, tokens, tool calls
- **Structured Action Spaces** - use tools and outputs that are strongly types and must pass a set of schema validation or policy checks. This mitigates the effects of agentic missteps and can help stop them from compounding.
- [Prompt Cacheing](https://x.com/trq212/status/2024574133011673516)
- **Circuit breaker pattern:** Each tool category (Home Assistant, Apple ecosystem, Gmail, etc.) has a circuit breaker. Three consecutive failures trips it open - the agent stops trying and tells me the service is down. After 5 minutes, they'll try again.
- Recent practices emphasize a trace-first data flywheel: run the agent in realistic environments, log full trajectories (prompts, tool calls, tool outputs, and outcomes) and continuously mine failures for targeted improvements (better prompts, new tools, better verfiers, or finetuning)
- Practical deployments adopt adaptive optimization: fast-path execution for routine cases, slower verified paths for high-risk actions, explicit budgets (time, tokens, tool calls), and permission gates that bound side effects even when the model is capable
## Memory
- Bookend Approach - starts each session by reading a summary document that is created at the end of the previosu session. Can run updates throughout the session.
- Database with Semantic Search - embed memories
- Example types of memories from Doris:
- `identity` - Core facts: names, relationships, ages, birthdays
- `family` - Context about family members, schools, activities
- `preference` - How we like things done ("no cheerleading", "truth over comfort")
- `project` - Things I'm working on (Doris itself is in here)
- `decision` - Architectural choices, decisions made in past conversations
- `context` - Recurring themes, background info
- `health`, `financial` - Sensitive categories with appropriate handling
- Supported by memory extraction:
- **Explicit logging** - I can say "remember this" or "log this decision"
- **Auto-extraction** - Haiku reviews conversations and pulls out facts worth remembering
- **Session summaries** - Rich summaries of longer sessions with reasoning and open questions
- Graph-Based State Management: Simple chains and loops are too fragile for complex tasks. Modeling the agent's logic as a formal state graph (using LangGraph) has been essential. This allows for explicit state management, error handling nodes, and self-correction paths, making the agent far more resilient.
## Infrastructure
- Use components like sandboxed tool execution, schema validation, identity / permission enforcements, audit logs, caching, and observability (traces from prompts, tools calls, and intermediate state)
# Workflows and Orchestration Approaches
- Thread-Based (i.e. Process-based) Workflow ([AGENT THREADS](https://www.youtube.com/watch?v=-WBHNFAB0OE))
- Base Thread: Your fundamental unit of work
- P Thread: Parallel execution for scaling output
- C Thread: Chained work for production-sensitive tasks
- F Thread: Fusion threads for rapid prototyping and confidence
- B Thread: Meta structures with agents prompting agents
- L Thread: Long duration, high autonomy workflows
- The Cooridnator
- Sequential
- Iterative Refinement between 2 agents
-
## Coding Workflows
- Plan -> Build -> Review -> Fix
![[image-31.png]]
# Technical Stack
# Sources
Topics That Need A Home:
- [[Agentic Feedback Loops]]