The Hybrid Interaction Engine
The Hybrid Interaction Engine (HIE) is the coordination layer between the LLM, the rendered UI blocks, and the avatar.
In most agentic UIs, the model creates visual artifacts — cards, lists, previews, forms — but has no way to stay aware of them after rendering, and no way to react when the user interacts with them.
HIE solves this by tracking everything the model creates, feeding user interactions back as structured events, and driving the avatar's expressions through data-driven rules.
Three policies
P1: Persistent grounding
Every UI block the model creates is tracked. A visual-context summary is continuously sent to the model so it always knows what the user currently sees.
Without this, when a user says "tell me more about the second one", the model has no idea which results are visible. With persistent grounding, it does.
P2: Interaction normalization
User interactions with blocks — click, select, expand, dismiss, submit — are captured and sent to the model as structured events it can reason about.
Without this, the model shows a list and the user clicks one, but the model has no idea a click happened. With P2, the model can say "you selected the Q4 budget document — would you like me to summarize it?" without the user restating their intent.
P3: Embodiment arbitration
When the agent has a visual presence (the avatar), its expression state is driven by data-driven rules rather than scattered code.
18 trigger rules map system events to avatar expressions. Higher-priority events override lower-priority ones, with timing rules to prevent the avatar from flickering between expressions during multi-step operations. A separate verbosity director adjusts how verbose the model should be based on the current UI density (minimal, brief, normal, or detailed).
The closed loop in practice
- The user asks for budget documents.
- GriMoire renders search results in a block. The avatar shifts to
happy(search found results). - HIE tracks what those visible results are and sends a visual summary to the model.
- The user clicks one result. HIE sends a
[User interaction: selected item 2]event. The avatar shifts tothinking. - The model responds with awareness: "You selected the Q4 budget document. Would you like me to summarize it or check who has access?"
- The user never had to restate what they were looking at.
Events
Everything inside HIE is driven by typed events. Each event comes from a producer and has a delivery mode that controls what happens next.
| Category | Examples | Delivery |
|---|---|---|
| Thread | started, continued, reset | Local only |
| Block | created, updated, removed | Silent |
| Interaction | click result, select item | Silent or triggers reply |
| Task | focused, recap requested | Triggers reply |
| Artifact | result ready, recap ready | Silent |
| Form | opened, submitted, cancelled | Silent |
| Tool | execution completed, failed | Local only |
| Shell | logs toggled, settings toggled | Local only |
What HIE tracks
HIE keeps a single state object that gets updated as events arrive.
Task context is a snapshot of the current task, not a history. Each new event replaces it.
| Task kind | Triggered by |
|---|---|
| search | Search tool starts |
| click-result | User clicks a search result |
| select | User selects items |
| look | File preview or detail view |
| summarize | Summary or recap operation |
| chat-about | Conversational follow-up |
| focus | User focuses on a specific block |
| recap | Recap requested |
| form | Form opened |
Not every block becomes an artifact. Transient blocks like search results, selection lists, and confirmation dialogs are tracked as blocks but not recorded as artifacts. Content blocks (info cards, file previews, charts, etc.) become artifacts when they have source context.
How the model sees UI changes
Visual context
When a block appears, HIE summarizes it and sends that summary to the model. For blocks with references (like search results), the summary includes numbered items:
[Visual context: Search results for "SPFx architecture" (5 results):
1) Architecture Guide — /sites/docs/Architecture.aspx
2) SPFx Overview — /sites/dev/SPFx-Overview.docx
...]
This is what enables follow-ups like "summarize document 3".
Interaction context
Interaction messages separate the action description (trusted) from the payload data (untrusted) to limit prompt-injection risk. Beyond blocks and interactions, HIE also sends focus events, recap updates, and form state changes to the model.
When block summaries get too long, HIE compresses them using a fast model — keeping numbered references but removing filler.
When updates are sent
| Channel | When the model gets a state refresh |
|---|---|
| Text | Before each typed user message |
| Voice | Every 5 model responses, plus whenever blocks change |
Multi-step flows
HIE detects common multi-step patterns and provides flow hints to the model.
| Flow | Trigger | Steps |
|---|---|---|
| Search then drill | Search completes | Search → click result → detail |
| Browse then open | Browse completes | Browse → select file → open |
| Confirm before action | Confirmation dialog shown | Confirm → execute |
| Select then act | Selection list shown | Select items → act on selection |
| Compose then submit | Compose form shown | Fill form → submit |
Flows are advisory — HIE provides hints but does not enforce state transitions.
Lifecycle
HIE starts up lazily — on the first voice connection or the first typed message. It persists across voice connect/disconnect cycles within the same page session, so earlier visual context can still influence later turns.
Examples
Example 1: "Search for SPFx"
Example 2: "Summarize document 3"
Example 3: "Send it by email"
Across all three examples, the chain is: search results (tracked block) → summary (artifact) → email form (artifact). The search step is tracked but not recorded as an artifact — the artifact chain starts at the summary.
What HIE is not
- It is not generic browser vision or pixel-based understanding.
- It is not a replacement for good block design.
- It is not a graph-orchestration system for arbitrary workflows.
- It is not a new wire protocol.
It is a coordination layer between the model, the visible UI, and the embodiment layer.
The concepts behind HIE are described in Reflexive UI Awareness as Host-Side Orchestration: Continuous Grounding, Interaction Normalization, and Embodiment Policy for Agentic Interface.