Skip to main content

The Hybrid Interaction Engine

The Hybrid Interaction Engine (HIE) is the coordination layer between the LLM, the rendered UI blocks, and the avatar.

In most agentic UIs, the model creates visual artifacts — cards, lists, previews, forms — but has no way to stay aware of them after rendering, and no way to react when the user interacts with them.

HIE solves this by tracking everything the model creates, feeding user interactions back as structured events, and driving the avatar's expressions through data-driven rules.

Three policies

P1: Persistent grounding

Every UI block the model creates is tracked. A visual-context summary is continuously sent to the model so it always knows what the user currently sees.

Model triggers tool
Block rendered
Action panel
Block tracked
Type, content, timestamp
Summary sent to model
Visual context message
Model responds
Aware of visible state

Without this, when a user says "tell me more about the second one", the model has no idea which results are visible. With persistent grounding, it does.

P2: Interaction normalization

User interactions with blocks — click, select, expand, dismiss, submit — are captured and sent to the model as structured events it can reason about.

User interacts
Click, select, dismiss
Event normalized
Standard format
Sent to model
[User interaction:] message
Model reasons
About what user did

Without this, the model shows a list and the user clicks one, but the model has no idea a click happened. With P2, the model can say "you selected the Q4 budget document — would you like me to summarize it?" without the user restating their intent.

P3: Embodiment arbitration

When the agent has a visual presence (the avatar), its expression state is driven by data-driven rules rather than scattered code.

Data event
Search results, error, form submit
Rule engine
18 priority-based rules
Priority check
Higher priority wins
Expression set
Avatar reacts visually

18 trigger rules map system events to avatar expressions. Higher-priority events override lower-priority ones, with timing rules to prevent the avatar from flickering between expressions during multi-step operations. A separate verbosity director adjusts how verbose the model should be based on the current UI density (minimal, brief, normal, or detailed).

The closed loop in practice

  1. The user asks for budget documents.
  2. GriMoire renders search results in a block. The avatar shifts to happy (search found results).
  3. HIE tracks what those visible results are and sends a visual summary to the model.
  4. The user clicks one result. HIE sends a [User interaction: selected item 2] event. The avatar shifts to thinking.
  5. The model responds with awareness: "You selected the Q4 budget document. Would you like me to summarize it or check who has access?"
  6. The user never had to restate what they were looking at.

Events

Everything inside HIE is driven by typed events. Each event comes from a producer and has a delivery mode that controls what happens next.

Event producers
Block runtime
create, update, remove
Action panel
clicks, selects, dismiss
Form lifecycle
open, submit, cancel
Tool completion
success or failure
Thread lifecycle
start, continue, reset
Typed event
Category + payload + delivery mode
Delivery mode
Local only
Updates HIE state, not sent to model
Silent
Sent to model without prompting a reply
Triggers reply
Sent to model and prompts a response
CategoryExamplesDelivery
Threadstarted, continued, resetLocal only
Blockcreated, updated, removedSilent
Interactionclick result, select itemSilent or triggers reply
Taskfocused, recap requestedTriggers reply
Artifactresult ready, recap readySilent
Formopened, submitted, cancelledSilent
Toolexecution completed, failedLocal only
Shelllogs toggled, settings toggledLocal only

What HIE tracks

HIE keeps a single state object that gets updated as events arrive.

Typed event
State updater
Updates on each event
Current state
Task context
What the user is doing right now
Artifacts
Content blocks worth remembering
Shell state
Logs open, settings open, app visible

Task context is a snapshot of the current task, not a history. Each new event replaces it.

Task kindTriggered by
searchSearch tool starts
click-resultUser clicks a search result
selectUser selects items
lookFile preview or detail view
summarizeSummary or recap operation
chat-aboutConversational follow-up
focusUser focuses on a specific block
recapRecap requested
formForm opened

Not every block becomes an artifact. Transient blocks like search results, selection lists, and confirmation dialogs are tracked as blocks but not recorded as artifacts. Content blocks (info cards, file previews, charts, etc.) become artifacts when they have source context.

How the model sees UI changes

Visual context

Block ready
Block tracker
Summarize by type
Context injector
Debounce and dedup
Send to model
[Visual context: ...]

When a block appears, HIE summarizes it and sends that summary to the model. For blocks with references (like search results), the summary includes numbered items:

[Visual context: Search results for "SPFx architecture" (5 results):
1) Architecture Guide — /sites/docs/Architecture.aspx
2) SPFx Overview — /sites/dev/SPFx-Overview.docx
...]

This is what enables follow-ups like "summarize document 3".

Interaction context

User interacts
Click, select, submit
Interaction formatter
Separate action from payload
Context injector
[User interaction: ...]

Interaction messages separate the action description (trusted) from the payload data (untrusted) to limit prompt-injection risk. Beyond blocks and interactions, HIE also sends focus events, recap updates, and form state changes to the model.

When block summaries get too long, HIE compresses them using a fast model — keeping numbered references but removing filler.

When updates are sent

ChannelWhen the model gets a state refresh
TextBefore each typed user message
VoiceEvery 5 model responses, plus whenever blocks change

Multi-step flows

HIE detects common multi-step patterns and provides flow hints to the model.

FlowTriggerSteps
Search then drillSearch completesSearch → click result → detail
Browse then openBrowse completesBrowse → select file → open
Confirm before actionConfirmation dialog shownConfirm → execute
Select then actSelection list shownSelect items → act on selection
Compose then submitCompose form shownFill form → submit
These are hints

Flows are advisory — HIE provides hints but does not enforce state transitions.

Lifecycle

HIE starts up lazily — on the first voice connection or the first typed message. It persists across voice connect/disconnect cycles within the same page session, so earlier visual context can still influence later turns.

Examples

Example 1: "Search for SPFx"

1. New conversation turn
Starts a new thread
2. Visual state refreshed
Model sees any existing blocks
3. Search tool called
Loading block appears
4. Block tracked
Task set to search
5. Results arrive
Block updated with results
6. Context sent to model
Numbered list of results
7. Flow hint starts
Search-then-drill flow detected

Example 2: "Summarize document 3"

1. Thread continues
References visible results
2. Visual state refreshed
Model sees numbered result list
3. File content retrieved
Model reads the referenced document
4. Summary card created
Info card with summary content
5. Artifact recorded
Summary linked back to search results

Example 3: "Send it by email"

1. Compose form tool called
Email form preset
2. Source context captured
HIE links form to summary
3. Form block created
Form appears in action panel
4. User fills and submits
Form data sent
5. Submission confirmed
Form artifact marked as submitted

Across all three examples, the chain is: search results (tracked block) → summary (artifact) → email form (artifact). The search step is tracked but not recorded as an artifact — the artifact chain starts at the summary.

What HIE is not

  • It is not generic browser vision or pixel-based understanding.
  • It is not a replacement for good block design.
  • It is not a graph-orchestration system for arbitrary workflows.
  • It is not a new wire protocol.

It is a coordination layer between the model, the visible UI, and the embodiment layer.