Expressive avatar

The avatar is a visual conversation partner. It reacts to what is said, to data-layer events, and to user interactions with UI blocks.

Visage gallery

Users choose their avatar from seven visages:

Visage	Description
Classic	The book mascot — the core GriMoire character
AnonyMousse	A mask-inspired face
Robot	A pixel-art robot
Black Cat	A dark feline character
Orange Cat	A warm-toned feline character
Squirrel	A small woodland character
Particle	A particle-based abstract visage

Each character is an SVG file with named groups for movable facial parts: #brows, #left_eye, #right_eye, #mouth, and #faceRoot. Some visages use PNG sprite parts (separate brow, eye, and mouth images) embedded inside SVG <image> tags. Others use pure SVG paths.

Expression transforms are dampened per visage so the same emotion looks natural on each character shape — the classic mascot has a small face on a book body, so its mouth movement is more conservative than the squirrel's.

Users can also disable the avatar entirely if they prefer a text-only interface.

How expressions are rendered

At 60 frames per second, the avatar renderer applies SVG transforms (translate, rotate, scale) to each facial part group:

Brows — lift offset (pixels) + rotation (degrees). Surprised lifts brows the most. Confused is asymmetric: left brow up, right brow down.
Eyes — scale 0.88–1.22 + position offset. Surprised widens eyes. Happy squints them down in a Duchenne smile.
Mouth — computed from lip sync data (openness, width, round) combined with expression boosts. Happy lifts and widens. Surprised drops the mouth open.
FaceRoot — subtle floating sine wave for natural breathing, plus scale modulation.

Transitions between expressions are lerped over 120ms–350ms depending on the target expression, so changes feel smooth rather than abrupt.

Seven expressions

The avatar supports seven expression states:

Expression	Visual effect	Transition
Idle	Warm micro-smile, neutral baseline	300ms
Listening	Eyebrows raised, eyes open wider, light smile — attentive	250ms
Thinking	Brows lift dramatically, eyes drift upward and left — contemplative	300ms
Speaking	Light brow engagement, mouth driven by lip sync	200ms
Surprised	Brows shoot up, eyes widen dramatically, mouth drops	120ms
Happy	Eyes squint down (Duchenne smile), mouth beams upward, cheeks lift	300ms
Confused	Asymmetric brows (left up, right down), asymmetric eyes, off-center grimace	350ms

Each expression is defined with specific values for brow offset, brow rotation, eye scale, mouth lift, mouth width, and mouth openness.

Expression triggering

Expressions are triggered by four independent layers, coordinated through a priority system:

Sentiment analysis

Text patterns → expression

→

Data expressions

System events → expression

→

LLM explicit control

set_expression tool

→

Idle animations

Inactivity → tier progression

Sentiment analysis

Regex patterns detect emotion in user and assistant messages:

User says "thanks" or "awesome" → happy
User says "wow" or "no way" → surprised
User says "don't understand" or "huh" → confused
Assistant says "let me check" or "analyzing" → thinking

When regex patterns find no match, an async call to the Nano model provides a fallback classification.

Data expressions

18 priority-based rules react to system events:

Trigger	Expression	Priority	Context
Search found 3+ results	happy	10	Successful retrieval
Search found 0 results	confused	12	Empty result set
Tool execution error	confused	15	Something went wrong
User clicked an item	thinking	8	Processing selection
User confirmed an action	happy	10	Positive confirmation
Form submitted	happy	10	Successful submission
Form validation failed	confused	12	Input error

Higher-priority events override lower-priority ones. The system also suppresses reverts during multi-step operations to prevent expression flickering.

LLM explicit control

The model can call set_expression directly at priority 20, which always wins over sentiment and data rules. This lets the model set the avatar's expression when it has specific conversational intent.

Idle animations

A 5-tier inactivity progression adds life when the conversation pauses:

Tier	Duration	Visual effect
Normal	0–30s	Standard behavior
Breathing	30–60s	Gentle sine wave Y offset
Sparkle	1–2 min	Random particles brighten
Drift	2–5 min	Looser physics, more particle movement
Wind	5+ min	Directional wind force across particles

Ambient sound analysis from the microphone caps idle progression: voice detected forces return to normal, typing caps at breathing, environmental noise caps at sparkle.

Lip sync

The avatar's mouth movement is driven by Web Audio frequency-band analysis:

LipSyncAnalyzer

Web Audio DSP frequency-band analysis

→

Phoneme classification

14 viseme categories from 5 frequency bands

→

Mouth transform

openness + width + round

Primary: Frequency-based DSP

The lip sync analyzer is pure Web Audio DSP that analyzes the assistant's audio output in real time:

AnalyserNode.getByteFrequencyData() captures the audio spectrum
5-band frequency analysis (sub, low/F1, mid/F2, high/fricatives, very-high/sibilants)
Heuristic classification into 14 phoneme visemes (sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, I, O, U)
Flickering prevention with hold logic and 2-frame minimum
Outputs a 3-parameter mouth model: openness (jaw drop), width (smile/narrowness), round (lip protrusion)

This works across both voice paths — WebRTC realtime audio and any other audio source — without external service dependencies.

Attribution

The lip sync approach and mouth shape definitions are adapted from lipsync-engine (MIT License, Beer Digital LLC). The implementation is rebuilt directly on the Web Audio API with no external dependencies.

Procedural fallback

If audio capture itself fails, a simple procedural animation provides basic mouth movement.

Four personalities

Each personality changes the avatar's voice, visual theme, and behavioral characteristics:

Personality	Voice	Visual theme	Character
Normal	alloy	Light blue particles, calm glow	Professional, balanced
Funny	shimmer	Gold particles, sparkle trails, playful physics	Witty, energetic
Harsh	echo	Gray particles, tight springs, sharp trails	Blunt, direct
Devil	ash	Red particles, ember trails, red eye glow	Dark, theatrical

Personality affects particle colors, glow radius and color, trail style and length, spring strength, noise amplitude, chaos factor, and background gradient. The Devil personality is the only one with a unique eye glow effect.

Gaze tracking

The avatar's eyes track toward the action panel for 3 seconds when blocks are created or updated. This creates a natural feeling that the assistant is "looking at" what it just showed.

Gaze is applied as directional offsets across facial regions:

Pupils — strongest effect (most noticeable eye direction change)
Eyes — medium effect
Other regions — subtle effect

Movement is lerped at 10% per frame for smooth, natural-feeling motion. Gaze auto-reverts to center after the configured timeout.

Visage gallery​

How expressions are rendered​

Seven expressions​

Expression triggering​

Sentiment analysis​

Data expressions​

LLM explicit control​

Idle animations​

Lip sync​

Primary: Frequency-based DSP​

Procedural fallback​

Four personalities​

Gaze tracking​

Visage gallery

How expressions are rendered

Seven expressions

Expression triggering

Sentiment analysis

Data expressions

LLM explicit control

Idle animations

Lip sync

Primary: Frequency-based DSP

Procedural fallback

Four personalities

Gaze tracking