Expressive avatar
The avatar is a visual conversation partner. It reacts to what is said, to data-layer events, and to user interactions with UI blocks.
Visage gallery
Users choose their avatar from seven visages:
| Visage | Description |
|---|---|
| Classic | The book mascot — the core GriMoire character |
| AnonyMousse | A mask-inspired face |
| Robot | A pixel-art robot |
| Black Cat | A dark feline character |
| Orange Cat | A warm-toned feline character |
| Squirrel | A small woodland character |
| Particle | A particle-based abstract visage |
Each character is an SVG file with named groups for movable facial parts: #brows, #left_eye, #right_eye, #mouth, and #faceRoot. Some visages use PNG sprite parts (separate brow, eye, and mouth images) embedded inside SVG <image> tags. Others use pure SVG paths.
Expression transforms are dampened per visage so the same emotion looks natural on each character shape — the classic mascot has a small face on a book body, so its mouth movement is more conservative than the squirrel's.
Users can also disable the avatar entirely if they prefer a text-only interface.
How expressions are rendered
At 60 frames per second, the avatar renderer applies SVG transforms (translate, rotate, scale) to each facial part group:
- Brows — lift offset (pixels) + rotation (degrees). Surprised lifts brows the most. Confused is asymmetric: left brow up, right brow down.
- Eyes — scale 0.88–1.22 + position offset. Surprised widens eyes. Happy squints them down in a Duchenne smile.
- Mouth — computed from lip sync data (openness, width, round) combined with expression boosts. Happy lifts and widens. Surprised drops the mouth open.
- FaceRoot — subtle floating sine wave for natural breathing, plus scale modulation.
Transitions between expressions are lerped over 120ms–350ms depending on the target expression, so changes feel smooth rather than abrupt.
Seven expressions
The avatar supports seven expression states:
| Expression | Visual effect | Transition |
|---|---|---|
| Idle | Warm micro-smile, neutral baseline | 300ms |
| Listening | Eyebrows raised, eyes open wider, light smile — attentive | 250ms |
| Thinking | Brows lift dramatically, eyes drift upward and left — contemplative | 300ms |
| Speaking | Light brow engagement, mouth driven by lip sync | 200ms |
| Surprised | Brows shoot up, eyes widen dramatically, mouth drops | 120ms |
| Happy | Eyes squint down (Duchenne smile), mouth beams upward, cheeks lift | 300ms |
| Confused | Asymmetric brows (left up, right down), asymmetric eyes, off-center grimace | 350ms |
Each expression is defined with specific values for brow offset, brow rotation, eye scale, mouth lift, mouth width, and mouth openness.
Expression triggering
Expressions are triggered by four independent layers, coordinated through a priority system:
Sentiment analysis
Regex patterns detect emotion in user and assistant messages:
- User says "thanks" or "awesome" → happy
- User says "wow" or "no way" → surprised
- User says "don't understand" or "huh" → confused
- Assistant says "let me check" or "analyzing" → thinking
When regex patterns find no match, an async call to the Nano model provides a fallback classification.
Data expressions
18 priority-based rules react to system events:
| Trigger | Expression | Priority | Context |
|---|---|---|---|
| Search found 3+ results | happy | 10 | Successful retrieval |
| Search found 0 results | confused | 12 | Empty result set |
| Tool execution error | confused | 15 | Something went wrong |
| User clicked an item | thinking | 8 | Processing selection |
| User confirmed an action | happy | 10 | Positive confirmation |
| Form submitted | happy | 10 | Successful submission |
| Form validation failed | confused | 12 | Input error |
Higher-priority events override lower-priority ones. The system also suppresses reverts during multi-step operations to prevent expression flickering.
LLM explicit control
The model can call set_expression directly at priority 20, which always wins over sentiment and data rules. This lets the model set the avatar's expression when it has specific conversational intent.
Idle animations
A 5-tier inactivity progression adds life when the conversation pauses:
| Tier | Duration | Visual effect |
|---|---|---|
| Normal | 0–30s | Standard behavior |
| Breathing | 30–60s | Gentle sine wave Y offset |
| Sparkle | 1–2 min | Random particles brighten |
| Drift | 2–5 min | Looser physics, more particle movement |
| Wind | 5+ min | Directional wind force across particles |
Ambient sound analysis from the microphone caps idle progression: voice detected forces return to normal, typing caps at breathing, environmental noise caps at sparkle.
Lip sync
The avatar's mouth movement is driven by Web Audio frequency-band analysis:
Primary: Frequency-based DSP
The lip sync analyzer is pure Web Audio DSP that analyzes the assistant's audio output in real time:
AnalyserNode.getByteFrequencyData()captures the audio spectrum- 5-band frequency analysis (sub, low/F1, mid/F2, high/fricatives, very-high/sibilants)
- Heuristic classification into 14 phoneme visemes (sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, I, O, U)
- Flickering prevention with hold logic and 2-frame minimum
- Outputs a 3-parameter mouth model: openness (jaw drop), width (smile/narrowness), round (lip protrusion)
This works across both voice paths — WebRTC realtime audio and any other audio source — without external service dependencies.
The lip sync approach and mouth shape definitions are adapted from lipsync-engine (MIT License, Beer Digital LLC). The implementation is rebuilt directly on the Web Audio API with no external dependencies.
Procedural fallback
If audio capture itself fails, a simple procedural animation provides basic mouth movement.
Four personalities
Each personality changes the avatar's voice, visual theme, and behavioral characteristics:
| Personality | Voice | Visual theme | Character |
|---|---|---|---|
| Normal | alloy | Light blue particles, calm glow | Professional, balanced |
| Funny | shimmer | Gold particles, sparkle trails, playful physics | Witty, energetic |
| Harsh | echo | Gray particles, tight springs, sharp trails | Blunt, direct |
| Devil | ash | Red particles, ember trails, red eye glow | Dark, theatrical |
Personality affects particle colors, glow radius and color, trail style and length, spring strength, noise amplitude, chaos factor, and background gradient. The Devil personality is the only one with a unique eye glow effect.
Gaze tracking
The avatar's eyes track toward the action panel for 3 seconds when blocks are created or updated. This creates a natural feeling that the assistant is "looking at" what it just showed.
Gaze is applied as directional offsets across facial regions:
- Pupils — strongest effect (most noticeable eye direction change)
- Eyes — medium effect
- Other regions — subtle effect
Movement is lerped at 10% per frame for smooth, natural-feeling motion. Gaze auto-reverts to center after the configured timeout.