Skip to main content

Expressive avatar

The avatar is a visual conversation partner. It reacts to what is said, to data-layer events, and to user interactions with UI blocks.

Users choose their avatar from seven visages:

VisageDescription
ClassicThe book mascot — the core GriMoire character
AnonyMousseA mask-inspired face
RobotA pixel-art robot
Black CatA dark feline character
Orange CatA warm-toned feline character
SquirrelA small woodland character
ParticleA particle-based abstract visage

Each character is an SVG file with named groups for movable facial parts: #brows, #left_eye, #right_eye, #mouth, and #faceRoot. Some visages use PNG sprite parts (separate brow, eye, and mouth images) embedded inside SVG <image> tags. Others use pure SVG paths.

Expression transforms are dampened per visage so the same emotion looks natural on each character shape — the classic mascot has a small face on a book body, so its mouth movement is more conservative than the squirrel's.

Users can also disable the avatar entirely if they prefer a text-only interface.

How expressions are rendered

At 60 frames per second, the avatar renderer applies SVG transforms (translate, rotate, scale) to each facial part group:

  • Brows — lift offset (pixels) + rotation (degrees). Surprised lifts brows the most. Confused is asymmetric: left brow up, right brow down.
  • Eyes — scale 0.88–1.22 + position offset. Surprised widens eyes. Happy squints them down in a Duchenne smile.
  • Mouth — computed from lip sync data (openness, width, round) combined with expression boosts. Happy lifts and widens. Surprised drops the mouth open.
  • FaceRoot — subtle floating sine wave for natural breathing, plus scale modulation.

Transitions between expressions are lerped over 120ms–350ms depending on the target expression, so changes feel smooth rather than abrupt.

Seven expressions

The avatar supports seven expression states:

ExpressionVisual effectTransition
IdleWarm micro-smile, neutral baseline300ms
ListeningEyebrows raised, eyes open wider, light smile — attentive250ms
ThinkingBrows lift dramatically, eyes drift upward and left — contemplative300ms
SpeakingLight brow engagement, mouth driven by lip sync200ms
SurprisedBrows shoot up, eyes widen dramatically, mouth drops120ms
HappyEyes squint down (Duchenne smile), mouth beams upward, cheeks lift300ms
ConfusedAsymmetric brows (left up, right down), asymmetric eyes, off-center grimace350ms

Each expression is defined with specific values for brow offset, brow rotation, eye scale, mouth lift, mouth width, and mouth openness.

Expression triggering

Expressions are triggered by four independent layers, coordinated through a priority system:

Sentiment analysis
Text patterns → expression
Data expressions
System events → expression
LLM explicit control
set_expression tool
Idle animations
Inactivity → tier progression

Sentiment analysis

Regex patterns detect emotion in user and assistant messages:

  • User says "thanks" or "awesome" → happy
  • User says "wow" or "no way" → surprised
  • User says "don't understand" or "huh" → confused
  • Assistant says "let me check" or "analyzing" → thinking

When regex patterns find no match, an async call to the Nano model provides a fallback classification.

Data expressions

18 priority-based rules react to system events:

TriggerExpressionPriorityContext
Search found 3+ resultshappy10Successful retrieval
Search found 0 resultsconfused12Empty result set
Tool execution errorconfused15Something went wrong
User clicked an itemthinking8Processing selection
User confirmed an actionhappy10Positive confirmation
Form submittedhappy10Successful submission
Form validation failedconfused12Input error

Higher-priority events override lower-priority ones. The system also suppresses reverts during multi-step operations to prevent expression flickering.

LLM explicit control

The model can call set_expression directly at priority 20, which always wins over sentiment and data rules. This lets the model set the avatar's expression when it has specific conversational intent.

Idle animations

A 5-tier inactivity progression adds life when the conversation pauses:

TierDurationVisual effect
Normal0–30sStandard behavior
Breathing30–60sGentle sine wave Y offset
Sparkle1–2 minRandom particles brighten
Drift2–5 minLooser physics, more particle movement
Wind5+ minDirectional wind force across particles

Ambient sound analysis from the microphone caps idle progression: voice detected forces return to normal, typing caps at breathing, environmental noise caps at sparkle.

Lip sync

The avatar's mouth movement is driven by Web Audio frequency-band analysis:

LipSyncAnalyzer
Web Audio DSP frequency-band analysis
Phoneme classification
14 viseme categories from 5 frequency bands
Mouth transform
openness + width + round

Primary: Frequency-based DSP

The lip sync analyzer is pure Web Audio DSP that analyzes the assistant's audio output in real time:

  1. AnalyserNode.getByteFrequencyData() captures the audio spectrum
  2. 5-band frequency analysis (sub, low/F1, mid/F2, high/fricatives, very-high/sibilants)
  3. Heuristic classification into 14 phoneme visemes (sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, I, O, U)
  4. Flickering prevention with hold logic and 2-frame minimum
  5. Outputs a 3-parameter mouth model: openness (jaw drop), width (smile/narrowness), round (lip protrusion)

This works across both voice paths — WebRTC realtime audio and any other audio source — without external service dependencies.

Attribution

The lip sync approach and mouth shape definitions are adapted from lipsync-engine (MIT License, Beer Digital LLC). The implementation is rebuilt directly on the Web Audio API with no external dependencies.

Procedural fallback

If audio capture itself fails, a simple procedural animation provides basic mouth movement.

Four personalities

Each personality changes the avatar's voice, visual theme, and behavioral characteristics:

PersonalityVoiceVisual themeCharacter
NormalalloyLight blue particles, calm glowProfessional, balanced
FunnyshimmerGold particles, sparkle trails, playful physicsWitty, energetic
HarshechoGray particles, tight springs, sharp trailsBlunt, direct
DevilashRed particles, ember trails, red eye glowDark, theatrical

Personality affects particle colors, glow radius and color, trail style and length, spring strength, noise amplitude, chaos factor, and background gradient. The Devil personality is the only one with a unique eye glow effect.

Gaze tracking

The avatar's eyes track toward the action panel for 3 seconds when blocks are created or updated. This creates a natural feeling that the assistant is "looking at" what it just showed.

Gaze is applied as directional offsets across facial regions:

  • Pupils — strongest effect (most noticeable eye direction change)
  • Eyes — medium effect
  • Other regions — subtle effect

Movement is lerped at 10% per frame for smooth, natural-feeling motion. Gaze auto-reverts to center after the configured timeout.