Edge AI · Zero cloud · Zero inference cost

On-device AI
for interactive toys

LLM, speech recognition & voice synthesis compiled to ONNX — running offline on edge hardware and consumer devices. Manufacturers pay nothing for inference.

Edge AI Platform

Full inference pipeline.
Entirely offline.

Wake word to spoken response — everything runs on the device. No cloud, no API costs, no latency penalties.

Runtime

ONNX-Native

Models compiled to ONNX IR with INT4/INT8 quantization. Single-file deployment with custom operator fusion for embedded targets.

Voice

Full Voice Pipeline

End-to-end ASR, LLM, TTS on-device. Wake word, noise suppression, and child-friendly voice synthesis. Real-time streaming.

Character

Personality Engine

Character-consistent dialogue with emotional state tracking. Persistent memory stored on local flash across sessions.

Deploy

OTA Model Updates

Delta-compressed updates over WiFi or BLE. Hot-swap weights without device restart. Staged rollouts with automatic fallback.

Safety

COPPA Compliant

On-device content filtering. No audio or conversation data leaves the device. Privacy by architecture, not just policy.

SDK

Analytics SDK

Privacy-preserving insights via federated aggregation. Session metrics, engagement scoring, and crash telemetry.

inference.py
from neohumans import EdgeRuntime

# fully offline — no cloud, no cost
rt = EdgeRuntime(
    format="onnx",
    device="auto",
    models=[
        "neo-llm",
        "neo-asr",
        "neo-tts"
    ]
)

# listen → think → speak
audio = rt.listen()
reply = rt.think(audio)
rt.speak(reply)
Device Offload

Your users' devices
become the compute

The ALIVE module connects to nearby iPhones, Android phones, or Macs over a local mesh. Model weights live on the consumer device. Inference happens there. The toy just talks.

i

iPhone / iPad

iOS App

Weights stored locally

A

Android

Android App

Weights stored locally

M

Mac

macOS App

Weights stored locally

BLE + WiFi Direct · local mesh
NH

ALIVE Module

Minimal compute
Mic + Speaker

Receives inference results


$0

Inference cost
per conversation

100%

Offline capable
no internet needed

3

Platforms
iOS, Android & macOS

01

Install the app

User downloads the NeoHumans companion app on their phone, tablet, or Mac.

02

Download weights

ONNX model weights are downloaded once to the device. Around 480 MB total.

03

Mesh connect

ALIVE module pairs over BLE or WiFi Direct. No internet needed.

04

Talk

Audio routes to the device for inference. Response streams back to the toy.

Models

Small models.
Real conversations.

Each model ships as a single ONNX file — quantized and optimized for embedded deployment.

Language

neo-llm

1.3B
ONNX size~400 MB
QuantizationINT4
Context4K tokens
Throughput~30 tok/s

Speech Recognition

neo-asr

82M
ONNX size~45 MB
QuantizationINT8
StreamingYes
Latency~50ms

Text to Speech

neo-tts

48M
ONNX size~35 MB
QuantizationINT8
Voices32 presets
First chunk~30ms

Total on-device footprint

~480 MB on disk

Peak RAM

~800 MB

E2E Latency

under 200ms

Hardware

Runs on
real hardware

Validated across production edge platforms. Plus consumer devices via the companion app.

Q

Qualcomm

QCS6490 / QCS4490

Hexagon NPU

~30 tok/s

N

NVIDIA Jetson

Orin Nano / NX

CUDA + TensorRT

~55 tok/s

R

Raspberry Pi

CM4 / Pi 5

ARM NEON SIMD

~10 tok/s

Soon
?

Custom Silicon

Purpose-built ASIC

Designed for toy-grade AI

TBD

+ Consumer Device Offload

iPhoneiPadAndroidMacBookiMac

When the toy lacks compute, it offloads inference to the nearest device running the NeoHumans app.

For Manufacturers

Zero inference cost.
Infinite conversations.

Every conversation runs on hardware your customers already own. No servers. No per-token billing. No scaling nightmares. Ship the toy.

$0

per conversation
per device, forever

0 ms

network latency
everything is local

0 data

sent to the cloud
COPPA by architecture

Let's build

Ready to ship
smarter toys?

We work with toy manufacturers and hardware partners to bring on-device AI to production. Let's talk.