Voice Emerges as the Next Major AI Interface, Moving Beyond Text and Screens

By admin | Feb 05, 2026 | 2 min read

Mati Staniszewski, co-founder and CEO of ElevenLabs, believes voice is emerging as the next primary interface for artificial intelligence. He suggests that as AI models evolve beyond text and screens, voice will become the dominant way people interact with machines, fundamentally changing our relationship with technology. Looking ahead, Staniszewski hopes that “hopefully all our phones will go back in our pockets, and we can immerse ourselves in the real world around us, with voice as the mechanism that controls technology.”

This perspective helped drive ElevenLabs’ recent $500 million funding round, which valued the company at $11 billion. The view that voice is critical is gaining traction across the AI sector. Both OpenAI and Google are emphasizing voice capabilities in their next-generation models, while Apple seems to be developing always-on, voice-adjacent technologies through acquisitions like Q.ai. As AI integrates into wearables, vehicles, and other hardware, control is shifting from touchscreens to speech, positioning voice as a central competitive arena in AI’s next phase.

Seth Pierrepont, a general partner at Iconiq Capital, reinforced this outlook during a talk at Web Summit. He noted that while screens will remain important for gaming and entertainment, conventional input methods like keyboards are beginning to seem “outdated.” Pierrepont also highlighted that as AI systems become more agentic, interactions will evolve—models will incorporate guardrails, integrations, and contextual awareness to respond with less direct user prompting.

Staniszewski identifies this move toward agentic AI as one of the most significant shifts happening. Instead of requiring detailed instructions, future voice systems will use persistent memory and accumulated context to make exchanges feel more natural and less demanding for users. This progression will also affect how voice models are implemented. Although high-quality audio models have traditionally relied on cloud computing, ElevenLabs is developing a hybrid approach that combines cloud and on-device processing. This strategy is designed to support new hardware like headphones and wearables, where voice acts as a constant companion rather than an occasional feature.

ElevenLabs has already teamed up with Meta to integrate its voice technology into platforms such as Instagram and Horizon Worlds, Meta’s virtual reality environment. Staniszewski expressed openness to collaborating with Meta on its Ray-Ban smart glasses as voice interfaces expand into new device categories.

However, as voice becomes more ever-present and woven into everyday devices, it raises serious questions about privacy, surveillance, and the extent of personal data these systems might retain. With voice technology moving closer to users’ daily routines, concerns are growing about potential data misuse—an issue companies like Google have already faced criticism over.

RELATED AI TOOLS CATEGORIES AND TAGS

Categories: Text Audio Text Generation

Tags: #Large Language Models

LiveKit Secures $100 Million at $1 Billion Valuation to Power Real-Time AI Voice and Video

Comments

Please log in to leave a comment.

No comments yet. Be the first to comment!

RELATED AI TOOLS CATEGORIES AND TAGS

RELATED ARTICLES

LiveKit Secures $100 Million at $1…

OpenAI Consolidates Teams to Launc…

AI Startup Launches Collaborative …

Comments