Every morning, as Nandita Mohan sifts through her emails, her college pals are in her ear — recounting their day, reminiscing, reflecting on what it’s like to have graduated in the throes of a pandemic.
Mohan isn’t on the phone, nor is she listening to an especially personal podcast; she’s using Cappuccino, an app that takes voice recordings from a closed group of friends or family and delivers them as downloadable audio.
“Just hearing all of us makes me value our friendship, and hearing their voices is a gamechanger,” the 23-year-old Bay Area software programmer says.
Audio messaging has been available for years; voice memos on WhatsApp are especially big in India and WeChat audio messages are popular in China. And the pandemic’s social distancing has made voice memos a easy way for people to stay in touch while bypassing Zoom fatigue. But now a new wave of hip apps are baking the immediacy and rawness of audio into the core experience, making voice the way in which people connect again. From phone calls, to messaging, and back to audio—the way we use our phones may be coming full circle.
The best -known audio-focused network is Clubhouse, the buzzy, invite-only app that debuted to glowing reviews for its talkshow-like twist on the chatrooms of the early internet, making it akin to dropping in on an (online) party conversation.
But Clubhouse’s promise was shattered by its lack of moderation and the unfettered chatter of misogynistic venture capitalists. New York Times reporter Taylor Lorenz, once a fan of the app, was subject to harassment in Clubhouse sessions for calling out one VC’s behavior.
“I don’t plan on opening the app again,” Lorenz told Wired. “I don’t want to support any network that doesn’t take user safety seriously.” Her experience wasn’t a one-off and since then darker, racist elements have appeared, suggesting the behavior that mars every other social platform also exists beneath Clubhouse’s exclusive, cool veneer.
Gaming chat app Discord, meanwhile, has exploded during the pandemic. The service utilizes voice over IP software to translate spoken chat into text (an idea that came from video gamers who found typing while also playing impossible). In June, to tap into people’s need for connection during the pandemic, Discord announced a new slogan—“Your place to talk” — and efforts to make the service appear less gamer-centric. The marketing push seems to have worked: By October, Discord estimated 6.7 million users — up from 1.4 million In February, just before the pandemic hit.
But while Discord’s communities, or “servers,” can be as small and innocent as kids organizing remote-but-simultaneous sleepovers they have also included far-right extremists who have used the service to organize the Charlottesville white supremacist rallies and the recent insurrection at the US Capitol.
In both Discord and Clubhouse, the in-group culture — nerdy gamers in Discord’s case, over-confident venture capitalists for Clubhouse — have led to instances of groupthink that can be, at best, off-putting, and at worst, bigoted. Yet there’s still an appeal to both: Isn’t it cool to talk and literally be heard? After all, that’s the foundational promise of social media: democratization of voice.
Speak and you shall be heard
The intimacy of voice makes audio social media that much more appealing in the age of pandemic social distancing and isolation. Jimi Tele, the CEO of Chekmate, a “text-free” dating app that connects users through only voice and video, says that the intimacy of voice inspired him to launch the app that would be “catfish-proof,” referring to people deceiving others online with fake profiles.
“We wanted to break away from the anonymity and gamification that texting allows and instead create a community rooted in authenticity where users are encouraged to be themselves without judgment,” Tele says. The app’s users start voice memos that average at five seconds, then get progressively longer. And while Chekmate has a video option, Tele says that the app’s several thousand users overwhelmingly favor using their voices. “They are perceived as less intimidating [than video messages],” he says.
This immediacy and authenticity is the reason why Gilles Poupardin created Cappuccino. He wondered why there wasn’t already a product that gathered voice memos together into a single downloadable file. “Everyone has a group chat with friends,” he says. “But what if you could hear your friends? That’s really powerful.”
Mohan agrees. She says that her group of friends switched to Cappuccino from a Facebook messenger chat group, then tried Zoom calls early on in the pandemic. But the discussions would inevitably circle into a highlights reel of big events. “There was no time for details,” she laments. The daily Cappuccino “beans,” as the stitched-together recordings are called, let Mohan’s friend circle keep up to date in a very intimate way — “My one friend is moving to a new apartment in a new city, and she was just talking about how she goes to get coffee in her kitchen,” Mohan says. “That’s something I would never know in a Zoom call, because it’s so small.”
“We were interested in whether audio could add an additional layer of connection to the public conversation,” says Rémy Bourgoin, senior software engineer on Twitter’s voice tweets and Spaces team.
Bourgoin says that the vision is for Spaces to be “as intimate and comfortable as attending a well-hosted dinner party. You don’t need to know everyone there to have a good time, but you should feel comfortable sitting at the table.”
You may have snorted in disbelief reading that Twitter wants to create a space that is “comfortable” and “intimate.” After all, Twitter doesn’t exactly have a stellar track record in creating an online environment that is welcoming and protects vulnerable users from abuse.
Bourgoin says the group is moving slowly on purpose before releasing Spaces beyond beta and a small group of users, even going so far as to include captioning — a rare accessibility feature on audio networks. “Right now, Spaces can be reported by anyone who is in the Space,” Bourgoin says. “Reports will be reviewed by our team, who will evaluate for violations of the Twitter Rules.”
Ah, moderation. Content moderation on audio is far more difficult than text. Searchable text and automoderators have been used to some success, but human moderators seems to be the most thorough way to block people who don’t abide by community rules — which puts human beings at risk. For platforms where people can jump in at any time and chat, the very democratized medium that makes audio attractive creates a nightmare in moderation. “That’s definitely a huge challenge with any user-generated platform,” says Austin Petersmith, who launched Capiche.fm in beta last year, a site that launched out of a software community that is a bit like a call-in radio show: hosts call each other to start the show, then invite listeners to chime in while they’re “on-air.”
As users of Clubhouse have learned, voice-only spaces can quickly get ugly just like anywhere else on the internet. People who already suffer from online abuse in text form — marginalized, female or non-binary, non-white, and/or younger — are unlikely to want to make the leap to a place where they can now be abused in a different, harder to police, format.
There’s also reason to believe these less regulated, newer platforms will be attractive to the hundreds of disaffected, far-right conspiracy-minded extremists and QAnon believers, who are now creating their own podcast networks.
But still, these audio social networks seem to offer something that traditional social media cannot. One of the format’s main benefits is it gives users the immediate connection of a voice or video call but on their own terms. Phone calls — and Zoom calls, for that matter — require some planning. But audio social media is something that can be created and digested at your own convenience in a way that news alerts, notifications, and doomscrolling don’t allow. As Mohan, who listens to her friends every morning says of Cappuccino: “It engages me and forces me to listen more carefully as each person is talking. I even take notes of things I want to respond to and say.”
For Mohan, the recordings from her circle of five friends have become a beloved ritual, allowing her to catch up with her friends at her own pace. “Every day, in the middle of my work day, I’ll record my Cappuccino,” she says, referring to the recording she makes on the app. “It feels really personal. I’m hearing all their voices and I feel on top of what they [her friends] are doing in their day to day.”