Emoji with Speech: Rethinking Visual Interfaces to Increase Accessibility

If we observe and keep score of how much emoji we use in our conversations in a day, these emotional icons would win over the number of words we use. From videos to GIF’s to memes and many forms of visual media, sending emoji presents a shift in the way we communicate since its invention in 1998 by Shigetaka Kurita, an engineer for DOCOMO phone company in Japan.

There is an overwhelming amount of study surrounding Emoji often by these overlapping fields — linguistics, socio-cultural theory in psychology and machine learning or sentiment analysis, where brands like Spotify, Airbnb, and Instagram study billions of user data to make sense of what these emoji represent. In a 2015 study, it was found that 92% of online users use emoji in conversations.

What if Emoji can talk?

For a year of studying Web Accessibility, I realize the emotional and social value of emoji in the digital era, and have learned from a recent interview with a low-vision user how emoji and gifs is one of the ways she interacts with her friends. I took on this project and studied emoji from the lens of web accessibility, and quickly prototyped a few solutions to present alternative ways to access emoji using speech interfaces.

Sketches of initial chatbot idea for EmojiSpeak

I also learned about Damen Rose, who talked about Glance Culture on “The Ouch” podcast on disability. Damen is a blind reporter at BBC, who shared his experience on emoji being one more example of how the technological world is becoming more visual, a “Glance Culture” where users take in so many things at a glance on a screen. This presents an exclusionary experience for the visually impaired community to keep up, understand different memes, and get the references.

There are abundance of anecdotes about how it’s difficult for a blind person or screenreader user to read emojis as it has long-winded alternate text/image descriptions. I’ve come across a well-written article by Veronica Lewis, a college student who writes reviews about tech devices, where she details the current technical challenges of how emoji creates confusion as it is presented on the web.

User Interviews

There are still plenty of use cases that shaped the focus of my research, and sketched least 30 prototypes that I have explored as I went deeper on this topic. But the most data I have gathered is when I started observing end users.

User A: Completely Blind. Has never used Emoji for conversations as he found that there was no need for it. Uses the Macbook and the iPhone interchangeably to send text messages.

  • Learned the use of emoji in less than a minute, during observation using both iOS and MacOS. Knew where to find the emoji from the keyboard right off the bat.
  • Played a game with emoji by sending a pair of them to friends and have them guess it. 
  • When user asked about the different types of heart emojis, why there were 17 kinds, and in different colors? Why would I use this vs. the other? The initial reaction of the user on the descriptive nature of emoji alt-text, was also touting emoji’s usage context.
Completely blind user asked about why there are 17 different heart emojis
  • Apple has a search bar for its Emoji keyboard on the computer, but not on iPhones. 
  • User would search for meaning of emojis using Google, then copies and pastes it to text message.

User B: Completely Blind, loves to use Emojis that relate to health and fitness and friendly characters.

  • Apple’s iPhone interface is easy to use. She is an expert at using emoji
  • She is also an expert in using her phone for convenience — particularly E-commerce, Youtube for yoga.
  • In my conversation with her iPhones seem to have fixed the problem with emoji use on text interface.
  • This boils down to an issue of texting etiquette, the emoji sometimes are not announced as an emoji

Competitive Analysis

MIT researchers are working on Deepmoji, where users type in a sentence and gets matched an equivalent emoji that befits the statement. They use predictive model based on millions of tweets that contain emojis. The more users participate, the model understands many nuances of how language is used to express emotions. Most study around emoji nowadays is around data science, and emotional and sentiment analysis, based on data tools and artificial intelligence. For my project I’d like to try something where all users of abilities are able to participate in submitting their own definition of emoji an automated solution.

Emojipedia is the biggest resource for emoji and their origin. Although many users use emojipedia as emoji reference, it is a closed system, authored by researchers and writers. I am looking for a community-driven solution that also is a way to build empathy around the issue of accessibility and image descriptions.

Urban Dictionary Urban Dictionary invites anyone to add a word and define it, just like how regular dictionaries work. It would be the closest product solution I have in mind, in terms of creating a community-driven platform and contextualizing a visual language like emoji. The content being very text oriented helps with compatibility to screen readers and text-to-speech technology.

Synthesis of Key Findings

Challenges in Glance Culture 
The rise of “visual content as interface” on touch screens and mobile devices has paved the way for users to adopt emojis in daily conversations. Not only used by early-adopters and millennials, a study reports that 92% of the online population uses stickers or emoji in text messages and social media platforms and has been widely used to the point of the little pictograms appearing in court documents.

Emoji as a consequence of “Glance Culture”, as it is oftenused in highly visual communication platforms, is excluding blind and screenreader users. 

How might we provide way so emoji can be accessible through voice interaction or speech output?

  • Visual Language — contextual usage of emoji spans through different types of users — age, race, or abilities, and evolves constantly whenever used.
  • Technical Barrier — image descriptions of emoji can be complex to read and confusing especially when sent in multiples.

“Intent” feature for VUI design to help identify use cases
I have identified three use cases to categorize how emoji is being used by users:

  • Emoji as Action — Emoji are pictures of text or as pictograms are often used to represent active words. “Clapping hands”, “kissing face”, “woman dancing emoji” are examples of emoji that depict action.
  • Emoji as Representation — For a visual impaired user who hasn’t used emoji, exposure to it usage begins based on the person’s experience. eg, Sandra, who loved to do yoga, uses the the “Person in lotus position” emoji a lot as a symbol that relates to her favorite activity.
  • Emoji as a Reaction — birthed by social platforms that promote quick response interaction, eg Facebook using “like”, “haha”, “wow”, “sad” and “angry”

Rapid Prototyping

Solution A: Emoji Dictionary

To get a human provided context of what emoji means, I have created a prototype that invites users to co-design meaning to emoji.


Solution B: Emoji Composer

A platform that presents the current descriptions of UNICODE emoji, I created a platform that simulates a posting/public messaging interface, with the text description read out loud through Text-to-speech.


Rapid Prototype Feedback:

“How the emoji is presenting itself on the web seems like the first problem that needs to be solved. Fix that and you have already fixed its usability.”

“How is this about glance culture? Maybe reimagine how you can redesign of the emoji keyboard? How can one access it without a screen? Categorization of a large dataset? How is the gesture like?”

“If you are in design space, and have enough time, what can be done to create a multimodal approach?” Like a heart emoji, sounds of a beating heart, and provides a pattern of haptic feedback?

Final Execution

Emojispeak is a platform that aims to identify alternative ways to access emoji, to educate and promote accessibility on the web.


Accessibility as a starting point

Emojispeak is a thesis project that was born out of my 100 Days of Inclusive Design exploration on Instagram. You may view the Instagram page here.

When I learned about digital accessibility in my second year of grad school, it immediately opened up a new world of design principles that when incorporated in a UX designer’s toolkit, could enhance the usability of digital products.

A11y is a numeronym for accessibility
Day 22: A11y is a numeronym for accessibility

The Web Content Accessibility Guidelines 2.0 or WCAG are a set of guidelines that make content accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, limited movement, speech disabilities, photosensitivity and combinations of these. In extension, accessibility also focuses on set of groups outside people with disability, such as designing for the aging population.