HTML design and the audible experience

Published 12th August 2021 11 minute read

The HTML we write shapes what screen reader users hear. Every element, every attribute, every piece of text we include becomes part of an audible experience.

This article explores how HTML translates into spoken announcements and how we can design better experiences for people who listen to the web.

The web has a voice

When someone uses a screen reader, they hear the web rather than see it. The visual layout disappears. Colours, spacing, and visual hierarchy become irrelevant. What remains is structure and meaning, conveyed through synthesised speech.

A button looks like a button to sighted users. To screen reader users, it sounds like one. The screen reader announces "button" after reading the label. This audio cue tells someone they can activate it.

This is the audible experience. It's shaped entirely by the HTML we write.

How HTML becomes speech

Screen readers don't read your source code directly. They read from something called the accessibility tree, a simplified version of the page that contains only the information assistive technologies need.

The browser builds this tree from your HTML. It extracts:

Names: What is this thing called?
Roles: What type of thing is it?
States: What condition is it in?

These three pieces of information form the foundation of every screen reader announcement.

Names, roles, and states in practice

Consider a simple checkbox:

<input type="checkbox" id="terms" checked />
<label for="terms">I agree to the terms</label>

A screen reader might announce: "I agree to the terms, checkbox, checked."

Name: "I agree to the terms" (from the label)
Role: "checkbox" (from the input type)
State: "checked" (from the checked attribute)

The announcement tells someone everything they need to know to understand and interact with this control. They know what it's for, what it is, and what condition it's in.

Native HTML speaks fluently

Standard HTML elements come with built-in accessibility. A <button> announces as a button. A <nav> announces as navigation. A <table> announces as a table with row and column information.

<button>Send message</button>

Screen reader announces: "Send message, button."

No extra work required. The browser knows what a button is and communicates that to assistive technologies automatically.

This is why semantic HTML matters so much. When we use the right elements, we get the right announcements for free.

When HTML falls silent

Problems arise when we ignore semantics. A <div> with a click handler might look like a button, but it sounds like nothing:

<div class="btn" onclick="sendMessage()">Send message</div>

Screen reader announces: "Send message."

No role. No indication that this is interactive. Someone listening to the page has no idea they can click this. The visual design speaks, but the HTML stays silent.

Making silent elements speak

When you cannot use native HTML, WAI-ARIA lets you add the missing information:

<div
  class="btn"
  role="button"
  tabindex="0"
  onclick="sendMessage()"
  onkeydown="if(event.key==='Enter'||event.key===' ')sendMessage()"
>
  Send message
</div>

Screen reader announces: "Send message, button."

The role="button" tells assistive technologies this is a button. The tabindex="0" makes it keyboard focusable. The keydown handler makes Enter and Space activate it.

But notice how much work this takes compared to just using a <button>. Native HTML remains the better choice whenever possible.

The danger of ARIA

ARIA is powerful but risky. It can make things worse when used incorrectly.

<button role="link">Submit form</button>

This button now announces as a link. Someone expecting link behaviour will be confused when it acts like a button. The audible experience no longer matches reality.

Use ARIA only when necessary, and test thoroughly when you do.

Designing the audible experience

Good HTML design means thinking about what people will hear, not just what they'll see.

Ask yourself:

Does this element have a clear name?
Does the role match what it does?
Are state changes communicated?

Clear names

Every interactive element needs an accessible name. For buttons and links, visible text usually provides this. For icons and images, you need alternatives:

<button aria-label="Close dialog">
  <svg aria-hidden="true"></svg>
</button>

This is an icon button with an accessible name.

Screen reader announces: "Close dialog, button."

Without the aria-label, this would announce as just "button", leaving someone guessing what it does.

Matching roles

The role should match the behaviour. If it navigates, use a link. If it triggers an action, use a button. If it's a list of options, use radio buttons or a select.

This looks like tabs but sounds like buttons:

<div class="tabs">
  <button class="tab">Overview</button>
  <button class="tab active">Details</button>
  <button class="tab">Reviews</button>
</div>

This sounds like tabs:

<div role="tablist">
  <button role="tab" aria-selected="false">Overview</button>
  <button role="tab" aria-selected="true">Details</button>
  <button role="tab" aria-selected="false">Reviews</button>
</div>

The second version tells screen reader users this is a tabbed interface, with one tab currently selected. The audible experience matches the visual one.

Communicating state

When things change, screen reader users need to know. A disclosure that expands? Announce whether it's expanded or collapsed:

<button aria-expanded="false" aria-controls="details">Show details</button>
<div id="details" hidden>More information here.</div>

When aria-expanded changes to "true", screen readers announce the new state. Someone listening knows the content is now visible.

Content as conversation

Think of your page as a conversation with someone who cannot see it. You need to tell them:

What things are (roles)
What things are called (names)
What condition things are in (states)
How things are organised (structure)

Headings create a table of contents people can navigate. Lists group related items. Landmarks define regions of the page.

<nav aria-label="Main">
  <ul>
    <li><a href="/">Home</a></li>
    <li><a href="/about">About</a></li>
    <li><a href="/contact">Contact</a></li>
  </ul>
</nav>

Someone can jump directly to "Main navigation" and hear a list of three links. The structure itself communicates meaning.

Testing the audible experience

You cannot design a good audible experience without listening to it. Turn on a screen reader and hear your page:

VoiceOver on macOS: Command + F5
NVDA on Windows: Free download from nvaccess.org
VoiceOver on iOS: Settings, then Accessibility, then VoiceOver
TalkBack on Android: Settings, then Accessibility, then TalkBack

Listen to your forms. Navigate your page by headings. Tab through interactive elements. Does what you hear make sense? Would you understand this page if you couldn't see it?

The limits of automation

Automated testing tools check whether your HTML is valid. They can catch missing labels, invalid ARIA, and structural errors.

But they cannot tell you if your page sounds good. They cannot judge whether "Click here" is a useful link name or whether your heading structure tells a coherent story.

Automated tools catch perhaps 30-40% of accessibility issues. The rest require human judgement. The audible experience is one of those things only humans can evaluate.

Writing HTML is writing for the ear

Every time you write HTML, you're writing something that will be spoken aloud to someone. Your markup has a voice.

A well-structured page sounds organised and navigable. Forms with proper labels sound clear and usable. Interactive elements with correct roles sound familiar and predictable.

A poorly-structured page sounds chaotic. Unlabelled buttons sound like mysteries. Divs pretending to be buttons sound like nothing at all.

The HTML we write creates the audible experience.

When we take care with our markup, we take care of the people who depend on it.