WCAG 1.2.4 requires that captions are provided for all live audio content in synchronized media — in plain terms, any real-time event you broadcast with sound, like a webinar, product launch, or live stream, needs captions that appear as it happens. Those captions must carry the dialogue plus speaker identity and meaningful sounds, with no more than a broadcast delay.

What Success Criterion 1.2.4 actually says

The normative text from the W3C is one line: “Captions are provided for all live audio content in synchronized media.” It is a Level AA criterion — the conformance tier almost every regulation, contract, and lawsuit settlement points to in practice. “Synchronized media” means audio or video synchronized with another format, like a live video feed with a soundtrack. The word doing all the work here is live.

W3C defines live as “information captured from a real-world event and transmitted to the receiver with no more than a broadcast delay” — not enough time to script and proofread captions the way you would for a recorded file. The intent is to “enable people who are deaf or hard of hearing to watch real-time presentations.” And like recorded captions, live captions must “identify who is speaking and notate sound effects and other significant audio,” not just transcribe the words.

Live is not the same as prerecorded

This is why 1.2.4 is a separate criterion from 1.2.2:

  • Prerecorded (1.2.2, Level A): the audio already exists. You have hours or days to write, edit, and verify a WebVTT file before anyone watches.
  • Live (1.2.4, Level AA): the audio is created in the moment. Captions must be produced as people speak, with no second take.

There is no pre-written caption file to fall back on. That real-time production problem — not a missing <track> element — is the core of meeting 1.2.4, and why the bar is Level AA rather than A.

Who this helps

Live captions exist primarily for people who are Deaf or hard of hearing — roughly 48 million Americans have some degree of hearing loss, per the CDC. Without real-time captions, a live webinar or town hall is inaccessible to them as it happens; a recording posted the next day does not let them participate, ask questions, or react in the moment. W3C puts the benefit plainly: people who are deaf or have a hearing loss “can access the auditory information in the synchronized media content through captions.”

The audience is wider, too: attendees in sound-off offices, non-native speakers who read the language more comfortably than they hear it, and anyone parsing unfamiliar names, product terms, or accents in a fast-moving talk.

Concrete examples that fall under 1.2.4

If you broadcast any of these with audio, 1.2.4 applies:

  • Live webinars and online workshops — the most common case for small businesses.
  • Product launches and live demos streamed to customers.
  • Live streams on your site, YouTube Live, LinkedIn Live, or similar.
  • Virtual conferences, town halls, and AMAs with a live audience.
  • Live web casts — W3C’s own example is “a news organization [that] provides a live, captioned web cast.”

W3C also cites an orchestra web cast using CART captioning that “captures lyrics and dialog as well as identifies non-vocal music by title, movement, composer” — a reminder that meaningful non-speech audio counts here too.

How to provide live captions

There are two realistic paths, and they differ sharply in quality:

1. CART (Communication Access Realtime Translation) — the reliable option. A trained captioner listens to your event and types a verbatim transcript in real time, which streams to viewers as on-screen captions. W3C’s sufficient technique G9 is “creating captions for live synchronized media,” paired with either open captions (G93) burned into the video or closed captions (G87) the viewer can toggle. CART is the gold standard for webinars, launches, and broadcasts because a human handles names, jargon, and crosstalk accurately.

2. Automatic speech recognition (ASR) — with caveats. Zoom, Microsoft Teams, and YouTube Live all offer built-in auto-captions you can turn on in a click. But raw ASR routinely mangles proper nouns, technical terms, and homophones, and adds latency. W3C’s caption guidance is consistent: automatic captions “do not meet user needs or accessibility requirements unless they are confirmed to be fully accurate.” For a casual internal stream, edited ASR may pass; for a public launch or anything legally exposed, book a CART captioner.

Setup tips: schedule the captioner early and share names, agenda, and a glossary so they can prep; in Zoom, enable live captioning or connect a CART provider via the caption API; in Teams, turn on live captions and add a professional captioner where accuracy matters; on YouTube Live, stream captions from a CART encoder rather than relying on auto-captions.

How to test for 1.2.4

You test this differently from recorded video, because there is no file to inspect — you verify the live experience:

  1. Inventory your live events — every webinar, launch, stream, and town hall you broadcast with audio.
  2. Confirm captions appear in real time. During the event or a rehearsal, turn captions on and check the on-screen text tracks the audio with no more than a broadcast delay.
  3. Check accuracy and speaker labels. Read along: are names, terms, and punctuation right, and is it clear who is speaking?
  4. Confirm meaningful sounds are conveyed where they matter to understanding.
  5. Verify your provider and workflow. Is a CART captioner booked, or the platform configured to stream accurate captions? Automated scanners cannot judge a live event — this is a planning-and-rehearsal check a human runs before you go live, and exactly the gap a hands-on accessibility audit closes.

Live captioning is not a fringe concern — it sits at the center of major web-access litigation. In the National Association of the Deaf’s cases against Harvard and MIT, the complaints targeted both inaccurate captions and uncaptioned live-streamed content; both universities settled via consent decrees the NAD calls among the most comprehensive online-accessibility requirements in higher education. More broadly, the U.S. Department of Justice has long affirmed that the ADA reaches online content, and WCAG 2.1 AA — the tier 1.2.4 lives in — is the practical standard cited in ADA Title III web claims and in government procurement.

This is general information, not legal advice. For your specific exposure, consult a qualified attorney.

Fixing it the right way

No accessibility overlay can caption a live event. Overlay widgets bolt a toolbar onto your page; they cannot listen to your webinar and produce accurate real-time text, let alone identify speakers or describe sounds. Meeting 1.2.4 is an operational fix, not a script you paste in.

Real remediation means building captioning into how you run events: booking a CART captioner or configuring your platform to stream verified captions, prepping a glossary, and rehearsing the caption feed before you go live. It belongs in the same plan as your recorded 1.2.2 and 1.2.1 work and any other screen-reader considerations. Not sure which events are exposed? Start with a free scan, then map the rest against the full list of success criteria with our remediation team.