Tuesday, March 9, we got the next update on YouTube’s automated captioning efforts. I heard it on NPR’s “All Things Considered” afternoon program, in which Robert Siegel interviewed Ken Harrenstien of Google with a (female) interpreter providing voice for the Google engineer.

Audio and transcript are available at http://www.npr.org/templates/story/story.php?storyId=124501330.

Harrenstien acknowledges that automated captioning today stumbles on proper names, including trademarks and product names:  ”YouTube” that comes out “You, too!” And automated captioning has difficulty with videos that have music or other sounds in the background. But, he characterizes himself as a technology-optimist, anticipating that in 10 years things will be much improved.

Benefits of captioning

Like “curb cuts” which have become the symbol indicating that solutions for disabled people (here, those in wheelchairs) resolve needs for others (strollers, roll-aboard luggage, shopping carts), captions have benefits that extend beyond hearing impairment.

  • Deaf and hearing impaired people can enjoy the huge inventory of videos on YouTube. (The still frame that opens this post is from an announcement by President Obama in response to the Chilean earthquake. Making emergency and other time-sensitive news available to those who cannot hear meets the requirements of laws and regulations in the US. And more importantly, it meets the moral or ethical standards we expect from a civilized society where we include everyone in the polity.)
  • If you’re in a noisy environment or located close to others who will be bothered by the audio, you can figure out what the video is saying even without benefit of headphones
  • Small companies can afford to provide captions on their webcasts, often the heart of learning about new products
  • Non-native speakers of English have a much better chance of understanding speech at ordinary (rapid) rates with the assist of captions
  • Captions provide input to machine translation services, so that there soon will be captions in other languages besides English as well; as automated speech-to-text technology improves, we’re going to see other input languages as well
  • Captions provide much better input to (current) search technology than speech does, so there’s hope of finding segments of videos that might not appear in written form

Professional captioners need not despair

I read the YouTube blog post of March 4 and the comments following it, and recalled the announcement of the limited trial with selected partners last November.  James expresses concern in his comment about the recent YouTube announcement that people, like him, who earn their living as captioners for post-production houses will lose their jobs as a result of the automated captioning.  My response seconds HowCheap’s comment that professional captioners will continue to find work both as editors of the automated speech-to-text and for organizations prefer doing their own captioning. Organizations that produce professional quality video typically start from a written script, adjust for the few changes that happen in the spoken version, and then set the timing of the text with the video.

The huge number of videos on YouTube are uploaded by individuals or by small organizations who may not be aware of the benefits from captioning, and likely don’t know about the tools available.  According to YouTube’s fact sheet: “Every minute 20 hours of video is uploaded to YouTube.” That’s a volume that is beyond the capacity of professional captioners and the organizations that employ them.

A proposal for improving the quality of captions

How shall we improve the quality of automatically produced captions?

I’d like to see interpreter training programs (ITPs) make editing automated captions a course assignment, a program requirement, or a component of an internship. Engagement with spoken language, not one’s own, is a challenge.  People phrase things in ways you don’t; they use unfamiliar vocabulary and proper names (streets, towns, people, products) that I need to look up.  Both ITPs for training sign language interpreters and those for people learning to interpret between 2 spoken languages may allow entry to students whose skills in listening, writing or spelling may be lacking.  How many caption-editing assignments are enough? Shall we also coordinate quality checks by others in the same or a different program?  Such assignments will guide students toward greater appreciation for the challenges of speech in online settings, with a task that provides an authentic service.

VRS and VRI

In the case of ITPs for sign language interpreters, the improved listening to online speech is great preparation for work settings such as VRS and VRI.  Video Relay Service (VRS) in the US is regulated by the FCC: deaf signers who cannot use the telephone (because their speech is not intelligible and they cannot hear well enough to understand speech over the phone) make use of intermediaries (interpreters) to communicate with hearing non-signers. (Think of simple tasks such as calling the school to notify them that your child will be absent; scheduling a haircut; ordering a pizza for delivery, not to mention more complex transactions involving prescriptions, real estate contract negotiation, billing disputes.)  Video Remote Interpreting (where the deaf and hearing parties are physically together, with the interpreter remote from them) is a service with similar requirements for the interpreter (listening to speech over a phone or data line, and rendering accurate translations in real time).

Broad multi-disciplinary open source content quality

Programs training instructors in English as a Second Language (ESL) could also participate.  Students in speech therapy and audiology would benefit from both the direct engagement with spoken language “in the wild” and with future colleagues in other disciplines. There are advantages to engaging a variety of people who are studying for professions that emphasize expertise in spoken and written English.

Looks like an open source content development effort to me. Yes, it will require a little bit of coordination, but not terrific overhead. How about it, ITP program directors?