Googling “Gone with the Wind”

December 17, 2010

Logos of You Tube and Next New NetworksThe New York Times reported yesterday that YouTube, a Google subsidiary, is in negotiations to purchase Next New Networks, the leading provider of original video programming for the Internet. While it is too early to comment because it is not a done deal, this has some interesting implications for the deaf and hard-of-hearing community. Even if this deal falls through, it would not be surprising if there are future efforts to acquire companies that independently produce original video content for the Web.

As a video site best known for hosting mom-and-pop video content, YouTube‘s possible acquisition of Next New Networks would effectively bring YouTube into more direct competition with commercial video content distributors like Hulu and Netflix. In the accessibility space, what separates YouTube from Netflix is its pioneering automatic transcription technology, which automatically captions any spoken dialogue on its videos without any human intervention. Does it mean that automatic transcription will extend to content that is typically the domain of Hulu and Blip.tv? As it stands today, independent web providers have not responded well to requests to caption their content.

While automatic transcription is not a sure thing today — its accuracy is far below acceptable access standards for closed-captioning, and it may take years for the accuracy to approach that of an average-quality human captioner — there is the possibility that deaf and hard-of-hearing people could start to enjoy so-called Webisodes on a par with their hearing peers, with little or no cost to the Webisode providers.

In the last ten years, the deaf and hard-of-hearing community has found itself increasingly shut out of most video content on the Internet. Prior to the advent of the Internet as a mass medium in the mid-1990’s, all commercial video content not shown at movie theaters was shown on only one other medium — the television set. The Americans with Disabilities Act of 1990, the Television Decoder Circuitry Act of 1990, and a comprehensive closed captioning infrastructure made it possible for the deaf and hard-of-hearing to enjoy closed captioned video content on almost the same terms as their hearing peers. There was no other major video distribution medium besides the television and the silver screen, so the existing captioning infrastructure was adequate to address all available video content then as required by law.

While the pre-broadband Internet had some negative impact on video accessibility for the deaf, the arrival of YouTube and the wide adoption of broadband Internet resulted in an explosion of video content which is not closed captioned, and far outnumbers traditional TV and movie content. While most online video content is non-commercial (i.e., uploaded by individuals like you and me) and thus out of the realm of commercial captioning agencies, there are hundreds of thousands, possibly millions, of hours of commercially produced online video content that is not required to be captioned under existing pre-2010 laws. The recently signed Twenty-First Century Communications and Video Accessibility Act of 2010, the first major law ever to mandate captioning for Internet-broadcast videos, only covers online content originally broadcast on television. However, even if all online video content is required to be captioned, the current infrastructure cannot support captioning of all video content. The lack of a digital/online video broadcasting format standard and the sheer amount of online commercial video content are major roadblocks.

From a business standpoint, it makes practical sense that YouTube would consider this acquisition. It is the Wild West in the online video programming arena — network-supported Web channels are battling with scores of independent web networks that have sprung up in the past several years. Yet, in any industry that sees too many companies pursuing the same set of consumers, it is inevitable that at some point there will be fewer existing web networks, as many of them will either die out, or be acquired by larger companies including traditional cable and broadcast networks, and major video content distributors such as Google.

The coming industry consolidation may make it easier for the remaining companies to support captioning standards for their video content. Whether it is automatic transcription, or other pieces of technology that can address the issue of captioning millions of hours of online video content, it is highly likely that something will be done to effectively address online video accessibility for the deaf and hard-of-hearing. There is a strong business case for this: transcribing all video content will make the video content searchable. As the world’s leading search engine, Google — the parent of YouTube — already recognizes the value of entering the video programming arena.

Share

    { 4 comments… read them below or add one }

    Dianrez December 18, 2010 at 12:34 am

    Automatic transcription on YouTube at present is far from acceptable; in most cases the gibberish is like attempting to understand a word salad. To mention it in the context of captioning Internet content is risking the industry jumping for easy answers to avoid the whole issue.

    We should never, ever accept less than optimal captioning. Automatic transcription has a long way to go to reach usable levels and almost always requires editing by human eyes.

    Reply

    Michael Janger December 18, 2010 at 10:59 am

    Dear Diane,

    Great to hear from you. I agree with you that automatic transcription today is not satisfactory at all. If we want to accept optimal captioning, then automatic transcription should pass a certain quality standard. I believe we should legislate this standard into law to force the content industry to develop innovative solutions — whether completely automatic, completely human, or as you suggested, a combination of the two — that meet this standard.

    The sheer number of video content on the Internet represents a real problem, and something that I believe a 100% human-based solution will not adequately address. On the other hand, a key business driver pushing the industry to approach 100% accuracy is the need for video content to become searchable. That is Google’s holy grail, and I am willing to bet that other companies in the search space want it too. Almost all text on the Web has become searchable. Video usage on the Web is exploding, but video searchability is lagging far behind text. Incorporating all dialogue into searchable text would absolutely change the way video search is done, and truly benefit the deaf and hard-of-hearing along the way.

    But that is not enough. Optimal captioning includes description of environmental sounds such as doors opening and phones ringing, and proper placement of captions with speakers on the screen. Automatic transcription will not address that for a while, even if it achieves 100% spoken language accuracy. Still, I am certain there could be intelligent software that could recognize environmental sounds — until then, some human intervention would be necessary to deliver a complete captioning experience for the deaf and hard-of-hearing. The question is, will there be enough warm bodies to optimally caption the sheer multitude of video content on the Internet?

    Google’s introduction of automatic transcription last year is a great start. I read somewhere that Google is using feedback loops and algorithms to constantly reevaluate videos transcribed with automatic transcription, and using that to continually improve accuracy in future videos. It’s an interesting approach.

    Any company working toward 100% accuracy will see benefits beyond improving the video experience for the deaf and hard-of-hearing. There is no good business case for taking the “easy way out” and delivering a substandard captioning experience.

    -Michael Janger

    Reply

    Michael Janger December 18, 2010 at 11:27 am

    A friend just shared with me this article on speech recognition: http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition

    You should also read the comments below that article. Very interesting discussion on this topic!

    Reply

    Christian Vogler December 18, 2010 at 5:55 pm

    The death of speech recognition is greatly exaggerated. For one thing, people can and do use it in limited, dictation-style contexts. A friend of mine is a computer science professor, suffers from carpal tunnel syndrome, and uses speech recognition to do his work. Another thing is that dead ends in research are nothing new. That doesn’t mean that we have to abandon all hope for substantial progress in the future. Research progresses frequently in the form of short bursts of intense activity that revolutionize a field, alternating with long periods of little or only incremental improvements.

    Reply

    Leave a Comment

    { 1 trackback }

    Previous post:

    Next post: