Real-Time Captioning is Taken up a Notch
By Sandra L. Howe
People with hearing disabilities may one day soon be holding small, palm-sized computers at lectures, family events, boardrooms, banks and hospital rooms – and “listening” to other people using these devices.
It’s already starting to happen at the Alexander Graham Bell Museum in Cape Breton, where deaf and hard of hearing people will carry these portable units to translate the guide’s spoken word into real-time text. If this pilot project is successful, other museums across Canada are likely to adopt this approach.
Voice recognition has made huge strides in the past few years, and the increased ease and convenience of using this technology is making it much more appealing to colleges and universities.
The Liberated Learning Program in Halifax, Nova Scotia, was the first to incorporate this innovative approach into several classes. Since 1998, several Saint Mary’s University (SMU) professors have been speaking in class while a huge screen behind them translates their words into text for students who are deaf or hard of hearing. University College of Cape Breton (UCCB) is not far behind; next year, its professors will start utilizing this new technology as well.
Only a couple of years ago, SMU professors had to wheel cumbersome specialized computers into their classrooms and hook them up in order to use voice-recognition software programs. Today, they carry wireless and networked laptops. Mainstream desktop computers have caught up to the point where professors can now use their own regular computers to record and recognize new vocabulary words. With a click of a button, those changes materialize directly into the classroom computers themselves. Nor do they have to worry about file management; transcripts of their lectures are automatically saved to a network.
Traditionally, the speaker had to say loudly, “Period!” “Comma!” and “New paragraph!” when using voice-recognition software in order to punctuate the sentence. This was impractical for the professor, yet without it, the lecture was written as one long, unbroken stream of text that students found impossible to decipher.
Now, the tiny pause a speaker normally takes after completing a phrase or sentence serves as a cue, telling the computer to leave a two-line space. This makes the lecture far more readable without encumbering the lecturer.
The plain-text files of several years ago have now become attractive web pages and PowerPoint slides that enable a student to click on any word and listen to the audio part of the lecture as they read. These files are also posted on the Internet for students to download.
Eventually, voice recognition is expected to evolve to the point where it will be able to immediately understand and translate average human speech, without prior training. At present, the technology is “in transition” to this ultimate goal. However, SMU professors no longer have to invest hours training the software to understand their own voices. Now, a third party can be brought in to manually translate a lecture. That transcript is used by the software to recognize the professor’s speech and learn to translate it accurately.
Although deaf and hard of hearing students are the most obvious users of this technology at SMU and UCCB, other students also rely on it, such as those with learning disabilities or dexterity problems, or those learning English as a second language. “And for that matter, what student wouldn’t want a complete transcript of the lecture?” Dr. David Leitch, the program’s director, points out.
Leitch says that this technology has increased the students’ independence, as they no longer constantly need to ask classmates to take notes for them. And since all the information is presented directly to the student, they can decide what is and isn’t relevant, rather than relying on someone else’s judgment.
So how well does this software translate words? A demonstration reveals that it does well with multi-syllable words, which provide more auditory data and are thus more recognizable. One-syllable words, such as “of,” “that” and “for,” are more difficult to translate. This can cause confusion, although students apparently are able to decode these errors with practice. Often the software also converts any stuttering or coughing into one-syllable words. Accuracy is typically in the 80- to 90-per-cent range, going up as high as 98 per cent when the speaker is reading from a text. Well-enunciated, well-organized, moderately paced speech seems to result in maximum overall accuracy.
American Sign Language (ASL) interpreters are still present at SMU, even in classes where this technology is used. Their presence benefits Deaf students whose primary language is ASL, and it also means that class discussions are translated in addition to the lectures. A third reason is that, while transcripts are corrected prior to being posted online for students to review, these corrections come too late for those who want to engage in real-time class discussions.
“Sooner rather than later, this voice recognition technology will be available in any classroom that is networked and has a reasonable projector,” says Dr. Leitch. “This means that any student with a disability will be able to demand that their educational institution makes this technology available.” Minimum computer requirements are Windows XP and a modern processor that is Pentium III or higher.
The voice technology has improved so much that Leitch predicts that eventually it will be seen everywhere. It will be used by the general public, not just people with disabilities.
The uses will continue to snowball,” he says. “In fact, I can imagine the day is coming when the application will also be available for all cell phones.”
(Sandra L. Howe is a freelance writer living in Dartmouth, Nova Scotia.)
You must be logged in to add a comment.
Comments