test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works

Advanced Search
& Collections
Issue Date   
Sign on to:   
Receive email
My Account
authorized users
Edit Profile   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/24924

Title: Summarizing Spoken Documents Through Utterance Selection
Authors: Zhu, Xiaodan
Advisor: Penn, Gerald
Department: Computer Science
Keywords: speech
Issue Date: 2-Sep-2010
Abstract: The inherently linear and sequential property of speech raises the need for ways to better navigate through spoken documents. The strategy of navigation I focus on in this thesis is summarization, which aims to identify important excerpts in spoken documents. A basic characteristic that distinguishes speech summarization from traditional text summarization is the availability and utilization of speech-related features. Most previous research, however, has addressed this source from the perspective of descriptive linguistics, in considering only such prosodic features that appear in that literature. The experiments in this dissertation suggest that incorporating prosody does help but its usefulness is very limited—much less than has been suggested in some previous research. We reassess the role of prosodic features vs. features arising from speech recognition transcripts, as well as baseline selection in error-prone and disfluency-filled spontaneous speech. These problems interact with each other, and isolated observations have hampered a comprehensive understanding to date. The effectiveness of these prosodic features is largely confined because of their difficulty in predicting content relevance and redundancy. Nevertheless, untranscribed audio does contain more information than just prosody. This dissertation shows that collecting statistics from far more complex acoustic patterns does allow for estimating state-of-the-art summarization models directly. To this end, we propose an acoustics-based summarization model that is estimated directly on acoustic patterns. We empirically determine the extent to which this acoustics-based model can effectively replace ASR-based models. The extent to which written sources can benefit speech summarization has also been limited, namely to noisy speech recognition transcripts. Predicting the salience of utterances can indeed benefit from more sources than raw audio only. Since speaking and writing are two basic ways of communication and are by nature closely related to each other, in many situations, speech is accompanied with relevant written text. Richer semantics conveyed in the relevant written text provides additional information over speech by itself. This thesis utilizes such information in content selection to help identify salient utterances in the corresponding speech documents. We also employ such richer content to find the structure of spoken documents—i.e., subtopic boundaries—which may in turn help summarization.
URI: http://hdl.handle.net/1807/24924
Appears in Collections:Doctoral
Department of Computer Science - Doctoral theses

Files in This Item:

File Description SizeFormat
Zhu_Xiaodan_201006_PhD_thesis.pdf791.88 kBAdobe PDF

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.