test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works

Advanced Search
& Collections
Issue Date   
Sign on to:   
Receive email
My Account
authorized users
Edit Profile   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/17257

Title: A Graph Approach to Measuring Text Distance
Authors: Tsang, Vivian
Advisor: Stevenson, Suzanne
Department: Computer Science
Keywords: Graph Approaches in NLP
Semantic Distance
Distributional Methods
Ontological Methods
Issue Date: 26-Feb-2009
Abstract: Text comparison is a key step in many natural language processing (NLP) applications in which texts can be classified on the basis of their semantic distance (how similar or different the texts are). For example, comparing the local context of an ambiguous word with that of a known word can help identify the sense of the ambiguous word. Typically, a distributional measure is used to capture the implicit semantic distance between two pieces of text. In this thesis, we introduce an alternative method of measuring the semantic distance between texts as a combination of distributional information and relational/ontological knowledge. In this work, we propose a novel distance measure within a network-flow formalism that combines these two distinct components in a way that they are not treated as separate and orthogonal pieces of information. First, we represent each text as a collection of frequency-weighted concepts within a relational thesaurus. Then, we make use of a network-flow method which provides an efficient way of measuring the semantic distance between two texts by taking advantage of the inherently graphical structure in an ontology. We evaluate our method in a variety of NLP tasks. In our task-based evaluation, we find that our method performs well on two of three tasks. We introduce a novel measure which is intended to capture how well our network-flow method perform on a dataset (represented as a collection of frequency-weighted concepts). In our analysis, we find that an integrated approach, rather than a purely distributional or graphical analysis, is more effective in explaining the performance inconsistency. Finally, we address a complexity issue that arises from the overhead required to incorporate more sophisticated concept-to-concept distances into the network-flow framework. We propose a graph transformation method which generates a pared-down network that requires less time to process. The new method achieves a significant speed improvement, and does not seriously hamper performance as a result of the transformation, as indicated in our analysis.
URI: http://hdl.handle.net/1807/17257
Appears in Collections:Doctoral
Department of Computer Science - Doctoral theses

Files in This Item:

File Description SizeFormat
Tsang_Vivian_Y_200811_PhD_thesis.pdf1.28 MBAdobe PDF

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.