<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>32</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Cameron Marlow</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">A Language-Based Approach to Categorical Analysis</style></title><secondary-title><style face="normal" font="default" size="100%">Media Arts and Sciences</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">classification</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2001</style></year></dates><publisher><style face="normal" font="default" size="100%">Massachusetts Institute of Technology</style></publisher><pub-location><style face="normal" font="default" size="100%">Cambridge, MA</style></pub-location><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">With the digitization of media, computers can be employed to help us with the process of classification, 
both by learning from our behavior to perform the task for us and by exposing new ways for us to think 
about our information. Given that most of our media comes in the form of electronic text, research in this 
area focuses on building automatic text classification systems. The standard representation employed by 
these systems, known as the bag-of-words approach to information retrieval, represents documents as 
collections of words. As a byproduct of this model, automatic classifiers have difficulty distinguishing 
between different meanings of a single word. 

This research presents a new computational model of electronic text, called a synchronic imprint, which 
uses structural information to contextualize the meaning of words. Every concept in the body of a text is 
described by its relationships with other concepts in the same text, allowing classification systems to 
distinguish between alternative meanings of the same word. This representation is applied to both the 
standard problem of text classification and also to the task of enabling people to better identify large 
bodies of text. The latter is achieved through the development of a visualization tool named flux that 
models synchronic imprints as a spring network. </style></abstract><work-type><style face="normal" font="default" size="100%">Masters Thesis</style></work-type></record></records></xml>