This a brief summary of responses to my query regarding automatic
thesaurus generation from large corpora. I am very grateful to Bob
Krovetz, Johan Hagman, Sara Rydin and Bill Mann for helpful
The following people worked or are working on automatic generation of
meaningful hierarchical thesauri:
Sharon Caraballo (esp. her recent Ph.D. dissertation available from
her home page);
Marti Hearst (a 1992 paper available from Marti Hearst's home page);
Gregory Grefenstette (I found it more difficult to locate relevant
Johan Hagman (results will be presented at JADT
Sara Rydin (started work on this for her Ph.D. thesis).
Virtually all of the work I located concentrates on automatic
detection of hyponymy/hypernymy relations on the basis of textual
clues such as "X, including x, y and z" (this normally implies that x,
y and z are kinds of X).
Bill Mann also mentions the the Oingo search engine which, it is
claimed, actually takes advantage of such techniques.
-- Adam P.
This archive was generated by hypermail 2b29 : Sun Jan 27 2002 - 11:15:17 MET