We are pleased to announce the release of SenseClusters, a free software
package that does unsupervised discovery of word senses by clustering
together instances of a word (or words) that are used in similar contexts
in raw text. It supports a wide range of clustering techniques based on
both context vectors and similarity matrices.
SenseClusters is flexible, and can be used in any application that
requires clustering of similar instances of text. Examples could include
word sense discrimination, synonymy identification, text classification,
and summarization. It can also be used to implement models such as Latent
Semantic Analysis (LSA).
SenseClusters takes a user through the entire process of unsupervised
learning of word senses, including text preprocessing, feature selection,
context vector and similarity matrix construction, dimensionality
reduction via singular value decomposition (SVD), and clustering via both
agglomerative and partitional algorithms.
SenseClusters provides a great deal of native functionality, and also
provides seamless interfaces to take advantage of a number of powerful
tools, including Cluto (a Clustering toolkit), SVDPACKC (which carries
out singular value decomposition), and the Ngram Statistics Package.
For general information please visit:
For immediate download of the first public release (0.47) please visit:
This is an active project, and the principle designer and lead developer
(Amruta Purandare, firstname.lastname@example.org) and I would be delighted to hear
any comments, requests, or even bug reports that you might have. You can
see some of our future plans in our Todo list, which is distributed with
Ted and Amruta
PS To subscribe to the SenseClusters mailing list/s, visit:
-- # Ted Pedersen http://www.umn.edu/~tpederse # # Department of Computer Science email@example.com # # University of Minnesota, Duluth # # Duluth, MN 55812 (218) 726-8770 #
This archive was generated by hypermail 2b29 : Sun Jan 04 2004 - 22:30:50 MET