I would like to thank every person who took the time to answer my
question about synonyms and the use of quantitative methods to analyze
them. I received numerous helpful links and materials (see below for a
summary). Thank you to Mark Turner, Viktor Pekar, Stefan Th. Gries, Rada
Mihalcea, Guy Aston, Stefan Schneider, Antoinette Renouf and Ramesh
I forgot to translate the two words I am interested in: Italian 'perfino'
and 'persino' = 'even' (as a focus particle). They are function words and
therefore it would (maybe?) be even be more surprising to have two
identical forms here. From some preliminary tests, however, they seem very
close indeed. What I am wondering about is, if I use quantitative
measures, such as the z-score, what would be a sufficient difference
between two measures (the point of delicacy or resolution) to distinguish
synonyms from near-synonyms?
Here is a list of articles and links I was referred to. Some of them are
directly downloadable from the Internet.
Christopher D. Manning & Hinrich Schuetze (1999) Foundations of
Statistical Natural Language Processing, MIT Press, Massachusetts, US, Pp.
680. (Chapter 8). With an implementation of the idea at:
Church, Kenneth Ward and Patrick Hanks. 1990. Word Association Norms,
Mutual Information, and Lexicography. Computational Linguistics 16:22-29.
Church, Kenneth Ward, William Gale, Patrick Hanks and Donald Hindle.
1991. Using Statistics in Lexical Analysis. In: Zernik, Uri (ed.).
Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon.
Hillsdale, NJ: Lawrence Erlbaum, p. 115-164.
Church, Kenneth Ward, William Gale, Patrick Hanks, Donald Hindle and
Rosamund Moon. 1994. Lexical Substitutability. In: Atkins, Beryl T. Sue
and Antonio Zampolli (eds.). Computational Approaches to the Lexicon.
Oxford, New York: Oxford University Press, p. 153-177.
Dekang Lin et al.: "Identifying Synonyms among Distributionally Similar
Words" at: http://www.cs.ualberta.ca/~lindek/papers.htm.
Diana Zaiu and Graeme Hirst's work on "near-synonymy" (and also Phil
Edmonds). More material is available from their CL group web page
Gries, Stefan Th. 2001. A corpus-linguistic analysis of -ic and -ical
adjectives. ICAME Journal 25:65-108.
Gries, Stefan Th. 2003. Testing the sub-test: A collocational-overlap
analysis of English -ic and -ical adjectives. International Journal of
Corpus Linguistics 8(1):31-61. (which will come out in a few weeks or so).
Krishnamurthy R. 1996: Ethnic, Racial and Tribal: The Language of Racism?
(in Texts and Practices, eds. Caldas-Coulthard & Coulthard, Routledge,
Krishnamurthy, R. 2000: Collocation: from silly ass to lexical sets
(in Heffer, C. and Sauntson, H. (eds) 'Words in Context: A Tribute to
John Sinclair on his Retirement'. Birmingham 2000.
Krishnamurthy, R. (forthcoming): Corpus, Collocation, and Lexical Sets, in
Proceedings of HUSSE (Hungarian Society for the Study of English) Thematic
Conference, "Empirically Based Approaches to Linguistic Description",
University of Debrecen, Hungary] [about sad/unhappy]
Many thanks again,
Anna-Maria De Cesare
On Fri, 16 May 2003, Anna-Maria De Cesare wrote:
> I am currently working on two Italian words ('perfino' and 'persino'),
> which I suspect to be absolute synonyms. My goal is to demonstrate their
> synonymy by using quantitative methods (I will use the Italian corpus
> CORIS, not yet pos-tagged).
> I was wondering if anybody could refer me to similar studies or could give
> me a hint of how to procede. Any suggestion if welcome!
> Thank you very much in advance for your time,
> Anna-Maria De Cesare
> Visiting Scholar
> Dept. of Romance Languages
> and Literatures
> University of Chicago
This archive was generated by hypermail 2b29 : Fri May 23 2003 - 16:35:56 MET DST