Corpora: POS tagging of spoken corpora (summary)

From: Jean Veronis (
Date: Tue Sep 19 2000 - 15:40:16 MET DST

  • Next message: Jean Veronis: "Corpora: POS tagging of spoken corpora (summary)"

    A few people responded to my query (thanks to Lars Borin, Mats Eeg-Olofsson
    , Andrew Harley , Joakim Nivre, Paul Rayson, Geoffrey Sampson, Maria
    Wolters and Jakub Zavrel), but as I suspected, there seems to exist only a
    handful of publications on this topic:

    Eeg-Olofsson, M. (1991). Word-Class Tagging: Some Computational Tools.
    Doctoral dissertation. University of Göteborg: Department of Computational

    Garside, R. (1995) Grammatical tagging of the spoken part of the British
    National Corpus: a progress report. In Leech, G., Myers, G. and Thomas, J.
    (eds) (1995), Spoken English on Computer: Transcription, Mark-up and
    Application. pp.161-7

    Garside, R. (1995) English for the Computer, Clarendon Press (ch. 6).

    Leech, G. N., Myers, G., & Thomas, J. (1995). Spoken English on Computer:
    Transcription, mark-up and application. London: Longman.

    Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T. & Sofkova, S. (1996)
    Tagging Spoken Language Using Written Language Statistics. In Proceedings
    of the 16th International Conference of Computational Linguistics
    (COLING-96). Copenhagen: Center for Language Technology. [Available at:

    Nivre, J. & Grönqvist, L. (in press) Tagging a Corpus of Spoken Swedish. To
    appear in International Journal of Corpus Linguistics. [Available at:

    Rahman, A. & Sampson, G.R. "Extending grammar annotation standards to
    spontaneous speech", in J.M. Kirk, ed., Corpora Galore: Analyses and
    Techniques in Describing English_, Rodopi (Amsterdam), 1999, pp. 295-311.

    Sampson, G. R. (1995). English for the Computer: The SUSANNE Corpus and
    Analytic Scheme. Oxford: Clarendon Press.

    Smith, N. (1997) Improving a tagger, in Garside, R., Leech, G., and
    McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer
    Text Corpora. Longman, London, pp. 137-150.

    Van Eynde, F., Zavrel J. & Daelemans, W. (2000). Part of Speech Tagging and
    Lemmatisation for the Spoken Dutch Corpus. In: M. Gavrilidou, G.
    Carayannis, S. Markantonatou, S. Piperidis & G. Stainhaouer (eds.),
    Proceedings of the Second International Conference on Language Resources
    and Evaluation. European Language Resources Association, Paris, 1427-1433.

    CANCODE project: <
    CHRISTINE Corpus: <>
    Stefan Rapp's thesis: <>

    This archive was generated by hypermail 2b29 : Tue Sep 19 2000 - 15:38:54 MET DST