If your files have a reasonable length, then did you consider storing
pointers to files only, and resolving the positions inside files
automatically on look-up?
Of course, the overhead may be too big if tokens have to be identified
on the fly, but I am using this approach with a tokenised corpus, and
speed is o.k.
Hope that helps,
Mickel Grönroos wrote:
> Does anybody have any experience of using a relational database to store
> index information for a concordance service?
> I'm building a test interface for the Bank of Finnish and plan to store
> pointers to specific locations in the corpus in a database column, e.g.
> something like 344:2555 would point to corpus file number 344, byte
> position 2555.
> The obvious problem is how one should handle common words, as every
> occurence of a specific type needs a pointer of its own. So, if the
> frequency of some common word is, say 50,000 this would generate 50,000
> pointers as well. Putting these in one field in a column seems to be
> rather foolish. Does anybody know how to avoid this?
-- Tylman Ule, Tel. 07071/29-78490, Fax 07071/550520 Seminar für Sprachwissenschaft, Universität Tübingen Kleine Wilhelmstraße 113, 72074 Tübingen
This archive was generated by hypermail 2b29 : Thu Mar 30 2000 - 10:05:37 MET DST