Main Menu

Powered by <TEI:TOK>
Maarten Janssen, 2014-

TEITOK - a Tokenized TEI environment

TEITOK is a web-based platform for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation, initially developed at the Centro de Linguística da Universidade de Lisboa, later at CELGA-ILTEC, and currently maintained at the ÚFAL institute of Charles University, Prague.

The system has a modular design with numerous modules making serving a wide range of different corpus types. Below are some examples of some of those, and the type of corpora TEITOK can deal with. More modules are added frequently, and it is possible to add custom modules as well.

The source is maintained at GitLab and some conversion tools are maintained on GitHub.

GitLab pageFacebook pageGoogle group


Manuscript-based corpora

  • Align your manuscript with your transcript
  • Display each manuscript line with its transcription
  • Transcribe directly from the manuscript
  • Search directly for manuscript fragments
  • Keep multiple editions within the same

Audio-based corpora

  • Align your audio with your transcription
  • Transcribe directly from the audio file
  • Scroll transcription vertical with wave function horizontal
  • Search directly for audio segments

Dependency Grammar

  • Keep dependency relations inside any corpus type
  • Visualize dependency trees for any sentence
  • Edit trees easily
  • Search using dependency relations

Geolocation Coordinates

  • Map documents onto the world map
  • Document are clustered into counted groups
  • Access the documents from the map
  • Compare corpus queries on the world map

Edit from CQP Query

  • Search for words often incorrectly annotated
  • Click on any token in a KWIC list to edit it
  • Edit all results in a systematic way
  • Edit each results individually in a list
  • Pre-modify each result by a regular expression

Stand-off Annotations

  • Adds stand-off annotations to any corpus file
  • Edit using an efficient interface
  • Annotate over discontinuous regions
  • Incorporate annotations into the CQP corpus