Main Menu Powered by <TEI:TOK> |
TEITOK Help PagesBatch EditTEITOK is meant not only to distribute corpora, but also to maintain and edit them. Editing in TEITOK is made easy: if you spot an error in any type of annotation, you can just click on the word and correct the error. You can not only do that in the document view, but also from the KWIC list, meaning you can search for very specific contexts where you know there are errors. And you can edit in the verticalized view to edit fields in a more structural way. But the most powerful editing mode in TEITOK is the multi-edit: you use a CQL query to edit multiple tokens in one go. The most simple use of this works as follows: say we spot that all occurrences of the word betwixt have been marked as a
noun - whereas they all should have been prepositions. So you go to search (index.php?action=cqp) and use the query
The multi-edit, however, does not let you just change a lot of occurrences - it forces you to verify them. With that, it would be too easy to unwantingly make incorrect changes to your corpus, after which is it hard to correct them back. So it show a list of the first occurrences, and you have to confirm which of those should be modified - there is a "select all" button on the bottom. This is mostly to make you check that you search was not too broad - the word betwixt can also be an adverb in constructions like in betwixt. So in this case, you can select all occurrences except those that are adverbs. The list only shows the first occurrences - typically the first 500 although you can change that. That is not so much because it is
hard to verify a large list, but more technical in nature: the correction works via a POST request in HTML, and there is a hard
limit on 1000 items per POST request. To do the next batch, you either have to jump to the next selection, or change the first batch
first, reindex the corpus, and then select again - which only works if the change would remove the items from the search results, so
for that we would need to refine the query to Contextual SearchA more complex use of the multiedit module is to do contextual searches: we only want to change certain tokens, say all
the occurrences of betwixt after a preposition. The search for that is simple enough:
To use multiedit in multi-token queries, we need to explicitly tell the system which word we are trying to edit. You
do this with the CQP target function, where you indicate the target in a search by adding a @ in front of it. So
the correct search for multiedit would in this case be: Individual changesInstead of refining our search, we can also choose to go through the list by hand - so if we know that betwixt
is often done wrong, but there is no easy way to determine what the correct tag would be (which is for instance the
case for the Spanish que), then we search for The individual change also lets you provide a systematic change first - if we want to correct the lemma for all
occurrences of Back to index |