Main Menu Powered by <TEI:TOK> |
TEITOK Help PagesWorking with spoken dataTEITOK can work with spoken corpora of various types: not only textual corpora consisting of transcribed oral data, but also proper oral transcriptions, and even time-aligned transcritions. And it can do this either for an entirely oral corpus or for a mixed corpus in which some files are spoken while others are written. TEITOK can display the audio file on top of the text so that you can listen to the text, and that can in principle even be the audio of a video file. In order to make TEITOK properly work with spoken data, there are several parts that should be configured, and this page gives an overview of the important aspects. All of these are general settings - there is also a dedicated interface for time-aligned spoken data: the wavesurfer interface Recommended XML codesXML files in TEITOK follow the TEI/XML guidelines. The section on spoken data in the TEI guidelines is very sparse compared to the rest of the framework, and the description not always clear. Below is a short of of the codes established as best practice amongst the spoken project in TEITOK. The codes with a slash behind it such as <pause/> are self-closing tags without any content inside, while the other tags say something about whatever is between the opening and closing tag, so <unclear>Betty</unclear> represent a segment where the speaker probably says "Betty". All these codes are typically used inside utterances: <u>
Audio search resultsFor time-aligned spoken data, TEITOK can render search results that allow you to immediately listen to the corresponding audio. In order for this to work, a number of items need to be in place:
With all these properly set up, the search result will present search results as utterance, with the matching tokens highlighted, and in front of each utterance a play button that will ask the browser to load the audio file and play the segment corresponding to the utterance. Symbol-based renderingIn spoken data, much linguistic mark-up was traditionally using special symbols - so for instance, truncated words
were often followed by a & sign to indicate the word was truncated:
This will put a pseudo-element after any This provides a very convenient way to display mark-up in a way that is familiar to the target audience, but where the symbol-based
mark-up is generated by the computer, and hence does not interfere with the text. However, there is a small complication: when the
content of the deleted element is suppressed, which is what by default happens in the @form view, the pseudo-element is not suppressed,
meaning the & will stay as a ghost element. This is not something that can be solved in CSS, and hence TEITOK provides a Javascript
solution that can be used in CSS: when changing views, any token that no longer has any content is adorned with an attribute
Back to index |