Select Page

Workshop ‘Creating an annotated corpus for cultural historical research in the CLARIAH Media Suite’


The Media Suite is one of the research environments developed within the Dutch CLARIAH research infrastructure. As an innovative digital research environment, the Media Suite is a networked, university-level access point to a large variety of digital collections – comprising key broadcast, film, paper and oral history collections from NISV, Eye Filmmuseum, the KB and DANS. Moreover the environment offers new ways of browsing, searching and analyzing the collections made available with digital tools developed specifically for the environment. The environment’s tools facilitate among other approaches exploratory research and browsing, close reading and qualitative analysis based on video annotation tools as well as data-driven modes of distant reading and visualization of collection data. Beyond digitized collection items – amounting to a couple of million of audiovisual items – the environment is also the unique access point to automatic data enrichments, such as automatic speech recognition (ASR) data of broadcast collections and optical character recognition (OCR) of historical paper collections. Working in the Media Suite you will learn to work with the environment’s tools in order to create, annotate and analyze your own corpus.

At the end of the workshop, students can:

  • Understand the scope and aims of the Media Suite environment as a tool for distant and close reading of multimedia sources.
  • Log-in and create a research project of their own in the Media Suite environment.
  • Create a corpus of sources from several collections in the Media Suite environment and explore and annotate them.
  • Identify key issues in working within the Media Suite environment and critically reflect on the value of working with the Media Suite as part of their ReMA or PhD research projects.


  • Completion of tutorial “Logging in, Workspace and Creating a User Project” prior to the workshop. Following this tutorial should take no longer than 10-15 minutes.
  • Develop a general sense and overview of the collections available in the Media Suite. A list of available collections, with links to descriptions, can be found here:
  • Read the articles on the reading list (see below).
  • Prepare a top three of general topics and/or questions you may wish to explore during the workshop and send these to the workshop organizers in advance.


9.30  – 12.30: Getting to know the Media Suite

  • 9.30 – 10.00: Introduction to the Media Suite (presentation Christian Olesen)
  • 10.00 – 10.20: First impressions from students and discussion of what kinds of topics could be researched

Break (10 mins)

  • 10.30 – 11.30: Searching and bookmarking together – first we go through different search functionalities together after which each student will work on a topic of choice in one or more collections with assistance from workshop leaders. Based on this workshop part you will learn how collection search in the Media Suite is dependent on the types of (meta)data available and that data and metadata are variable, student can distinguish between data and metadata and enrichments

Break (15 mins)

13.30 – 15.15: Corpus building

  • 13.30 – 13.50: We start by identifying a research problem, and deciding on a research question, individual work by exploring the collections in the Media Suite environment.
  • 13.50 – 14.00: Short presentations: prepare a 5 minute presentation on your planned research and search strategies for the remainder of the afternoon.

Break (15 minutes)

  • 14.15 – 15.00: Searching and bookmarking on individual projects
  • 14.00 – 15.15: Report back to the group – 15 minutes, short presentation each, and include one unexpected find, and one issue you encountered during your search.

15.45 – 17.15: Annotating and analyzing corpus

  • 15.45 – 16.45: Segmenting, annotating and analyzing corpus
  • 16.45 – 17.15: Wrap-up and discussion of abstract assignment

Additional links:

ASR progress and statistics:


  • Susan Aasman et al., ‘Tales of a Tool Encounter: Exploring Video Annotation for Doing Media History’, VIEW Journal of European Television History and Culture (2018), 7 (14): 73–87,
  • Liliana Melgar-Estrada, Marijn Koolen, Kaspar Beelen, Hugo Huurdeman, Mari Wigham, Carlos Martinez-Ortiz, Jaap Blom, and Roeland Ordelman. 2019. The CLARIAH Media Suite: a Hybrid Approach to System Design in the Humanities. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 373–377,
  • Christian Gosvig Olesen and Ivan Kisjes, ‘From Text Mining to Visual Classification: Rethinking Computational New Cinema History with Jean Desmet’s Digitised Business Archive’, Tijdschrift voor Mediageschiedenis (2018), 21 (2): 127-145,
  • Ordelman, R., Melgar, L., Van Gorp, J., and Noordegraaf, J. (2019). Media suite: Unlocking audiovisual archives for mixed media scholarly research. In Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018, 159, 133–143. Linköping University Electronic Press.

Final assignment

To obtain 1 EC, students should complete the final assignment, which is a 250-word abstract for a DH conference. Guidelines:

Abstract assignment:

Criteria Requirements Comments
(1) Scope and relevance of the research problem The problem is clearly defined, relevant to the workshop’s aims and student’s area of research.
(2) Selection of sources and explanation of digital method in the abstract The pitch draws on a logical collection of sources in the Media Suite and connects these to the followed method in a meaningful way.

Register (4/15 spaces left)

Bookings are closed for this course.