Researching the Danish web 2005-2020 – Workshop

This project is a dissemination as well as community and competence building project that aims at expanding the number of users of the annual corpora of the entire Danish web that were established in the Pilot projects P001, P002 and P004, and thereby also increase the degree of knowledge of the Cultural Heritage Cluster and expand the number of possible users in the future. The proposed project needs fewer resources than a ‘normal’ Pilot project (cf. below).

The previous Pilot project ‘Probing a nation’s web domain — the historical development of the Danish web’ aims at investigating the Danish web and its development from 2005 onwards, and it has developed robust procedures for extracting large amounts of data from Netarkivet to be analyzed on the Cultural Heritage Cluster (the ETL process, documented in a report by Per Møldrup-Dalum), and, in addition, procedures for preprocessing/cleaning of the material have been developed as well as an algorithm that is a necessary prerequisite for delimiting and establishing one annual corpus per year where each website is only present once.

The result of P001 is one annual sub-corpus per year, including methodological descriptions of the establishing procedure, the biases and the shortcomings of the material. So far, this material has only been analyzed by the participants in P001, but the material is a treasure trove that deserves to be analyzed by a much wider research community. This is what the present project intends to do by inviting researchers from all humanities disciplines to participate in a three-day gathering.

The present dissemination project will:

  1. Extract the relevant material from Netarkivet, following the already established procedures in relevant previous projects.
  2. Prepare the gathering (access provision for participants, walk-through of how the corpus is established, brief introduction to R, demos of showcase analyses, etc.).
  3. Invite all researchers in Denmark who are interested in getting experience with Big Data studies of material from Netarkivet, a maximum of app. 20 researchers (first come, first serve; all participants must apply for access to Netarkivet). Since 2012, through a series of workshops and online courses, NetLab (a part of DIGHUMLAB) has established a nationwide community of researchers studying the archived web (the project leader of this application is heading NetLab), and it is therefore possible to use this network to recruit participants.
  4. At the gathering introduce researchers from a variety of disciplines to Big Data analyses of the Danish web, and ensure they get hands-on experience with the Cultural Heritage Cluster.

The present dissemination project will create awareness and open up the Cultural Heritage Cluster facility as well as Netarkivet to a wider research community that may not be aware of the many research possibilities of the Cluster and of Netarkivet. In addition, it constitutes a unique opportunity to provide access to and hands-on experience with the Cultural Heritage Cluster for more researchers than what individual Pilot projects can probably accommodate for, and, finally, it will foster interdisciplinary collaboration that may lead to cross-disciplinary Big Data projects in the future.

The project is led by: Niels Brügger