The historical development of tracking and e-commerce on the Danish Web

The purpose of the project is to map and analyse the historical development of two different (but related) technologies/functionalities on the Danish web:

  1. Tracking technologies (e.g. http and Flash cookies, beacons, fingerprinting, html web storage etc.), and
  2. Shopping baskets (e-commerce). Studies have shown a widespread use of tracking technologies and e-commerce on the live web but a historical study of the development of these technologies on the Danish web has, to our knowledge, not been done. The data for the project will be the historical Danish web as it is preserved in Netarkivet.

The main research questions are:

  1. When and how has different tracking technologies been used on the Danish web, and how pervasive is the reach of companies like Facebook and Google/Alphabet?
  2. When and how has different technologies for e-commerce been used on the Danish web, and which are the main companies involved in the e-commerce over time?

E-commerce and tracking is related because targeted advertising is one of the main reasons for tracking users online. Doing a study, which searches for technologies for both tracking and e-commerce, is also advantageous because we expect a similar methodology to be useful for the two types of web functionality. The project will build upon experiences gained in the project “Probing a nation’s web domain – the Historical Development of the Danish Web”, the first pilot project of the DeiC Cultural Heritage Cluster, and it will use the procedure for selecting and delimiting corpora developed in this project. But where the Probing-project has focused on questions like the size of the web, number of files types and word frequencies, the project proposed here will aim to develop a methodology for searching for specific parts of the source code of websites. The aim is to identify the specific traces of these technologies (e.g. lines of code, names or similar), and what data sources these can be found in (the crawl.logs, the Solr index, the WARC files etc.), and then to analyse the rise, spread and possibly decline of different technologies for tracking and shopping online. The project will benefit greatly from – or even require – dialogue with the IT- specialists and curators of Netarkivet in order to ascertain the harvesting and preservation practices resulting in the capture (or not) of these types of functionalities. The analysis will require the use of the DeiC Cultural Heritage Cluster both because of the necessary level of protection of the data in Netarkivet and in order to have sufficient processing power to analyse around 1.2 million Danish websites per year.

For results regarding the first research question visit this link

The project is led by Janne Nielsen.