HDFS Layout

There are 3 major folders of interest, as seen on the tree structure below

/
├── datapool - A readonly folder where data made available by the Kgl. Bibliotek can be found. Projects will be given access as needed
│   ├── dk-web-crawl-log - Webarchive crawl logs
│   │   ├── ...
│   ├── dk-web-solr - Webarchive records retrieved from the SOLR index
│   │   ├── ...
│   ├── dk-web-text - Webarchive text records
│   │   ├── ...
│   └── dk-web-warc - Webarchive Arc/Warc records and data extracted directly from these.
│       ├── ...
├── projects - Each project will have a folder here, that is only read/writable for members of this project. This is where you store the result of you calculations
│   ├── p002 - The project folder for the project p002
│   ├── p003 - The project folder for the project p003
│   └── pc002003 - The projects p002 and p003 were closely related and needed to share some data, so this is a project folder for BOTH of these projects
└── user  - Where your HDFS user home is found
    ├── abrp002 - The home of the user abrp002
    ├── ...