User Home

There are 2 user-home folders of interest

/home/USERNAME

This is the normal POSIX user-home.

This is a personal shared resource, that is automounted on any KAC host when you log in.

This means that you do NOT have a different home folder on each host. Rather, the home folder is unique and identical, no matter which host you are logged in on.

This is the “file system” that is available in RStudio and Juputer.

You are the only user (note, admins can get access) that can read or write files in your user-home. Use this to store your programs and scripts, preferably in the /home/USERNAME/projects/ folder.

Do NOT store any sizeable data in your POSIX user-home. This runs the risk of filling up the shared storage and preventing other users from accessing the system, and the data risks being caught up in the audit log system and preserved unnessesarily. Data should be kept in HDFS.

Note that when starting a job on a processing node, the job does run as your user account, but your home folder is NOT mounted. This means that your jobs CANNOT refer to files in your home folder.

HDFS:///user/USERNAME

You ALSO have a user home folder on the distributed HDFS storage.

This home folder is totally distinct from your POSIX user-home in /home/USERNAME.

You are the only user (Note: admins can get access) that can read or write files in your HDFS user-home.

Files in HDFS:///user/USERNAME are available on all the KAC hosts, and can thus be read and written by your Hadoop jobs.

Use this to store input and output data files for your Hadoop jobs. The normal workflow is to run jobs that output in HDFS:///user/USERNAME until you are happy with the results. The results are then moved to HDFS:///projects/PROJECTNAME to be shared with the rest of your project team.