/
Additional Information About Downloading Files

Additional Information About Downloading Files

Downloading Files Via the Web Interface

The default method of uploading files in Dataverse is by using the web interface. You can select individual files or multiple files by utilizing the checkboxes to the left of the files, or at the top of the file download page. If choosing multiple files, you will only be able to choose the files that are displayed on the current page. To increase the number of files displayed on the page, scroll to the bottom of the files and use the “Files Per Page” selection tool.

If multiple files are chosen, the files will be downloaded as a compressed (.zip) file with the name “dataverse-files(#).zip”. The download will preserve any folder structure that exists in the dataset. You must extract the compressed file to a folder to see the files.

You may also download a file by clicking on the file name and then selecting the Access File button on the side menu.

Downloading Files via URL

The Dataverse installation displays a plaintext URL for the location of the file under the Metadata tab on the file page. This Download URL can be used to directly access the file via API (or in a web browser, if needed).

When downloading larger files, in order to ensure a reliable download, we recommend using one of the additional download options suggested on this page.

Downloading Tabular Files

Tabular data files offer additional options: You can explore using any data exploration or visualization External Tools (if they have been enabled), or choose from a number of tabular-data-specific download options available.

Ingested files can be downloaded in several different ways.

  • The default option is to download a tab-separated-value file which is an easy and free standard to use.

  • The original file, which may be in a proprietary format which requires special software

  • RData format if the installation has configured this

  • The variable metadata for the file in DDI format

Downloading Files via the Dataverse API

For more options for downloading individual or multiple files, please see the official Dataverse API guides related to file downloads below. These commands have to be typed in a linux environment.

Dataverse Data Access API

Dataverse Native API for File Downloads

Some notes on using the Dataverse API:

NOTE:  we are not using DDIs for files, so you can’t use the persistent ID, you have to use the database File ID

NOTE: you can find the file ID by hovering over the file name and looking at the URL on the bottom left of the screen

NOTE: you can find your API key by logging into Dataverse and clicking on your name on the top right, and then choosing “API token”

Per the API notes, the general format for the URL of a file is:

Basically, every file that is not restricted should be able to be downloaded with wget using the following format:

Or, using the API:

curl -L -O -J -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://dataverse.asu.edu/api/access/datafile/<fileid>

You can use the Data Access API to download all of the files in a dataset. The command would be similar to this:

             curl -L -O -J -H X-Dataverse-key:<API key>  https://dataverse.asu.edu/api/access/dataset/:persistentId/?persistentId=doi:10.48349/ASU/QDQ4MH

The persistent DOI can be found in the metadata of your dataset (just change the suffix or last 6 chars). For this option you will first need to login to the Dataverse console and generate an API key using the “API Token” option in the main menu below your name (top right). Copy that key into the location in the above command where <API key> is. This command will create a .zip file containing the downloaded files.

Other Options for Multiple File Downloads

Note: As with the download manager steps, these suggestions are presented for information purposes only. The ASU Library does not guarantee they will work for you and cannot provide technical assistance for them. These suggestions may not work if a guestbook has been enable for a dataset.

Sometimes it may be necessary to download all of the files from a dataset at the command line of a server or Linux virtual environment, such as the ASU Agave Cluster. The Dataverse file download API works well in some instances, but if there are numerous files or files that are very large, you may need a script that will create a list of files and then download them individually.

Python Script from Odum Institute

Don Sizemore from the Odum Institute has provided a Python script that can be used for multiple file downloads. This script can be copied to the server or virtual machine that you will be copying files to. To invoke it, run it using “python <name of script>.py” and add the parameters for the Dataverse installation URL, persistent DOI, version of the files, directory name, and API token, if needed. An example of this command would be:

python download_dataset.py -d='https://dataverse.asu.edu ' -p='doi:authority/shoulder/identifier' -v=1.0

These parameters are explained in the script as well.

Python Script to Download All files in a Particular Dataset

The following script can be used to download all of the files in a specific dataset. When you run it, you will be prompted for the suffix of the persistent ID of your dataset (ie: QQ45YH). This will create a folder with that name on your local system with the files downloaded into it. This option uses the dataverse API and should work without the API key as long as the dataset is published and the files are not restricted. (Make sure to change the extension to .py after saving the attachment).

GNU Wget

The following command uses wget to download the files from a dataset, so you will need to have that installed on whatever system you are using.  Copy the following command into a file and name it .sh. That will create a shell script that you can modify as needed and run. In order to run the script on a Windows computer, you will likely need a Windows Linux emulator such as Windows Subsystem for Linux (WSL) installed as well as wget. On a MAC you can use the terminal but will still need to install wget.

mkdir -p fileDownloads/<dataset persistent ID> | wget -r -e robots=off -nH --cut-dirs=3 --content-disposition "https://dataverse.asu.edu/api/datasets/:persistentId/dirindex?persistentId=doi:10.48349/ASU/<dataset persistent ID>“ -P fileDownloads/<dataset persistent ID>/

Note: for <dataset persistent ID> substitute the last 6 characters in your dataset DOI (ie: for doi:10.48349/ASU/IDEZ4P the persistent ID would be IDEZ4P)

There are other ways that you can write scripts or use WGET commands for file downloads. Please see the documentation for installation and use.

Downloading Files via Globus File Transfer

Globus logo a lower case g in a stylized cloud with the word globus under it

Globus is a file transfer tool that optimizes the uploading and downloading of large or numerous files between endpoints. Globus allows users to define “endpoints” from a variety of platforms and devices, and to transfer files between them, making the process of populating a dataset much more efficient. An enterprise installation of Globus is in use at Arizona State University and funded by the ASU Knowledge Enterprise (KE) Research Technology Office (RTO). This method of file transfer utilizes the Dataverse Globus transfer app that was developed by Scholar’s Portal/Borealis specifically for the Dataverse platform.

To access the Globus upload option, the user must have the Globus Connect Personal application installed on their computer. They may have it running on multiple devices and will be able to access any location where it is running from their local device.

A user may transfer collections of materials using Globus to and from Dataverse. The user must have “write” access to the collections to see them in the Dataverse Globus integration console. Write access is given within Globus by whoever creates the collection so that will need to be coordinated independently.

To download files using Globus, perform the following steps:

  1. Click on the name of the dataset from which you would like to download files. Make sure the Files tab is selected.

  2. Click the down arrow on the Access Dataset menu option and then choose Globus Transfer.

  3. The Dataverse Globus Transfer Tool will be launched in a separate window. The Active Personal Endpoints tab will show a list of files in your dataset. Single-click files or folders that you would like to transfer. The file will appear on the right side of the transfer list. Double-click folders to open them and NOT add them to the transfer list.

     

  4. When the right side of the transfer list shows all the files/folders you would like to transfer, click the Submit Transfer button.

  5. A message will appear indicating that the transfer was successful. The files will be copied to your local device under the directory or folder that is specified as your “Home” directory.

Downloading Files via Download Manager

Downloading Files via Download Manager