Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Current »

Need to download lots of files or huge files?

The user interface is generally the best way to download files from the Dataverse. However, the Dataverse platform defaults to your browser download settings which may lead to unsuccessful attempts in a large data situation.

You can also use the “Free Download Manager” (FDM), which is an extension that you can install in your browser that “catches” downloaded files from the web and does various things with them, including monitoring their file integrity so if you lose a connection it can pick up where you left off. You set a download location and select options to manage the downloads, such as setting a download time, bandwidth limits, or just having the download run in the background and not take up too many system resources.

The ASU Library cannot help you download research data from the repository and does not accept or ship external hard drives or discs for downloads.

Typically you will still need to click each file to download, but once each file is queued, the download manager will help manage the downloads. You can even schedule each one to download at a specific time. There are configuration options for adjusting bandwidth and breaking the job into threads to help speed. Some transfer time is still unavoidable, but if properly configured, you should not need to monitor the transaction physically.

How to install the Free Download Manager

  1. Go to https://www.freedownloadmanager.org/ and download the installer for Windows or another platform.

    Free Download Manager screenshotMicrosoft Edge, Google Chrome, and Firefox browser installation buttons
  2. Run the installer to install the FDM

  3. Open the program, click the menu top right and click on "Preferences."

  4. Scroll to “Browser Integration” and click the button for the browser that will be used for the downloads (i.e., Edge, Firefox, Chrome)

  5. Follow the instructions from the page that pops up to add the extension to the browser (i.e., Add to Firefox button)

  6. Set any other settings on the preference page (or leave them all at the defaults)

Connect to the file locations

  1. Go to the site that you want to download a file from. Make sure the FDM shows as added on the browser.

  2. Choose a file on the site to download and follow whatever steps are required to download it. The FDM will take over the download action and appear with a list of files in the queue to download.

  3. If the queue list doesn’t appear (the preferences page displays), click the left arrow on the top right to go to the download lists.

From here, you can stop and start downloads, delete downloads, and see the progress of the downloads. You can quickly adjust the bandwidth allocated for the downloads from the quick menu at the bottom left and use the scheduler to schedule a day/time to download the file.

Other Options for Multiple File Downloads

Note: As with the download manager steps, these suggestions are presented for information purposes only. The ASU Library does not guarantee they will work for you and cannot provide technical assistance for them. These suggestions may not work if a guestbook has been enable for a dataset.

Sometimes it may be necessary to download all of the files from a dataset at the command line of a server or Linux virtual environment, such as the ASU Agave Cluster. The Dataverse file download API works well in some instances, but if there are numerous files or files that are very large, you may need a script that will create a list of files and then download them individually.

Python Script

Don Sizemore from the Odum Institute has provided a Python script that can be used for multiple file downloads. This script can be copied to the server or virtual machine that you will be copying files to. To invoke it, run it using “python <name of script>.py” and add the parameters for the Dataverse installation URL, persistent DOI, version of the files, directory name, and API token, if needed. An example of this command would be:

python download_dataset.py -d='https://dataverse.asu.edu' -p='doi:authority/shoulder/identifier' -v=1.0

These parameters are explained in the script as well.

WGET Script

Another command-line option for downloading files is a solution provided by Phillip Conzett from UIT. This option involves downloading a spreadsheet, editing it to include the DOIs of the Datasets that you are working with, and then generating a script from the spreadsheet data. This option allows you to download files from multiple datasets at once, depending on how you create the script. Following are the instructions for using the method:

  1. Create a list of all published dataset DOIs in collection abc and collection def by running the following command in a bash command line, adapted to your case:

    curl 'https://dataverse.asu.edu/api/search?q=*&type=dataset&subtree=abc&subtree=def' | jq -r '.data | .items | .[] | .global_id' > dataset_dois.txt

  2. Copy the contents of dataset_dois.txt and paste into cell A2 in the LibreOffice spreadsheet dataverse_download_all_files_from_datasets.ods (link below).

  3. Copy cells B2 and C2 to the end of the contents of column A.

  4. Copy the contents of column C from cell C2 onward.

  5. Paste into a plain text document, and save it as dataverse_download_all_files_from_datasets.sh (or similar).

  6. In the command line, run the script created in the previous step

This will download all the files from collection abc and collection def into a sub-folder called fileDownload and within this sub-folder into sub-sub-folders named after the dataset DOI suffix.

Using the Dataverse API for File Downloads

For more options for downloading individual or multiple files, please see the official Dataverse API guides related to file downloads below.

Dataverse Data Access API

Dataverse Native API for File Downloads


Some notes on using the Dataverse API:

NOTE:  we are not using DDIs for files, so you can’t use the persistent ID, you have to use the database File ID

NOTE: you can find the file ID by hovering over the file name and looking at the URL on the bottom left of the screen

NOTE: you can find your API key by logging into Dataverse and clicking on your name on the top right, and then choosing “API token”

Per the API notes, the general format for the URL of a file is:

https://dataverse.asu.edu/api/access/datefile/<fileid>

Basically, every file that is not restricted should be able to be downloaded with wget using the following format:

wget https://dataverse.asu.edu/api/access/datafile/<fileid>   

Or, using the API:

curl -L -O -J -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://dataverse.asu.edu/api/access/datafile/<fileid>

  • No labels