Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NOTE: There is no limit to the size of a ZIP file to be uploaded, but there is a limit to the number of files that can be included, and that is 1000 files.

Uploading using Direct Upload

For most file uploads, utilizing the user interface will be adequate and work well. Normal file uploads use a portion of available network bandwidth, as well as some temporary storage on the user’s computer (or the server) to upload files. With larger files, these resources can be insufficient and the upload can fail. 

If the users uploading large files (ie: greater than 3G), it may be necessary to take advantage of AWS mutli-part upload functionality to assist with the upload success. This allows large files to be broken into smaller pieces to be uploaded, which eliminates many of the resource issues that can occur when trying to upload a larger file. This would involve a configuration of the specific dataset to use the direct upload capability when uploading files.

It is important for our team to work with you to assess your files and determine the best option for uploading files will be for your project.

NOTE: If direct upload is configured for your dataset file store, files that are compressed to ZIP format will not extract upon uploading, as the “Upload files via the web interface” section of this document specified.

For more information on direct upload, see: https://guides.dataverse.org/en/latest/developers/big-data-support.html#id2  

Uploading Files via DVWebLoader

DVWebLoader is a small web application that can be configured with Dataverse to allow upload of a whole directory/folder tree of files into a Dataverse dataset, retaining their relative paths within the directory/folder in the dataset. Before uploading, DVWebLoader will check the dataset contents and will, by default, not upload files that already exist in it. Users can modify the default selection by checking/unchecking specific files before initiating the upload. DVWebloader currently works with S3 stores with direct upload enabled and will not work with other types of stores in Dataverse.

...

NOTE: if you use the -directupload flag, ZIP files will not be extracted!

Uploading using Direct Upload

For most file uploads, utilizing the user interface will be adequate and work well. Normal file uploads use a portion of available network bandwidth, as well as some temporary storage on the user’s computer (or the server) to upload files. With larger files, these resources can be insufficient and the upload can fail. 

If the users uploading large files (ie: greater than 3G), it may be necessary to take advantage of AWS mutli-part upload functionality to assist with the upload success. This allows large files to be broken into smaller pieces to be uploaded, which eliminates many of the resource issues that can occur when trying to upload a larger file. This would involve a configuration of the specific dataset to use the direct upload capability when uploading files.

It is important for our team to work with you to assess your files and determine the best option for uploading files will be for your project.

NOTE: If direct upload is configured for your dataset file store, files that are compressed to ZIP format will not extract upon uploading, as the “Upload files via the web interface” section of this document specified.

For more information on direct upload, see: https://guides.dataverse.org/en/latest/developers/big-data-support.html#id2  

Ingesting Tabular Files

When files of type XLSX, CSV, R, STATA, or SPSS are uploaded to Dataverse, an ingest process analyzes the file and converts it to a simple tabular format that can be read by analysis tools. Dataverse does its best to strip the important data from the original file format into columns of data that can be read by the data curation tools. The original copy of the file is retainedl, and users can download either version of the file. 

...