Preparing to share your data: Essentials of a Research Data Management Plan

An ounce of prevention is worth a pound of cure! - Benjamin Franklin 

Preparing for publication

Preparing early will save you and your users time. Ideally, your research data publishing begins before you even start your research project, and you have a data management plan to help you. Since you will ultimately be sharing your work, your research data will need to be organized and structured to make the most sense to users who may not have ever been a part of your initial project.

Consider how you expect people actually to access your dataset files.  Provide instructions on how to access and use the datasets as intended.  See: Preparing data for publication by file type.

Organize your files, so they make sense to end-users. If you have many files from the tens to the thousands but have only been working in a flat folder structure, separate them into different folders, making them easier to navigate. 

Publish only what is needed. Not all of your data may need to be shared. Consider only those relative files to reproduce the results of your work or were necessary components that allowed you to come to your conclusions.  See: Other data publishing considerations

Manage user expectations for downloading. Dataverse is not an active research notebook system. Users may need to download and analyze the datasets on a local system. Some of the most extensive datasets in our repository could take upwards of 60 hours to download under normal circumstances. 

If you are submitting very large or numerous files, provide instructions for a download manager or consider other backup options to ensure users may access your research datasets promptly.  Multiple issues, such as local internet speed, firewalls, and limitations on end-user laptops and desktop computers, will affect users.

Some of the most extensive datasets in our repository could take upwards of 60 hours to download under normal circumstances.

You don’t need to create a dataset for every file and don’t confuse datasets and files. Datasets contain the metadata, the descriptive and administrative information about your project, and the files from the project. Files can be organized into a folder structure or tagged by keywords within a dataset. Do users need to understand the relationship of the file hierarchy, or do they need to filter them by keywords? Thinking about structure can make work easier for you and your users. 

The ASU Library indexes Dataverse at the dataset level into our library search interfaces. Provide as much information as necessary and limit the number of datasets requiring repeated information that could congest and confuse search results.  

You don’t have to go this alone. ASU Library experts provide consultations and can help get you started before you spend the time uploading files, only to realize you need to rethink their structure and sharing method. 

FAIR Data Sharing and the CARE Principles

Before publishing, make your research data FAIR: Findable, Accessible, Interoperable, and Reusable. Benefits of the FAIR Principles include:

  • Promoting Research Impact – making your data available for discovery increases the likelihood of others finding your work, as well as increasing its relevancy

  • Upholding Funding Requirements – depositing your data in a disciplinary repository or ASU’s Research Data Repository facilitates granting agency and institutional data sharing and curation.

  • Supporting Preservation – depositing your research data begins a preservation process that ensures your data is maintained over the long-term

  • Fostering Data Utilization – applying comprehensive metadata to your research data provides essential context for yourself and others to utilize in the future

  • Encouraging New Discoveries – making your data available helps others build on your findings and make scientific discoveries with potential benefit to society in unimaginable ways

  • Endorsing Open Access – sharing your data in this way, you support the open access movement, which in turn helps the scientific community and society as a whole

The people and purpose-oriented CARE Principles for Indigenous Data Governance complement FAIR and reflect the crucial role of research data in advancing Indigenous innovation and self-determination. Our curatorial team is informed by the CARE Principles and will work with you where appropriate.

The Essentials: What You Need To Know

“Everything Should be made as simple as possible, but no simpler.” – Albert Einstein

The essential Considerations that need to be Answered

There are a lot of questions that must be addressed throughout the planning process. Understandably, at times it can feel overwhelming. However, as the old saying goes, it is possible to eat an elephant. You just have to do it one bite at a time. So, approach each question independently. Break things down until they are manageable for your purposes. You do not have to do this alone. ASU’s Office of Research Data Management research data managers is here to help plan and execute your active research data needs.

You can review our library tutorials and read our guide that features an introduction to the DMPTool which can be used to help build your proposal and request feedback. See: Other data publishing considerations

Components of Responsible Data Management and Curation

The following elements, if planned out appropriately, will set you up for success when it comes time to submit your data to ASU’s Research Data Repository. Don’t find yourself scrambling at the last minute. If you utilize these recommendations, meeting ASU’s submission requirements will be a much smoother process.

Big picture data curation and management planning

Define and document the purpose of your research

  • What are you ultimately working towards?

  • Decide and indicate what data you will collect, and how. Are you recording numeric information, images, or text modeling? What data type best represents the information you are trying to capture?

  • How much data do you anticipate collecting?

    • Indicate file storage size in addition to sample sizes.

    • Will the files be compressed and will the end users need the raw data or just the processed datasets that are a subset to be shared? What do the users need to download and use? Consider that the larger the files the longer it will take to download.

  • What programs or methods will be needed to collect the data? Will these programs or methods be open source or proprietary?

    • Benefits of Open Source: Less likely to encounter digital obsolescence; more autonomy in how the format is used; less likely to violate licensing; tend to work better across various platforms; generally easier to preserve over the long-term

    • Benefits of Proprietary: More support for technical issues

  • Will you be using any data produced by others, and how will you properly cite and give credit to those data producers?

Assign roles and responsibilities

  • Who has the ownership of the data, and the right to manage it? Will roles need to be divided or shared? Does the funding agency have any rights to consider?

Focusing on research data curation

When publishing your research provide context. Describe your data so that it is understandable to others!

  • What metadata schemas will you utilize to explain your data? Will it be a schema standard for your field?

  • For subject or keyword metadata, will you be using a controlled vocabulary? What syntax rules will you apply, such as will the plural or singular form of terms be used, or will the word be capitalized?

Choose file formats that work best for the type of data you are collecting

  • Will you use open standard formats or proprietary? (Ex: CSV versus Excel)

    • Benefits of Open Source: Less likely to encounter digital obsolescence; more autonomy in how the format is used; less likely to violate licensing; tend to work better across various platforms; generally easier to preserve over the long-term

    • Benefits of Proprietary: More support for technical issues

Design a folder hierarchy and file naming convention that is consistent and easy to organize

  • Consider not only how you may look for files, but how others may, as well

Ensure folder and file names tell you what will be in the folder or file without having to open it

  • Example Folder Hierarchy: [Project]/[Experiment]/[Type of File]

    • Example File Name: YYYYMMDD_[Experiment]_[Sample Type]

Follow naming conventions that are both computer-readable and human-readable

  • This helps ensure folders and files are accessible across many digital platforms

  • Do not use special characters or spaces in file names – utilize dashes or underscores instead

  • Utilize the ISO 8601 standard to disambiguate dates using the first four-digit year, two-digit month followed by two-digit day. YYYY-MM-DD January 1, 2022 becomes 2022-01-01.

Ensure you have working storage and backup system plans to prevent data loss

  • Best practice recommends creating 3 copies that are geographically distributed

    • Example: The original copy, plus a local external copy and an external remote copy

  • Options for backing up your data include computer hard drives, external hard drives, and organizational servers

Consider your data’s security

  • Will it need to be encrypted for reasons such as personally identifiable information?

  • Who should have access?

  • Can you set user permissions to control access?

  • How will you secure and store passwords?

  • Ensure your backup and storage procedures maintain this security!

Repository curators will provide you with a “readme” text file template to submit with your datasets that will encompass much of the above information. The information will also be utilized to fill in metadata fields that describe and contextualize your content in the repository to programmatically enhance discovery and access. See Cornell’s Guide to writing “readme” style metadata for more information.

Recommended reading:
Briney, Kristin. Data Management for Researchers : Organize, Maintain and Share Your Data for Research Success. Pelagic Publishing, 2015. Print edition

A good nuts and bolts look at managing your data. Chapter 6 Improving Data Analysis and Spreadsheet Best Practices is worth the price of admission.