Skip to main content

Let's Talk Research Data! Guide

Where to deposit research data?

Research data should be submitted to institutional (UNB Dataverse ), discipline-specific, community-recognized repositories where possible, or to generalist repositories if no suitable community resource is available.

A tool to assist in identifying repositories (from DataCite):  https://repositoryfinder.datacite.org 

repofinder

Available repositories: 

UNB Dataverse 

Federated Research Data Repository -FRDR 

  • a platform for Canadian researchers to deposit and share research data, and to facilitate the discovery of research data in Canadian repositories. This is a collaborative project between Portage NetworkCARL, and Compute CanadaFRDR utilizes Compute Canada resources to store research data as well as Globus services to transfer files and search for information. FRDR is particularly suitable for archiving and sharing large data sets (300 GB or 25,000 files). 

Getting started with FRDR 

To get started on FRDR demo, go to https://demo.frdr.ca/ and attempt to log in (in the header menu). From there you'll be prompted to create an account. You can use a Google, ORCID, or Compute Canada account, or you can create a new account with Globus. (Note: There is no current support logging in using the university account, but it is something that will be available in the future)
 
When you have an account, select the "Deposit Data" button on the FRDR demo homepage and you'll see a message asking you to email support to receive permission to use demo. (This is only the process for limited production. Once an administrator receives your email they can add you to the FRDR depositor group.)
 
In demo you can perform test submissions, get a sense of the metadata form to fill out, use your browser or Globus transfer to upload some test data, etc. You will need to download Globus Connect Personal to upload large datasets to FRDR using Globus, or to download large data files or entire datasets.
 
Some useful links:
FRDR documentation: https://www.frdr.ca/docs/en/home/
Globus Connect Personal download: https://www.globus.org/globus-connect-personal
An example of data record in FRDR: https://www.frdr.ca/repo/handle/doi:10.20383/101.0111

Discipline-specific data repositories suggested by Nature.com 

Some repositories on this page may only accept data from those funded by specific sources or may charge for hosting data. Be aware of any deposition policies for your chosen repository. The list includes the following disciplines and areas: 

  • Biological Sciences
    • Nucleic acid sequence
    • Protein sequence
    • Molecular & supramolecular structure
    • Neuroscience
    • Omics (functional genomics, Metabolomics, Proteomics) 
    • Taxonomy & species diversity
    • Mathematical & modeling resources
    • Cytometry & immunology
    • Imaging
    • Organism-focused resources
  • Health Sciences 
  • Chemistry & chemical biology
  • Earth & environmental sciences
  • Physics, astrophysics & astronomy
  • Social sciences
  • Generalist repositories

Science-specific repositories suggested by PLOS One

The list includes subjects and areas such as:

  • Biochemistry
  • Biomedical Sciences
  • Marine Sciences
  • Model organisms
  • Neuroscience
  • Omics
  • Physical Sciences
  • Sequencing
  • Social Sciences
  • Structural Databases
  • Taxonomic & Species Diversity
  • Unstructured and/or Large Data

DataCite 

  • an international non-for-profit organization for archiving and sharing research data. Platform hosted data, and share data under open terms of use (for example the CC0 waiver). 

re3data.org - Registry of Research Data Repositories

  • a global registry of research data repositories that covers research data repositories from different academic disciplines. 

Open Science Framework

Figshare

Zenodo

Dryad new! ("Building on our Successes: Past and Present" ; Dryad-Zenodo integration )

GitHub (a development platform for sharing source codes, open-source)

Dat - a distributed data community

Dat is a peer-to-peer platform for publishing datasets both large and small. Its design borrows concepts from distributed revision control systems, allowing multiple users to contribute changes and updates to a dataset while retaining authorship information and preserving older versions. Dat was initially funded by the Knight Foundation under an initiative that "seeks to increase the traction of the open data movement by providing better tools for collaboration." The Try Dat section of the project site contains a detailed tutorial that covers creating, publishing, and updating a dataset. Reference datasets are also provided in a number of formats, including a CSV on recent earthquakes, a JSON file of recently published DOIs, and Bionode format genomics data. The tutorial covers installing Dat on Windows, macOS, and Linux. Dat is free software, distributed under the BSD license, with source code available on Github.

 

The infographic below will help you to make an informed decision on where to deposit your research data.

deposit tree

 

Preparing a dataset for publication

The Project Close-out Checklist for Research Data (from CalTech). The closeout checklist describes a range of activities for helping ensure that research data are properly managed at the end of a project or researcher's departure. Activities include: making stewardship decisions, preparing files for archiving, sharing data, and setting aside important files in a "FINAL" folder. (Note there is a generic, editable version with Creative Commons Attribution )

The FAIR data principles are a set of community development principles for sharing data. FAIR stands for Findable, Accessible, Interoperable, Reusable. [Association of European Research Libraries]

fair

 

First, there was FAIR now there is TRUST (Transparency, Responsibility, User focus, Sustainability, Technology)  (Lin, D., Crabtree, J., Dillo, I. et al. The TRUST Principles for digital repositories. Sci Data 7, 144 (2020). https://doi.org/10.1038/s41597-020-0486-7; The TRUST Principles - An RDA Community Efforts.)

TRUST

Automated FAIR Data Assessment Tool (F-UJI)

F-UJI is a web service to assess FAIRness of research data based on metrics developed by the FAIRsFAIR project. This website aims to demonstrate the application of the web service as a backend to implement a user-friendly web application that allows the evaluation of FAIRness of digital research data objects (aka data sets). F-UJI is a result of the FAIRsFAIR “Fostering FAIR Data Practices In Europe” project, which received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-2018-2020.

Guidance on Depositing Existing Data in Public Repositories (PDF)

- panel on research ethics, Government of Canada

Can I share my data? 

-this decision tree is designed to be a quick and easy-to-use guide to alert Canadian researchers to situations where research data derived from human participants either may not be shared publicly or may require some modification before sharing. It relies heavily on the Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans - TCPS 2 (2018) which addresses consent and secondary use of information for research purposes.

Data documentation & metadata

In most cases, the terms data documentation and metadata can be used interchangeably. They help to understand raw data in details and allow other researchers to discover, use, and properly cite research data. It is important to start documenting your data at the very beginning of the research project. This could include:

  • making notes of all file formats, workflow details, information about how the data will be recorded and processed; 
  • explanation of codes, variables,  and abbreviations;
  • planning where the data will be stored in short and long terms that other researchers can find and re-use your data.

Metadata is data about data. It is metadata that makes your research data discoverable by a search engine. Metadata, in general, contains several elements, such as:

  • title
  • creator
  • identifier (DOI)
  • date created
  • format
  • subject
  • funder(s)
  • rights/licensing
  • location
  • methodology 

Metadata standards consist of elements specific to your research area or discipline. Many disciplines adapt their own metadata standards tailored to a particular needs of the research area. The diagram below shows some metadata standards

List of standards in your field by Digital Curation

Examples of metadata standards by Stanford University Libraries: 

Metadata Best Practices Guide  by Dataverse North group

Guide to writing 'README' style metadata from Cornell

A quick guide to writing README file from UBC  

Standards-based metadata is generally preferable, but where no appropriate standard exists, writing “README” style metadata is an appropriate strategy. A README file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. 

Other recommendations from BioMedCentral:

The Anatomy of a Data Note and BMC Research Notes

Data formats  

Software applications come and go. Proprietary formats created by software are typically controlled by the company and therefore might restrict the use of research data. Therefore, it is recommended to archive research data in an open source format. 

Type of Data

Recommended Format

text

.txt

document

.pdf

tabular

.csv

image

.tiff

archiving

.zip

 

Global Change Master Directory (GCMD) Keywords

A hierarchical set of controlled Earth Science vocabularies that help ensure Earth science data, services, and variables are described in a consistent and comprehensive manner and allow for the precise searching of metadata and subsequent retrieval of data, services, and variables. 

Data analysis and cleanup 

SPSS 

A software package used for statistical analysis, which includes but not limited to descriptive statistics (cross tabulation, frequencies, descriptives, descriptive ratio statistics) and bivariate statistic (means. ANOVA, t-test, correlation).  Available for Mac, Windows, and Unix. This software is available through UNB Virtual Lab for UNB students. 

SAS

  Advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.

R Studio

Free, an open source integrated development interface for R. It allows to view not only R code but also graphs, data code and the output results simultaneously. Input data could be in CSV, SPSS or SAS formats. The software is available for Mac, Windows, and Linux. 

OpenRefine 

Formerly Google Refine is a tool for working with large data sets: cleaning, transforming from one format into another. Free and open source tool! 

 

Working with sensitive data

What is sensitive data?

The term "sensitive data" refers to data that is capable of identifying an individual, species, object, process, or location at risk of discrimination, harm, or unwanted attention. Most sensitive data cannot be shared, but there are exceptions. 

An example of sensitive data is information that is protected against unwarranted disclosure. Among these are personal data, proprietary data, and other restricted or confidential information that must be protected from unauthorized access.

If you deal with sensitive data,  your work might need to be overseen by UNB Research Ethics Boards (REBs). Depending on your location there are two REBs at UNB: REB Fredericton campus and REB Saint John campus. Here you can find all the necessary ethics forms with instructions on how to apply. 

How to incorporate security into RDM planning?

(from the Tri-Agency RDM Policy FAQ)

When conducting research that involves sensitive data or has potential for dual use, researchers may need to take additional measures to balance the need for data-sharing and access with that for protection from threats. To ensure that the integrity of their research is not compromised and research results (e.g., data sets, publications, patents) are secure and protected until they choose to disseminate them, researchers should put in place good physical and cyber security practices and infrastructure. These practices should be agreed to by all research team members and partners.

 For more information on safeguarding research, consult the Safeguarding Your Research portal. 

Sharing sensitive data 

When you work with sensitive data, you will have specific management and dissemination requirements. Data that is sensitive should be password protected and encrypted and stored on a secure server with role-based access controls.

Can I share my data? 

-this decision tree is designed to be a quick and easy-to-use guide to alert Canadian researchers to situations where research data derived from human participants either may not be shared publicly or may require some modification before sharing. It relies heavily on the Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans - TCPS 2 (2018) which addresses consent and secondary use of information for research purposes.

The Research Ethics Boards at UNB shared the following links, which include information, forms, and guidelines:

Resources 

Sensitive Data Toolkit for Researchers 

- language for informed consent; to assist researchers in the development of deposit-friendly language for ethics approval and informed consent

Glossary of Terms for Sensitive Data used for Research Purposes 

- definitions of common terms in the Canadian context

Human Participant Research Data Risk Matrix

- helps researchers determine the risk level associated with human participant research data, and decide on how to manage, deposit, and use them appropriately in the future

De-identification Guidance

-this guidance is intended to help Canadian researchers minimize disclosure risk when sharing data collected from human participants

Ethical Considerations in the Use of Geospatial data for research and statistics

- this guidance document explores ethical considerations in the use of geospatial data for research, analysis and statistics. It has been developed by the UK Statistics Authority’s Centre for Applied Data Ethics in partnership with geospatial colleagues.

Current Best Practices for Generalizing Sensitive Species Occurrence Data (PDF)

-this document aims to provide best practice for dealing with sensitive primary species occurrence data, and provide guidance on how to make as much data available without at the same time opening up the species to harm because data has been placed in the public domain

Guidance on Depositing Existing Data in Public Repositories (PDF)

- panel on research ethics, Government of Canada

TCPS2 (2022) was updated on January 11, 2023. Highlights of changes are available here

- new requirements for REB review of data repositories and new definitions/requirements for consent for data reuse 

 

Analysis and tools for preparing sensitive data for sharing  

amnesia     

Amnesia is a free data anonymization tool that transforms relational and transactional databases into a dataset where formal privacy guarantees hold. In Amnesia, direct identifiers like names are removed and secondary identifiers like birth dates and zip codes are transformed to prevent individuals from being identified. Amnesia supports k-anonymity and km-anonymity.

 

 

Indigenous research

 

(from the Tri-Agency RDM Policy FAQ)

In an effort to support Indigenous communities to conduct research and partner with the broader research community, the agencies recognize that data related to research by and with Indigenous communities must be managed in accordance with data management principles developed and approved by these communities. These include, but are not limited to considerations of data collection, ownership, protection, use and sharing. The principles of ownership, control, access and possession (OCAP®) are one model for First Nations data governance, but this model does not necessarily respond to the distinct needs and values of distinct First Nations, Inuit and Métis communities. 

With respect to Indigenous research, the agencies acknowledge the importance of ethical considerations and refer grant recipients to the framework for the ethical conduct of research involving First Nations, Inuit, and Métis Peoples outlined in Chapter 9 of the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS 2). Decisions to deposit and/or share Indigenous research data and knowledge should be guided by principles of research with Indigenous Peoples.

Moving forward, the agencies plan to support the development of Indigenous RDM protocols that aim to ensure community consent, access and ownership of Indigenous data, and protection of Indigenous intellectual property rights. This next phase in advancing Indigenous RDM in Canada is outlined in Setting New Directions to Support Indigenous Research and Research Training in Canada 2019-2022.

Recommended resources

Online course - Fundamentals of OCAP® 

CARE principles for Indigenous Data Governance  (Collective benefit, Authority to control, Responsibility, and Ethics = CARE)

Carroll, S. R., Herczog, E., Hudson, M., Russell, K., & Stall, S. (2021). Operationalizing the CARE and FAIR Principles for Indigenous data futures. Scientific Data, 8(1), Article 1. https://doi.org/10.1038/s41597-021-00892-0

Committee, C. R. C. (2020, October 5). Strengthening Indigenous research capacity https://www.canada.ca/en/research-coordinating-committee/priorities/indigenous-research.html

Ellenwood, D. (2020, 19). “Information Has Value”: The Political Economy of Information Capitalism. In the Library With the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2020/information-has-value-the-political-economy-of-information-capitalism/

GIDA: Global Indigenous Data Alliance. https://www.gida-global.org

Government of Canada, I. (2021, October 29). Tri-Agency Research Data Management Policy—Frequently Asked Questions. Innovation, Science and Economic Development Canada. https://science.gc.ca/site/science/en/interagency-research-funding/policies-and-guidelines/research-data-management/tri-agency-research-data-management-policy-frequently-asked-questions

Hudson, M. (2020). Indigenous Data Sovereignty: Towards an Equitable and Inclusive Digital Future. A Digital New Deal: Visions of Justice in a Post-Covid World. https://itforchange.net/digital-new-deal/2020/11/01/indigenous-data-sovereignty-towards-an-equitable-and-inclusive-digital-future/

IEEE Standards Association. (n.d.). Recommended Practice for Provenance of Indigenous Peoples’ Data. IEEE Standards Association. https://standards.ieee.org/ieee/2890/10318/

Kukutai, T., & Taylor, J. (2016). Indigenous Data Sovereignty: Toward an agenda. ANU Press. https://doi.org/10.22459/CAEPR38.11.2016

National Indigenous Knowledge and Language Alliance. (2022). Projects. National Indigenous Knowledge and Language Alliance. https://www.nikla-ancla.com/projects

Mukurtu- free and open-source platform built with Indigenous communities to manage and share digital cultural heritage. https://www.mukurtu.org/    

Creating data management plan: templates & examples

What is a data management plan (DMP)? 

A DMP is a document outlining how you handle (organize, store, and share) your research data both during the project and after the project is completed. 

Guidelines on how to prepare DMPs:

Online tools for preparing data management plans (DMPs)

Canada

UK

USA

 

Public DMP exemplars and templates created using various online tools and shared publicly by their owners: 

Data DOI

A Digital Object Identifier or a DOI (DOI System) is a unique persistent identifier for a published digital object such as book, article, study or dataset. The word 'persistent' means that it never changes. The idea behind a persistent identifier is that it doesn't break when a website gets updated. 

How to obtain a DOI for data set?

A DOI can be created by publishing organizations, not by individual people. Many data repositories can publish research data and assign a DOI to a data set. This data DOI can then be used to cite your data set in a publication. View the list of data repositories to choose which one is more appropriate for the type of data you deal with and carefully read their Terms and Conditions as some repositories may charge you for using their services.  

Here is a list of selected data repositories where a DOI can be assigned free of charge to a dataset:

How to use data DOI?

By properly citing the data and including the DOI, you're giving proper credit to the creators who conducted the research and providing the scholarly community a clearer picture of the impact of the research.

 APA 6th edition:
Refer to Publication Manual of the American Psychological Association, 6th edition, (2010) p 210 - 211 (datset) and p 212 (unpublished raw data) [UNB Library: BF76.7 .P83 2010b; OCLC:316736612].

APA Style Guide to E- Resources:
Refer to APA Style Guide to Electronic References (2012) [UNB Library: PN 171 .F56 A63 2007 ; OCLC:795354092].

Data set:

Author. (Year). Title of data set (version number). Location: Name of the creator. 

or 

Author. (Year). Title of data set (version number). Retrieved from http://

Raw data (unpublished, untitled work):

Author. (Year). [Description of study topic]. Unpublished raw data.

 

 

RDM training, 101 Readings & other resources

Research Data Management (RDM) is an emerging service at UNB Libraries, focused on providing support for data management planning, storing, and publishing. RDM is an increasingly important part of research and scholarly communications. Our website is currently under construction. We work on creating content and services relevant to our research community.  Please contact RDM Services for details and/or to ask what we have to offer.  

A free online course with all you need to know for research data management, along with ways to engage and share data with business, policymakers, media, and the wider public.
The self-paced training course will take 15 to 20 hours to complete in eight structured modules. The course is packed with videos, quizzes, and real-life examples of data management, along with valuable tips from experts in data management, data sharing, and science communication.

A registry for online learning resources focusing on research data management. It was created in a collaboration between the U.S. Geological Survey's Community for Data Integration, the Earth Sciences Information Partnership (ESIP), and DataONE. 

Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Their mission is to provide researchers with high-quality, domain-specific training covering the full lifecycle of data-driven research.

- provides links to resources from a wide range of websites and organizations, sorted by topic and audience. 

This instance of The Art of Literary Text Analysis is created in Jupyter Notebooks based on the Python scripting language. Other programming choices are available, and many conceptual aspects of the guide are relevant regardless of the language and implementation. 

HDSI combines features of a premier research journal, a leading educational publication, and a popular magazine, HDSR provides a centralized, authoritative, and peer-reviewed publishing community to service the growing profession.

RDM costing tools for grant proposals

What will it cost to manage and share my data? (from OpenAIRE) Infographics provide information for researchers on the costs of research data management, how these can be addressed in advance, and the community resources available. 

Want to estimate the RDM cost for your research project? This tool is for you:

costing tool

 

RDM 101 READING ...

linguistic

 

 

MakeScientificDataFAIR

Nature 570, 27-29 (2019) 

"Everyone needs a data-management plan. They sound dull, but data-management plans are essential, and funders must explain why." Nature 555, 286 (2018) doi: 10.1038/d41586-018-03065-z 

available in UNB library ... 

rdm for researchers  fd     Stodden    sharing data

Funding, News, Announcements

Tri-Agency Research Data Management Policy

On March 16th, 2021 following an extended period of consultation and revision, Canada's three federal funding agencies (NSERC, SSHRC, CIHR) released the Tri-Agency Research Data Management (RDM) Policy "to support Canadian research excellence by promoting sound RDM and data stewardship practices." 

The Canadian Association of Research Libraries (CARL), of which UNB Libraries is a member, has been actively engaged in the development of the new policy from its earliest stages, and the CARL-sponsored Portage Network has been working to develop RDM-related expertise, resources, and infrastructure at research institutions across Canada.

UNB Libraries' Research Data Management Services are available and ready to help UNB researchers meet new and emerging requirements from the Tri-Agencies. Our services include one-on-one consultations, classroom instruction and workshops, support in developing data management plans (including use of the Portage Network's planning tool, DMP Assistant), and UNB's own research data repository, UNB Dataverse. To find out more, visit our UNB Libraries' RDM Services page and research guide, or contact us at rdm.services@unb.ca.

 

More Information More Information

Subject Specialties:
physics, chemistry, biology, biophysics, biochemistry, medical biophysics, engineering, mathematics, computer science, geology, forestry