‘Data should not be gathering dust on a shelf’
In the old days, it was perfectly acceptable for scientists to keep their research data stored away in a desk drawer or on a floppy disk. But not in the age of open science. Responsibly storing, preserving and sharing data has become an increasingly important part of the research process. Tilburg University researcher Michèle Nuijten and information specialist Petra Ploeg explain why good data management matters, and how data can be kept safe and secure far beyond the lifespan of a research project.
On the Tilburg University campus, personal computers and memory sticks are no longer considered good places to store research data. Still, not every researcher has made the switch to sustainable and transparent data management. “Some researchers are more invested and engaged in good data management than others,” says assistant professor in methodology and statistics Michèle Nuijten, whose research focuses on improving science practices. “Not everyone is as aware that collecting and preserving data requires more than saving your data set to your computer or emailing it to a co-author.”
“A dataset often isn’t just a single Excel worksheet — it’s much more complex”
So what exactly is the problem with keeping research data in the My Documents folder on your computer or in an email box? According to Nuijten, a lot can go wrong. If not stored and preserved properly, data can easily become corrupted, uninterpretable or lost. “A research project takes months or years to complete, so you can imagine that you collect huge amounts of data over such a long period of time. A data set is often not a single Excel worksheet, but a complex set of different files, documents and versions,” she says. “Before you start collecting data, you need to think of how you’ll store all that information in such a way that you and others will still be able to access and make sense of it—whether in five months or five years.”
Petra Ploeg, information specialist at the Tilburg University library, says data sets are too valuable to be gathering digital dust on a computer. “Data sets often contain a wealth of unique information, which can be useful even years after the original research has been concluded. You don’t want to let such valuable data to be forgotten on a disk somewhere.”
After researchers are done with a specific data set, they don’t have to continue looking after it themselves. To keep data from becoming orphaned, researchers are encouraged to leave their data in the care of data repositories—large information warehouses set up to store, manage and share scientific data. “By preserving data according to the principles of FAIR, a repository ensures that data is—and continues to be—findable, accessible, interoperable and reusable,” Ploeg explains.
Michèle Nuijten explains how she keeps her data secure during and after a research project: “I usually set up a shared folder with automatic backup, which my co-authors and I work in while we’re still in the process of collecting data, running analyses and writing up results. Once the research is completed, I always deposit my data in a repository. That way, I know my data is safely stored for the long run.”
Nuijten is an ardent supporter of data sharing. Whenever possible, she says, researchers should make their raw data openly available in a data repository. “If data is locked away, it’s impossible to see what the conclusions of a study are based on. Others will just have to trust that no errors or questionable analytic choices were made,” she explains. “Data sharing allows us to detect and correct mistakes. Plus, there is often a lot more information to be gained from a data set if it’s used by other researchers to answer different research questions.”
“Repositories make your data sustainable”
Petra Ploeg notices that some Tilburg University researchers are hesitant to deposit their data in a repository because they are concerned they will lose ownership or control over their data. They are not always aware that they have a lot of control over what they share, who they share it with, and under which conditions. “Researchers put a lot of time and effort into their data set, so they might worry about what will happen to their data or whether others will reap the benefits of the hard work they’ve done. But if you deposit your data in a repository, that doesn’t mean you’re giving it away. Quite the contrary—repositories make your data sustainable, ensuring your hard work is not wasted.”
Nuijten understands the concerns some researchers may have. “If you spend years building a data set, it’s understandable that you want to get a chance to use your data for your own publications before you allow other researchers to use it for theirs,” she says. “But there are solutions for that. For example, it’s possible to place your data under embargo for a given period of time, so that it doesn’t become publicly available immediately.”
There are many different data repositories to choose from. Some repositories are discipline-specific or focused on specific types of data, while other repositories are more generally oriented. According to Petra Ploeg, it’s best to choose a repository that is certified. “That way, you know the repository to which you deposit your data is trusted. In addition, major research funders such as NWO and ERC require data to be deposited in a certified repository.”
At Tilburg University, the central data repository is TiU Dataverse. This general-purpose data repository, managed by the university library, is built on the Dataverse software developed by Harvard University. Researchers from all Tilburg University faculties can deposit their data to TiU Dataverse. In addition, the university operates three domain-specific repositories: PROFILES, LISS Panel Data, and DNB Household Survey. All four Tilburg University repositories are certified with the international CoreTrustSeal quality label, or its earlier version, the Data Seal of Approval, which guarantees data is stored safely and sustainably.
“More and more researchers come to us for assistance in managing and preserving their data”
Good data management
According to Petra Ploeg, researchers on the Tilburg University campus are increasingly finding their way to TiU Dataverse. “We’ve had Dataverse for some time now, but we’ve seen interest in it increasingly growing over the last several years. More and more researchers come to us for assistance in managing and preserving their data.”
Ploeg thinks the rising interest in sustainable data management is, at least in part, driven by the open science movement, which calls for more transparency and accessibility in science. “In the age of open science, researchers are increasingly being required by their funder or institution to store their data in a transparent and sustainable way,” she says. “But more importantly, there is a growing awareness that good data management benefits individual scientists as well as science as a whole.”