Researchers on RDM: Dr Pim Huijnen, Utrecht University

The LEARN Project is all about creating tools and resources which help institutions improve the way research data is managed. Through workshops and other activities, we’ve spoken to many institutions about the challenges they face in this area.

However, data management isn’t something that only comes from institutions. Researchers also need to be actively involved, and institutions need to understand how to encourage researchers when it comes to storing and make data available.

Pim Huijnen is a Digital Humanities researcher and Assistant Professor Digital Cultural History at Utrecht University.

To better understand this topic, we’re publishing a number of blog posts focused on researchers and their data. Our first interview is with Dr Pim Huijnen, a Digital Humanities researcher and Assistant Professor Digital Cultural History at Utrecht University.

Our conversation highlighted why some researchers prefer to store data locally rather than in a central repository, and this is one issue which will have to be considered if institutions want to convince more researchers to put their data in repositories.

I’m a historian working at Utrecht University as an assistant professor at a cultural history group responsible for Digital Humanities research. I also work on a digital humanities project, Translantis, which uses new digital computational techniques to do historical research. Our data comes from the Delpher historical newspaper archive. We’re trying to come up with new techniques to extract information out of this massive data set, rather than browsing through it in a normal way. It’s all about distant reading.

When we talk about data management, we have to first make clear what data in this context really means. For me and my colleagues, it’s really the source material, the digitised originals, that we consider our data. Broadly speaking we don’t generate new data that could be reused in the context of someone else’s research.

We do come across issues related to data but that’s all in relation to the original material. For example, we do store data on a server at a national computing centre but we don’t know how long we can keep it there, how long will they be a server for our data?

Issues like this mean that we also try to keep our own data stored locally on our own computers. When it’s stored locally, nobody can say that they’re going stop storing our data, which would force us to find another solution.

I keep my data close to me so I can get to it when I want, but also to avoid bureaucracy. As soon as I go to the IT department, they’ll ask me to fill out some forms and I might have to go to my department and ask for money. I get tired even thinking about it.

This makes me think that it’s better to ask my department for an extra computer on which to store my data instead of filling out all of these forms.

Copyright is also something I worry about because I make individual arrangements with data providers and in some cases they tell me that I can use it in any way I like for my project but they also ask me not to share it with anybody. As soon as I store my data somewhere else, I can’t be 100% sure that nobody will get to my data and I will get into trouble when they do.

Compared with researchers from other domains, the way I work could be seen as quite informal but I’m very comfortable with it and I don’t see my techniques changing now or in the near future. If the amount of data increases drastically or if I’m part of a larger group that needs to get to my data, however, I will need to change the way I work.

It would be very helpful to me if I could access not only storage but also more processing power to work with my data. You need a fairly large computer in terms of memory and processing power to analyse the data sets I collect in a genuine distant-reading way. Otherwise, mostly I just see how far my computer goes and it turns out that I can only analyse 20-30 years at a time but really I’d like to see a 100 year period.

Want to know more about this topic? The issue of data storage for Humanities researchers is also considered in LEARN’s Toolkit of Best Practice for Research Data Management, for example in Case Study 9 from University College London – Challenges and Opportunities for Research Data Management in the Arts, Humanities and Social Sciences: a practitioner’s viewpoint (p. 45). Download the free Toolkit.

Share this!

Leave A Reply