Seems the big scientific brouhaha at the moment is PLoS’s recent (clarifications to their) policy that data for a paper will be shared. In my field, the answer to the title of this post is, “OF COURSE!”
However, I get that there are different cultures, that vary by field, about what kind of data sharing is expected, and how much credit should be given to those who share data (citation, certainly, but what about authorship?). As has been discussed, there are also “lots” of corner-cases about where and how exactly the policy does or should apply. My guess is that PLoS actually left this intentionally vague, so that Editors can use their judgement (although hopefully they have been trained on what exactly the policy is meant to do; I can’t find the tweet that suggested this).
But someone might scoop me on the analysis
I asked my dad, who is a medical doctor spending just a fraction of his time on research, how he felt about this argument. There are lots of pressures on his time aside from doing analysis on data. His funding for his time spent on research does not come from taxpayers. If anyone should be sympathetic to this argument, it should be him. His response? “Well, you better do all your analysis the first time then!”
If you’re going to hoard all your data for the day, years down the line, when you might publish your analyses from it, then there’s a chance you could get hit by a bus, have the data get damaged, or otherwise just not ever get around to it, despite all your best intentions to publish it. I get that not everything people are doing directly translates to life-or-death decisions, but if there’s scientific insight that might be gained from your data, how is it not wrong to slow the progress of that insight?
Furthermore, it seems to me if the tacit understanding in your field is that “when data are shared, along with some other substantial contributions, that’s standard grounds for authorship”, then it seems to me that it would be a breach of publication ethics to not include the source of the data as an author. The key point of “peer review” is that it’s done by your peers, and if it really is so unusual to have someone provide data without being a formal author, then you should trust your peers to catch that.
It’s too hard to put my data into a format that people will use
This may or may not be a corner-case, but a lot of time, you could just submit the excel file, data table, whatever other form the data is in zipped together as “Supplemental File 1. Data collected in 29 different files, separated by the fleezle criteron”.
My favorite source code license (under which I’ve released my processing/analysis code) is the Community Research and Academic Programming License. It’s obviously not designed for releasing data (it’s even unclear whether scientific data is subject to copyright protection and therefore licensable), but I think something very much like it could be useful for releasing data, especially in assuaging fears that it might be ugly and not in a totally pristine format. I might grumble when people’s released data tables are in a terrible format, but I actually curse them when it’s just plain not available.
What’s the “right” thing to do about data sharing?
There’s two questions that are tied up into this one. First, what do we want the world to look like? And second, only once we know where we’re going, how do we get there?
For the first question, I think almost everyone will agree that, all else being equal, more science and better science will get done the more free the data is. You want more, better science, don’t you? Of course, that’s only true if people feel they can be appropriately compensated for the effort they go through to make and share the data. I think if we can assuage people’s fears that their hard work will go properly recognized by funding bodies and hiring, tenure, and promotion committees (in almost all cases, composed of your scientific peers), then we can probably get them on board with freely sharing their data.
How we get to that kind of world is a whole ‘nother question. You might disagree, but I think PLoS is on the right track. As Gandhi probably never said, you must be the change you wish to see. PLoS is sticking itself out there, as it has done in improving other areas of science publication. Will this totally fix the apportioning of credit in every field? No, but I think the discussion about it will help bring the issue to the fore, so we can at least start moving towards that world. “Datasets generated” isn’t currently a standard section on a CV, but should it be? I think so (and may go ahead and update mine now).
Disclosure: My PhD supervisor, Michael Eisen, is a co-founder of PLoS and on it’s board of directors. I have not spoken with Mike about the content of this post or the “new” PLoS policy.