instruere...inlustrare...delectare Disputations

Tuesday, December 01, 2009

Science and data, ii

The atmospheric and oceanographic data used in climate modeling have a couple of key characteristics that a lot of other scientific data sets do not have: Sparsity and scarcity.

The sparsity of the data means there are too few measurements to fully determine the global oceanographic and atmospheric state at any point in time, much less over time. Various analytical and numerical techniques are required to fuse disparate data and interpolate between points.

These techniques may or may not be controversial, and they may or may not be sound. Most of them, though, will be chosen over other techniques for the use of which there is at least some justification.

If you want to know what techniques are used, and the justification for using them rather than others, then you have to read the scientific papers describing them. Emails, particularly informal emails of a political rather than scientific nature, don't cut it.

Which brings me to the scarcity of this data. A relatively small number of organizations have ownership of most of it, and there are legitimate reasons for them to control its dissemination. If I own a certain set of raw data, then I am to some extent answerable for any conclusions anyone else draws from that data. The concern some people have expressed over data processing shenanigans on the part of the CRU would apply equally to any other organization that processed the same data.

That said, the processing of the raw data into releasable products has to be done in a manner open to verification by an independent party. If you want to verify my data processing, then I should describe what I'm doing to you at a level of detail such that you could produce the same product given the same raw data. (Ideally, at least one independent party would be given the same raw data, to complete the verification.)

Of course, we want both the data processing to be done right and the right data processing to be done. And the second question, whether the raw data was processed appropriately, is at least as technical a question as the first. We don't emerge from the womb knowing when it would be scientific malpractice to use anything less than cubic splines, so simply pointing out what processing was or wasn't done isn't an argument against it.

To sum up, climate modeling is by its nature substantially different from a lot of other scientific work. There are unconstrained choices to be made of a kind that doesn't apply if, say, you're trying to determine the boiling point of a chemical solution at one atmosphere. This means, on the one hand, that we would expect to find differences of opinion in which choices are made, and on the other hand that the conclusions are not as certain as they were in our high school science labs.

And finally, a hermeneutic of suspicion cuts all ways. That two models produce different results tells us nothing about which model is better; that two scientists disagree tells us nothing about who is more correct. It's not just that most of us don't know enough to form an independent opinion on climate change, but we don't even know enough to form an independent opinion on whose opinion on climate change we should adopt.