Data nirvana. A thought following the Bean Review of UK statistics

One point coming out of Charlie Bean’s excellent interim review of the Office for National Statistics in the UK is that the ONS should provide better access to the underlying micro data behind our statistics.

Here are two reasons why a data nirvana of complete access is desirable.

First, aggregates are constructed – necessarily – under a very delicate set of assumptions.  To give an example, price indices that embrace apples and oranges will use assumptions about optimising consumers and a particular utility function to motivate weighting sub-indices by shares of expenditure in the sub-index in total expenditure.  Other data constructs might invoke the assumption of perfect competition.

These and other assumptions would, in an ideal world, be easily taken apart and modified by academics or data-users with a potentially better alternative;  or one that is better suited to some other purpose.

Second, frequently, information from sub-aggregates that goes into compiling an aggregate flows in at different times.  Early releases of the aggregate are often completed using forecasts or model and judgement-based assumptions to fill in the data that are missing at that point.  There may be other parts of the data production process that involve formal or informal filtering like this.  Like outlier and error detection.  Or judgements invoked to reconcile competing data sources [like the output/income/expenditure data on national accounts].

Complete access would allow users to experiment with their own alternatives for solving these filtering problems, one that the ONS themselves may not always be the best at, or where their solution may not be best for a particular user’s purpose.

A third reason – really a generalisation of my first two points – is that the optimal data series/index will depend on the use to which it is put.  The CPI is not the best index for me to use to track evolutions in the purchasing power of my salary over time, since my expenditure patterns don’t match those of the average respondent to expenditure surveys.  Other CPI indices could be optimised for preserving the purchasing power of benefit recipients;  or maximising the ability to forecast future values of the conventional CPI itself.   The possibilities are many.

This kind of Nirvana is difficult to achieve because it risks some breach of anonymity, particularly in the case of business respondents, who may be large enough to identify easily.  To the extent that data collection – particularly data quality – requires cooperation, and that cooperation requires a credible protection of anonymity, so that information isn’t used for commercial advantage, full access is problematic.

The ONS do have systems for researcher access, using remote terminals that allow users to dive into the data and run code on ONS servers, checked before release for anonymity threats.  But the systems are cumbersome and expensive.




This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s