Making the connection

Data flow 10 January 2012

By Peter Elias

Survey-based data collection methods underlie much empirical research in the social and economic sciences. Large-scale sampling and surveying of known populations is a tried and tested method used to create many of the major data resources supported by the ESRC. Understanding Society is a good example of such a resource, as are the Birth Cohort Studies and, at an international level, the European Social Survey.

Data collected by sample surveys are relatively expensive to compile. Whether through face-to-face or telephone interviews, the process is both time-consuming and costly. Cheaper web surveys are becoming increasingly popular, but for high-quality research in the social sciences these do not provide the same degree of control over the information collected that can be obtained via an interviewer-administered approach.

Response rates across all major surveys are falling, possibly because people are leading increasingly busy lives but also because of 'survey fatigue' – a lower willingness to co-operate as more organisations attempt to gather information from individuals by survey methods.

Data linking provides another approach that has the potential not just to add high value to existing surveys, but also as a means of creating new data resources which place no reliance whatsoever on survey-based methods. It takes advantage of our 'digital footprint' – the records we create as we go about our daily lives. These may record transactions we engage in, registrations we undertake, our communications with each other and web-based activity. Examples include administrative records (eg, social security payments, income tax payments, school and college enrolments, hospitalisation and GP records), transaction records (eg, mortgage payments, electricity billing, use of loyalty cards when shopping), internet activity such as web searching and use of social media, and remote sensing records (eg, road transport sensing devices). While these examples reveal the diversity of the electronic information we generate, they share certain common characteristics:

  • They are, in most cases, personal records – they may contain detailed information about an individual that could be misused if placed in the wrong hands
  • they are not designed specifically as research resources even though the information they contain might have research value
  • they belong, in general, to big datasets covering large segments of the population.

Some countries have developed their use of administrative records to the point where these are now routinely substituted for the more traditional ways of generating data resources for social and economic research.

In Finland, for example, the population census is conducted not via the use of a census form delivered to households but from 30 different registers and administrative files. In the UK, it is within the devolved administrations that data linking has progressed to the point where it significantly enhances existing surveys and facilitates linkage across different types of administrative records. While these examples indicate the value of linked data, the approach towards more routine research use of data linkage procedures is rightly moving cautiously given the risks and problems involved.

Continued - next page