NHS Digital Data Release Register - reformatted

Clinical Practice Research Datalink (CPRD)

Project 1 — DARS-NIC-113074-D9M1C

Opt outs honoured: Yes - patient objections upheld (Section 251, Section 251 NHS Act 2006)

Sensitive: Non Sensitive

When: 2019/02 — 2019/07.

Repeats: One-Off

Legal basis: Health and Social Care Act 2012 – s261(7)

Categories: Anonymised - ICO code compliant


  • MRIS - Bespoke


5a. Objective for processing: The data controller is Department of Health and Social Care, with the Secretary of State for Health and Social Care (acting as part of the Crown), acting through the Clinical Practice Research Datalink centre (hereinafter referred to as CPRD) within the Medicines and Healthcare Products Regulatory Agency. This is the same arrangement for the data processor although it is CPRD who actually process the data but are not listed as data processors. The data processor is Department of Health and Social Care The Clinical Practice Research Data-linkage (CPRD) is a centre of the Medicines and Healthcare products Regulatory Agency (MHRA), an executive agency of the Department of Health & Social Care (DHSC). The MHRA regulates medicines, medical devices and blood components for transfusion in the UK and the MHRA act as the Executive agency. CPRD is the UK’s pre-eminent research service, providing access to primary care data (that has been anonymised) linked by NHS Digital to other similarly pseudonymised health data. This data is provided by NHS Digital and others for the purposes of public health research including the monitoring of drug safety. All such data is linked (in its identifiable form) by NHS Digital only. It is jointly funded by the MHRA and the National Institute for Health Research (NIHR). CPRD’s aims are to support vital public health research and to inform advances in patient safety in the delivery of patient care pathways. These depend on access to accurate, real-time representative patient data to produce reliable evidence based clinical and drug safety guidance. The legal bases for processing the data provided by NHS Digital are: • Gathering of GP patient data and collation with other data sets to produce data-sets that have been anonymised: medical research under Article 9(2)(j); drug and device safety under Article 9(2)(i) of the General Data Protection Regulation CPRD services are designed to maximise the way de-identified NHS clinical data can be used to improve and safeguard public health. For more than 20 years data provided by CPRD have been used in a range of drug safety and epidemiological studies that have impacted on health care, and resulted in over 1700 peer-reviewed publications. In addition to supporting high-quality observational research, CPRD is developing world-leading services based on using real world data to support clinical trials and intervention studies. The intention is to continue to link CPRD primary care data to NHS Digital’s secondary care and other datasets, as linkage greatly increases the scale, depth, completeness and therefore value of data available for public health research. The outputs of such research based on linked data in turn improve and protect patient care pathways/treatments and provide clinical benefits for the UK, supporting delivery of CPRD’s core objectives. CPRD’s research and data services are based on a database of de-identified longitudinal primary care records contributed by consenting GP practices from the four UK nations, and on the ability to link primary care data to secondary care data (and other data sets), from the NHS, Office of National Statistics (ONS) and Public Health England (PHE). One of CPRD’s main priorities is to increase the number of national data sets that are linked to primary care data and made available on a routine basis to the research community. Such collection and linkages occur under the appropriate permissions (ethical and s251), which have been granted to CPRD by the East Midlands & Derby Research Ethics Committee (REC), and the Health Research Authority (HRA). NHS Digital has been providing secondary and other data for linkage with CPRD primary care data for a number of years. Data linkage is carried out exclusively by NHS Digital as the Trusted Third Party (TTP) for this purpose. Linked data sets currently available include extracts from Civil Registration data; Hospital Episode Statistics (HES), which encompasses Admitted Patient Care, Critical Care, Outpatient and Accident & Emergency data; Patient Reported Outcome Measures (PROMs); Diagnostic Imaging Dataset (DID); Mental Health data; National Cancer Registry; Deprivation data including Townsend Score and Index of Multiple Deprivation. Critical care is supplied as a separate dataset by NHS Digital, but is integrated with Admitted Patient Care. Data can only be used for public health research purposes in research recommended for approval by ISAC for MHRA database research. CPRD make the final decision on access, and ensure compliance with NHS Digital’s requirements within the data sharing agreement, e.g. security of the third party. Access to CPRD data and services will not be permitted in circumstances that may result in loss of public trust or for activities that may undermine the integrity of the CPRD database. For this study CPRD will receive the linked data file from NHS Digital. Imperial College London will send nominal pollution codes and English postcodes to NHS Digital. St Georges University of London will receive the final pseudonymised dataset from CPRD. The legal bases for CPRD processing the data linked by NHS Digital is article 9(2)(j) and article 6(1)(e) of the General Data Protection Regulation. The request is not for NHS Digital data but for NHS Digital to carry out a trusted 3rd party data linkage between the English postcodes sent by Imperial College London and the primary care health records sent by the GP system providers on behalf of CPRD. Section 251 support is in place to cover the linkage. Associations between long-term concentrations of outdoor air pollution and heath have been evaluated using epidemiological cohort studies. Substantial reviews of the epidemiological, toxicological and mechanistic literature have concluded that the evidence is sufficient, or suggestive, to infer causality for a range of health outcomes. The US Health Effects Institute (HEI) identified in their 2014 Research Agenda the need to improve understanding of the nature of the relationships between pollutants and health at low levels of air pollution currently prevalent in North America, Europe, and other high-income countries. In response to the HEI’s request for applications, a consortium of European investigators proposed a joint study utilising existing individual and administrative cohorts. The ESCAPE cohort study (European Study of Cohorts for Air Pollution Effects) has analysed previously a number of individual cohorts across Europe. The proposal from the European consortium, now funded by the HEI, aims to develop previous cohort studies by group members by combining a pooled analysis of the ESCAPE cohorts together with local analyses of the administrative cohorts utilising pollution concentrations derived from European scale and local pollution models at a 100m grid resolution. The Health Effects Institute has funded 14 institutions in a European collaborative study to bring together cohorts from a large number of countries, including England, to study the associations between concentrations of pollutants and mortality and disease incidence. None of the 14 institutions will have a role in this study and the linked dataset. The US Health Effects Institute fund St Georges, University of London to carry out the work, however, they have no control on the outputs of the study. The aim of the study is to assess, associations between long-term average concentrations of particulate matter, nitrogen dioxide, sulphur dioxide, black carbon and ozone and the risk of death and disease incidence in England. This investigation comprises a survival analysis incorporating measures of air pollutants, patient characteristics such as age, sex, body mass index, smoking status and index of multiple deprivation score for all-cause and cause-specific mortality and the incidence of coronary and cerebrovascular disease, dementia, and lung cancer. Annual concentrations of pollutants including nitrogen dioxide, particles and ozone will be provided by Imperial College London in pseudonymised form to CPRD, who will construct the final dataset. The pollutant concentrations have been derived from models based upon data from satellites, land utilization data and monitoring stations. Noise levels have also been derived from statistical models based upon measurements and building topology. The pollution data is provided for all postcodes (postcode centroid) in England for 2010 and are also extrapolated to other years. The data will not be used for commercial purposes, not provided in record level form to any third party and not used for direct marketing.

Expected Benefits:

The expected benefit from this study will be an improved understanding of the nature of the relationships between pollutants and health at low levels of air pollution currently prevalent in the UK. Air pollution, particularly nitrogen dioxide and particles emitted in diesel exhaust, continues to be of concern to government agencies, health organisations, environmental groups and the public. In 2009 the UK Committee on the Medical Effects of Air Pollutants concluded that the available evidence supported a causal association between long-term exposure to particulate air pollution, represented by PM2.5, and mortality. A recent assessment of the consequence of life long exposure to air pollution by the Royal Colleges highlighted the dangerous impact on the nation’s health. Clarification of the nature of the relationships at relatively low concentrations will enable the burden of air pollution at current levels and the impact of any policy scenarios to be evaluated more accurately hence leading to appropriate, cost effective, pollution abatement strategies leading to improved protection of human health and the environment. The outputs from this study will provide evidence for the association between air pollution and health. These results will be incorporated into evidential assessments by national and international bodies such as the UK Committee on the Medical Effects of Air Pollutants (COMEAP), the WHO and the US Environmental Protection Agency. Such assessment are used in setting guideline and limit values for air pollution and provide inputs to cost benefit modelling exercises such as Defra’s current assessment of mitigation strategies for the UK to meet NO2 limit values as directed by the European Commission and confirmed UK courts after challenge by ClientEarth. The outputs will be hazard ratios and confidence intervals for a range of diseases. These measures are incorporated into systematic reviews and meta-analysis undertaken by governments/health organisations. The large English cohort will contribute data to these reviews as well as provide specific evidence for a UK population. A recent example of how these data feed into evidential reviews and into policy and public health benefits is the recently published NO2 review by COMEAP. The summary coefficient for nitrogen dioxide (NO2) and mortality from the review was used by Defra in their cost-benefit analysis to determine strategies to achieve mandatory reductions in concentrations of NO2. The HRs were used to quantify reduction in years of life lost which translates into monetary benefits. The benefits of the outputs from this study will be improved information for the characterisation of the effects of air pollution on health in the UK. As described, the outputs feed into a process that lead to the formulation of air pollution control strategies that will reduce the risks of long-term exposure to air pollution in the general population. The outputs from this research will be disseminated via conference presentations and publications in the peer review literature. Publication in open-access journals enables the results to reach the widest audience world-wide and ensures the results are included in literature searches as part of systematic reviews. The outputs will provide coefficients for input into cost benefit models in order to formulate appropriate mitigation strategies to reduce air pollution emissions. Examples might include controls on engine emissions, traffic volumes or low emission zones. For example, such plans are detailed in Defra’s consultation for reducing NO2 concentrations. Air pollution exposure is ubiquitous. The Royal Colleges recently assessed the lifelong burden of air pollution exposure. They estimate that 40,000 deaths per annum were attributed to long-term exposure to outdoor air pollution. The benefits will be achieved by the data controller and third parties as described above. The outputs are an important input to evidential reviews and cost benefit analysis undertaken by Government departments, Health organisations and academics. The benefit will be measured using years of life lost (for mortality) and the attributable number of deaths. The health effects of air pollution are routinely monitored (by COMEAP for example) and reviews routinely undertaken. WHO is currently undertaking a review of the evidence in support of its revision of the air pollution guidelines. The US EPA also regularly updates its assessments. Depending upon the findings from this study these organisations may consider updating their recommendations or they will include them in their next planned assessments.


The outputs from the analyses comprising summary statistics and hazard ratios and associated 95% confidence intervals will be included in a report to the study sponsors (Health Effects Institute). The findings will be published in specialist peer reviewed epidemiological journals to be decided at the end of the study. The findings from the study will be presented at the first International Society for Environmental Epidemiology meeting and the first HEI annual review meeting after completion of the study. Publication in an HEI report and in peer review journals will enable the findings from the study to be included in evidential reviews by organisations and Governmental agencies such as the UK Committee on Air Pollution, World Health Organisation and the US Environmental Protection Agency. These evidential reviews provide the scientific basis for advice to Government Departments in cost/benefit calculations e.g. the recent Air Quality Strategy published by Defra. Imperial College London and CPRD will also disseminate the study findings on their websites and internal newsletters/ publication. No individual level data will be included in any reports, journal publications or conference abstracts/presentations/posters. The target date for the production of the output is 30/06/2019. For the pathways of dissemination of the outputs there will be presentations at scientific conferences: the annual HEI conference and the International Society for Environmental Epidemiology both of which are open to stakeholders and the public. The output will also be published in peer reviewed open access journal papers. No specific public / patient engagement activities are currently planned but suitable routes of dissemination will be considered and put in place. All outputs will be restricted to aggregate data with small numbers suppressed in line with the HES Analysis Guide.


The only Identifier required for the linkage of CPRD primacy care data to Imperial College London pollution data is patient postcode; this is not needed for the research study itself but will be sent by the GP system providers and Imperial College London to NHS Digital to generate the bridging file. The GP system providers will not submit any other identifiers to NHS Digital. This bridging file will contain the nominal codes that has successfully been linked to the CPRD primary care data and the patient pseudonyms, which will be used by CPRD to create a linked pseudonymised dataset The final dataset that will be sent to St. George’s, University of London will be pseudonymised data. This data linkage requires CPRD and Imperial College London identifier –postcode– to permit accurate linkage of CPRD’s primary care health records and Imperial College London’s air pollution datasets for all English practices into a new linked dataset for the research study. Imperial College London provide only environmental data for all English postcodes to NHS Digital. This data is generated from annual average concentrations of air pollutants (particular matter, nitrogen dioxide, ozone and black carbon) were modelled using a state-of-the-art European model which combines information from satellite data and chemical transport models with information on the road network, land use and monitored pollutant concentrations These models were developed and published by the Swiss Tropical and Public Health Institute, Basel, as part of this project. These air pollution maps (100m x 100m resolution) were sent to Imperial College London. A second set of modelled pollutant concentrations were produced by IC using UK specific land use data and model specification. Imperial College London then linked each English postcode centroid (x,y coordinate) to these air pollution maps using a geographic information system. Annual estimates of noise exposures will be assigned to English postcode centroids using a version of the CNOSSOS-EU model developed by Imperial College London. Imperial College London and St Georges University London do not know which postcodes contain patient data held in CPRD. No clinical data from the GP system providers is sent to NHS Digital, and at no stage do CPRD, Imperial College London or St George’s, University of London receive any patient identifiers. Personal identifiers including name, date of birth, postcode and NHS number are removed at source by the GP system providers and replaced by pseudonymised system patient and practice identifiers (GP System Practice and Patient ID) prior to transfer of data to CPRD. CPRD then replaces the original GP System Practice and Patient ID with a CPRD patient pseudonym (CPRD Patient ID). Identifiable data fields for CPRD patients flow directly from GP system providers to NHS Digital. The legal basis for the lawful flow of identifiable data is primarily CPRD’s s251 support (ref: ECC 5-05 (a)/2012). This support permits “GP practices and specified others (according to the approved ‘Master Dataset’ list) to [1] transfer confidential patient information to NHS Digital; [2] NHS Digital to receive identifiers, undertake linkages and provide CPRD a de-identified dataset.” Under the described legal basis, the following steps explained below will be used to transfer, store and process data as part of this linkage. Step 1. Transfer of identifiers (transfer of data from Imperial College London to NHS Digital, and from GP system providers to NHS Digital, will be via secure file transfer protocol (SFTP) servers which are encrypted to ensure security of electronic data in transit). Step 1a. Imperial College London will securely provide to NHS Digital as the Trusted Third Party (TTP) for linkages, a file containing all English postcodes held in Imperial College London, since at this stage, it is not clear which postcodes will link to the CPRD patient data and be relevant to the study. The file consists of one data field (English Postcode) and the Imperial College London nominal pollution code (a pseudonym attached to each English postcode within the Imperial College London dataset and used to link the pollution data). The nominal code sent by Imperial College London is for the creation of the bridging file sent to CPRD, which is explained in step 2. Step 1b. In parallel, CPRD requests that participating GP system providers securely provide to NHS Digital, a file containing information on all patients held in CPRD. The file consists of the one identifiable data field (Postcode) and the GP System Practice Key and Patient Key (pseudonymised data fields assigned to each unique individual in CPRD). Step 2. Creation and provision of bridging file by the Trusted Third Party NHS Digital match the identifiable data field (Postcode) received from GP system providers to the English postcode and nominal pollution code file from Imperial College London. NHS Digital supply CPRD with a bridging file containing a pseudonymised patient identifier (Study ID), Imperial College London nominal pollution code and the GP System Practice Key and Patient Key for each linked postcode that can be used to merge the primary care dataset with the second Imperial College London dataset containing nominal pollution codes and postcodes. Additionally, NHS Digital generate and supply a study specific pseudonymised patient identifier for each linked patient (Study ID). NHS Digital securely releases the bridging file via secure file transfer protocol (SFTP) to CPRD. Once the bridging file has been supplied to CPRD, and CPRD confirm the linkage as valid, NHS Digital will delete the file supplied by Imperial College London (Step 1a). Imperial College London Data that has not been matched from the primary care dataset with the Imperial College London dataset will be deleted by NHS Digital. Data supplied by the GP system providers to NHS Digital (Step 1b) is utilised for CPRD routine linkage and will be retained. It is emphasised that following data linkage by NHS Digital using patient identifiable fields, there is no further flow or use of identifiable data at any point past this stage. Step 3. Extraction and provision to CPRD Imperial College London will send to CPRD a file containing all the nominal pollution code with the pollution data securely by SFTP. Step 4. Creation of study dataset by CPRD CPRD use the GP System Practice and Patient IDs in the bridging file (supplied by NHS Digital and initially provided by the GP System Providers) to generate the associated CPRD patient pseudonym (CPRD Record Key) using internal lookup files. A patient cohort file containing CPRD Record Key is combined with the CPRD Patient Key generated from the bridging file received from NHS Digital (Step2) to generate a list of Imperial College London nominal pollution codes corresponding to each CPRD patient in the cohort. The bridging file supplied to CPRD in Step 2 containing the nominal pollution code, will be used to link the pollution data in the file sent by Imperial College London in step 3. The air pollution data that has not been linked will be discarded by CPRD. CPRD creates an anonymised study dataset for release to researchers containing Imperial College London nominal pollution codes for all CPRD patients in the cohort. CPRD Record Key and Imperial College London nominal pollution code are not included in the dataset. Step 5. Release of study dataset to St George’s, University of London CPRD ensure that the research applicants (St George’s, University of London) have signed a bespoke Dataset Agreement, previously agreed with Imperial College London. This will include any additional terms and conditions required by Imperial College London, before any release of the linked data outside of CPRD. The study dataset is then sent securely to St George’s, University of London by CPRD, using SFTP. St George’s, University of London researchers use the study dataset under the Dataset Agreement to produce research outcomes as approved under their Independent Scientific Advisory Committee, ISAC, protocol. CPRD retain a copy of the study dataset for archiving purposes once the data has been successfully transferred to and verified by St George’s, University of London. CPRD also deletes Imperial College London data not included in the study after preparation of the dataset has been undertaken (Step 4). The resulting dataset will be accessible solely by employees of St George’s University of London who will process and analyse the data to obtain findings for research outcomes. The data will be held on St George’s, University of London servers in the UK and will not be stored elsewhere at any time. The request for this particular type of data linkage has been initiated by St George's University of London, after this initial linkage and dissemination, the data will also be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process. The environmental data provided by Imperial College London will be linked to the wider CPRD database and will be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process. All organisations party to this agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract ie: employees, agents and contractors of the Data Recipient who may have access to that data).