NHS Digital Data Release Register - reformatted

Clinical Practice Research Datalink (CPRD)

Project 1 — DARS-NIC-113074-D9M1C

Opt outs honoured: Yes - patient objections upheld (Section 251, Section 251 NHS Act 2006)

Sensitive: Non Sensitive

When: 2019/02 — 2019/07.

Repeats: One-Off

Legal basis: Health and Social Care Act 2012 – s261(7)

Categories: Anonymised - ICO code compliant

Datasets:

  • MRIS - Bespoke

Objectives:

5a. Objective for processing: The data controller is Department of Health and Social Care, with the Secretary of State for Health and Social Care (acting as part of the Crown), acting through the Clinical Practice Research Datalink centre (hereinafter referred to as CPRD) within the Medicines and Healthcare Products Regulatory Agency. This is the same arrangement for the data processor although it is CPRD who actually process the data but are not listed as data processors. The data processor is Department of Health and Social Care The Clinical Practice Research Data-linkage (CPRD) is a centre of the Medicines and Healthcare products Regulatory Agency (MHRA), an executive agency of the Department of Health & Social Care (DHSC). The MHRA regulates medicines, medical devices and blood components for transfusion in the UK and the MHRA act as the Executive agency. CPRD is the UK’s pre-eminent research service, providing access to primary care data (that has been anonymised) linked by NHS Digital to other similarly pseudonymised health data. This data is provided by NHS Digital and others for the purposes of public health research including the monitoring of drug safety. All such data is linked (in its identifiable form) by NHS Digital only. It is jointly funded by the MHRA and the National Institute for Health Research (NIHR). CPRD’s aims are to support vital public health research and to inform advances in patient safety in the delivery of patient care pathways. These depend on access to accurate, real-time representative patient data to produce reliable evidence based clinical and drug safety guidance. The legal bases for processing the data provided by NHS Digital are: • Gathering of GP patient data and collation with other data sets to produce data-sets that have been anonymised: medical research under Article 9(2)(j); drug and device safety under Article 9(2)(i) of the General Data Protection Regulation CPRD services are designed to maximise the way de-identified NHS clinical data can be used to improve and safeguard public health. For more than 20 years data provided by CPRD have been used in a range of drug safety and epidemiological studies that have impacted on health care, and resulted in over 1700 peer-reviewed publications. In addition to supporting high-quality observational research, CPRD is developing world-leading services based on using real world data to support clinical trials and intervention studies. The intention is to continue to link CPRD primary care data to NHS Digital’s secondary care and other datasets, as linkage greatly increases the scale, depth, completeness and therefore value of data available for public health research. The outputs of such research based on linked data in turn improve and protect patient care pathways/treatments and provide clinical benefits for the UK, supporting delivery of CPRD’s core objectives. CPRD’s research and data services are based on a database of de-identified longitudinal primary care records contributed by consenting GP practices from the four UK nations, and on the ability to link primary care data to secondary care data (and other data sets), from the NHS, Office of National Statistics (ONS) and Public Health England (PHE). One of CPRD’s main priorities is to increase the number of national data sets that are linked to primary care data and made available on a routine basis to the research community. Such collection and linkages occur under the appropriate permissions (ethical and s251), which have been granted to CPRD by the East Midlands & Derby Research Ethics Committee (REC), and the Health Research Authority (HRA). NHS Digital has been providing secondary and other data for linkage with CPRD primary care data for a number of years. Data linkage is carried out exclusively by NHS Digital as the Trusted Third Party (TTP) for this purpose. Linked data sets currently available include extracts from Civil Registration data; Hospital Episode Statistics (HES), which encompasses Admitted Patient Care, Critical Care, Outpatient and Accident & Emergency data; Patient Reported Outcome Measures (PROMs); Diagnostic Imaging Dataset (DID); Mental Health data; National Cancer Registry; Deprivation data including Townsend Score and Index of Multiple Deprivation. Critical care is supplied as a separate dataset by NHS Digital, but is integrated with Admitted Patient Care. Data can only be used for public health research purposes in research recommended for approval by ISAC for MHRA database research. CPRD make the final decision on access, and ensure compliance with NHS Digital’s requirements within the data sharing agreement, e.g. security of the third party. Access to CPRD data and services will not be permitted in circumstances that may result in loss of public trust or for activities that may undermine the integrity of the CPRD database. For this study CPRD will receive the linked data file from NHS Digital. Imperial College London will send nominal pollution codes and English postcodes to NHS Digital. St Georges University of London will receive the final pseudonymised dataset from CPRD. The legal bases for CPRD processing the data linked by NHS Digital is article 9(2)(j) and article 6(1)(e) of the General Data Protection Regulation. The request is not for NHS Digital data but for NHS Digital to carry out a trusted 3rd party data linkage between the English postcodes sent by Imperial College London and the primary care health records sent by the GP system providers on behalf of CPRD. Section 251 support is in place to cover the linkage. Associations between long-term concentrations of outdoor air pollution and heath have been evaluated using epidemiological cohort studies. Substantial reviews of the epidemiological, toxicological and mechanistic literature have concluded that the evidence is sufficient, or suggestive, to infer causality for a range of health outcomes. The US Health Effects Institute (HEI) identified in their 2014 Research Agenda the need to improve understanding of the nature of the relationships between pollutants and health at low levels of air pollution currently prevalent in North America, Europe, and other high-income countries. In response to the HEI’s request for applications, a consortium of European investigators proposed a joint study utilising existing individual and administrative cohorts. The ESCAPE cohort study (European Study of Cohorts for Air Pollution Effects) has analysed previously a number of individual cohorts across Europe. The proposal from the European consortium, now funded by the HEI, aims to develop previous cohort studies by group members by combining a pooled analysis of the ESCAPE cohorts together with local analyses of the administrative cohorts utilising pollution concentrations derived from European scale and local pollution models at a 100m grid resolution. The Health Effects Institute has funded 14 institutions in a European collaborative study to bring together cohorts from a large number of countries, including England, to study the associations between concentrations of pollutants and mortality and disease incidence. None of the 14 institutions will have a role in this study and the linked dataset. The US Health Effects Institute fund St Georges, University of London to carry out the work, however, they have no control on the outputs of the study. The aim of the study is to assess, associations between long-term average concentrations of particulate matter, nitrogen dioxide, sulphur dioxide, black carbon and ozone and the risk of death and disease incidence in England. This investigation comprises a survival analysis incorporating measures of air pollutants, patient characteristics such as age, sex, body mass index, smoking status and index of multiple deprivation score for all-cause and cause-specific mortality and the incidence of coronary and cerebrovascular disease, dementia, and lung cancer. Annual concentrations of pollutants including nitrogen dioxide, particles and ozone will be provided by Imperial College London in pseudonymised form to CPRD, who will construct the final dataset. The pollutant concentrations have been derived from models based upon data from satellites, land utilization data and monitoring stations. Noise levels have also been derived from statistical models based upon measurements and building topology. The pollution data is provided for all postcodes (postcode centroid) in England for 2010 and are also extrapolated to other years. The data will not be used for commercial purposes, not provided in record level form to any third party and not used for direct marketing.

Expected Benefits:

The expected benefit from this study will be an improved understanding of the nature of the relationships between pollutants and health at low levels of air pollution currently prevalent in the UK. Air pollution, particularly nitrogen dioxide and particles emitted in diesel exhaust, continues to be of concern to government agencies, health organisations, environmental groups and the public. In 2009 the UK Committee on the Medical Effects of Air Pollutants concluded that the available evidence supported a causal association between long-term exposure to particulate air pollution, represented by PM2.5, and mortality. A recent assessment of the consequence of life long exposure to air pollution by the Royal Colleges highlighted the dangerous impact on the nation’s health. Clarification of the nature of the relationships at relatively low concentrations will enable the burden of air pollution at current levels and the impact of any policy scenarios to be evaluated more accurately hence leading to appropriate, cost effective, pollution abatement strategies leading to improved protection of human health and the environment. The outputs from this study will provide evidence for the association between air pollution and health. These results will be incorporated into evidential assessments by national and international bodies such as the UK Committee on the Medical Effects of Air Pollutants (COMEAP), the WHO and the US Environmental Protection Agency. Such assessment are used in setting guideline and limit values for air pollution and provide inputs to cost benefit modelling exercises such as Defra’s current assessment of mitigation strategies for the UK to meet NO2 limit values as directed by the European Commission and confirmed UK courts after challenge by ClientEarth. The outputs will be hazard ratios and confidence intervals for a range of diseases. These measures are incorporated into systematic reviews and meta-analysis undertaken by governments/health organisations. The large English cohort will contribute data to these reviews as well as provide specific evidence for a UK population. A recent example of how these data feed into evidential reviews and into policy and public health benefits is the recently published NO2 review by COMEAP. The summary coefficient for nitrogen dioxide (NO2) and mortality from the review was used by Defra in their cost-benefit analysis to determine strategies to achieve mandatory reductions in concentrations of NO2. The HRs were used to quantify reduction in years of life lost which translates into monetary benefits. The benefits of the outputs from this study will be improved information for the characterisation of the effects of air pollution on health in the UK. As described, the outputs feed into a process that lead to the formulation of air pollution control strategies that will reduce the risks of long-term exposure to air pollution in the general population. The outputs from this research will be disseminated via conference presentations and publications in the peer review literature. Publication in open-access journals enables the results to reach the widest audience world-wide and ensures the results are included in literature searches as part of systematic reviews. The outputs will provide coefficients for input into cost benefit models in order to formulate appropriate mitigation strategies to reduce air pollution emissions. Examples might include controls on engine emissions, traffic volumes or low emission zones. For example, such plans are detailed in Defra’s consultation for reducing NO2 concentrations. Air pollution exposure is ubiquitous. The Royal Colleges recently assessed the lifelong burden of air pollution exposure. They estimate that 40,000 deaths per annum were attributed to long-term exposure to outdoor air pollution. The benefits will be achieved by the data controller and third parties as described above. The outputs are an important input to evidential reviews and cost benefit analysis undertaken by Government departments, Health organisations and academics. The benefit will be measured using years of life lost (for mortality) and the attributable number of deaths. The health effects of air pollution are routinely monitored (by COMEAP for example) and reviews routinely undertaken. WHO is currently undertaking a review of the evidence in support of its revision of the air pollution guidelines. The US EPA also regularly updates its assessments. Depending upon the findings from this study these organisations may consider updating their recommendations or they will include them in their next planned assessments.

Outputs:

The outputs from the analyses comprising summary statistics and hazard ratios and associated 95% confidence intervals will be included in a report to the study sponsors (Health Effects Institute). The findings will be published in specialist peer reviewed epidemiological journals to be decided at the end of the study. The findings from the study will be presented at the first International Society for Environmental Epidemiology meeting and the first HEI annual review meeting after completion of the study. Publication in an HEI report and in peer review journals will enable the findings from the study to be included in evidential reviews by organisations and Governmental agencies such as the UK Committee on Air Pollution, World Health Organisation and the US Environmental Protection Agency. These evidential reviews provide the scientific basis for advice to Government Departments in cost/benefit calculations e.g. the recent Air Quality Strategy published by Defra. Imperial College London and CPRD will also disseminate the study findings on their websites and internal newsletters/ publication. No individual level data will be included in any reports, journal publications or conference abstracts/presentations/posters. The target date for the production of the output is 30/06/2019. For the pathways of dissemination of the outputs there will be presentations at scientific conferences: the annual HEI conference and the International Society for Environmental Epidemiology both of which are open to stakeholders and the public. The output will also be published in peer reviewed open access journal papers. No specific public / patient engagement activities are currently planned but suitable routes of dissemination will be considered and put in place. All outputs will be restricted to aggregate data with small numbers suppressed in line with the HES Analysis Guide.

Processing:

The only Identifier required for the linkage of CPRD primacy care data to Imperial College London pollution data is patient postcode; this is not needed for the research study itself but will be sent by the GP system providers and Imperial College London to NHS Digital to generate the bridging file. The GP system providers will not submit any other identifiers to NHS Digital. This bridging file will contain the nominal codes that has successfully been linked to the CPRD primary care data and the patient pseudonyms, which will be used by CPRD to create a linked pseudonymised dataset The final dataset that will be sent to St. George’s, University of London will be pseudonymised data. This data linkage requires CPRD and Imperial College London identifier –postcode– to permit accurate linkage of CPRD’s primary care health records and Imperial College London’s air pollution datasets for all English practices into a new linked dataset for the research study. Imperial College London provide only environmental data for all English postcodes to NHS Digital. This data is generated from annual average concentrations of air pollutants (particular matter, nitrogen dioxide, ozone and black carbon) were modelled using a state-of-the-art European model which combines information from satellite data and chemical transport models with information on the road network, land use and monitored pollutant concentrations These models were developed and published by the Swiss Tropical and Public Health Institute, Basel, as part of this project. These air pollution maps (100m x 100m resolution) were sent to Imperial College London. A second set of modelled pollutant concentrations were produced by IC using UK specific land use data and model specification. Imperial College London then linked each English postcode centroid (x,y coordinate) to these air pollution maps using a geographic information system. Annual estimates of noise exposures will be assigned to English postcode centroids using a version of the CNOSSOS-EU model developed by Imperial College London. Imperial College London and St Georges University London do not know which postcodes contain patient data held in CPRD. No clinical data from the GP system providers is sent to NHS Digital, and at no stage do CPRD, Imperial College London or St George’s, University of London receive any patient identifiers. Personal identifiers including name, date of birth, postcode and NHS number are removed at source by the GP system providers and replaced by pseudonymised system patient and practice identifiers (GP System Practice and Patient ID) prior to transfer of data to CPRD. CPRD then replaces the original GP System Practice and Patient ID with a CPRD patient pseudonym (CPRD Patient ID). Identifiable data fields for CPRD patients flow directly from GP system providers to NHS Digital. The legal basis for the lawful flow of identifiable data is primarily CPRD’s s251 support (ref: ECC 5-05 (a)/2012). This support permits “GP practices and specified others (according to the approved ‘Master Dataset’ list) to [1] transfer confidential patient information to NHS Digital; [2] NHS Digital to receive identifiers, undertake linkages and provide CPRD a de-identified dataset.” Under the described legal basis, the following steps explained below will be used to transfer, store and process data as part of this linkage. Step 1. Transfer of identifiers (transfer of data from Imperial College London to NHS Digital, and from GP system providers to NHS Digital, will be via secure file transfer protocol (SFTP) servers which are encrypted to ensure security of electronic data in transit). Step 1a. Imperial College London will securely provide to NHS Digital as the Trusted Third Party (TTP) for linkages, a file containing all English postcodes held in Imperial College London, since at this stage, it is not clear which postcodes will link to the CPRD patient data and be relevant to the study. The file consists of one data field (English Postcode) and the Imperial College London nominal pollution code (a pseudonym attached to each English postcode within the Imperial College London dataset and used to link the pollution data). The nominal code sent by Imperial College London is for the creation of the bridging file sent to CPRD, which is explained in step 2. Step 1b. In parallel, CPRD requests that participating GP system providers securely provide to NHS Digital, a file containing information on all patients held in CPRD. The file consists of the one identifiable data field (Postcode) and the GP System Practice Key and Patient Key (pseudonymised data fields assigned to each unique individual in CPRD). Step 2. Creation and provision of bridging file by the Trusted Third Party NHS Digital match the identifiable data field (Postcode) received from GP system providers to the English postcode and nominal pollution code file from Imperial College London. NHS Digital supply CPRD with a bridging file containing a pseudonymised patient identifier (Study ID), Imperial College London nominal pollution code and the GP System Practice Key and Patient Key for each linked postcode that can be used to merge the primary care dataset with the second Imperial College London dataset containing nominal pollution codes and postcodes. Additionally, NHS Digital generate and supply a study specific pseudonymised patient identifier for each linked patient (Study ID). NHS Digital securely releases the bridging file via secure file transfer protocol (SFTP) to CPRD. Once the bridging file has been supplied to CPRD, and CPRD confirm the linkage as valid, NHS Digital will delete the file supplied by Imperial College London (Step 1a). Imperial College London Data that has not been matched from the primary care dataset with the Imperial College London dataset will be deleted by NHS Digital. Data supplied by the GP system providers to NHS Digital (Step 1b) is utilised for CPRD routine linkage and will be retained. It is emphasised that following data linkage by NHS Digital using patient identifiable fields, there is no further flow or use of identifiable data at any point past this stage. Step 3. Extraction and provision to CPRD Imperial College London will send to CPRD a file containing all the nominal pollution code with the pollution data securely by SFTP. Step 4. Creation of study dataset by CPRD CPRD use the GP System Practice and Patient IDs in the bridging file (supplied by NHS Digital and initially provided by the GP System Providers) to generate the associated CPRD patient pseudonym (CPRD Record Key) using internal lookup files. A patient cohort file containing CPRD Record Key is combined with the CPRD Patient Key generated from the bridging file received from NHS Digital (Step2) to generate a list of Imperial College London nominal pollution codes corresponding to each CPRD patient in the cohort. The bridging file supplied to CPRD in Step 2 containing the nominal pollution code, will be used to link the pollution data in the file sent by Imperial College London in step 3. The air pollution data that has not been linked will be discarded by CPRD. CPRD creates an anonymised study dataset for release to researchers containing Imperial College London nominal pollution codes for all CPRD patients in the cohort. CPRD Record Key and Imperial College London nominal pollution code are not included in the dataset. Step 5. Release of study dataset to St George’s, University of London CPRD ensure that the research applicants (St George’s, University of London) have signed a bespoke Dataset Agreement, previously agreed with Imperial College London. This will include any additional terms and conditions required by Imperial College London, before any release of the linked data outside of CPRD. The study dataset is then sent securely to St George’s, University of London by CPRD, using SFTP. St George’s, University of London researchers use the study dataset under the Dataset Agreement to produce research outcomes as approved under their Independent Scientific Advisory Committee, ISAC, protocol. CPRD retain a copy of the study dataset for archiving purposes once the data has been successfully transferred to and verified by St George’s, University of London. CPRD also deletes Imperial College London data not included in the study after preparation of the dataset has been undertaken (Step 4). The resulting dataset will be accessible solely by employees of St George’s University of London who will process and analyse the data to obtain findings for research outcomes. The data will be held on St George’s, University of London servers in the UK and will not be stored elsewhere at any time. The request for this particular type of data linkage has been initiated by St George's University of London, after this initial linkage and dissemination, the data will also be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process. The environmental data provided by Imperial College London will be linked to the wider CPRD database and will be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process. All organisations party to this agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract ie: employees, agents and contractors of the Data Recipient who may have access to that data).


Project 2 — DARS-NIC-108098-D2L3V

Opt outs honoured: No - data flow is not identifiable (Section 251 NHS Act 2006)

Sensitive: Non Sensitive

When: 2020/03 — 2020/03.

Repeats: One-Off

Legal basis: Health and Social Care Act 2012 – s261(7)

Categories: Anonymised - ICO code compliant

Datasets:

  • MRIS - Bespoke

Objectives:

The data controller is Department of Health and Social Care, with the Secretary of State for Health and Social Care (acting as part of the Crown), acting through the Clinical Practice Research Datalink centre (hereinafter referred to as CPRD) within the Medicines and Healthcare Products Regulatory Agency. This is the same arrangement for the data processor although it is CPRD who actually process the data but are not listed as data processors. The data processor is Department of Health and Social Care. The Clinical Practice Research Data-linkage (CPRD) is a centre of the Medicines and Healthcare products Regulatory Agency (MHRA), an executive agency of the Department of Health & Social Care (DHSC). The MHRA regulates medicines, medical devices and blood components for transfusion in the UK and the MHRA act as the Executive agency. CPRD is the UK’s pre-eminent research service, providing access to primary care data (that has been anonymised) linked by NHS Digital to other similarly pseudonymised health data. This data is provided by NHS Digital and others for the purposes of public health research including the monitoring of drug safety. All such data is linked (in its identifiable form) by NHS Digital only. It is jointly funded by the MHRA and the National Institute for Health Research (NIHR). CPRD’s aims are to support vital public health research and to inform advances in patient safety in the delivery of patient care pathways. These depend on access to accurate, real-time representative patient data to produce reliable evidence based clinical and drug safety guidance. The legal bases for processing the data provided by NHS Digital are: • Gathering of GP patient data and collation with other data sets to produce data-sets that have been anonymised: medical research under Article 9(2)(j); drug and device safety under Article 9(2)(i) of the General Data Protection Regulation CPRD services are designed to maximise the way de-identified NHS clinical data can be used to improve and safeguard public health. For more than 20 years data provided by CPRD have been used in a range of drug safety and epidemiological studies that have impacted on health care, and resulted in over 1700 peer-reviewed publications. In addition to supporting high-quality observational research, CPRD is developing world-leading services based on using real world data to support clinical trials and intervention studies. The intention is to continue to link CPRD primary care data to NHS Digital’s secondary care and other datasets, as linkage greatly increases the scale, depth, completeness and therefore value of data available for public health research. The outputs of such research based on linked data in turn improve and protect patient care pathways/treatments and provide clinical benefits for the UK, supporting delivery of CPRD’s core objectives. CPRD’s research and data services are based on a database of de-identified longitudinal primary care records contributed by consenting GP practices from the four UK nations, and on the ability to link primary care data to secondary care data (and other data sets), from the NHS, Office of National Statistics (ONS) and Public Health England (PHE). One of CPRD’s main priorities is to increase the number of national data sets that are linked to primary care data and made available on a routine basis to the research community. Such collection and linkages occur under the appropriate permissions (ethical and s251), which have been granted to CPRD by the East Midlands & Derby Research Ethics Committee (REC), and the Health Research Authority (HRA). NHS Digital has been providing secondary and other data for linkage with CPRD primary care data for a number of years. Data linkage is carried out exclusively by NHS Digital as the Trusted Third Party (TTP) for this purpose. Linked data sets currently available include extracts from Civil Registration data; Hospital Episode Statistics (HES), which encompasses Admitted Patient Care, Critical Care, Outpatient and Accident & Emergency data; Patient Reported Outcome Measures (PROMs); Diagnostic Imaging Dataset (DID); Mental Health data; National Cancer Registry; Deprivation data including Townsend Score and Index of Multiple Deprivation. Critical care is supplied as a separate dataset by NHS Digital, but is integrated with Admitted Patient Care. Data can only be used for public health research purposes in research recommended for approval by ISAC for MHRA database research. CPRD make the final decision on access, and ensure compliance with NHS Digital’s requirements within the data sharing agreement, e.g. security of the third party. Access to CPRD data and services will not be permitted in circumstances that may result in loss of public trust or for activities that may undermine the integrity of the CPRD database. This application is to support two separate research projects. Both projects involve linkage of record level data from the same databases: the Clinical Practice Research Datalink (CPRD), and the Midlands and North West Bowel Cancer Screening Hub (Public Health England). Both studies are funded by the National Awareness and Early Diagnosis Initiative (NAEDI) and are administered by Cancer Research UK. No data from either study will be made available to third-parties and no elements of the work will take place outside the UK. The two study projects are presented below. 1. Project 1: An enhanced role for primary care in bowel cancer screening: an observational study investigating primary care use among bowel screening non-responders. Purpose – Despite the efficient provision of bowel cancer screening programmes in the UK, low participation remains a problem, especially in lower socio-economic groups. Primary care professionals can have an important role in increasing participation among non-responders, but little is known about how the non-responders use primary care. The study’s main aim is to explore and describe the utilisation of primary care services by non-responders to bowel cancer screening 25 months after the last invitation to screening, in order to identify opportunities to engage with the non-responders. It also aims to compare responders and non-responders to identify if there are differences in the way they use primary care. The primary research questions are: 1. How frequently do non-responders to bowel cancer screening consult with primary care and what are their main reasons for consultation (diagnoses, symptoms and procedures)? 2. Which professionals are more frequently involved in the care of non-responders? 3. How are the non-responders characterised in terms of socio-demographic characteristics such as age, gender, marital status, ethnicity and deprivation? 4. How frequently do non-responders engage in health-seeking behaviours such as health screening programmes (i.e. cervical cancer/breast cancer) or other preventative activities? Secondary research questions for Project 1 are: 1. Amongst non-responders, are socio-demographic characteristics associated with frequency of attendance to consultations (very low/low frequency attenders versus other attenders)? 2. Amongst non-responders, are lifestyle risk factors for Colorectal Cancer (CRC) associated with frequency of attendance (very low/low frequency attenders versus other attenders)? 3. Are lifestyle risk factors for CRC, multimorbidity and poor health status associated with responder status (non-responders versus responders) to bowel cancer screening? 4. Do the identified patterns of consultation (frequency and main reasons for consultation) vary according to responder status (non-responders versus responders) to bowel cancer screening? Data will be linked between CPRD and the Midlands and North West (NW) Bowel Cancer Screening (BCS) Hub. The study population is composed of patients living in Midlands and North West area who are eligible to bowel cancer screening (aged 60-74); classified as either responders or non-responders. CPRD data is requested for all patients who received an invitation from the Bowel Cancer Screening Programme from Apr 2014 to Apr 2016. This is limited to those in the Midlands and North West. The estimated cohort size for non-responders is 66,275. University of Edinburgh (UoE) will extract data from a 25-month period. Using descriptive statistics, UoE will explore and describe reasons for consultation and patterns of attendance according to the non-responders' socio-demographic characteristics (such as age, gender and deprivation) and calculate consultation rates (taking into account the patients’ age and gender). Using multivariate binary logistic regression, UoE will compare responders and non-responders in order to identify groups in need of more support and information, and examine whether patterns of consultation differ among both groups. A detailed understanding of how bowel screening non-responders use primary care will allow for the identification of optimum opportunities to engage with them. More effective primary care-based strategies can help to improve bowel screening uptake and reduce current disparities. Furthermore, they have the potential to increase the proportion of cancers diagnosed earlier and reduce mortality from the disease in the long-term. 2. Project 2: The influence of a negative Faecal Occult Blood test (FOBt) on the response of screening invitees and healthcare providers to symptoms of colorectal cancer. Purpose – Bowel cancer screening has the potential to significantly reduce deaths from colorectal (bowel) cancer and has been introduced across the UK. However, approximately 40% of cancers will not be detected by the test, and therefore there is a need for awareness of the symptoms of colorectal cancer in the general population, and for primary care to respond effectively to symptomatic patients. Previous work has shown that significant numbers of invitees believe that a one off test confers long term protection from the disease. The aim of this study is to determine whether the pattern of symptom presentation to primary care differs between individuals who have accepted offers of bowel screening and received a negative result, and those who have not yet been invited or declined to take part. The cohort size is 7,800. The primary research questions are: 1. Does the pattern of bowel associated symptom presentation to primary care differ between; individuals who have accepted offers of FOBt screening and received a negative result, those who declined their invitation to take part in screening and those who live in an area where roll-out of the screening programme had yet to commence? 2. Does the pattern of GP referral for Colorectal Cancer (CRC) associated investigations differ between; individuals who have accepted offers of FOBt screening and received a negative result, those who declined their invitation to take part in screening and those who live in an area where roll-out of the screening programme had yet to commence? 3. Does the pattern of bowel associated diagnoses in primary care differ between; individuals who have accepted offers of FOBt screening and received a negative result, those who declined their invitation to take part in screening and those who live in an area where roll-out of the screening programme had yet to commence? Secondary research questions for Project 2 are: 1. Does the pattern of bowel-associated consultations in primary care differ by socio-economic status? 2. Does the pattern of bowel-associated consultations in primary care differ between different ethnic groups? This study will use a linked dataset from CPRD and PHE’s Midlands and North West Programme Hub to investigate patterns of bowel symptom presentation in primary care over a six-month period. The study will also utilize established linkages with Hospital Episode Statistics and civil registration death data to compliment routinely linked CPRD data. HES in relation to identifying bowel related investigations and diagnoses; and civil registration death data to determine date and cause of death for any patient who died during the 6 month follow up. At the completion of this study the University of Edinburgh will have a comprehensive picture of the pattern of response to bowel symptoms amongst invitees to FOBt screening in England and unique insights into how this response is moderated by ethnicity and socioeconomic status. Further, this work has tremendous potential to lead to the better integration of effort of early diagnosis and screening activities in colorectal cancer.

Expected Benefits:

Both studies benefit from collaborators who have important roles in bowel screening provision in England and Scotland, and can influence not only to the clinical community, but also policy makers. Summaries of findings from studies will be prepared and disseminated to the UK Bowel Screening Programmes, Health Psychologists and primary care. Summaries will also be shared with other relevant contacts such as the Scottish Coordinator of Screening Programmes and the study funder (Cancer Research UK). A comprehensive research report for each study will be prepared for the study funder and will also help to inform future discussions with Cancer Screening Programmes. The final reports are expected as soon as results from the studies are available (expected to be in 2019). In order to disseminate results to primary care professionals, policy makers and researchers (and to meet funder requirements); papers from both studies will be published Open Access. The research team will aim for the British Medical Journal, the British Journal of General Practice and the British Journal of Cancer. Presentations at national (such as the National Cancer Research Institute Annual Conference) and international Conferences (such as the Annual Cancer and Primary Care Network (Ca-PRI) Conference) are planned. “Negative FOBt study”: This study, along with the complementary study components already published, will provide crucial information to help determine the potential impact of a negative test result on how patients and GPs respond when presented with symptoms associated with a colorectal cancer diagnosis following a negative screening test result. Screening programmes inevitably miss a proportion of cancers and some cancers will develop between screening rounds. Even with a fully implemented programme, approximately 75% of all colorectal cancers will be diagnosed symptomatically in primary care. This study will provide new and unique insights which will inform on-going initiatives in primary care, in collaboration with the national screening programmes, to essentially promote symptom awareness, encourage prompt help-seeking, timely referral and early diagnosis. Furthermore, it will generate a comprehensive picture of how patients respond to symptoms, and provide insights into which patient characteristics moderate this response. Finally, the study will provide a better understanding of the limitations of colorectal cancer screening tests among screening participants, and will generate benchmark data for further analyses of symptom awareness among patents attending screening with the new faecal immunochemical test (FIT). “Non-responders using primary care study”: Bowel cancer screening programmes can contribute to reducing mortality from the disease, but increased participation is required for this to happen. Current uptake in England is below 60%, and there are substantial challenges in ensuring equitable uptake, especially among invitees with lower socio-economic status, men and ethnic minorities. Evidence shows that a personal recommendation from a GP or other health care professional can increase participation in bowel cancer screening. However, despite the important role that primary care can have in promoting screening uptake, information on the profile of non-responders consulting in primary care is scarce. When primary care strategies (which have been increasing over the years) do not have sufficient information on the patients they are trying to reach, they are missing opportunities to engage with them. A detailed understanding of how non-responders use primary care will allow for the identification of optimum opportunities to engage with patients, especially hard to reach groups who consult in primary care. Study findings will comprehensively describe the profile of patients who require more effective support, information and risk assessment, and will inform target populations for future initiatives aiming to increase informed participation in bowel screening. More effective primary care-based strategies can help to improve bowel screening uptake and reduce current disparities. Furthermore, they have the potential to increase the proportion of cancers diagnosed earlier and reduce mortality from the disease. These wider benefits are expected in the long-term (5-10 years), and should be considered as part of a larger context in which other public health strategies are developed to increase bowel screening uptake; in addition to providing optimum treatment when a cancer is actually diagnosed.

Outputs:

All outputs will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide. Data analysis for both studies will commence as soon as the linked datasets are received and is expected to be finished before the end of 2019. Data will not be used for sales and marketing purposes. Research reports will be prepared for both studies and will be submitted to the funder (CRUK). Reports will be used to inform discussions with NHS Cancer Screening Programmes and NSD Scotland. CRUK will also have a summary of the study results for their website. The final reports are expected as soon as results from the studies are available. Disseminating results to primary care professionals, policy makers and researchers is paramount. All papers will be published Open Access as per the funder’s requirements. Manuscript submission for both studies is expected in 2019. Target Journals from both studies include the British Medical Journal, the British Journal of General Practice and the British Journal of Cancer. Presentations are planned at the National Cancer Research Institute (NCRI) annual meeting and the 11th Cancer and Primary Care Network (Ca-PRI) Conference. Specific Outputs are described separately for Project 1 and Project 2: Project 1: An enhanced role for primary care in bowel cancer screening: an observational study investigating primary care use among bowel screening non-responders Research outputs will fill a gap by examining patterns of health care utilisation in detail, along with the non-responders’ socio-demographic characteristics. The study will investigate attendance to preventative activities (as a proxy for health-seeking behaviour) and socio-demographics as these are associated with higher uptake. It will compare non-responders’ presentation of lifestyle risk factors for CRC according to different frequencies of attendance in order to identify patients who might require more effective support, health promotion and risk assessment than others. By comparing responders and non-responders the study will identify groups in need for more support and information and examine which (if any) patterns are exclusive of non-responders. In order to inform the data analysis protocol for this study, a literature review of challenges in analysing routine datasets was prepared by the research team. The output was a comprehensive report which was presented at the SAPC Conference and at the Dealing with Data Conference at the University of Edinburgh (2014). The study is part of a larger project which has also developed and tested the feasibility of a bowel screening brief intervention in routine practice. Feasibility study results are in press at BMJ Open. Project 2: The influence of a negative Faecal Occult Blood test (FOBt) on the response of screening invitees and healthcare providers to symptoms of colorectal cancer. The output data from this data-linkage study will provide a comprehensive picture of the pattern of response to symptoms suggestive of colorectal cancer, following a negative FOBt result. Data will include the presentation and frequency of both colorectal specific and non-specific symptoms, clinical investigations and GP referrals) among screening participants in England and provide unique insights into how this response is moderated by socioeconomic status. The study is part of a larger project exploring the influence of a negative FOBt test result on response to symptoms of colorectal cancer. Complementary qualitative components of this project have already resulted in one published article and a second article which is currently under review with the journal Health Expectations.

Processing:

A linked HES-primary care dataset already exists and is held by CPRD, with the linked HES data having previously been provided to CPRD under a Data Sharing Agreement (DSA) with NHS Digital (NIC-15625-T8K6L). Patient identifiers required for linkage of CPRD Primary care data to the Midlands and North West Bowel Cancer Screening data are the NHS number, date of birth, gender and postcode; these are not needed for the research study itself but will be sent by the GP system providers to NHS Digital. NHS Digital already hold the Midlands and North West Bowel Cancer Screening data on behalf of Public Health England. The bespoke dataset that will be received by the University of Edinburgh will be pseudonymised data. This bespoke data linkage requires CPRD and Midlands and North West Bowel Cancer Screening patient identifiers – namely date of birth, postcode, NHS number and gender – to permit accurate linkage of CPRD and Midlands and North West Bowel Cancer Screening datasets into a new single dataset for the research study. No clinical data from the GP system providers or PHE is sent to NHS Digital, and at no stage do CPRD or University of Edinburgh receive any patient identifiers. NHS Digital hold the clinical data on behalf of PHE. Personal identifiers including name, date of birth, postcode and NHS number are removed at source by the GP system providers and replaced by pseudonymised system patient and practice identifiers (GP System Practice Key and GP System Patient Key) prior to transfer of data to CPRD. CPRD then replaces the original GP System Practice Key and GP System Patient Key with a CPRD patient pseudonym (CPRD Patient Study ID). Identifiable data fields for CPRD patients flow directly from GP system providers to NHS Digital. The legal support for the lawful flow of identifiable data is primarily CPRD’s s251 support (ref: ECC 5-05 (a)/2012). This support permits “GP practices and specified others (according to the approved ‘Master Dataset’ list) to [1] transfer confidential patient information to NHS Digital; [2] NHS Digital to receive identifiers, undertake linkages and provide the CPRD a de-identified dataset.” CPRD has obtained further clarification from CAG (via a s251 amendment in December 2017) that the PHE bowel cancer screening dataset (Midlands and North West) is part of CPRD’s Master Dataset List, and that CPRD has ongoing CAG approval for linkages to this dataset. CPRD also have Research Ethics Committee (REC) approval (ref: 05/MRE04/87) for the research study and this linkage, to take place. CPRD have a Data Sharing Agreement (DSA) with PHE and this permits CPRD to receive and process BCS pseudonymised patient data. Under the described legal basis, the following steps explained below will be used to transfer, store and process data as part of this linkage. Step 1. Transfer of patient identifiers Step 1a. The Midlands and North West Bowel Cancer Screening (PHE) dataset is held at NHS Digital and not at PHE itself. At the request of CPRD PHE provides instructions to NHS Digital as the Trusted Third Party (TTP) for linkages to use patient identifiers from the Midlands and North West Bowel Cancer Screening to create pseudonymised study IDs required for linkage. This means that the flow of PHE patient identifiers will remain within NHS Digital. Step 1b. In parallel, CPRD requests that participating GP system providers securely provide to NHS Digital a file containing information on all patients held in CPRD. The file consists of the four identifiable data fields (NHS Number, Date of birth, Gender and Postcode) and the GP System Practice Key and GP System Patient Key (pseudonymised data fields assigned to each unique individual in CPRD). Transfer of data from GP system providers to NHS Digital, will be via secure file transfer protocol (SFTP) servers which are encrypted to ensure security of electronic data in transit. Step 2. Creation and provision of bridging file by the Trusted Third Party Step 2a. Bridging file to CPRD NHS Digital match the identifiable data fields held on behalf of PHE and participating GP system providers. NHS Digital supply CPRD with a bridging file containing pseudonymised patient identifiers (The GP System Practice Key and GP System Patient Key) for each linked patient that can be used to merge the primary care dataset with the Midlands and North West BCS dataset. Additionally, NHS Digital generate and supply a Midlands and North West BCS specific pseudonymised patient identifier for each linked patient (Study ID). NHS Digital securely releases the bridging file via secure file transfer protocol (SFTP) to CPRD. The bridging file will be supplied to CPRD, and CPRD will confirm the linkage as valid. Step 2b. Bridging file to Trusted Third Party NHS Digital also releases a second bridging file in parallel containing a Midlands and North West BCS study specific pseudonymised patient identifier for each linked patient (Study ID), to NHS Digital. This is done since the Midlands and North West BCS dataset is held by NHS Digital on behalf of PHE, and not at PHE itself. Data supplied by the GP system providers to NHS Digital (Step 1b) is utilised for CPRD routine linkage and will be retained. It is emphasised that following data linkage by NHS Digital using patient identifiable fields, there is no further flow or use of identifiable data at any point past this stage. Step 3. Extraction of matching Keys by NHS Digital NHS Digital will match the Study ID for the Midland and North West data and extract the required clinical information. The Study ID generated by NHS Digital for the specific study is then matched to the clinical information and extracted. The new file will contain no personal identifiable details in the datasets. The file containing study ID and clinical variables is sent to CPRD via secure transfer. NHS Digital will apply opt outs to the PHE data before it is disseminated from NHS Digital. Step 4. Creation of the linked PHE-CPRD dataset by CPRD Upon receipt of the clinical variable from NHS Digital, CPRD uses the Study ID to match the file containing CPRD GP System Practice Key, GP System Patient Key and the required clinical information at record level. This linked dataset remains at CPRD and is only released to researchers, after further pseudonymisation Step 5. Dataset extract The linked dataset held by CPRD is then used to create two project specific linked datasets, limited to the requested patient cohort and clinical information as approved by Independent Scientific Advisory Committee (ISAC) for each project. This will involve the creation of a file containing a linked BCS (Midland and North West) -CPRD patient dataset extract for ‘Project 1’ as explained in the Purpose section above. For ‘Project 2’, CPRD will repeat the process, additionally adding linked HES and ONS data to the PHE-CPRD patient dataset extract using CPRD IDs, and its existing HES and ONS linked datasets provided by NHS Digital. CPRD will then further pseudonymise the Keys used in the linked dataset extracts’ to further ensure patient data cannot be identified. Step 6. Creation of study dataset by CPRD and release to UoE Prior to release of the linked dataset extract, CPRD ensures the UoE researchers have signed a bespoke Dataset Agreement (inclusive of any additional PHE terms and conditions) which has been previously agreed with PHE. CPRD then transfers, with approval of PHE, and via secure file transfer protocol (SFTP), the two dataset extracts to the University, and confirms safe receipt of this. Project 1 will have pseudonymised data extract containing the Midlands and North West Bowel Cancer Screening data linked to the CPRD primary care data. Project 2 will have pseudonymised data extract containing the Midlands and North West Bowel Cancer Screening data linked to previously linked CPRD primary care data- IMD, HES and mortality data. The IMD, HES and mortality data are part of the established routinely linked dataset which CPRD receive as part of a separate Data Sharing Agreement with NHS Digital. Analysis undertaken The linked datasets received by Edinburgh researchers will not be linked again with any other data by Edinburgh. In order to answer the research questions, both descriptive statistics and multivariate analysis of data using a conditional logistic regression model will be carried out. The data linkage taking place under this application will also be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process. The bowel cancer screening data will be linked to the wider CPRD database and will be available to other researchers subject to a suitable application submitted through CPRD’s ISAC process.