NHS Digital Data Release Register - reformatted

Medicines And Healthcare Products Regulatory Agency (mhra)

Project 1 — DARS-NIC-08477-H7S0Z

Opt outs honoured: Y, N

Sensitive: Sensitive, and Non Sensitive

When: 2016/12 — 2017/08.

Repeats: One-Off, Ongoing

Legal basis: Health and Social Care Act 2012, Section 251 approval is in place for the flow of identifiable data, Section 42(4) of the Statistics and Registration Service Act (2007) as amended by section 287 of the Health and Social Care Act (2012)

Categories: Anonymised - ICO code compliant, Identifiable

Datasets:

  • Mental Health Minimum Data Set
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Critical Care
  • Hospital Episode Statistics Outpatients
  • Hospital Episode Statistics Accident and Emergency
  • Diagnostic Imaging Dataset
  • Patient Reported Outcome Measures (Linkable to HES)
  • Bridge file: Hospital Episode Statistics to Mental Health Minimum Data Set
  • Office for National Statistics Mortality Data
  • Mental Health and Learning Disabilities Data Set

Benefits:

As evidenced by existing studies using linked data to the CPRD primary care database (on an ongoing basis), results derived from publications undertaken by CPRD customers have frequently impacted public health and informed clinical guidelines. Notable examples and relevant publications include the following: A CPRD study on the prevalence and management of atrial fibrillation (AF) in the general population led to initiatives to improve the care of women and older people with AF [1]. The recently published and very important NICE guidance consultation document “Suspected cancer: Recognition of suspected cancer in children, young people and adults” derived its evidence for management, investigation and referral for most of the cancers discussed largely from GPRD and CPRD studies and, for some of them, entirely from CPRD research [2]. In a large cohort of patients seen in general practice with irritable bowel syndrome, the very considerable extent of psychological co-morbidity was accurately described using data from the CPRD [3], and other research in CPRD has highlighted changing diagnostic "fashions" in the recognition of chronic fatigue syndrome [4]. Both of these studies had implications for the management of patients with long-term physical and psychological problems in general practice. An important CPRD study detailed the epidemiology of gastrointestinal (GI) bleeding, leading to modification of NICE guidance on GI bleeding [5], while a series of studies on the complications of coeliac disease and inflammatory bowel disease led to a modification of the risk estimates of these conditions in clinical practice [6-8]. Publications: [1] Majeed A, K Moser, K Carroll (2001) Trends in the prevalence and management of atrial fibrillation in general practice in England and Wales, 1994–1998: analysis of data from the general practice research database. Heart (3) 284 – 288 [2] National Institute for Health and Care Excellence. Suspected cancer (update). Anticipated publication date: May 2015. Suspected cancer: recognition and management of suspected cancer in children, young people and adults (update). http://www.nice.org.uk/guidance/indevelopment/gidcgwave0618. [3] Jones R, Latinovic R, Charlton J, Gulliford M. Physical and psychological co-morbidity in irritable bowel syndrome: a matched cohort study using the General Practice Research Database. Aliment Pharmacol Ther. 2006 Sep1;24(5):879-86. PubMed PMID: 16918893. [4] Gallagher AM, Thomas JM, Hamilton WT, White PD Incidence of fatigue symptoms and diagnoses presenting in UK primary care from 1990 – 2001. JRSM (2004); 97: 571-575. [5] Crooks CJ, West J, Card TR. Comorbidities affect risk of nonvariceal upper gstrointestinal bleeding. Gastroenterology 2013;144:1384-1393, 1393 e1381-1382; quiz e1318-1389. [6] Crooks CJ, Card TR, West J. Defining upper gastrointestinal bleeding from linked primary and secondary care data and the effect on occurrence and 28 day mortality. BMC Health Serv Res 2012;12:392. [7] Cooks CJ, Card TR, West J. Excess long-term mortality following non-variceal upper gastrointestinal bleeding: a population-based cohort study. PLoS Med 2013;10:e1001437. [8] West J, Logan RF, Smith CJ, Hubbard RB, Card TR. Malignancy and mortality in people with coeliac disease: population based cohort study. BMJ 2004;329:716-719. The same principles apply to both UK-based and international customers; through using UK healthcare data provided by CPRD, researchers are expected to produce results of clinical importance to the UK public. Making this data available to a range of international customers opens up the resource to a broader base of talent and expertise that can run data analytics to provide measurable benefits to UK public health and the NHS. Linking DIDs to CPRD GOLD would enable, among others, further validation of diagnostic records and more accurate phenotypic definition of diseases; linking PROMS to CPRD GOLD would enable to introduce quality of life information in the clinical records; linking MHLDDSMD to CPRD GOLD would, among others, be a step towards addressing the major health and social challenges the country faces and responding to national research priorities, including Dementia Challenge sponsored by the Cabinet Office. Third Party (Research) Licence – Risk Mitigation In recognition of the fact that data may still be a small residual risk of re-identification in specific contexts (e.g. where rare patterns or combinations of data items occur in the data and may be recognisable to the recipient of the data), CPRD establishes a legal licence with the end user that obliges them not to re-identify, or to attempt to re-identify a data subject. This is particularly a risk where an organisation that controls identifiable data wishes to link its data with other health care data. Extra licence controls are established in these circumstances.

Outputs:

CPRD customers using these linked data will be producing (on an ongoing basis) research publications in peer-reviewed journals and presentations in scientific conferences. CPRD customers include academic institutions, pharmaceutical companies, governmental centres, and research charities, undertaking medical data research. Trialviz: Trialviz is a feasibility and protocol optimisation tool created by CPRD that generates numbers of eligible patients based on inclusion/exclusion criteria. Access to the tool is currently only restricted to internal researchers. No decision has yet been made regarding whether Trialvis is released to external customers. Outputs are restricted to feasibility numbers and no identifiable or medical information is provided. Per internal policy, Trialviz suppresses in line with the HES analysis guide.

Processing:

CPRD receives and supplies only pseudonymised datasets. Customers may receive more than one data linkage if they are licensed to do so. CPRD will use a licence agreement which mirrors the requirements of HSCIC. The CPRD enables linkage of NHS and other health related data for research projects approved by the Independent Scientific Advisory Committee (ISAC) of the MHRA. CPRD stores no data that has other than an pseudonymised ID as all coded data has its personal identifiers removed before entering the CPRD data domain. Data are only ever made available for medical and health research projects, approved by ISAC, and not for any other purpose. All researchers are made fully aware they must not use data outside of the approved protocol and that an extension study or new use of data requires additional or new ISAC approval and a new legal arrangement. CPRD and ISAC require that research results need to be published in the scientific literature or shared with the regulators. Following protocol approval by ISAC, CPRD will enable research by researchers who can demonstrate they have previous experience by peer review of undertaking the proposed research and who works for an organisation with which CPRD has a legal contract that covers all the obligations expected by the release of a research dataset. Scientific Governance The Independent Scientific Advisory Committee (ISAC) is a non-statutory expert advisory body established in 2006 by the Secretary of State to provide advice on research related requests to access data from the MHRA Yellow Card Scheme and the General Practice Research Database. The ISAC provides expert advisory support for all studies seeking access to data available under the Clinical Practice Research Datalink (CPRD). ISAC consists of an appointed Chair (0.5 FTE), a Senior Scientific Officer and a multi-disciplinary appointed panel. The generalist members of the panel cover statistics, primary care, secondary care, patient and public involvement, drug safety and outcomes. They may be supplemented by specialist members drawn from the communities represented in linked data sets (e.g. cancer and MINAP) Applications to ISAC are screened and categorised for risk according to a high, medium and low risk assessment. Low risk protocols are subject to Chair approval, medium risk are assessed by the Chair to determine whether they are submitted to the panel and high risk protocols, all drug safety studies for instance are subject to committee review. Protocols can be approved, rejected, or can be conditionally approved subject to review and / or addressing any comments made by ISAC. ISAC publishedwill be reinstating publication of meeting minutes shortly and are to includinge summary information describing the nature of the studies reviewed. CPRD controls the use to which its data may be put by licence agreements, which reference ISAC approved protocols. Licence agreements, which are contractually binding in nature, limit the use of data to medical and health research purposes. CPRD will ensure that the use of data will comply with the Care Act and its application by HSCIC. Within this limit, research requires the approval by ISAC of a protocol, which meets stringent standards. Guidance for submission of a protocol is attached. This provides detailed guidance on the requirements of ISAC, and including the expectation that findings of scientific and public health importance are disseminated by publication. Applications will also be streamed according to the disclosure-risk categories agreed in the CPRD Section 251 application. A summary report of applications is shared with the Confidentiality Advisory Group (CAG) as a condition of the former Ethics and Confidentiality Committee (ECC) approval subject to 12 months disclosure reporting and a half year review. As and when new linkages are established, CPRD will work with the data source and Data Controller to establish suitable arrangements for representation on ISAC, or parallel approval mechanisms. This will ensure that suitable expertise in the understanding of specific data sets and their utility is incorporated into the approval process. Information Governance All studies operate under an appropriate legal gateway to enable the processing of identifiable data used for the purpose of linkage by the Trusted Third Party. This may be Section 251 approval (without consent) through the Confidentiality Advisory Group (CAG) of the Health Research Authority or under consent. CPRD has approval from the ECC / CAG for linkage by the trusted third party and residual identifiers that have an agreed research purpose (e.g. Date of Death). CPRD will make further applications to the CAG for any processing that goes outside of the existing permissions, for example in terms of additional linkages. A new dataset will only be linked once CAG approval is obtained. Ethics approval CPRD has an overarching approval provided by East Midlands (Derby) Research Ethics Committee for observational research using primary care and linked data provided by CPRD. Any study that involves interaction or intervention with patients requires a separate ethical review and approval before any data will be provided. Data are collected from consenting practices. On acceptance as a CPRD contributor, an initial data collection of all available historic EHR data is taken from the practice computer. Subsequent data collections are performed on an incremental basis approximately every four to six weeks, transmitted electronically to the CPRD via the secure NHS intranet. The data are verified for integrity and completeness before further processing. If a collection fails these checks a re-collection is requested. The physical and logical separation of identifier data services from the clinical research data services is a fundamental tenet of the CPRD security model that is built around the Health and Social Care Information Centre acting as a Trusted Third Party for linkage and CPRD itself acting as the research data processor. Direct identifiers are streamed separately to the data linkage service (HSCIC) by data sources. This approach will be persisted even where the HSCIC hold the data (e.g. HES). It is important to note that data that has been de-identified (had the direct identifiers removed) CPRD recognise that it can still disclose identity; either where the rarity of information expressed in a record, or through the association of the data in a record with other records accessible to the recipient, or through other knowledge possessed by the recipient. It is recognised that combinations of data, particularly within complex, granular or linked data sets provide a potential for disclosing identity or personal information, especially where a researcher possesses a local clinical data source that they are attempting to supplement for research purposes. CPRD mitigates this risk through the application of legal agreements with researchers that will prevent the use of the data in conjunction with any other data from any other source for the purpose of re-identifying or attempting to re-identify an individual. CPRD also holds the right of audit with research data recipients to establish that relevant terms of use (including the full terms of the data sharing agreement with the research user), security and confidentiality conditions are being adhered to. On request from HSCIC, CPRD will provide to HSCIC the results of any such audit. Physical Security Measures CPRD operates to high level to ensure that when data is transmitted and or stored it is done so in a way that protects the data. All data in CPRD is stored in a “Tier 3” data centre that is compliant with Government standards to operate in a way that meets the full requirements for managing and storing such important data. The measures are always under review and are subject to audit. Security measures include: • Multifactor authentication for access • Monitoring of access • Round the clock security staff presence • Robust firewalls and other access restrictions A back-up store of the data mirrors the above features but in an alternative location to allow for business continuity. Data processing Collection loading takes collections as received, extracted and pre-loaded by CPRD, and links them to a collection so that they can be processed. Stage 1 identifies if there is a viable collection to process. This involves checking for the presence of key and optional data files, and a check of the structure within each of the files to ensure that it is correct. The collection is then archived. Files or fields that are not required in the processing are then stripped away. The resulting data files are archived for merging in subsequent collections. Data from all data collections is combined in order to enable the identification and appropriate processing of updated records. The latest version of each updated records is retained as the current version. The records from all data are removed by referencing a special mandatory collection file which contains logs of records deleted since the last collection. During the next stage the text data in the data files are encoded replacing them with numeric lookups, reducing the size of the database, and rendering it easier to manipulate computationally. Quality assurance analysis then takes place resulting in patient acceptability criteria and practice up to standard dates. Feedback reports are also generated and sent to practices to help them identify problems and encourage better and more standardized recording. The final processing stage reorganises the data into a consistent column order, and sorts by patient identifier, ready for use in the query tools. All CPRD staff are appropriately trained and have the necessary understanding of the governance processes pertaining to relevant laws. They will also be aware that any misuse of data may result in disciplinary procedures and, in the case of a severe breach, dismissal and immediate removal from the premises. Training covering use of data is mandatory for CPRD staff and licence-holders prior to accessing data. Operational staff who are responsible for collection of data and interaction with site staff are precluded from access to data. Data is kept on restricted servers and drives accessible only to appropriately trained research staff. Online access – Primary care data are available online via a secure portal. A purpose-built query tool allows customers to define patient cohorts and an “extract” tool then enables cuts of the data as specified, against a cohort or control group. Flat files – Flat files allow licenced customers the same access as online data but supplied on an encrypted hard drive that is sent by courier. Datasets – The CPRD Data Team will extract datasets for researchers against a query specification. The query and its output content will be agreed with the researcher prior to generation of the data sets. In accordance with guidance from the ICO, CPRD does not permit personal data to be processed outside its own servers, and hence any such data is retained within the EU. De-sensitising Data Where there are potentially identifying characteristics that are indicated by the research, the data is made less specific. For example, where the age of data subjects is relevant to the research, the researchers are normally provided with year of birth rather than exact date of birth. In a study involving children, the month of birth are normally provided to the researcher. Similar processes are in place for geographic data such as where derivations are required from postcode. The standard identifiable geography for the researcher is the region and is restricted to a minimum population of about a million. Pseudonymisation Pseudonymisation is applicable in a number of contexts within CPRD to enable recognition of the fact that different records relate to the same individual whilst not revealing the identity of that individual. Whilst it is primarily applicable to the data subject identity, it may also be applied to other individuals recorded in their records (such as clinicians providing clinical care), organisations (such as hospital trusts or general medical practices) and geospatial identifiers (such as postcode, or grid reference). Primary Pseudonymisation CPRD establishes data subject pseudonymisation through the establishment of a compound pseudonym key that comprises a practice identifier and a patient identifier (within that practice). This compound key is not identifiable within the source GP EHR. Multiple Pseudonym Layers CPRD processes data and makes it available internally to CPRD researchers. In doing so, CPRD replaces the original data source pseudonym(s) with a second layer pseudonym or person ID. This creates multiple layers of separation, such that an adversary would need to translate the CPRD person ID back to the data source pseudonym ID and then gain access to the data source patient index, in order to directly identify a data subject. When linked data is supplied to third parties, the person ID may be replaced by a third layer pseudonym ID that establishes a further layer of separation. Encryption Encryption is used for data in transit between secure locations. This will apply to both identifier data for linkage and clinical / research data. Although the clinical data is pseudonymised, there remains the residual risk of re-identification or the risk of inclusion of disclosive content and data is only intended for processing by authorised recipients. Encryption will mitigate the risk and provide assurance. The general default minimum standard for encryption will be AES 256 using a complex pass-phrase consisting of 12 characters and a mix of upper case, lower case, numeric and special characters.

Objectives:

Clinical Practice Research Datalink (CPRD) is a governmental centre, jointly funded by the Medicines and Healthcare Products Regulatory Agency (MHRA) and the National Institute for Health Research (NIHR), the remit of which is to enable medical research through sharing pseudonymised clinical data. The CPRD in-house teams add value to the data sources through building tools and methodologies, conducting data characterisation studies and facilitating verification and validation services. CPRD wishes to be in position to share with its customers linked pseudonymised data so they can undertake approved research projects looking to improve public health and patient outcomes. Researchers using CPRD data can undertake, among others, drug safety, disease epidemiology, healthcare utilisation and health economics and outcomes studies that often directly inform clinical guidelines or regulatory decisions. An additional incentive is to provide pseudonymised data in the context of clinical drug safety or randomised controlled trials that can help build a more complete picture for the enrolled patients and increase transparency and efficiency of study results and design. These linkages will open the door to a large number of research questions, given that the granularity and detail of this information is not found in primary care data alone. Primary care data, in the main, forms the core of the CPRD service and linkage to other data sets. Each individual practice has the control over the extent of their participation with the CPRD. When the extract is initiated the practice has the choice whether to “consent” to additional linkages - and hence the extract to the Trusted Third Party (TTP). At any time subsequently, they can revoke the consent to linkage, or the consent to the CPRD extract. The GP Electronic Healthcare Records (EHR) registration options allow for selection of an individual patient and for that patient to be flagged as opting out of the CPRD pseudonymised extract. In the event that this option is selected, the patient’s data will not be extracted from the GP EHR for CPRD or for linkage (to the TTP). The Vision system has an inbuilt process for this but CPRD also reviews and respects READ codes that flag patient objections to their data being used for various purposes by not collecting this data. This number is between one and two per cent of the total patient population. CPRD promotes the right of patients to exercise this opt out by the provision of posters and information leaflets to practices. CPRD most recently provided this information between December 2014 and January 2015. All practices sending data for inclusion in CPRD were sent two posters and an initial batch of 30 patient leaflets. More patient leaflets, without a limit on numbers are available to practices on request.


Project 2 — DARS-NIC-15625-T8K6L

Opt outs honoured: Y, N

Sensitive: Non Sensitive, and Sensitive

When: 2017/09 — 2018/02.

Repeats: Ongoing

Legal basis: Section 251 approval is in place for the flow of identifiable data

Categories: Anonymised - ICO code compliant, Identifiable

Datasets:

  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Critical Care
  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Outpatients
  • Diagnostic Imaging Dataset
  • Office for National Statistics Mortality Data (linkable to HES)
  • Patient Reported Outcome Measures (Linkable to HES)
  • Mental Health Minimum Data Set
  • Mental Health Services Data Set
  • Office for National Statistics Mortality Data

Benefits:

Past and existing studies (on an ongoing basis) use linked data with the CPRD primary care database to generate research results. These studies are expected to produce benefits of clinical importance to the UK public, and to be published in peer-reviewed journals and presented at scientific conferences. Some recent examples and other relevant publications resulting from linked data research which resulted in clinical benefits are presented below. Case Study 1: The effectiveness of the influenza vaccine against hospital admissions and mortality in individuals with type 2 diabetes Seasonal influenza accounts for a significant proportion of excess winter mortality. Current policy in the UK and in many countries worldwide recommends annual flu vaccinations for patients with chronic conditions such as diabetes, though evidence to support such policies is limited. Imperial College London recently investigated the effectiveness of the influenza vaccine at reducing cardiovascular and respiratory hospital admissions and mortality in patients with type 2 diabetes. The study used linkages between CPRD GOLD primary care data, Hospital Episode Statistics (HES) and the Office for National Statistics (ONS) mortality data to look at admissions and death in 125,000 patients over a seven-year period. Influenza vaccination was associated with a reduction in the rate of hospital admissions for acute cardiovascular and respiratory disease and a reduction in all-cause mortality across the seven flu seasons. The study has been widely reported within healthcare and mainstream media and supports current flu vaccination initiatives in the UK and beyond. Reference 1: Vamos EP et al. Effectiveness of the influenza vaccine in preventing admission to hospital and death in people with type 2 diabetes. CMAJ. 2016 Oct 4;188(14):E342-E351. Case Study 2: Risk associated with the prescription of long-acting β2-agonists (LABA), short-acting β2-agonists (SABA) or inhaled corticosteroids (ICS) for asthma in primary care. Omalizumab is a recent antibody-based treatment developed to help control moderate to severe allergic asthma, when symptom control with inhaled corticosteroids (ICS) is inadequate. ICS are frequently prescribed alongside long-acting β2-agonists (LABA). A 2010 study using CPRD data (then GPRD) linked with Hospital Episode Statistics investigated the risk of asthma-related death and hospitalisation among patients on ICS or LABA therapy. The study was important to establish the relative risk across commonly-prescribed asthma treatments and concluded that LABA exposure was not associated with an increased risk for all-cause mortality. This study was subsequently incorporated into NICE guidelines released in 2013 outlining evidence-based recommendations for omalizumab use in patients with severe persistent asthma. Reference 2: de Vries F, Setakis E, Zhang B, van Staa TP. Long-acting {beta}2-agonists in adult asthma and the pattern of risk of death and severe asthma outcomes: a study using the GPRD. Eur Respir J. 2010 Sep;36(3):494-502. Additional references describing health benefits of CPRD and linked data. Example Reference 3: The CPRD and the RCGP: building on research success by enhancing benefits for patients and practices. Antonis A Kousoulis, Imran Rafi, Chair, and Simon de Lusignan Br J Gen Pract. 2015 Feb; 65(631): 54–55. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4325440/?tool=pmcentrez Example Reference 4: Quality Improvement's greatest hits of 2016, Hannah Price, Head of Quality Improvement http://www.rcgp.org.uk/clinical-and-research/clinical-news/quality-improvements-greatest-hits-of-2016.aspx RCGP and Clinical Practice Research Datalink (CPRD) have joined forces to produce innovative data reports focusing on prescribing and patient safety, which enable benchmarking and case-finding. Over 100 practices from all four nations of the UK have participated in the successful pilot stage of the projects. This phase is now coming to an end and the reports will be rolled out to all practices in the CPRD network during 2017. Example Reference 5: A recently published systematic review (Oyinlola et al 2016) identified 43 CPRD studies that have been used in 25 medical guidance documents. The reviewers found that use of data from the CPRD to inform guidelines has increased in recent years and noted the importance of linking data to extend research to medical conditions that are treated in multiple settings (e.g. primary and secondary care). Reference: Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC Health Serv Res. 2016 Jul 26;16:299. Example Reference 6: A review of patients with learning disabilities (LD) at the Winterbourne View private hospital was established for all aspects of care for this patient group. This study used three years of CPRD data alongside HES APC data to describe the level of GP prescribing of psychotropic medication to patients with LD, and explored whether a relevant diagnostic indication was recorded. The results of the study led NHS England to promise rapid and sustained action to tackle over-prescribing, and an urgent letter sent to professionals to urge they review their own prescribing. Reference: Glover, G, Williams R. 'Prescribing of psychotropic drugs to people with learning disabilities and/or autism by general practitioners in England'. Public Health England. June 2015.

Outputs:

CPRD customers using linked data products will be producing (on an on-going basis) research publications in peer-reviewed journals and presentations at scientific conferences. CPRD customers include academic institutions, pharmaceutical companies, Governmental centres and research charities. These all undertake medical and health data research, which may result in formal publications. All data included in such outputs by CPRD customers will be aggregated, small numbers suppressed in line with the HES Analysis Guide (or dataset specific suppression controls). A selection of recent publications resulting from use of CPRD linked data are presented below. Moss S, Melia J, Sutton J, Mathews C, Kirby M. (2016) ‘Prostate-specific antigen testing rates and referral patterns from general practice data in England.’ Int J Clin Pract. 2016 Apr;70(4):312-8. doi: 10.1111/ijcp.12784. Epub 2016 Mar 14. William Hollingworth, (Professor), Mousumi Biswas, Rachel L Maishman, Mark J Dayer, Theresa McDonagh, Sarah Purdya, Barnaby C Reeves, Chris A Rogers, Rachael Williams, Maria Pufulete. (2016) ‘The healthcare costs of heart failure during the last five years of life: A retrospective cohort study’ International Journal of Cardiology, Volume 224, 1 December 2016, Pages 132–138 Laurence Baril, Dominique Rosillon, Corinne Willame, Maria Genalin Angelo, Julia Zima, Judith H. van den Bosch, Tjeerd Van Staa, Rachael Boggon, Eveline M. Bunge, Sonia Hernandez-Diaz, Christina D. Chambers. (2015) ‘Risk of spontaneous abortion and other pregnancy outcomes in 15–25 year old women exposed to human papillomavirus-16/18 AS04-adjuvanted vaccine in the United Kingdom’ Vaccine, Vol 33, Issue 48, 27 November 2015, Pages 6884–6891 Taylor S, Taylor RJ, Lustig RL, Schuck-Paim C, Haguinet F, Webb DJ, Logie J, Matias G, Fleming DM (2016). ‘Modelling estimates of the burden of respiratory syncytial virus infection in children in the UK.’ BMJ Open. (2016) Jun 2;6(6):e009337. doi: 10.1136/bmjopen-2015-009337. Wing K, Bhaskaran K, Smeeth L, van Staa TP, Klungel OH, Reynolds RF, Douglas I (2016). ‘Optimising case detection within UK electronic health records: use of multiple linked databases for detecting liver injury.’ BMJ Open. 2016 Sep 2;6(9):e012102. doi: 10.1136/bmjopen-2016-012102. Alexandre L, Clark AB, Bhutta HY, Chan SS, Lewis MP, Hart AR. (2016) ‘Association Between Statin Use After Diagnosis of Esophageal Cancer and Survival: A Population-Based Cohort Study.’ Gastroenterology. 2016 Apr;150(4):854-65.e1; quiz e16-7. doi: 10.1053/j.gastro.2015.12.039. Epub 2016 Jan 9.

Processing:

CPRD has established agreements with General Practices and agreed contracts with their data processors, the GP clinical IT system providers, enabling the extraction of agreed data from the primary care electronic health record (EHR). Protecting patient confidentiality is paramount to CPRD. A number of processes and procedures are in place to safeguard the identity and confidentiality of patient data received and supplied by CPRD. An overview of these is presented below, including minimised dataset extraction, data transformation, strong and multiple pseudonymisation, and governance and scrutiny on approvals to use linked data. The CPRD Policy for Managing Anonymisation and the Risk of Identification in Observational Research sets out the management processes employed to ensure that CPRD appropriately anonymises patient data for observational research purposes, and complies with the Information Commissioner’s Office (ICO) Code on Anonymisation and with Office of National Statistics (ONS) requirements on use of death registration data. 1) Data Collection Data collected by CPRD includes all coded patient primary care data, including gender, year-of-birth, and year-and-month-of-birth for patients aged 16 and under. CPRD does not receive patient name, address, full date of birth, NHS Number or free text medical notes. In order to enable the linking of primary care records to other health related data, GP EHR suppliers provide certain patient identifiers directly to NHS Digital. These are: NHS Number, full date of birth, post code and gender. CPRD does receive gender but does not receive any of the other identifiers. The Trusted Third Party (NHS Digital) provides the linkage service for CPRD. Data collections are received by CPRD securely through an N3 link. Data arrives as a series of incremental collections of data from practices that have agreed to share data with CPRD. Data collections, once received, are checked for content (data structure and format), completeness (presence of key and optional data files) and continuity. The collection is then archived. Details of each data collection are logged to an administrative database, and data is made available for processing. Database creation or a build process is undertaken on a monthly basis by taking a snapshot of the fully processed data and organising it into a structure which enables tools to query and extract the data for use in observation and interventional research studies. CPRD retains the data collected up to the point that a GP Practice withdraws from participation. This is to ensure that CPRD can create (if needed) datasets for (eg) validation of previous research, or for longitudinal studies. Patient opt-outs remain respected from the point of notification to CPRD. 2) Data transformation CPRD does not release the same linked data to external researchers that it receives from NHS Digital. The changes made in between data receipt and release are termed ‘data transformation’. This is done to protect patient confidentiality, and also to better facilitate relevant research. Transformation involves removing the provider codes provided by NHS Digital. Data provided by CPRD is matched to the former Strategic Health Authority boundaries. It is based on matching the address of the GP Practice to the SHA. This ‘blurs’ the link between hospital activity records and other potential identifiers collected and provided (Gender, Year of Birth, Date of Death and Ethnicity). For example, transformation of the CPRD linked HES Admitted Patient Care (APC) data involves: (i) The encrypted_”HESID” id field provided to CPRD by NHS Digital is not released to customers. CPRD creates a pseudonym linked to a unique patient activity record in the HES data. (ii) Encoding of the record level identifier (epikey). The epikey variable has been encoded by the CPRD to minimise the risk of breaching licensing conditions through linkage of these data to other HES data sources containing patient identifiable information. The epikey is encoded with a new key each time data is processed, so that the epikey for the same record differs in every release of CPRD linked HES APC data. This prevents different researchers from linking patients from the same dataset, or from comparison with older release versions of the data. (iii) Collating data across years, formatting date and diagnosis fields, and dropping fields (mainly provider and geographical based) from standard release of the data. The episode-level data files received by NHS Digital are transformed into a normalised data structure containing the following tables: 1. Hospitalisations 2. Episodes 3. Diagnosis 4. Procedures 5. Augmented Care 6. Critical Care 7. Maternity 8. Health Resource Group 3) Pseudonymisation process CPRD has agreed pseudonymisation processes with each GP EHR system provider as well as the Trusted Third Party used for data linkage. The overarching process for patient data pseudonymisation comprises of the following stages to protect patient confidentiality at all times: (i) GP system provider – the provider replaces the patient identifiers (NHS number) in each patient record with a system practice ID and system patient ID before its secure transfer to CPRD. (ii) CPRD – on collection of the patient data, CPRD replaces the original data source patient and practice ID from the GP system provider with a CPRD patient and practice pseudonym. (iii) Data linkage – where this is undertaken by the Trusted Third Party (using patient identifiers sent directly from GPs), all linked patient record data are anonymised by the TTP before release to CPRD. Similarly, where cancer registry data is received by CPRD from Public Health England for linkage, PHE anonymise patient data before release to CPRD. (iv) Data release – the linked data is cut by CPRD to minimise data ultimately released to third party researchers, with the linked patient ID replaced again by a further patient ID at release establishing yet another layer of separation. Record level identifiers (such as epikey, attendkey, aekey) are additionally encoded such that the record level pseudonym differs in every release of the linked data. Combined with the wider processes and procedures noted in this section, the above pseudonymisation process precludes users of linked data from identifying patients from the data provided. 4) Data use Access and use of the data is controlled. With the exception of interventional and clinical studies (which require separate Health Research Authority approval), researchers must gain approval for their study protocol from the Independent Scientific Advisory Committee for MHRA Database Research (ISAC). Approved applications to ISAC are published on the CPRD website https://www.cprd.com/ISAC/datause.asp. CPRD may generate aggregate level linked data without ISAC approval to inform feasibility and design of external research (observational and interventional/clinical) and for assessment of ISAC protocols. CPRD will undertake such assessments on behalf of external researchers with no release of record level linked data permitted outside of CPRD. ISAC therefore plays a key data governance role. Approval from ISAC is required if access to anonymised patient level data is requested for requested for observational research and there is an intention to publish results, or where the study depends on access to primary care data linked to other health related data. ISAC's role is to determine whether a research proposal is of public health value, will be conducted by researchers with the appropriate level of expertise, highlight if any ethical or confidentiality issues may arise in the proposed research, and to consider the scientific merit of the proposed methods and overall study. 5) Data release Release of patient level linked data to third party researchers will only occur after: a) All required approvals (including ISAC approval) have been obtained; b) Data is determined to be anonymised as per CPRD’s Policy for Managing Anonymisation and the Risk of Identification in Observational Research; c) Additional requirements on anonymisation relating to ONS death registration data have been met, with agreement from ONS (see below); d) Researchers are provided with robust contracts defining terms of use relating to secure access, retention and destruction of the data; and e) Access to data provided by CPRD which is sub-licensed having been provided by NHS Digital, the Office for National Statistics or by any other Data Controller or Custodian, is done so under terms compatible with the terms under which data is provided to CPRD. With regard to release of ONS death registration data: • ONS death data provided by NHS Digital is stored separately by CPRD and interrogated on a case by case basis to assess the scientific value of research applications; • Sub-national geographic data are not provided to researchers without additional review and approvals from ISAC and where relevant, HRA CAG; • As standard, CPRD match each GP practice postcode to a larger geographical area aligned with the historical NHS Strategic Health Authority boundaries, ensuring an underlying population size of at least 2 million persons. The GP practice post code, the hospital or other institutional identifier are not released; • The ISAC review includes a risk assessment of patient re-identification, and if appropriate research applicants are required to outline risk mitigation plans; • CPRD’s Policy for Managing Anonymisation and the Risk of Identification in Observational Research sets out CPRD’s policy for the release for publication of data relating to small cell counts; • Restrictions on the number of stratified analyses are imposed in the case of research proposals investigating rare diseases or treatments to minimise the risk of re-identification; • ISAC approvals only allow exact ONS dates of death for use in calculating the time to death from a given event of interest (for e.g. a particular diagnosis) for the purpose of survival analyses and where there is a clear benefit to public health from the proposed research; and • Researchers are also contractually bound to maintain patient anonymity and prevent inadvertent re-identification of patients In accordance with guidance from the Information Commissioner’s Office (ICO), CPRD does not permit personal data to be processed outside its own servers, and hence any such data is retained within the EU. 6) Data Access Management CPRD processes patient data and makes it available internally to CPRD researchers. To control third-party access to linked data and minimise data released to third parties, the CPRD Observational Research Team will extract datasets for researchers against a query specification or primary care data defined cohort. The query and its output content will be agreed with the researcher prior to generation of the data sets. This is the only process by which applicants to CPRD may access linked data. Hewlett Packard/Sungard are captured as data storage addresses as for the purposes of this application, Sungard is considered to be the initial back-up and recovery, Hewlett Packard are the 'back-up to the back-up'. They are not involved in processing of the data in any way (Sungard provide a facilities management and site management service). CPRD have confirmed that neither HP nor Sungard have access to the server (neither administrative nor user rights). 7) Information Security Measures CPRD is part of a wider Government agency (the MHRA) and conforms to the 10 National Data Guardian data security standards as well as to NHS Digital requirements. The MHRA meets NHS Information Governance Toolkit standards on information security, and details on standards and arrangements are set out in CPRD’s approved System Level Security Policy (SLSP). CPRD operates to a high level to ensure that when data is transmitted and or stored it is done so in a way that protects the data. All data in CPRD is stored in a “Tier 3” data centre that is compliant with Government standards to operate in a way that meets the full requirements for managing and storing such important data. The measures are always under review and are subject to audit. Security measures include: • Multifactor authentication for access • Monitoring of access • Round the clock security staff presence • Robust firewalls and other access restrictions A back-up store of the data (provided by named data processors) mirrors the above features but in an alternative location to allow for business continuity. 8) Data destruction and disposal Data destruction standards (currently NHS Digital ‘Destruction and Disposal of Sensitive Data’ guidelines v3.2) will be met through planned implementation in MHRA of a Blancco LUN Eraser tool, to guarantee that sensitive data is properly erased and sanitized securely and permanently. This tool ensures compliance with industry standards and regulations, including PCI DSS, HIPAA, SOX, ISO 27001 and the EU General Data Protection Regulation, and the tool will be in place by August 2017. 9) Encryption Encryption is used for data in transit between secure locations. This will apply to both identifier data for linkage and clinical / research data. Although the clinical data is pseudonymised, there remains the residual risk of re-identification or the risk of inclusion of disclosive content and data is only intended for processing by authorised recipients. Encryption mitigates the risk and provides assurance. The default minimum standard for encryption will be AES 256 using a complex pass-phrase consisting of 12 characters and a mix of upper case, lower case, numeric and special characters. 10) Training All CPRD staff and licensed data users are appropriately trained and have the necessary understanding of the governance processes pertaining to relevant laws. They will also be aware that any misuse of data may result in disciplinary procedures and, in the case of a severe breach, dismissal and immediate removal from the premises. Training covering use of data is mandatory for CPRD staff and licence-holders prior to accessing data. CPRD staff who are responsible for the collection of data and interaction with site staff are precluded from access to data. Data is kept on restricted servers and drives accessible only to appropriately trained research staff. NHS Digital permits CPRD sub-licensees to share data with third parties subject to the third parties collaborating on the same research as the sub-licensee, and subject to the terms, checks and controls carried out by CPRD in relation to sub-licences. Details of such licences will be published and shared with NHS Digital.

Objectives:

CPRD is the UK’s pre-eminent research service, providing access to anonymised (in line with the ICO code of anonymisation) primary care data linked by NHS Digital to other similarly anonymised health data provided by NHS Digital and others for the purposes of public health research including the monitoring of drug safety. All such data is linked (in its identifiable form) by NHS Digital only. It is jointly funded by the MHRA and the National Institute for Health Research (NIHR). CPRD’s aims are to support vital public health research and to inform advances in patient safety in the delivery of patient care pathways. These depend on access to accurate, real-time representative patient data to produce reliable evidence-based clinical and drug safety guidance. CPRD services are designed to maximise the way anonymised NHS clinical data can be used to improve and safeguard public health. For more than 20 years data provided by CPRD have been used in a range of drug safety and epidemiological studies that have impacted on health care, and resulted in over 1700 peer-reviewed publications. In addition to supporting high-quality observational research, CPRD is developing world-leading services based on using real world data to support clinical trials and intervention studies. The intention is to continue to link anonymised CPRD primary care data to NHS Digital’s secondary care and other datasets, as linkage greatly increases the scale, depth, completeness and therefore value of data available for public health research. The outputs of such research based on linked data in turn improve and protect patient care pathways/treatments and provide clinical benefits for the UK, supporting delivery of CPRD’s core objectives. CPRD’s research and data services are based on a database of anonymised longitudinal primary care records contributed by consenting GP practices from the four UK nations, and on the ability to link primary care data to secondary care data (and other data sets), from the NHS, Office of National Statistics (ONS) and Public Health England (PHE). One of CPRD’s main priorities is to increase the number of national data sets that are linked to primary care data and made available on a routine basis to the research community. Such collection and linkages occur under the appropriate permissions (ethical and s251), which have been granted to CPRD by the East Midlands – Derby Research Ethics Committee (REC), and the Health Research Authority (HRA). NHS Digital has been providing secondary and other data for linkage with CPRD primary care data for a number of years. Data linkage is carried out exclusively by NHS Digital as the Trusted Third Party (TTP) for this purpose. Linked data sets currently available include extracts from ONS Death Registration data; Hospital Episode Statistics (HES), which encompasses Admitted Patient Care, Critical Care, Outpatient and Accident & Emergency data; Patient Reported Outcome Measures (PROMs); Diagnostic Imaging Dataset (DID); Mental Health data; National Cancer Registry; Deprivation data including Townsend Score and Index of Multiple Deprivation. Critical care is supplied as a separate dataset by NHS Digital, but is integrated with Admitted Patient Care. Data can only be used for public health research purposes in research recommended for approval by ISAC for MHRA database research. CPRD make the final decision on access, and ensure compliance with NHS Digital’s requirements within the data sharing agreement, including (e.g.) security of the third party. Access to CPRD data and services will not be permitted in circumstances that may result in loss of public trust or for activities that may undermine the integrity of the CPRD database.