NHS Digital Data Release Register - reformatted

Genomics England

Project 1 — DARS-NIC-12784-R8W7V

Opt outs honoured: No - consent provided by participants of research study, No - data flow is not identifiable (Reasonable Expectation, Consent (Reasonable Expectation))

Sensitive: Sensitive, and Non Sensitive

When: 2016/04 (or before) — 2020/01.

Repeats: Ongoing, One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC, Health and Social Care Act 2012 – s261(2)(c)

Categories: Identifiable, Anonymised - ICO code compliant


  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Outpatients
  • Hospital Episode Statistics Critical Care
  • MRIS - List Cleaning Report
  • MRIS - Flagging Current Status Report
  • MRIS - Cause of Death Report
  • Mental Health and Learning Disabilities Data Set
  • Mental Health Minimum Data Set
  • Bridge file: Hospital Episode Statistics to Diagnostic Imaging Dataset
  • Diagnostic Imaging Dataset
  • Bridge file: Hospital Episode Statistics to Mental Health Minimum Data Set
  • Patient Reported Outcome Measures (Linkable to HES)
  • MRIS - Cohort Event Notification Report
  • MRIS - Members and Postings Report
  • Mental Health Services Data Set


The aim is to create a new genomic medicine service for the NHS – transforming the way people are cared for. Patients may be offered a diagnosis where there wasn’t one before. In time, there is the potential of new and more effective treatments. The project will also enable new medical research. Combining genomic sequence data with medical records is a ground-breaking resource. Researchers will study how best to use genomics in healthcare and how best to interpret the data to help patients. The causes, diagnosis and treatment of disease will also be investigated. We also aim to kick-start a UK genomics industry. This is currently the largest national sequencing project of its kind in the world. Genomics England seeking to obtain information from participants’ medical records that span their entire lifetime. The DNA sequence, and information from patients’ health records and any other information given to the Project will be collected and stored securely by the Project as a resource for use by approved researchers for future scientific and medical purposes during the life and after the death of participants. Diagnoses arising from the sequencing and analysis of the participants’ DNA are already being fed back to Participants now and for many they are receiving a diagnosis for the first time. Genomic England’s legacy will be a genomics service ready for adoption by the NHS, high ethical standards and public support for genomics, new medicines, treatments and diagnostics and a country which hosts the world’s leading genomic companies. It is a bold ambition with benefits for all.

Yielded Benefits:

Over 41,000 Genomes sequenced as of December 2017. Participant stories can be found at: https://www.genomicsengland.co.uk/alexs-story/ Genomics England has built upon its commitment to lead on Governments technology and innovation agenda by forging partnership with industry. Examples of this include a new industry collaboration with leading life sciences companies Inivata and Thermo Fisher Scientific to improve understanding of cancer. Public Health England has announced that Whole Genome Sequencing (WGS) is now being used to identify different strains of tuberculosis (TB). This is the first time that WGS has been used as a diagnostic solution for managing a disease on this scale anywhere in the world. The technique, developed in conjunction with the University of Oxford, allows faster and more accurate diagnoses, meaning patients can be treated with precisely the right medication more quickly. Genomics England has now engaged devolved nations and is recruiting participants from Scotland and Wales. Update May 2018 Over 60,000 genomes have now been sequenced and over 12,000 clinical reports have been issued to NHS Genomic Medicine Centres. Thirty disease and cross-cutting research domains have had their plans approved and now have access to 100,000 Genomes Project data. The number of users with access to the Genomics England Research Environment is now over 1,300. Twelve publications have arisen from or refer to the 100,000 Genomes Project during the last year, including: • The 100,000 Genomes Project: bringing whole genome sequencing to the NHS. Clare Turnbull et al. BMJ 2018; doi: https://doi.org/10.1136/bmj.k1687 (24 April 2018) • Identification of rare sequence variation underlying heritable pulmonary arterial hypertension. Nicholas W. Morrell et al. Nature Communications 2018;9; doi:10.1038/s41467-018-03672-4 (12 April 2018) • Introducing genomics into cancer care. Sue Hill BRJ Surg 2018;105(2):e14–e15 (17 January 2018) • Missense variants in the X-linked gene PRPS1 cause retinal degeneration in females. Alessia Fiorentino, Kaoru Fujinami, Gavin Arno et al. Hum Mutat 2017; doi:10.1002/humu.23349 (17 October 2017) See https://www.genomicsengland.co.uk/category/updates/ and https://www.genomicsengland.co.uk/about-gecip/publications/ for details of news and publications. Genomics England created the Discovery Forum in July 2017 to build on the work of the GENE Consortium. The Discovery Forum provides a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider UK genomics landscape.

Expected Benefits:

The overall benefits realisation for the project are established by the Department of Health (DoH). Each individual research study will have their own specific aims and benefits that underpin the DoH benefits. The 10 key benefits have been drafted as: 1. It is anticipated that many of the circa 20,000 patients with rare diseases who provide their genomes for sequencing as part of the Project will receive a formal diagnosis for the first time. 2. The speed of processing the data from Whole Genome Sequences should be greatly increased with an associated acceleration of diagnosis – something that previously has taken years to identify, under the Project this should be possible in a few months. 3. It is hoped that Genomic diagnosis as a result of the Project will enable clinicians to make cancer treatment more personalised by determining how effective treatments like Herceptin or radiotherapy are likely to be. This will improve the effectiveness of treatments and may provide financial savings. 4. Although not all patients involved in the Project will benefit from a significant improvement in their own condition, for most the benefit will be in knowing that they will be helping people like them in the future. 5. The Project has already identified issues with the current approach for collecting DNA from cancer tumours. A current study within the Project is looking at identifying optimum methods for collecting DNA from cancer tumours. This is something which previously that has been incredibly difficult to do at scale and which is essential for high quality Whole Genome Sequencing. 7. As a result of the high standards of ethical practice and transparency underpinning the Project, the case will be made for collecting genomic data, linking it the phenotypic data and sharing it in a controlled way with academics, researchers and industry. 8. The creation of NHS Genomic Medicine Centres will allow engagement and feedback to patients with rare diseases and cancer from the Project and will provide the infrastructure to bring about transformational change in the NHS so that it continues to deliver world-leading healthcare in the future 8. As a result of the Project, the NHS and Public Health workforce will benefit from additional education in genomic medicine, including 550 places for an MSc in Genomics Medicine over the next 3 years, increased capacity in the scientific workforce, and a legacy of education and training in genomics for the future workforce. 9. The secure dataset of genomic and clinical data which is created as a result of the Project will enable clinicians, researchers and industry to discover new variants with a view to creating new diagnostics and treatments. 10. The Project will kick-start the development of the UK industry in Whole Genome Sequencing. The global genomics market was valued at an estimated £7.6 billion in 2013 and is expected to reach over £13 billion by 2018.


All outputs from research environments will be anonymised. The outputs will relate to the purposes described above for each of the research areas. Proof of concept outputs will be produced during the summer of 2015, with a move to researcher created outputs during the Autumn of 2015 onwards. The specific outputs are defined by the research groups and then verified for being anonymous when an extract is requested.


Amendment - Genomics England has engaged the Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), at the University of Oxford to act as a Data Processor. The Data Processor will provide data handling services related to the acquisition and cleaning of registry based data provided by the HSCIC for the consented Genomics England participants. The University of Oxford will access identifiable record level data in the performance of this function. The scope of data processing activities is limited. All data processing activities will be performed in the Genomics England Data Centre and no data will leave these servers. Genomics England will remain the Data Controller and remain responsible for all aspects of system security and access control. Oxford will not access data remotely or take any data away from the Genomics England data centre. There are three principle stages of processing: 1. Data acquisition, cleansing, quality verification, linkage and de-identification 2. Identification of participant cohorts that meet research scope parameters 3. Data analysis for research using de-identified data The first stage focuses on the acquisition of data and quality verification to ensure it is complete, accurate and complies well with NHS data dictionary and other data standards that apply. The data is provided over a period of time (related to the treatment of participants) and associated with their longitudinal data from other NHS sources. The intention over the course of the Project is to link this data with other data, such as primary, secondary, social and participant provided data. For this application the request is limited to HES Data. The richness of the high quality data sets are crucial to the success of the 100,000 Genome Project in delivering value to the NHS. The evaluation of whole genome sequencing (WGS) data in the context of rich and extended phenotypes derived from electronic health records, such as blood pressure, cholesterol, glucose, and pharmacogenomics, adds significant value. The richness of the Project dataset will allow us to move beyond the primary phenotype of the rare disease, cancer or infectious disease that led to the patient’s enrolment to evaluate the WGS in the context of other continuous traits, diseases and response to therapy. As soon as the data completeness and quality has been confirmed the data is de-identified as all subsequent processing can be performed without direct identifiers. This de-identification is a key facet of the 100,000 Genomes Project. The second stage is focused on the confirmation and approval of valid research scope and selecting a de-identified cohort of participants that fulfil the focus of the research request. The Researchers will BE members of a Genomics England Clinical Interpretation Partnership (GECiP) or a GENE Consortia. GECiP. The overall aim of the Genomics England Clinical Interpretation Partnership (GeCIP) is to create a thriving, sustainable environment for researchers and clinical (NHS) disease experts. The activities of GeCIP will inform NHS feedback to clinicians and the multidisciplinary teams by providing enhanced data interpretation, additional information on pathogenicity of variants, and functional characterisation. GENE Consortia. Genomics England are running an Industry trial during the calendar year 2015. 12 pharma, biotech and diagnostics companies have committed to invest monetary and FTE resources to understand how best to realise the value from working with Genomics England, our Bioinformatics Platform Partners and the wider NHS. Across the 100,000 Genome Project Genomics England will be at the forefront of Lifescience Programmes in the UK and Worldwide. For example Gene discovery in the 100,000 Genomes Project will create significant opportunities for scientific innovation and place particular emphasis upon national and international collaborations. Where possible, we will work with key international programmes including Development Disorders (DDD) and Orphanet, and complement the work of the International Rare Diseases Consortium (IRDC). All research requests will be assessed to ensure they are included in the approved use purposes set out in the Genomics England Protocol and that it complies with the boundaries of the research group (Genomics England Clinical Interpretation Partnership or GENE consortia). Each research request will be for a sub-set of the de-identified data, with the specific data requirements specified in the request. The researchers also declare any data they wish to bring into the environment and any tools they wish to use for analysis. The third stage is the research analysis of the de-identified approved data sets in the virtual data centre environments. Researchers perform all the analysis and processing within the environments hosted by Genomics England, they do not extract de-identified data. Researchers will use pre-declared data and tools to perform their analysis. If researchers want to extract any anonymised results data, they must first put any such results in a secure folder for anonymisation verification before it can be extracted. A simplified view of the Genomics England Data Flow is shown below. Note the de-identified export boundaries into the Genomics England Core Research Repository Genomics England provide the HSCIC with a cohort for linkage and they receive HES data from the HSCIC on a monthly basis. Every quarter Genomics provide an updated cohort to the HSCIC and the HSCIC provide the historical data for the extra cohort members The cohort is already flagged with the HSCIC so Genomics will only receive the historical data for the extra cohort members each quarter.