NHS Digital Data Release Register - reformatted

University Of Liverpool

Project 1 — DARS-NIC-14337-J4N1T

Opt outs honoured: N

Sensitive: Non Sensitive

When: 2016/04 (or before) — 2018/02.

Repeats: Ongoing

Legal basis: Health and Social Care Act 2012

Categories: Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Admitted Patient Care

Benefits:

The information that will be collected from this, the British Orthopaedic Surgery Surveillance (BOSS) study, has been planned in conjunction with patients, their parents and treating clinicians. The formation of a prospective SCFE database was a recommendation by NICE in their recent review of SCFE – the BOSS study is therefore meeting this demand. The information gained from the BOSS study will be the largest study undertaken into the rare disease of SCFE, and will yield information concerning the effects of different interventions on patient outcomes. This will have direct implications for the way that surgical care is delivered, with surgeons being able to benchmark their practice against others. The findings of the BOSS study will inform the feasibility of a clinical trial into interventions for SCFE (i.e. are there enough cases, enough surgeon engagement and enough variation/uncertainty in practice to warrant a trial?). A clinical trial would be the gold-standard means to ensure evidence based care is being delivered to patients. However, if a clinical trial is not feasible, the BOSS study will significantly enrich the current evidence to enhance patient care. The BOSS study group has biannual presentations; at the British Society of Children’s Orthopaedic Surgery (BSCOS) and at a national BOSS Collaborator meeting. BSCOS and the British Orthopaedic Association (BOA) have advocated that their members engage with the BOSS study. The results should therefore have direct, relevant and measurable impact on the clinicians involved. The BOSS study group have worked with members of the SCFE NICE review group to ensure that the findings of the study will address questions raised within the recent NICE review. The BOSS study will therefore have a direct effect on surgical practice and a positive impact on the care of patients. The BOSS study will begin recruitment during 2016 (with HSCIC to aid case identification), will collect outcomes at 2 years (from 2018) and will therefore report in early 2020.

Outputs:

(1) To determine the case mix of SCFE across the UK, the variation in surgical practice and clinical and radiographic outcomes up to 2-years. There will be a published report which will be sent to Trusts and Clinicians on an annual basis during the study and the final report will published no later than one-year after completion of the study. (2) The results will inform the feasibility of a clinical trial into the surgical treatments for SCFE, and will inform NICE related to the guideline surgery for SCFE. This final report will be published in peer-reviewed journals (e.g. the British Medical Journal) in line with NIHR expectations. (3) Publication of the protocol will be undertaken during 2016, publication of the case mix and variations in surgical practice will published in late 2017. Publication of 2-year follow-up will not be possible until 2019. NIHR funding for this study is for 5 years, so the project will be completed by 2020. If a clinical trial were to ensue, this would require sufficient case numbers, surgeon engagement, patient engagement and a well-balanced trial question. The BOSS Study is a good cost-effective mechanism by which to ‘test’ surgeon engagement, case numbers and begin to understand the variation in practice. This project will therefore inform the feasibility of all aspects of a trial.

Processing:

Data will be used as a reference from which to determine the completeness of cases reported by clinicians and prompt additional reporting. Data will be processed by a team within the clinical trials unit at the University of Liverpool. The BOSS study team will cross-check the HES identified case against the list of cases already reported to them by clinicians through the REDCap clinical trials platform. Cases will be identified only by age/sex of patient, date of admission, date of surgery and hospital. The team will make the assumption that no hospital will have more than one admission per day sharing the same details (this is a rare disease and even in larger children’s hospitals more than one case per day is unusual). If a case is not reported to them, the team will contact the clinician participating in the British Orthopaedic Surgery Surveillance (BOSS) study (all hospitals have identified such an individual with the support of the British Orthopaedic Association and the British Children’s Orthopaedic Association). The team will then ask the clinician to verify/refute the diagnosis. If the diagnosis is verified they will ask the clinician to submit anonymous details of the case through the REDCap clinical trials platform (this is in keeping with the successful UKOSS/ British Association of Pediatric Surgeons-Congenital Anomaly Surveillance System (BAPS-CASS) reporting systems). The BOSS Study has been granted nationwide NHS research approval (HRA-cohort 3), and national ethics approval.

Objectives:

The British Orthopaedic Surgery Surveillance (BOSS) study is a mechanism for researching the treatment of rare orthopaedic diseases within the UK. The methodology detailed in the study protocol is in keeping with similar successful studies of rare diseases performed in Obstetrics and Gynecology (UK Obstetric Surveillance System (UKOSS), BAPS-CASS (British Association of Paediatric Surgeons – Congenital Anomaly Surveillance Study) and BPSU (British Paediatric Surveillance Unit). In these studies, routine data (i.e. disease and anomaly registers) is often used to verify the completeness of case ascertainment – though this is the first time that HES has been used to attempt to augment case identification. The diseases of interest within the BOSS Study are Slipped Capital Femoral Epiphysis (SCFE), and Perthes’ disease. Both are rare hip diseases of adolescence. SCFE is always admitted to hospital for surgery, therefore the data captured within HES is likely to be good. Perthes’ disease is only occasionally admitted; therefore HES data is unlikely to be useful to identify cases of new disease. The data processing and analysis hereafter relates solely to SCFE. All orthopaedic units treating children within the UK are being asked to submit data to the service evaluation/ audit (supported by Orthopaedic Specialist Societies, NICE and the National Clinical Director for Children). As of February 2016, over 150 UK hospitals have agreed to supply data to the study. Details of any new case of SCFE will be recorded by clinicians prospectively using the secure online REDCap clinical trials platform. All English hospitals have a nominated representative and University of Liverpool have made separate applications for the Scottish and Welsh data to the respective organisations. This data will be managed by the Liverpool clinical trials unit, who are overseeing the delivery of the BOSS Study. HRA have given nationwide permission for sites to collect this data, without any additional local approvals and without patient consent (as data is anonymised and data forms part of routine care). The care offered to children affected by the diseases of interest varies considerably around the UK, and beyond. These variations exist owing to the beliefs by the surgeon to which treatment is best and the experience and the skill that the local surgical team can provide. By adequately documenting variations in disease and surgical practice and recording the outcomes in current care, only then can one begin to influence change to improve care across the UK. HES data will be used to ensure maximum case ascertainment is achieved within the study, to ensure the generalisability of the results. When a case of disease is identified within HES (by ICD code) during the study period, the BOSS team in Liverpool will be notified. HSCIC will share with the BOSS team the age and gender of patient, date of admission, date of surgery and hospital. No unique identifiers will be captured; therefore the patient will not be identifiable except to the treating clinicians. The data supplied by HES will then be used to check completeness of the REDCap database (i.e. that supplied prospectively by clinicians) against the HES record. In the event that a case is identified within HES, but has not been reported through RedCap, the nominated surgeon-lead at the relevant hospital will be contacted to ask them to determine the validity of the diagnosis, and if appropriate, formally report the details of the case through REDCap. If the BOSS surveillance mechanism is successful, it will be expanded to other diseases. The intention is that the data generated may serve as stand-alone service improvement, and may generate feasibility research for future clinical trials of treatment interventions.


Project 2 — DARS-NIC-147982-J7KGV

Opt outs honoured: N

Sensitive: Non Sensitive, and Sensitive

When: 2016/04 (or before) — 2017/02.

Repeats: Ongoing

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • MRIS - Flagging Current Status Report
  • MRIS - Cause of Death Report

Objectives:

The data supplied by the NHSIC to Cancer Research Centre will be used only for the approved Medical Research Project MR1025


Project 3 — DARS-NIC-16656-D9B5T

Opt outs honoured: N

Sensitive: Non Sensitive

When: 2016/09 — 2018/02.

Repeats: Ongoing

Legal basis: Health and Social Care Act 2012

Categories: Anonymised - ICO code compliant

Datasets:

  • Hospital Episode Statistics Outpatients
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Accident and Emergency

Benefits:

Benefits from reviewed journal papers and related analysis. 1. The impact of trends in gastrointestinal infections on health care utilisation. Analysis indicating the impact of gastrointestinal (GI) infection trends on health care utilisation and the extent to which this is mediated by socioeconomic and health service related factors, will indicate how targeted interventions that reduce GI infections and actions that influence the health seeking behaviour of people with GI could reduce healthcare usage. Alongside this analysis Liverpool are working with local public health and environmental health teams to develop targeted interventions to reduce inequalities in the causes and consequences of gastrointestinal infections. This analysis will inform the development of these interventions leading to more effective approaches. For example this could include actions to support parents caring for children with gastrointestinal infections and promoting alternatives to A&E by enhancing support through pharmacies and primary care. 2. The environmental determinants of health care utilisation. This analysis will identify the extent to which environmental factors, such as air pollution, flood risk, housing quality and fuel poverty influence health care utilisation and inequalities in these effects by area deprivation. Previously strategies to manage demand for health care services have focused on service redesign rather than environmental determinants of health. This analysis will be used to develop strategies with local partners to reduce demand for health care by addressing important environmental determinants. The results will indicate the potential savings to the NHS from investment in initiatives to reduce fuel poverty or improve air quality, for example. This will then lead to benefits both through improving health and reducing preventable health care costs. 3. The effect of changes in social care funding and welfare reform on health care utilisation. This analysis will indicate the effect of changes in social care funding and welfare reform on health care utilisation and the factors that might mitigate these effects. Funding for social care is currently being reduced relative to demand, and major welfare reforms are being introduced, however, it is not currently known what effect this is having on healthcare utilisation. The analysis will indicate the potential costs to the health service of these policies. It will inform national policy debates about the costs and benefits of different approaches to welfare reform and the allocation of resources for health and social care services. It will help identify the characteristics of local systems that are more resilient to these changes – enabling the development of local health, social care and welfare systems that can better improve health and reduce health inequalities. 4. The health inequalities impact of initiatives to promote neighbourhood resilience. This analysis will indicate the health inequalities impact of a number of local initiatives that aim to promote economic, environmental and social resilience in disadvantaged neighbourhoods in the North West. These include initiatives to improve housing, increase financial security, reduce social isolation and improve public involvement and governance. This will indicate what works and provide evidence for local authorities across the country helping them develop initiatives that promote resilience, improve health and reduce inequalities. 5. What components of resilience have the greatest impact on health. The University have developed a model of resilience with local authorities in the North West that focuses on economic, environmental, social and governance systems. However it is not yet known what the relative impact of these components is on health and health inequalities. Analyses will indicate the health gains that could be expected for investments in different components of this model and the interactions between them. This will enable the more efficient use of resources to develop more resilient systems that reduce health inequalities. 6. The impact on health care utilisation of new models of out of hospital treatment and care and community orientated primary care. There are currently a large number of new models of out of hospital treatment and care, being developed across the country, particularly as part of the Vanguard programme. New initiatives are often overlaid on top of and interact with existing programmes and wider system changes. The NHS and local authority partners of the NWC CLAHRC have identified this as a priority for the research programme supported by the NWC CLAHRC over the next 3 years. Analyses will identify the components of new models of care along with wider system changes that appear to be effective, both within primary care and at the interface of primary, secondary and social care. Analysis will particularly focus on how these effects differ across socioeconomic groups and interact with the social and environmental determinants of health . This will support the development of out of hospital care that addresses inequalities and improves health whilst reducing healthcare utilisation. For example this could include new approaches for incorporating wider social support in general practice through the third sector or identifying the key components for the effective integration of health and social care teams. 7. Predicting adverse tends in neighbourhood health. Increasingly health and social care systems are using risk prediction and stratification methods to target resources and interventions. These have tended to use individual risk factors and model risk at the individual level. This tends to neglect the impact of environmental and area-based determinants of health outcomes. This paper will outline the methods used to develop a risk prediction model that is based on neighbourhood level analysis, incorporating a broader set of individual and environmental determinants than models based solely on individual risk factors. Publishing the methods for producing the model will enable the robust development of a tool that local authorities and NHS organisations can use to target the right actions at the right risk conditions in the right neighbourhoods to most effectively improve health and reduce health service demand (see below). Conferences/presentations. 1. NIHR HPRU annual conference – 2017 This presentation will be used to disseminate the early results from the analysis for Paper 1 to an audience of NHS, Health Protection and Environmental Health practitioners. This will enable them to develop more effective approaches that reduce the impact of gastrointestinal infections in disadvantaged neighbourhoods. 2. European Public Health Association Conference – 2017 This presentation will be used to disseminate and discuss the early results from the analysis for paper 2 to an international audience of public health practitioners, policy makers and academics. This will enable them to make the case for investment in and development of strategies to reduce demand for health care by addressing important environmental determinants of health. It will also stimulate cross-country learning about effective approaches to reduce environmental determinants of health, leading to improved public health policies. 3. Public Health England Annual Conference and Local Government Association Conferences – 2018 These conferences will be used to present early findings from the analysis for papers 4 and 5 to audiences of public health practitioners, other local authority professionals and local government policy makers. This will enable them to make evidenced based decisions about how scarce resources are invested locally in actions to improve the social determinants of health. For example this could indicate whether investment in employment services is likely to be more or less effective than investment in services to reduce social isolation and which are likely to be the important components of these initiatives that increase effectiveness. 4. Annual Primary Care Conference - 2018. This conference will be used to present findings from Papers 6 & 7 to an audience of GPs, Commissioners and other health care professionals – demonstrating the impact of new models of out of hospital care that have been developed in the North West. This will enable other regions to learn about what works for which patient groups enabling the sharing of best practice and the improvement of health and social care services. Policy and practice briefings 1. Developing resilient neighbourhoods. This will synthesise the results from the analysis outlined for papers 4 and 5 above with other research being carried out through the NWC CLAHRC on neighbourhood resilience- including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods. It will provide practical advice for local government organisations indicating approaches that are likely to be effective at promoting resilience and addressing the social determinants of health. This will lead to more effective local government policies and activities that deliver greater health benefits than would otherwise be the case. 2. New models of out of hospital treatment and care, what works for whom? This will synthesise the results from the analysis outlined for papers 6 and 7 above with other research being carried out through the NWC CLAHRC on out of hospital care including systematic reviews of the evidence and qualitative research in the intervention neighbourhoods and GP practices. It will provide practical advice for NHS and local government organisations indicating approaches to out of hospital care that are likely to be effective at reducing health inequalities and reducing demand for health and social care services. Importantly it will identify which components are likely to be particularly effective in deprived neighbourhoods and which approaches risk widening health inequalities. 3. Using neighbourhood predictive modelling to plan and target prevention. This will provide a practical guide for local government and NHS organisations to use the neighbourhood risk model developed through this project to better target resources and adapt services to local needs. This will lead to benefits through the development of more appropriate local services. Other outputs 1. Construction of longitudinal panel dataset of neighbourhood indicators with linked socioeconomic data. This dataset will be a resource that will be used by a number research projects within the NWC CLAHRC for the purposes outlined in this application. Statistical code used to develop the indicators will be made available to other researchers and the longitudinal panel dataset could also be made available more broadly for research that benefits health and social care. As outlined above where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. The National Institute for Health Research and the Medical Research council have recognised the need for more research that uses routine datasets such as this to evaluate the impact of public policies as “natural experiments”. This work will provide a major advance in these methods and data resources to support them leading to benefits to patients and the public through the rapid evaluation of public policies that have an impact on health. 2. Predictive modeling tool freely available to local authority and NHS organisations. The predictive modeling interface will enable local authorities and NHS organisations to better target resources and adapt services to local needs. This will lead to the more efficient and effective use of resources leading to health benefits for patients and the public. 3. Web based Neighbourhood Resilience Interface developed. The development of this freely available interface will support community groups and residents in disadvantaged neighbourhoods to identify local needs; monitor progress and advocate for change. This will lead to improved and more effective local services, it will support local community groups in making the case for funding in disadvantaged areas leading to increased investment.

Outputs:

Planned journal submissions for publications At least 8 publications in high impact peer reviewed journals are expected from this work. These are outlined below. Paper 1. The impact of gastrointestinal disease trends on health care utilisation and the extent to which these are mediated by socioeconomic and health service related factors. - Lancet Infectious diseases - January 2018 Paper 2. The environmental determinants of health care utilisation and inequalities in these effects by area deprivation - International Journal of Epidemiology - January 2018 Paper 3. The effect of changes in social care funding and welfare reform on health care utilisation 2010 and 2017; are some places more resilient than others?, British Medical Journal - January 2018 Paper 4. The health inequalities impact of initiatives to promote neighbourhood resilience. American Journal of Public Health - January 2019 Paper 5. What components of resilience have the greatest impact on health - the implications for inequalities. Journal of Epidemiology and Community Health - January 2018 Paper 6. The impact on health care utilisation of new models of out of hospital treatment and care, British Medical Journal - January 2018 Paper 7. The impact on health care utilisation of community orientated primary care, British Medical Journal - January 2018 Paper 8. Predicting adverse tends in neighbourhood health - April 2019. American Journal of Public Health. The findings from the research will be disseminated through the following Conferences Presentations: NIHR HPRU annual conference - 2017 European Public Health Association Conference - 2017 Public Health England Annual Conference - 2018 Local Government Association Conference - 2018 Annual Primary Care Conference - 2018. Policy and Practice Briefing papers The University of Liverpool will produce a series of freely available briefing papers directed at practitioners, commissioners and policy makers in local government and NHS organisations. 1. Developing resilient neighbourhoods. 2. New models of out of hospital treatment and care, what works for whom? 3. Using neighbourhood predictive modeling to plan and target prevention. Other Outputs Longitudinal panel dataset of neighbourhood indicators. The initial product of this project will be a longitudinal panel dataset of neighbourhood indicators. This will initially be used by research groups within the NIHR CLAHRC NWC as outlined above. Where possible and following risk assessment and guidance from the HSCIC these data will be made available as Open Data. Where necessary this will involve removing sensitive indicators and aggregating indicators to higher geographies to ensure anonymity is maintained. Open Data available by September 2018. Predictive modeling tool. As outlined in the analysis section for Objective 3, a predictive model will be developed that can be used by local government and NHS organisations to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. An online interface will be developed that enables local authorities to use this model to visualise and identify high-risk neighbourhoods. This will be made freely available for use by local government and NHS organisations. January 2019 Web based Neighbourhood Resilience Interface. As outlined above, the development web based presentations of the Longitudinal panel dataset of neighbourhood indicators that will enable local groups to interact with the data, including mapping data, comparing neighbourhoods and visualising trends over time. This will support community groups to identify local needs; monitor progress and advocate for change promoting transparency and accountability. This will be freely and publically available. Developed January 2019. All outputs will be risk assessed for the potential of re-identification and will only include aggregate data with small numbers suppressed in line with HES analysis guidance.

Processing:

Step 1. Indicator development. In the first step of data processing indicators will be developed for each Lower Super Output Area (LSOA) in England from 2004-05 to 2017-18. The data request has been limited to these years as this is the minimum number of years that is sufficient to measure change over time within neighbourhoods. This process will involve a number of stages to develop robust indicators which are likely to be sensitive to socioeconomic and environmental change, national social and welfare policy changes and local health and social care redesign initiatives. Initially the University of Liverpool are developing theoretical models for the exposures and interventions being investigated. These outline the likely mechanisms through which these factors are likely to have an impact on hospital activity. As well as developing theoretical models of the impact of national socioeconomic, environmental and policy changes, Liverpool are working with local stakeholders to identify, prioritise and develop models for local NHS and Council initiatives. These will then be used to identify candidate indicators that are likely to be affected by these changes and initiatives. Indicator definitions will be developed and the data quality and precision tested. Categories will be refined and time periods pooled to provide sample sizes within each cell that give estimates that are sufficiently precise and comply with the HSCIC Small Numbers Policy / HES analysis guide. The reliability and validity of indicators will be investigated by testing the association between candidate indicators and other measures of similar constructs from different data sources. In particular, indicators will be compared to measures derived from a household health survey, which has been carried out across neighbourhoods in the North West. Indicators will then be refined in consultation with local NHS and Local Authority stakeholders. It is likely that the indicators will include measures of particular groups of morbidities (e.g chronic conditions, mental health or alcohol related conditions, accidents), some will be age specific (e.g asthma admissions in children, accidents on children, falls amongst older people), some will be limited to particular admission type (e.g emergency admissions for particular chronic conditions) some will be directly related to processes of care – e.g delayed discharge, length of stay etc). Where relevant indicators will be replicated at higher geographies and by GP practice. Step 2 – Matching and linking LSOA level data. In Step 2 data will be matched at the LSOA level to other national datasets indicating socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and uptake of local authority and NHS initiatives. These datasets only include pseudonymised data and do not include any personal data, and linkage will only occur at the area level minimizing the risks of re-identification due to data linkage. National and local small area datasets that will be used alongside neighbourhood level indicators derived from HES data: National Datasets. • Modelled LSOA level prescribing data • LSOA population estimates • Housing overcrowding data (census) • Modelled LSOA air quality indicators for 2001, 2005, 2008 and 2012 • Crime data by LSOA • Economic activity • Self-reported health (census) • DWP statistics on the number of claimants of welfare benefits by LSOA • The number of laboratory reports for gastrointestinal infections by LSOA • Flood warning areas mapped to LSOA • Density of fast food and alcohol outlets, access to green spaces, • Housing quality indicators • Small area fuel poverty indicators. Local datasets. • Number people receiving emergency food from food banks by LSOA (local authority) • Number of people attending swimming / gym activities by LSOA (local authority) • Number of people receiving social care services by LSOA (local authority) • Number of people requesting debt/ financial/housing/welfare advice by LSOA (local authority) • Numbers accessing credit unions (local authority) • Local authority licensing data (local authority) This will result in a longitudinal panel dataset of neighbourhood indicators of hospital activity and potential determinants of health and health care use. To achieve Objective 2, LSOAs within this dataset will then be mapped to areas involved in a number of area-based interventions in the North West of England. The Collaboration for Leadership in Applied Health Research North West Coast (CLAHRC NWC) is working with the NHS, Local Government organisations and residents to prioritise existing interventions and to develop and changes those based on evidence and to evaluate their impact on health and health inequalities. These include health and social care service redesign initiatives as well as initiatives that aim to promote the resilience of local economic, social, environmental and governance systems. GP practice codes will also be mapped to groups of GP practices involved in health and social care redesign initiatives that are targeting GP registered populations rather than particular neighbourhoods. These intervention areas will then be matched with both national and regional (NW) control areas with similar characteristics, in order to evaluate the impact of these interventions on health outcomes and health service use. Step 3 Analysis. Objective 1 – Nationwide analysis. Analysis for Objective 1 will use the longitudinal panel dataset for the whole country. Longitudinal analysis methods will be used to investigate the association between socioeconomic changes, welfare policy changes, environmental changes and infectious disease trends within neighbourhoods and changes in indicators of health service utilisation. Mediation and interaction analysis will then investigate whether these effects are modified by other neighbourhood characteristics – e.g area deprivation, characteristics of the physical environment, health and social care services, local governance arrangements. Objective 2 – Evaluations of local initiatives Analysis for Objective 2 will use the longitudinal panel dataset for local intervention areas alongside data from national and regional matched control areas to evaluate the impact of interventions whilst controlling for the contextual and national trends identified through the analysis for Objective 1. Objective 3 – Predictive models. This analysis will use the findings from Objectives 1 and 2 in multivariable analysis to develop predictive models of the modifiable factors driving adverse health trends and increases in demand for health services at the neighbourhood level. These will be developed to not only identify neighbourhoods at high risk, but also to predict those areas that are most likely to experience adverse trends in health outcomes and health care utilisation in the future. Working with local government and NHS organisations the University of Liverpool will develop and evaluate approaches for the practical application of these predictive models to support the more effective use of local resources. Objective 4 - Community led approaches for monitoring progress on health inequalities at the neighbourhood level. A selection of the indicators from the aggregate longitudinal panel dataset will be developed in order that they can be made publically available as Open Data (see below controls in place to minimise risks). Working with a network of community organisations who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), these indicators will be used to test out new community led approaches for monitoring progress on health inequalities at the neighbourhood level. This will involve the development of web based presentations of data that would enable local groups to identify local needs, monitor progress and advocate for change promoting transparency and accountability. Data governance, management and controls in place for data access and procedures to minimise the risk of re-identification. The Integrated Longitudinal Research Resource The usage of the HES data included in this request and the other small area datasets will be managed through the Integrated Longitudinal Research Resource (ILRR). The ILRR is a data management resource at the University of Liverpool established by the NIHR CLAHRC NWC in collaboration with the NIHR Gastrointestinal Health Protection Research Unit (GIHPRU) and the Consumer Data Research Centre(CDRC). The ILRR includes a dedicated Data Scientist, secure servers and robust policies for data sharing and data usage. The ILRR is overseen by a governance board, who approve access to data for specific usages based on criteria specific to each dataset. The governance board includes representatives from the NIHR CLAHRC NWC, NIHR GIHPRU and CDRC, NHS and Local government partners, a public advisor and an NHS information governance expert. Controls in place for managing access to the HES data in this request. Only ILRR data scientists based at the University of Liverpool, will have access to the record level HES data included in this request. No third party will have access to the record level data. The HES data included in this request and the panel of aggregate longitudinal neighbourhood indicators derived from that data will be consistently documented, catalogued and coded and stored in a secure SQL server database. Only aggregate data with small numbers suppressed in line with HES analysis guide will be made available to other researchers. This aggregated small area data will still be treated as safeguarded data, with specific data items only being made available to researchers as needed for specific analysis plans, with data only released after any risks of re-identification have been assessed and mitigated by ILRR data scientists. Access to the aggregated panel dataset of neighbourhood indicators will be limited to research groups that are part of the NIHR NWC CLAHRC (unless data is made available as Open Data – see below). These research groups include academic researchers from Liverpool, Lancaster and Central Lancaster Universities as well as analysts from NHS and Local Government organisations. Each group of researchers will outline a detailed analysis plan relating to each of the Objectives above, describing which aggregate indicators of hospital activity they require access to and which indicators related to socioeconomic change, national social and welfare policy changes, environmental changes, morbidity trends and those related to local area based local authority and NHS interventions. Each of these detailed analysis plans will be reviewed by the Integrated Longitudinal Research Resource (ILRR) governance board. Data will only be released only if the data is to be used according to the purposes outlined in this application. Only aggregate data that only includes the variables required for the specific analysis of each group will be released. Each request will be assessed by an experienced Data Scientist to identify if there are any risks of data being re-identified as a result of the linkage with other data sources, and to mitigate these risks. This risk assessment will be based on the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification. None of the datasets that will be used to develop linked LSOA indicators include any personal data, therefore risks of re-identification due to data linkage is low. Open data. As outlined under Objective 4 the aim is to develop a selection of the aggregate indicators derived from HES data so that they could be released as Open Data. The risk of re-identification for each of these indicators will be assessed using the procedures outlined in the Anonymisation Standard for Publishing Health and Social Care Data Specification and measures taken to ensure the level of anonymisation is low enough to allow public release. For example this could involve aggregating these indicators the ward level (average population size 10,000), rather than at the LSOA or pooling data over a number of years. The HSCIC will be consulted before any indicator is releases under the Open Government License. These Open Data aggregate indicators will then be used in work with a network of community organisations and members of the public who are part of the NWC CLAHRC Community Researcher and Engagement Network (COREN), to involve members of the public in identifying local needs, monitoring progress and advocate for change to improve services. The role of the institutions involved in these grants. The University of Liverpool (UoL) will be the sole data controller and data processor for this application and all record level data will be processed at the UoL. Only data scientists based at the UoL and employed by the UoL will have access to the record level data. The ILRR governance board that includes representative from the NIHR CLAHRC NWC, NIHR GIHPRU, CDRC and local NHS and LA organisations will oversee procedures and processes for accessing the small area aggregate level data derived from the record level data, and assess and approve requests from research groups to use this data. These research groups will only have access to aggregate datasets that have been risk assessed by data scientists at UoL and comply with HES small number analysis guidance. These research groups will include partners who are members of the NIHR CLAHRC NWC collaboration, including researchers from Liverpool, Lancaster and Central Lancashire Universities, as well as analysts from local NHS and Local Government organisations. As is required by the NIHR, the research from this project will be published in peer-reviewed journals that are compliant with the NIHR policy on Open Access.

Objectives:

HES data will be used to develop a longitudinal panel of neighbourhood (Lower Super Output Area - LSOA) indicators. These will be used to investigate the impact on health care utilisation of risk factors, policies and interventions. Analysis of this longitudinal panel will: 1. Investigate the impact across England of socioeconomic changes, national health and welfare policy changes, environmental changes and infectious disease trends on healthcare utilisation and whether there are neighbourhood level characteristics that modify these effects. Analysis will investigate inequalities between neighbourhoods in the consequences of these adverse trends and events. Analyses for this Objective will indicate the contextual factors driving adverse health outcomes and health service utilisation at the neighbourhood level. 2. Evaluate the impact of area based local authority and NHS economic, environmental, social, governance and service redesign activities on health outcomes and demand for health and social care services. 3. To develop predictive models of the factors driving adverse health trends and increases in demand for health services at the neighbourhood level, that can then be used by local agencies to better target resources at the root causes of ill-health and health service demand and the neighbourhoods most affected. 4. To develop new approaches for monitoring progress on health inequalities at the neighbourhood level and involving the public in using data to influence local services and policies - supporting Open Data initiatives to promote transparency and accountability.


Project 4 — DARS-NIC-19237-R3T6S

Opt outs honoured: N

Sensitive: Sensitive

When: 2017/03 — 2018/02.

Repeats: Ongoing

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • MRIS - Flagging Current Status Report
  • MRIS - Cause of Death Report
  • MRIS - Members and Postings Report
  • MRIS - Cohort Event Notification Report

Benefits:

ONS Mortality and Cancer Registry Data have been received on a quarterly basis since 2012. This was vital information during the conduct of the trial as any deceased participants, or those diagnosed with lung cancer, were annotated on the database and marked as "Off Study" so that the UKLS project team would not contact them again to arrange repeat scans or complete follow up questionnaires. The trial has now finished but the follow up data on deaths and lung cancer diagnosis is still required. Once the outcome data is available, the success of the screening can be evaluated. This analysis will be provided to the UK National Screening Committee (UKNSC) to help inform decision-making as to whether a lung cancer screening programme should be implemented in the UK. The UKNSC will not be using only the UKLS analysis, but will also receive analysis from a larger lung cancer screening trial run in the Netherlands (NELSON - European Nederlands-Leuvens Screening Onderzoek). The NELSON Trial is due to report in soon. A future exercise to combine UKLS and NELSON data is anticipated, but further data sharing will not be undertaken prior to approval from NHS Digital by means of a separate application.

Outputs:

Data already received from NHS digital has not had any analysis undertaken, nor any data published/released or reported upon. Data received to date has been utilised to update the UKLS database to ensure University of Liverpool did not contact deceased individuals. The initial findings/conclusions of the trial data were published in the BMJ-Thorax Online First. This included methods, trial design, recruitment, randomisation, nodule management, number of cancers, treatment, cost effectiveness modelling. The full report of these aspects of the UKLS trial has also been published by the funder, National Institute for Health Research, Health Technology Assessment Programme (NIHR HTA). Both of these are available as open publications. Future specific outputs anticipated June to December 2017 in the form of presentations/publications and peer review journals will be added to the UKLS website. One specific output will be a report to the UK National Screening Committee on the cost effectiveness and mortality benefit of introducing a lung cancer screening programme into the UK. Prior to this report it will be necessary for the statistician to analyse the data on causes of death and lung cancer diagnoses. The UKLS statistician is also designing a risk model to predict lung cancer utilising nodule data from the UKLS study and, if successful, will be submitted for publication. The most appropriate publication will be identified when the analysis is complete. This may include Epidemiological or Radiological publications, such as BMJ-Thorax. Submission to a publication does not guarantee acceptance so it may be submitted to more than one publication before being accepted. Outputs will contain only data that is aggregated with small numbers suppressed in line with the HES Analysis Guide.

Processing:

An updated cohort Excel file (containing details of participants who have given informed consent) will be sent to NHS Digital by UKLS Project Manager. The updated cohort file is as a result of removal of those participants who have died (as informed by previous data received from NHS Digital). NHS Digital will upload linked dataset file onto their secure portal and notify UKLS Lung Cancer IT technician. The file will be downloaded and saved as a password protected document into a folder. The UKLS Project Manager will update the UKLS database with deaths and cancers notified to ensure no further contact with those individuals is attempted. The Lung Cancer IT Technician will write/run queries to extract selected data from the UKLS ONS/Cancer Registry database. The output will include the pseudonymised unique patient identifier (MPI) in order that it can be linked to subject data held by UKLS. Subject data is data that has been provided by the participants as part of the trail. For those randomised to the CT screening arm, details of CT scan results are held and any treatment received as part of the trial. This data will not be linked with any other patient-level data. Researchers have access to the pseudonymised data for analysis only, which is imported into statistical software, usually SAS, STATA, or Excel. Only substantive employees of the University of Liverpool will process the data and only for the purpose as defined in this agreement. The analysis (as anonymised, aggregate data) will be the subject of publication (see specific outputs), however record level data will be viewed by the named users in this agreement only. The clinical database used within the UKLS has data for 4,061 subjects; all data is held securely (with additional password protection) and accessed only by the named users, in compliance with The University of Liverpool Data Policies. Although the database includes NHS numbers, only pseudonymised data will be made available to researchers, subjects will be identified using a pseudonymised unique identifier in any extracted data. Any analysis will be viewed by the named users in this agreement only, further data sharing beyond the named users may be required in the future however this will be requested by means of a further application to NHS Digital.

Objectives:

The overall aim of the trial was to provide data required for an informed decision about the introduction of population screening for lung cancer. This involved establishing the impact of screening on lung cancer mortality, determining the best screening strategy and assessing the physical and psychological consequences and the health implications of screening. An additional objective was to create a resource for future improvements to screening strategies. It was initially anticipated that the pilot study would be followed by a more in-depth extended trial with a larger cohort of people. This did not receive further funding however and data will be limited to the pilot study with a cohort of 4,061 participants, of which recruitment has now ended. Although further data is being requested under this agreement, it will be 'follow up' data for the original cohort only e.g. Cause of Death, Date of Death and Cancer Registry data. The data will be recorded on the United Kingdom Lung Cancer Screening Trial (UKLS) database and pseudonymised data given to the named researchers in order to ascertain any mortality advantage to screening and inform the UK National Screening committee. Any future sharing of record-level data would be subject to an amendment application requiring NHS Digital approval.


Project 5 — DARS-NIC-19805-M6T5R

Opt outs honoured: N

Sensitive: Sensitive, and Non Sensitive

When: 2016/04 (or before) — 2016/08.

Repeats: One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Critical Care
  • Hospital Episode Statistics Outpatients

Benefits:

There are both direct and indirect benefits to healthcare. This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research. The Standard and New Antiepileptic Drugs (SANAD) RCT is a multicentre, pragmatic RCT of worldwide significance, informing the first line use of antiepileptic drugs in clinical practice and prompting a review of national treatment guidelines. The subsequent study SANAD II (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119) is on-going, opening recruitment in 2013 and will be recruiting 1510 participants’ for a duration of 5.5 years and is expected to exert a significant influence on the evidence for the treatment of epilepsy, the most common neurological disease. The study to which this application refers will directly inform the methodology employed in the data collection and analyses of the SANAD II study. The assessment of the additional benefits of data from electronic medical records, particularly with regards to the analysis of health economic outcomes and methods to address missing data, will inform the subsequent methods employed in the analyses of SANAD II. For example, if implementing data from electronic medical records provides greater benefit when addressing missing data, this data will subsequently be requested for all participants of SANAD II. This directly benefits health and social care by informing the methodology to be employed in the data collection and analyses of a NIHR HTA funded RCT, therefore maximizing the power of the data and results and the subsequent impact on patient care. This output will be measurable based on the methods subsequently employed in SANAD II. Through output via presentation and publication to the research community involved in clinical trials and clinical trials methodology, there will be indirect benefits to health and social care. The output of this study aims to inform the implementation of data from electronic medical records in prospective clinical research including RCTs. Improved knowledge of the attributes, additional benefits and efficiency of data accessed from electronic medical records will inform the design of future RCTs. Resultantly, RCTs will use electronic medical records for the objectives where a benefit is offered, over standard methods of data collection. This will result in improved efficiency (and therefore costs) of RCTs, frequently funded through public sources and improved participant experience. For example, the number of clinical trial follow up appointments may be reduced if data can be adequately collected using electronic medical records. Finally, indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy, although this hypothesis is not being explicitly assessed.

Outputs:

Final results from this study and the associated outputs are expected by study completion (12/2017). All presented or published results will be on a strictly anonymous basis. Non-identifiable aggregate data will be used in presentations and publications with the suppression of small numbers in line with the HES analysis guide when the output involves specific clinical details. The output will consist of descriptive statistics and statistical measures of agreement between data retrieved from electronic medical records, including HES data to data collected through standard methods during SANAD II. The nature of the sample (60 included participants) results in a possibility that small numbers may be identified for data variables of interest. For example, if 5 participants experience hospital admissions or admission to critical care and data from electronic medical records provides significant benefits over standard RCT methods; this would be important to include in any output. As we are primarily concerned with the agreement and additional benefits of data from electronic medical records rather than specific clinical details, there will be no requirement to include explicit clinical details in any output. In order to present the differences between data from electronic medical records and data recorded through standard methods in SANAD II, there will be a need to highlight the availability of specific data variables. For example, outputs may present that ‘details of MRI scans were available in X number of patients’ rather than ‘X number of patients had an MRI scan demonstrating temporal sclerosis’. The exclusion of specific clinical details at record level in addition to demographic variables and geographical location for individuals involved will ensure participant anonymity is maintained; it is the available data variables and agreement between datasets that will inform the outputs of this study. In all output, aggregate data will be de-identified and all measures will be taken to ensure that individuals cannot be identified. For example during the analysis the additional benefits will be examined of assessing IMD by LSOA to inform the health economic analysis. However, in any presentation there will be no need to and the LSOA of individual participants will not be presented, but rather the aggregate results. Rare events are not expected, but if these occur and there remains any risk of identification details will be omitted from all presentations and publications and small numbers suppressed in line with the HES analysis guide. This study will have both direct and indirect outputs. In the first instance, the study will inform the analyses to be undertaken in the SANAD II RCT. Specific components will include the analysis of health economic outcomes and optimal methods to address missing RCT data. For example, if incorporating data from electronic medical records in place of traditional methods such as multiple imputation provides a more rigorous dataset, electronic medical records data will subsequently be sought for all participants’ of SANAD II and included in the analyses on completion of the trial. This output will take the form of a study report and local presentation to the SANAD II study team. This will occur on completion of the study by December 2017. Notably, all members of the team for this study are also involved in SANAD II. This study will also inform the clinical trials community, contributing to the development and improvement of efficient RCT design with the incorporation of data from electronic medical records. The output will be disseminated to clinicians and academics involved in the conduct of clinical trials and research concerning clinical trials methodology. Members of the public and non-academics may have access to the output through presentations and publications but there is no planned specific dissemination to these groups, with the exception of the participant study report that will be provided on completion of the study. This is justified as the output will be primarily informing the methodological aspects of clinical trials. Indirectly, the assessment of data from electronic medical records in RCTs assessing treatments for epilepsy may also indicate potential utility of electronic medical records in the routine clinical monitoring of epilepsy. Although not directly assessing routine clinical practice or the patients’ perspective in this study, a parallel theme funded by the MRC HTMR involves assessing patients’ perspectives with regards to clinical trials methodology, including the development of ‘core outcome sets’ for clinical trials. There are multiple objectives to this project and the dissemination of findings aims to take the following forms: - Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: o A narrative assessment of the methods and feasibility of access will be presented at academic conferences including the International Clinical Trials Methodology Conference 2017 and Association of British Neurologists Annual Meeting 2017. o An assessment of the feasibility, agreement and reliability of data from routine sources will be presented at academic conferences and published in a peer-reviewed clinical journal. The manuscript will initially be submitted to the British Medical Journal during 2016. - Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, applied to the aims of a RCT, SANAD II: o The additional benefit of data from routine sources applied to the assessment of clinical efficacy, adverse events, health economic outcomes and addressing missing RCT data will be presented at academic conferences as above and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals. - Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: o A narrative assessment of the efficiency of accessing and formatting data from routine sources and a discussion of the relative value of accessing data from routine sources will be presented at academic conferences and published in a peer-reviewed journal. The manuscript will initially be submitted to Clinical Trials on completion of the study in December 2017 and if not selected for publication will be submitted to similar methodological journals. Finally with respect to all study objectives, formal study reports will be submitted to the MRC Hubs for Trials Methodology, the funder of this study, to inform the wider programme of research aiming to develop the use of health informatics, including electronic medical records in prospective medical research.

Processing:

The legal gateway for the flow of data into the HSCIC is informed patient consent. This study is sponsored by the University of Liverpool and has been approved by the North of Scotland Research Ethics Service and Health Research Authority. The specific methodological activities involved in the processing of data are as follows: The SANAD II Data Manager will identify eligible participants by review of data recorded for participants enrolled in SANAD II. Eligible participants will be those aged 16 years and over, with capacity to consent and having completed a minimum of 12 months follow up in SANAD II. Participants’ date of birth, date of enrolment and consent details (to identify those with capacity to consent) will be screened. The names and addresses of eligible individuals will subsequently be retrieved. An invitation pack will be sent via the postal services containing a participant information leaflet, consent form and pre-paid addressed envelope. Informed written consent will be requested for access to identifiable data from electronic medical records for the equivalent time period in SANAD II. Organisations including the HSCIC are specifically named in the consent form. Full, explicit details of the data flows and processing activities are detailed in the consent materials and form. HSCIC feedback has been sought in an earlier application. There are approximately 70 consented participants in this study. Data from consenting participants will be requested from electronic medical records held by specific ‘routine data sources’. The HSCIC HES data will be requested for participants resident in England. Data will also be requested from The Secure Anonymised Information Linkage Databank (SAIL) for participants resident in Wales and the General Practitioners for participants resident in the North West of England. The General Practitioners will be approached by the study team and if permitted primary care data will be transcribed in the practice by the Principal Investigator. Data from consenting participants’ electronic medical records will be requested from HSCIC on an identifiable, record level basis, with individual identified by NHS Number. The rationale for this is to allow linking of data regarding an individual from electronic medical records from all routine data sources to the data collected using standard methodology as part of SANAD II, in order to compare the datasets and permit the analyses. Data will be collected for the equivalent time period the individual has been enrolled in SANAD II and will be requested on one occasion only. Data will include medical, demographic and socio-economic variables. The NHS Number (and name and date of birth if required and indicated by HSCIC) to identify the consenting participant will be securely transferred from the Clinical Trials Research Centre, University of Liverpool to the HSCIC. Subsequently, data from participants’ electronic medical records provided by the HSCIC will be securely transferred to the University of Liverpool. In both cases, data will be transferred using the HSCIC Secure File Transfer (SFT) System. The consent materials and form explicitly permits these data flows in this study. Participants data from electronic medical records (accessed through HSCIC, SAIL and participants GP’s) will be securely transferred to the University of Liverpool Clinical Trials Research Centre and linked to the data collected as part of SANAD II in order to permit the intended analyses. The SANAD II Data Manager will perform this linking and will therefore receive and access the data from electronic medical records in the first instance. Following linking, the SANAD II Data Manager will pseudonymise the complete dataset with participants identified only by their Unique Study Number. At this stage the dataset will then be accessible to the study team members involved in the analysis. Therefore, all data from electronic medical records and SANAD II data collected using standard methods will be pseudonymised to all members of the team for this study. The SANAD II Data Manager, who must perform the linkage, will have access to the demographic variables of consenting participants’ and medical data but will not and will have no requirement to access participants’ medical data for the purpose of linking. Data regarding individuals received from all sources will be linked. Therefore, secondary care data received from the HSCIC HES datasets will be linked to data collected using standard methods in SANAD II. In addition, for a small subset of participants resident in the North West of England, data retrieved from General Practitioners will be linked to both HES data and data collected during SANAD II. This process is necessary to perform a full assessment of the agreement and additional benefits of routinely recorded data (from all data sources) compared to data collected using standard methods in SANAD II. All pseudonymised study data from electronic medical records and SANAD II will be stored using the University of Liverpool Research Data Management Service’s DataStore (http://www.liv.ac.uk/csd/research-data-management/storage) at all times. Data is stored electronically on University of Liverpool central servers, located in an access controlled server room and connected to the main University network, located behind a firewall. Physical access is limited to Computer Services Department staff. Data will be encrypted using industry standard techniques meeting the Information Governance Toolkit standard (8HN20). Data will not be transferred to an additional location. The PI for this atudy will act as data custodian. The University of Liverpool Information Security Policy and Research Data Management Policy provide further information regarding data security. The pseudonymised dataset will be accessed by specific members of the study team based in and employed by the University of Liverpool. Data will then be analysed to assess the following objectives: 1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: A narrative assessment of feasibility will be followed with a quantitative assessment of agreement between data from electronic medical records and data collected using standard methods in SANAD II. Agreement will be compared at the individual level. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data. Subsequently, relevant outcomes of the RCT will be examined. 2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II: An exploratory analysis will assess the additional benefits of accessing data from electronic medical records. The assessment of clinical efficacy, adverse events, health economic outcomes and methods to address missing RCT data will be examined. Where linked data are available, agreement will be compared at the individual level in the first instance. Methods to account for paired data would include Bland-Altman methods for continuous data and cross-tabulations and kappa statistics for categorical data. 3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: The relative value of accessing data from different sources will be discussed in the context of the prior analyses, including knowledge of the relationship between datasets and the assessment of methods of addressing missing data. The optimal ‘mix’ of data from routine sources and standard methods will be discussed. The potential impact on the data collection processes if SANAD II were to be repeated will be considered and the quantitative methods that could be used in future research proposed. All personal data in this study will be kept strictly confidential and will be handled, stored and destroyed in accordance with the Data Protection Act 1998.

Objectives:

This application for HSCIC HES data is part of a research study funded by the Medical Research Council Hubs for Trials Methodology Research (MRC HTMR). The study is standalone and independent, but is broadly part of a wider programme of funding involving research aiming to develop the use of health informatics, including electronic medical records in prospective medical research. Data regarding patients’ primary and secondary care is routinely recorded in electronic medical records by a number of organisations including the HSCIC. Such data retrieved from electronic medical records has demonstrated utility in clinical research. Electronic medical records have an established role in providing the dataset for retrospective, observational clinical and record linkage studies. In addition, in prospective studies, electronic medical records can provide useful, additional data that can inform analyses such as the long term assessment of mortality. Although there is a precedent for the use of data retrieved from electronic medical records in retrospective clinical studies and to a lesser extent in prospective studies, there is limited evidence of the attributes of such data when accessed to measure prospective outcomes as part of a pragmatic Randomised Controlled Trial (RCT). An assessment of data retrieved from electronic medical records in the context of prospective clinical research becomes particularly relevant where such data are now being used to conduct all stages of a RCT, including recruitment, intervention and follow up assessments, despite the feasibility, agreement, additional benefit and efficiency being unclear. This study will assess the feasibility, agreement and additional benefits of data retrieved from electronic medical records in measuring the objectives of a RCT. Subsequently, the efficiency and relative value of accessing data from electronic medical records compared to collecting data using standard RCT methodology will be explored. The electronic medical records will be requested from ‘routine data sources’, primarily the HSCIC but also The Secure Anonymised Information Linkage Databank for participants resident in Wales and the General Practitioner for participants resident in the North West of England, accessed through NorthWest eHealth. The study will directly inform the methodology of the NIHR Health Technology Assessment Programme funded RCT Standard and New Antiepileptic Drugs II (SANAD II) (EudraCT No: 2012-001884-64, ISRCTN Number: 30294119). For example, accessing electronic medical records for participants of SANAD II may positively inform the health economic analyses and methods to address missing data. This will subsequently inform the methods to be performed in the final trial analyses on completion of SANAD II in 2018, including the access and implementation of data from electronic medical records. Improving the completeness of SANAD II data and precision of the analyses will positively influence health and social care by maximising the value of data collected and outcomes in this publicly funded RCT. Furthermore, the outcomes of this study will indirectly inform the methodology of similar pragmatic RCTs in the future. The specific objectives in this study where access to electronic medical records held by the HSCIC will be requested are as follows: 1. Assess the attributes of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a randomised controlled trial (RCT), SANAD II: a. Assessment of the feasibility of accessing data from routine sources b. Assessment of the agreement of data from routine sources 2. Assess the additional benefit of data from electronic medical records compared to data collected using standard methods, relevant to the aims of a RCT, SANAD II: a. Assessment of clinical efficacy b. Assessment of adverse events c. Assessment of health economic outcomes d. Assessment of the methods of addressing missing RCT data 3. Assess the efficiency of data from electronic medical records compared to data collected using standard methods, in measuring the outcomes of a RCT, SANAD II: a. Assessment of the efficiency of procedures to access / obtain data b. Assessment of the efficiency of procedures to format data c. Explore the relative value of accessing data from routine data sources


Project 6 — DARS-NIC-311179-R5V5Y

Opt outs honoured: N

Sensitive: Non Sensitive

When: 2016/04 (or before) — 2016/08.

Repeats: One-Off

Legal basis: Informed Patient consent to permit the receipt, processing and release of data by the HSCIC

Categories: Identifiable

Datasets:

  • Hospital Episode Statistics Accident and Emergency
  • Hospital Episode Statistics Admitted Patient Care
  • Hospital Episode Statistics Outpatients

Benefits:

Past Benefits 1. Report for RCLCF This report is a requirement of the funding body to ascertain that adequate outputs are produced from their financial contribution and be accountable to the general public and its trustees as to exactly how and for what purpose voluntary public funding is being utilised in the name of Lung Cancer Research. It is imperative that the RCLCF is satisfied that the RCLCRP is constantly finding and utilising all information available to it to further develop and prove the accuracy of the LLP Risk Model when used as an early detection tool for Lung Cancer. Risk Prediction models incorporating multiple risk factors have been recognised as a method of identifying individuals at high risk of developing lung cancer. Thus accurate selection of high-risk individuals for lung cancer screening requires robust methods for prediction. The LLP has produced a risk model that has been utilised for identifying high risk individuals for screening in the first UK lung screening programme. As early diagnosis can save lives, the LLP have developed a new generation of risk model, the LLPi that may assist in identifying individuals at high risk of developing lung cancer using hospital episodes as surrogates of disease history. Unlike most risk models that are based on (biased) questionnaire data, the LLPi took advantage of the available hospital episode statistics data to corroborate questionnaire data for disease status. This resulted in the LLPi with a good calibration and c-statistic of 0.85 – one of the highest in lung cancer risk modelling. The Cancer Registry, HES and ONS data being made available to the RCLCRP is a fundamental aspect in aiding the development and testing of the LLP and LLPi Risk Model and developing new biomarkers for disease detection and management. Such data also allows further co-morbidity links and contributory factors to be investigated, analysed and reported on and thus enables patient recruitment strategy development for future research and further funding to be sought. The findings of the RCLCRP enables it’s funder, the RCLCF, to develop informed policy, target fundraising and influence UK Public Health by generating information and publicity campaigns to raise awareness of lung cancer. These individuals and public health professionals can make use of this information to take decisions and advise on lifestyle choices relating to an individual’s risk of developing lung cancer particularly if they have known pre-existing conditions or persist with current lifestyle. 2. Publications Publication is fundamental to the provision of evidence-based medicine and the delivery of an effective healthcare system. For example, these publications benefit the wider community of clinicians both when investigating the possible presence of Lung Cancer within patients and potential risk of Lung Cancer developing where certain health risk pre-cursors are present in their history, whether these be actual co-morbidity diseases already present or socio-demographic health patterns. They also demonstrate how the Risk Model could be used as a tool within public health organisations for preventative/deterrent purposes when patients are advised by their clinicians to make improvements to their health or change their habits to potentially improve life expectancy. The publications can also contribute to scientific knowledge on development and validation of biomarkers used to detect or differentiate lung cancer, for example: methods for detection lung cancer; characterisation of molecular changes in cancer cells; the nature of DNA mutation and methylation as a hallmark of different lung cancer sub-types. These results are put into clinically relevant context by the data we obtain relating to other diseases (HES), the incidence of cancer amongst previously healthy recruits (cancer registry) and outcome (ONS). Our publications also highlight the need to develop drugs to improve life expectancy timelines when Lung Cancer is detected. Delineation of different molecular classes of lung cancer is contributing significantly to changes in medical practice, leading to new targeted therapies. 3. Report for EU FP7 Funded Projects (LCAOS & CURELUNG) The detailed reports of the findings and impact of research to which the RCLCRP contributed are seen by EU Commissioners and Scientific Committees to inform on the effectiveness and impact of the work, appropriate utilisation of funding and further progress required to develop for implementation of the findings. These help to guide future policy decisions on research goals and investment (including the structure of the current funding scheme, Horizon 2020). Future Benefits The establishment of the LLP case/control cohort has provided an important resources that is internationally recognised and will continue to provide benefits in the future. The ongoing update of associated medical data will further enhance the utility of this research resource, for example: 1. The molecular biomarker group within the RCLCRP aim to utilise bronchial washings and/or sputum and/or blood to develop molecular assays for early diagnosis of lung cancer. The integration of HES, Cancer Registry and ONS data with the molecular data will allow us to improve the LLP Risk Model tool for application of personalised risk assessment alongside the development of future molecular assays (either targeting those at highest risk, or attenuating the results to account for known confounding factors). 2. Characterisation of risk factors for lung cancer is of considerable health and economic importance, as they can be used to inform prevention, screening and treatment policy. The group will continue to develop the Liverpool Lung Project (LLP) risk model for lung cancer and is identifying epigenetic and genetic biomarkers for early detection and prognosis of lung cancer. 3. The United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model will be used to help refine and improve risk assessment tools, providing more efficient targeting of screening populations and other interventions. The RCLCRP and associated HSCIC data provides an important corollary to the CT screening setting and an opportunity to publish comparative studies to inform the direction of lung cancer early detection. 4. Application of HSCIC data to clinical problems provides an opportunity for training and education of the next generation of scientists and medics. Anonymised HES data will be utilised for the academic training of future PhD students and other affiliated research scientists. This will promote innovation, exploit new technologies and produce world-class scientists that will contribute to the continued development of life science research, which provides an important economic driver and improves healthcare. 5. Specific benefits to the Health and Social Care system include: the use of molecular-epidemiological risk assessments prior to clinical diagnosis and markers of pre-clinical carcinogenesis in patients with a high risk of developing lung cancer will reduce the incidence of clinically detectable lung cancer, given the appropriate intervention strategies. Early detection research provides the most cost-effective strategy for improved mortality, as treatment at an earlier stage not only provides better patient outcome, but is cheaper in the long term.

Outputs:

Past Outputs 1. Annual Report(s) for funding body, e.g. The Roy Castle Lung Cancer Foundation (RCLCF) to identify type of research undertaken, recruitment statistics and specific research developments within the funding period. This report is seen by the RCLCF Executive and Scientific Committee and its Trustees to inform policy and quantify the benefit of future funding of the research programme. 2. The RCLCRP and its collaborators have produced many peer-reviewed publications in a selection of high-ranking journals As an example, publications during 2013 & 2014 that made use of ONS, MCCR or HES data included: • Contribution to a study that examined over 500 lung tumours for DNA methylation and demonstrated a prognostic DNA methylation signature for stage I Non-Small Cell Lung Cancer (NSCLC) (Sandoval et al., Journal of Clinical Oncology 2013 [47]) • Discovery and validated a microRNA expression signature that identifies NSCLC (Bediaga et al., British Journal of Cancer 2013 [48]). • Examination of the molecular genetic profile of carcinoid cancers, implicating chromatin-remodelling genes (Fernandez-Cuesta et al., Nature Communications 2013 [49]). • Aiding the definition of a genomics-based classification of human lung tumours (Clinical Lung Cancer Genome Project & Network Genomic Medicine, Science Translational Medicine, 2013 [50]). • LLP Biobank samples helped identify a new tumour suppressor gene for lung cancer (Gkirtzimanaki et al., Proceedings of the National Academy of Science USA 2013 [51]). • The importance of risk prediction models to lung cancer screening has been highlighted (Field et al., 2013 in Lancet, Lancet Oncology and Journal of Surgical Oncology [48, 52, 53]). • We have investigated factors associated with dropout in a 5-year follow-up of individuals at high risk of lung cancer in the LLP follow-up cohort (Marcus et al., International Journal of Oncology, 2013 [54]) and looked at the impact of co-morbidity on lung cancer mortality (Marcus et al., Oncology Letters, 2013 [55]). • Genome Wide Association Studies (GWAS) and epidemiology continue to provide a useful insight into lung cancer susceptibility (TRICL, ILCCO & SYNERGY publications): - Lung cancer risk among different professions (Behrens, Occupational and Environmental Medicine, 2013 [56]; Consonni et al., International Journal of Cancer 2014 [57]). - New methods for smoking assessment in lung cancer risk (Vlaanderen et al., American Journal of Epidemiology 2014 [58]). - A pooled analysis of case-control studies conducted between 1985 and 2010 (Olsson et al., Am J Epidemiol 2013 [59]). - SYNERGY – Welding and Lung Cancer in a Pooled Analysis of Case-Control Studies (Kendzia et al., Am J Epidemiol 2013 [60]) - Analysis of the relationship between second hand tobacco smoke and lung cancer histology (Kim et al., International Journal of Cancer 2014 [61]). - Associations of risk variants for other cancers with lung cancer risk (Park et al., Journal of the National Cancer Institute 2014 [62]). • Two publications have utilised anonymous HES data: - Marcus MW, Chen Y, Duffy SW, Field JK. Impact of comorbidity on lung cancer mortality - a report from the Liverpool Lung Project. Oncol Lett. 2015 Apr;9(4):1902-1906. - Marcus MW, Chen Y, Raji OY, Duffy SW, Field JK. LLPi: Liverpool Lung Project Risk Prediction Model for Lung Cancer Incidence. Cancer Prev Res (Phila) 2015 Jun;8:570-5. N.B. References of Publications listed above can be found in SD11 – Publication References. Much of this work has also been presented at major cancer conferences (e.g. NCRI Annual UK Meeting, American Association of Cancer Research Annual Meeting and World Lung Cancer Conference). 3. Report on the LCAOS & CURELUNG Projects (EU FP7 Collaborations): LCAOS - development of a Breath Test for the early detection of Lung Cancer; CURELUNG – the (epi)genetics of lung cancer. The selection process used to identify the cohorts for these studies included a knowledge of their health status, Access to HSCIC data for these individuals provided important information of their cancer and respiratory disease history which was utilised when at the sample analysis stage. For example in LCAOS, HES information for a particular patient whose lung capacity levels were low at the time of the sample being taken and enduring breathing difficulties may be shown some 12 months later to have had hospital episode which diagnosed a lung disease which may have been present at the time of the sample being taken. For CURELUNG, respiratory disease status informed risk-stratification analysis; outcome data was used to investigate the possibility of treatment stratification based on DNA methylation. 4. The RCLCRP has established, through the Liverpool Lung Project, one of the largest prospective lung cancer case-control and cohort population in Europe (>11,500 participants) with epidemiological, clinical & outcome data and specimens incorporated into the LLP Biobank. This is a resource that has been and will continue to be utilised for a wide variety of research projects, generating additional investment and providing opportunities for exploitation of results in the form of risk prediction models, biomarkers for cancer detection, characterisation of lung disease and identification of targets for treatment. 5. The RCLCRP was instrumental in initiation of the United Kingdom Lung Cancer Screening Trial (UKLS) utilising the LLP risk model. Professor Field is the clinical investigator of the UKLS and the trial was run from the University of Liverpool Cancer Trial Unit. Future Outputs 1. Reports: Further reports for grant awarding bodies will be produced. This will include reports in support of additional funding applications for further analysis, ensuring maximum utility and benefit from the data provided. 2. Publications: It is anticipated that the analysis from this study will be included in internationally renowned oncology, epidemiology and public health journals (in keeping with our proven publication record, above). Publications will be prepared for 2015, 2016, and 2017. 3. Presentations: In accordance with previous years it is expected that presentations will be given at major cancer conferences. These presentations will provide dissemination of results from ongoing studies of LLP Risk Modelling, Methylation, MicroRNA, Sequencing, etc. Nature of Outputs The LLP project provides detailed clinical outcomes together with the patient’s epidemiological questionnaires, complemented by the excellent HES data; in depth molecular-epidemiological LLP investigations into molecular biomarker groups and DNA sequencing projects. The majority of outputs will contain aggregate data only; very occasionally individual level data will be presented (e.g. patient characteristics for tumour samples analysed), but these will be coded and completely anonymised to prevent identification. No HSCIC linked record level data will be shared directly with commercial companies or third party organisations or included in directly in any outputs. In some instances the data will consist of anonymised, characteristic data linked to a sample shared for research purposes; e.g. it may state that “the sample was from a patient of 60 years old with a diagnosis of COPD present for 10 years who was diagnosed with lung cancer at 65 years and died of heart failure aged 70 and the patient had been hospitalised for COPD on 6 occasions”. All outputs are research outputs, not commercial, although some research is undertaken within a commercial environment (e.g. pharmaceutical or life-science/biomarker companies).

Processing:

All data processing of the original HSCIC dataset will take place at The University of Liverpool and be carried out by the RCLCRP IT staff at The (UoL) APEX Building (3rd Floor).. SQL queries will be written to extract selected data from the HES database. IT staff will link the extracted data to subject data held by the Roy Castle Research Programme; any patient identifiable data fields supplied by HSCIC will not be made available to researchers. SQL is used to anonymise the data by linking them to unique patient identifier (MPI). Anonymised data are then imported into statistical software The clinical database used within the RCLCRP has data for 14,000+ subjects; all data is held securely (with additional password protection) and accessed only by trained personnel, in compliance with the University of Liverpool Data Policies. These records have NHS number and a unique identifier. These identifiers will be used to identify subjects in the HES dataset, but only the local code will be used to identify subjects in any extracted data. Additionally subsets of the data will be exported, anonymously, and used with statistical software at the University of Liverpool. Data used in the subsets relates to the health status (comorbidities), previous disease history or outcome (death, subsequent disease) of subjects who have provided informed consent and donated samples and/or lifestyle/clinical history to the LLP (RCLCRP). Data on patient identifiers or dates relating to any episode/event are not shared. The most frequent user of the data is the statistician employed on the LLP (RCLCRP) studies at the University of Liverpool, although other university researchers also have access to the anonymous data associated with participants in their studies. However, these researchers only have access to anonymous data extracted previously by the LLP (RCLCRP) personnel as part of approved research studies associated with the LLP (RCLCRP). The purpose of all uses of the data is the same (the study of lung disease) as set out in the ethically approved study documentation. The HES and ONS datasets will not be shared with a 3rd Party; extracted anonymous data will only be released to research collaborators following informed consent and ethics approval, release will be covered by Material Transfer Agreement (MTA), in accordance with local and national guidelines. The data is not accessed directly by the external researchers. Providing that subjects have consented to use by external collaborators then specific anonymous data (extracted by the LLP (RCLCRP) IT and statistical staff may be released to external researchers (typically as part of a larger dataset) following approval of a Material Transfer agreement by the study Sponsor (The University of Liverpool) and approval of the specific collaborative study by the local NRES ethics committee. All researchers using anonymous data belong to recognised research institutions or registered commercial companies covered by a Material Transfer Agreement. A list of recognised research institutions or registered commercial companies (strictly those for which the University of Liverpool RCLCRP have MTA’s in place) are listed within SD10 – LLP Collaborators. The purpose for which data will be shared within the MTA agreements is individual to each MTA/organisation with which the MTA agreement is in place and is always for research purposes. The individual level data which may occasionally be presented to one of these organisations may be for example a sample of blood or tissue with the shared anonymised data that the sample was from a patient of a particular age, who had perhaps encountered a number of episodes of hospitalisation for e.g. COPD or another condition. The data may divulge the age in years, number of hospitalisations for investigations for e.g. lung disease, or perhaps that the sample subject has a diagnosis of lung or another cancer and the number of years cancer present within the sample. Death related data would be limited to age of death or survival period from a specific treatment or diagnosis. In short an anonymised timeline of medical history may be the kind of data shared in association with the human material, but this would be devoid of dates or potential patient identifiers. The high incidence and mortality of lung cancer helps ensure that it is very unlikely that anyone would be able to identify an individual from the nature of the data presented, but care is always taken to ensure that this is the case, especially in publications (where data aggregation is the norm). Geographical (e.g. postcode) are always aggregated and provider data is not a focus of the research. Data released might include disease or comorbidity status derived from HES or outcome/death status derived from ONS along with data about the subject or samples collected by other legal means (with the consent of the subject) such as case note review. However, this is never provided with any personal identifiers or dates attached, so no link to the initial HES/ONS data or to any individual can be made by the researcher using the data supplied. Data format consists of an encrypted, password protected data file in a recognised database or statistical software file format. Data provided to external collaborators is totally anonymous and Confidentiality is governed by a number of clauses in the MTA. Under no circumstance would any third party organisation or employee (within a UoL MTA agreement) be able to link any identifiable patient data to material or data shared by UoL RCLCRP. Data is not always aggregated, but is sufficiently coded to prevent identification of individuals (data stripped of personal identifiers before use & in any representation). This de-identification meets the requirements outlined within the HES Analysis Guide March 2015. Data is often, but not always, aggregated however, even on occasion when data is not aggregated it is still compliant with the March 2015 HES Analysis guide, in particular Sections 4, 5 and 6. In a similar way to the establishment of a PSEUDO_HESID, (as stated within the HES Analysis Guide), the UoL RCLCRP MPI No. is used within the RCLCRP study when the HSCIC data is received by the HSCIC authorised IT employee and utilised by the statistician. Similarly, when samples or data is shared with any other organisation, this UoL RCLCRP MPI No. provides a link that can only be used by RCLCRP staff to integrate data. Therefore, no patient can be linked to any of the data received other than within the UoL RCLCRP by approved staff operating within the UoL data governance framework. Only those UoL employees listed to HSCIC are able access the data. At no point is any of the HES, Cancer Registry or ONS data used by UoL RCLCRP employees to demonstrate linked patterns of Hospital Admissions to Cancer rates or death statistics.

Objectives:

Over the years, the Roy Castle Lung Cancer Research Programme (RCLCRP) has been at the forefront of ground breaking research in early detection of lung cancer. Lung cancer is the leading cause of cancer-related death in most developed countries, with mortality rates exceeding that of colon, breast and prostate cancer combined (Jemal et al, 2010; Siegel et al, 2011). Given that more than 94% of the patients diagnosed with lung cancer in the UK die of the disease within five years, the primary objective is to detect lung cancer at an earlier, potentially more curable stage (5-year survival rate of stage IA tumour is ~70%). Lung cancer is predominantly a disease of the elderly, with an average age at diagnosis of around 60-70 years, and often presented very late at an advanced stage (Alberg et al, 2007; Dela Cruz et al, 2011). Although the pathogenesis of lung cancer is not yet fully understood, researchers have suggested the potential role of the occurrence of concomitant diseases in the aetiology of lung cancer. Due to increasing longevity and rapid ageing populations, the number of people with more than one comorbid conditions is expected to increase sharply in the coming decades (van den Akker et al, 1998; Yancik et al, 2001). This increase might lead to an increase in the incidence of lung cancer and the comorbidity burden might lead to increase overall and/or lung cancer-specific mortality. To this end, a documentation of previous history of diseases is essential for exploring the impact of comorbidity on lung cancer. A rich source of data for exploring the potential role of comorbidity in lung cancer pathogenesis is the Hospital Episode Statistics (HES). The Liverpool Lung Project (LLP) intend to link details of all admissions, outpatient appointments and accident and emergency (A&E) attendances of all participants in the LLP at NHS hospitals in England to the epidemiology data gathered through detailed questionnaire for all LLP patients. In addition, all information gathered will be linked to the ONS data to study the mortality patterns of all participants in the LLP. The in-house database system will be used to collate all data and the output of the analysis will be documented in scientific literature.