NHS Digital Data Release Register - reformatted

Office For National Statistics (ons) projects

997 data files in total were disseminated unsafely (information about files used safely is missing for TRE/"system access" projects).


🚩 Office For National Statistics (ons) was sent multiple files from the same dataset, in the same month, both with optouts respected and with optouts ignored. Office For National Statistics (ons) may not have compared the two files, but the identifiers are consistent between datasets, and outside of a good TRE NHS Digital can not know what recipients actually do.

Maternity Services Data Set Legal Notice — DARS-NIC-631605-F2H3M

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable, No (Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 - s261(5)(d); Other-Statistics and Registration Services Act (2007) Section 45C

Purposes: No (Agency/Public Body)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2022-11-28 — 2025-11-27 2022.12 — 2024.02.

Access method: One-Off, Ongoing

Data-controller type: OFFICE FOR NATIONAL STATISTICS

Sublicensing allowed: No

Datasets:

  1. MSDS (Maternity Services Data Set) v1.5
  2. MSDS (Maternity Services Data Set) v2.0
  3. Maternity Services Data Set (MSDS) v1.5
  4. Maternity Services Data Set (MSDS) v2

Outputs:

The outputs expected from this work include a range of outputs such as statistics, statistical reports, presentations, methodology papers and analyses to carry out ONS’ function to produce statistics for the public good. Should outputs contain any data, this data will be aggregated with small numbers supressed.

Any official statistics produced will be published so that they are publicly available, for example on the Office for National Statistics website, in statistical bulletins, analysis articles or in data dashboards.

Projects focused on new topics or new data sources will include exploration of the feasibility of using and/or linking the data as well as quality assessments of the data and linkage prior to full development of analysis and publication of statistics. Briefings for technical/expert audiences or methodology reports may be written for internal use in the development and assessment of new data sources and analysis. These reports will be published alongside experimental statistics or official statistics on the ONS website or in peer-reviewed journals as appropriate.

All ONS statistical teams engage regularly with users and will seek to provide frequent updates on the production of official statistics through ONS’ regular stakeholder conversations, presentations and events.

ONS have a publishing strategy and team which supports analytical teams to develop different outputs to suit a range of different audiences. ONS also has active communications, media and social media teams to promote dissemination of statistics to reach as wide an audience as possible.

Health and Social Care:

ONS aim to integrate a functionally anonymised version of the MSDS into the existing Public Health Data Asset, along with births registrations and birth notifications data, within a few months of receipt of the MSDS to support the COVID-19 and pregnancy work.

In relation to child and maternal health, the first phase of work on the infant mortality risk factors is due to be completed by Autumn 2022 after which time MSDS could be integrated into the linked dataset for analysis and reporting in 2022/23.

For the wider work on health and social care statistics ONS are looking to improve and develop new statistics using newly linked data to investigate the purposes outlined within section 5a. No specific target dates can be given as ONS will not know if and when experimental or National Statistics will be produced using MSDS data for these purposes until the initial stage of exploration and feasibility are complete.

Population and Census Statistics

Population and census statistics development requires data to be available by November 2022 to enable initial analysis to be complete by the end of December 2022. There is a need to have this information written up for the 2023 National Statistician’s Recommendation on the future of population and social statistics. Work will however continue beyond 2022, as methods and data sources are further developed.


ONS / NHS Digital TRE Public Health Asset — DARS-NIC-420710-X0H1P

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable (Statutory exemption to flow confidential data without consent)

Legal basis:

Purposes: No (Agency/Public Body)

Sensitive: Sensitive

When:DSA runs 2021-02-18 — 2022-02-17

Access method: One-Off, Ongoing

Data-controller type: HEALTH & SOCIAL CARE INFORMATION CENTRE, OFFICE FOR NATIONAL STATISTICS, NHS ENGLAND - X26, OFFICE FOR NATIONAL STATISTICS

Sublicensing allowed: Yes

Datasets:

  1. Emergency Care Data Set (ECDS)
  2. Hospital Episode Statistics Accident and Emergency (HES A and E)
  3. Hospital Episode Statistics Admitted Patient Care (HES APC)
  4. Hospital Episode Statistics Outpatients (HES OP)

Objectives:

The objective of this application is to seek permission for ONS to make an anonymised version of an existing dataset it holds containing NHS Digital data (called the ‘Public health data asset’) available for use by approved researchers in its Trusted Research Environment. This data asset (to be called the Public Health Research Database) includes a number of underlying data sources that have previously been linked at a record level for statistical purposes. The core features of the asset are as follows.

ONS currently has approved access to NHS Digital controlled identifiable data for its functions (essentially the production of official statistics). For the HES data, the legal basis is section 45C of the Statistics and Registration Services Act (2008), as amended by the Digital Economy Act (2017). Full details are available in this linked Data Sharing Agreement.

Note that no new data will be disseminated under this agreement – ONS will use the HES data already being disseminated under DARS-NIC-175120-W5G2X . Details on this data are included within this agreement to set out the intention, but the sub-licencing (and making available to other researchers) currently only applies to the HES data shared under NIC-175120.

NIC-175120 includes birth notification data, HES data and Improving Access to Psychological Therapies data. The purpose of the data sharing agreement is for ONS, as the executive arm of the UK Statistics Authority (UKSA), to carry out the production of official statistics. Legal basis is Section 45C of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act 2017.

A key part of the processing of these data for the production of statistics involves linking them at a record level to ONS held 2011 Census and mortality data. This linkage allows ONS to create a person level health based research dataset that includes a large proportion of the England and Wales population who were present in 2011. It includes: their characteristics as recorded on the 2011 Census, if they have subsequently
died, cause of death (including if from COVID-19 this year), and what underlying conditions they have/had using evidence from the event based HES data.

The HES data also allow ONS to include hospitalisation from COVID-19 as an outcome (when they subsequently recover).

This subsequent linked data asset is termed the “Public health data asset’. For internal ONS purposes only, it also includes person level information on comorbidities derived from the event based GP Extraction Service Data for Pandemic Planning and Response (GDPPR).

The approved ONS statistical purposes for the data in the DSAs with NHS Digital include using this powerful linked data to model the risk of COVID-19 mortality and morbidity based on the other socio-economic and health information in the linked dataset. As at December 2020, ONS had already released outputs based on the data that were important for decision making and public debate around the pandemic, notably the relative risk of
COVID-19 mortality across different ethnic groups.

However, there is far more insight that could be gained from this powerful linked ‘Public health data asset’ than can be reasonably researched by ONS on its own. Such missed insights could be of importance in decision making in the fight against the pandemic, and as such could ultimately save lives. It is therefore clearly in the public interest that these data be made available to approved researchers as long as the data can be made available in an anonymised form that is proportionate, and minimises data protection risks.

Therefore, this agreement permits the additional data processing required to ensure that the pseudonymised data held by ONS is suitably transformed ready for access in a TRE by approved applications and researchers, such that the data is anonymous (and therefore non-personal) in the hands of the researchers given the technical and contractual controls that ONS apply to the manipulated data asset. Such external research will be limited to statistic research for COVID-19 purposes.

The resulting dataset (the Public Health Research Database) will be held within the ONS Trusted Research Environment (TRE) and be jointly controlled by ONS and NHS Digital, with a sub-licence issued by NHS Digital setting out the basis for the creation and onward sharing of the ONS asset by ONS. Although the application process will be managed by ONS, it has been developed in collaboration with NHS Digital. And the agreed process flow will include the referral of all applications to NHS Digital for consent under the terms of the Statistics and Registration Services Act 2007, in conjunction with approval by ONS. Only once approval has been granted by ONS will access to the data be given.

In terms of data minimisation, ONS already minimised the HES data that it holds under its own DSAs for statistical purposes, to the years and variables needed to achieve its statistical goals. These variables used by ONS did include personal identifiers to enable the linkage of those data to the 2011 Census and mortality data. But such variables are not needed by ONS analysts post linkage, and have been removed from the linked dataset being used by ONS analysts. – i.e. only the necessary data linkers in ONS access these identifiers. These identifiers will obviously remain absent in the Public Health Research Database.

As at December 2020 Data Sharing Agreement DARS-NIC-175120-W5G2X provides the following data periods:
• Hospital Episode Statistics (HES) APC - record level identifiable 2009/10 through to 2020/21.
• HES OP - record level identifiable 2009/10 through to 2020/21.
• HES A&E - record level identifiable 2009/10 through to 2019/20. (discontinued April 2020).

In addition, only the relevant information needed to achieve ONS’s statistical coronavirus response has been / will be drawn across from other datasets such as Census, Mortality, HES and GDPPR, into the Public Health Data Asset being used internally by ONS analysts for its statistical purposes. In terms of the functionally anonymous Public Health Research Database that will only include some of the information on this internal data asset, the following have been derived and will be on the first iteration of the Public Health Research Database made available under this agreement:

Only derived ‘yes/no’ variables that indicate whether there is evidence of a subject having suffered from a particular condition in the past based on their HES records within the last 3 years. This is only a small set of comorbidities (ie 28 of these yes/no variables for broad conditions relevant to risk of poor outcomes from COVID-19 such as cancer, COPD, asthma, diabetes).

There already is (and in future will be) additional information in the internal linked data (the Public Health Data Asset) that ONS has produced for statistics.

These are not yet part of the Public Health Research Database, but they may be once such information has been fully integrated, QA’d, and confirmed as statistically useful by ONS within its internal version of the data (the Public Health Data Asset).

The types of this additional derived information is set out below, but to be clear at this stage, this is just to flag they may be added to the Public Health Research Database in future. In the case of items 1-3, these are derived from the HES and will be taken to already be covered by this agreement, should these additions be made.

Items 4 and 5 are/would be information derived from other datasets and therefore would require an amendment to this agreement before they could be added:

1. Additional binary comorbidity information for further conditions where there is a link to coronavirus. See box 1 here for a longer list of comorbidities:
https://www.bmj.com/content/371/bmj.m3731

2. Currently, the time related element of the comorbidity variables is simply whether there is any evidence of that condition in HES within the last three years (as at March 2020). However, if and when research suggests it could be related to outcome, then additional variables may be developed for each comorbidity to provide more information on how recently there is evidence of that condition. For example, evidence of that condition in HES within the last three months (as at March 2020), within the last 6 months, 12 months, 2 years, etc. This will always maintain at least a month’s separation.

3. Evidence of being hospitalised but then recovering from COVID-19 will be derived from the HES event based data for March 2020 onwards and added as person level variables to the linked research dataset. This will allow exploration of severe illness (but not death) as an outcome. For ONS statistical purposes conducted by ONS analysts on its secure systems (under its existing DSAs), then this will include the specific date of admission and date of discharge as these will be important for survival analyses. However, for any version of the linked data to be made available under this agreement, then ONS will engage with NHSD on the appropriate level of granularity to ensure the data remain functionally anonymous. For example, limiting the information to month of admission and month of discharge. Other derived person level COVID-19 hospitalisation information will include whether a patient was admitted to ITU during their stay and if so, for how long, and other treatments they received. As such NHS Digital data will be heavily derived before being made visible to researchers.

4. Similar person level comorbidity variable as described for HES above – i.e. ones relevant to risk from coronavirus – but derived from the GDPPR data. This will give a more complete picture of comorbidity, particularly where relevant conditions tend to be managed solely through primary care.

5. Vaccination data, test and trace data, and ONS COVID Infection Survey data. Of these, only the vaccination data will be NHSD owned data. The Test and trace data are owned by DHSC and the CIS data are owned by ONS.

The data will be processed by ONS under GDPR article 6 (1) (e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law and article 9 (2) (j): Archiving, research and statistics (with a basis in law). Processing includes creating the Public Health Research Database and ONS granting access to the Public Health Research Database.

Both ONS and NHS Digital recognise their role as joint data controllers (including for example the creation of relevant DPIA and joint controller arrangements which sets out their respective responsibilities as joint controllers) and are committed to ensuring that appropriate transparency information is put in place.

Expected Benefits:

Like with section 5c, it is not possible to be specific as the final public benefits until one or more projects are approved to use the sub-licensed, joint controlled linked data in the ONS TRE. However, the design of the dataset itself is specifically aimed at enabled research into health in the powerful socio-economic context of the 2011 Census data. It is also specifically aimed at providing data that is of direct relevance to the COVID-19 pandemic. Project applications must have a specific health and COVID-19 purpose focus. And as a result, the results will be used to inform pandemic decision making and increase understanding of the COVID-19 pandemic and its effects over time.

Outputs:

The outputs/outcome of this application will simply be a sub-licensing agreement that sets up a framework within which ONS can, in theory, make an anonymised version of its Public health data asset available to approved projects run by approved researchers with the consent of NHS Digital.

Only once an approved researcher makes an application through the ONS approvals process agreed with NHS Digital, will it become clear what the research outputs that will make use of the data will be.

Processing:

As described in section 5a, ONS holds the following NHS Digital data and these have been used / will be used to derive person level health information that has been added / will be added to the research dataset that ONS are referring to as the ‘Public health data asset’.

As at December 2020 the datasets and periods available to ONS for the Public Health Asset are

• HES APC - record level identifiable 2009/10 through to 2020/21.
• HES OP - record level identifiable 2009/10 through to 2020/21.
• HES A&E - record level identifiable 2009/10 through to 2019/20. (discontinued April 2020).

The Public health data asset is a person level dataset that includes information from over 50 million respondents to the 2011 Census, including their age, sex, ethnicity, occupation and disability stats (as at 2011). It also includes their date and cause of death where relevant, and where successful linkage has been made to ONS mortality data that covers from 2011 Census date to virtually up to date. Once linked to derived person level health data (as described at 5a), the linked data are deidentified and for the purposes of this application, will be transferred to the ONS TRE.

All ONS processing of the data for the statistical purposes covered in other DSAs is performed in the ONS secure data platform that is for internal use only. Details of the transfer of the data from NHSD to ONS, the security features of the ONS internal and secure data platform, and how access to identifiable data by ONS analysts is minimised, are described in those DSAs.

Those flows and associated processing will not need to be repeated to achieve the objective of this application. All that is required is to further deidentify the linked data that ONS analysts are using (if indeed further deidentification beyond that already completed is required). This will then allow the linked data to be made available in the ONS TRE where it will be accessed by researchers as anonymous in context.

No attempt is made by ONS employees to re-identify subjects once the linked data have been deidentified; ONS analysts and processors are only interested in population level patterns and insights for the public good.

Once pseudonymised data have been securely transferred to the ONS TRE team by secure electronic data transfer, the data will be processed as follows by the ONS TRE team:

The ONS TRE is administered by a specialist team, and includes the following procedural and technical controls to ensure data are kept secure:

1. The data will be transferred to the TRE via Secure File Transfer. The methods selected for Secure File Transfer are in line with ONS guidelines for securely transferring data.

2. ONS will provide access to the de-identified linked data to approved researchers under s39(4)(i) of the SRSA. This included issuing a Research Code of Practice and Accreditation Criteria which sets out the criteria for the accreditation of researchers and established the Research Accreditation Panel (RAP) to independently accredit researchers. Each research application will be assessed and approved by RAP and ONS will require consent from NHS Digital under SRSA before it grants access to an approved researcher.

ONS will also refer all applications for access to linked data that includes HES data to NHSD for their consent under SRSA via an information governance approvals process agreed with them under the sub-licence agreed via this application.

External researchers are only allowed access to the data once their applications have been accredited by RAP, and NHSD have also given their consent. If/when any linked data are made available that includes the test and trace data controlled by DHSC, then ONS will agree an approvals and sign-off process, and make amendment to the data processing agreement with DHSC for those data, as required.

3. Access to the data will be limited to:

• Approved researchers under the approved researcher framework who are carrying out statistical research for COVID-19 purposes only.
• ONS support staff who have appropriate training and security clearance (at least ‘Security Clearance’) to access the data to review and prepare it before making it available to researchers. These support staff also check the research outputs to ensure that they are safe to publish.

4. All data being ingested into the ONS TRE undergoes registration as an information asset and is assigned a formal Information Asset Owner within ONS, even if it originates outside ONS. In addition, the data are assessed by the Information Asset Owner from a sensitivity perspective to ensure they are handled appropriately, in line with ONS Security Policy and Practice. The data sensitivity assessment is based on the content of the data and takes account of the amount of Personal Data contained within each dataset (including any Special Category Personal Data), as well as any conditions of data use specified by the supplier.

The servers used to store data, and to host the analysis environment are located within a Pan-Government and National Cyber Security Centre (NCSC) Accredited (PGA) data centre, based in the UK. Data in use are stored in a file format compatible with most statistical software packages available in the ONS TRE. The data are stored on an encrypted drive before it is loaded in line with government security standards. Once in the TRE it is stored in a data holding area accessible only to selected and security cleared ONS support staff. When placed in project folders, for access by the researchers, the data are made available as a read-only copy ensuring that researchers cannot edit or tamper with the original dataset in any way.

The ONS TRE does not, as a standard, provide a way for a researcher to ingest their data. The researcher must explicitly specify in their project application what data they wish to ingest for their specific project, and ONS will then assess the application and provide specific permission on a case-by-case basis with consent from NHS Digital. The frequency of this occurring is likely to be low based on other ONS TRE past usage.

5. Deletion: ONS will destroy the data in line with the NHS Digital standard for data destruction.
If NHS Digital or ONS wish to withdraw a dataset from the ONS TRE, ONS will delete the data and remove the data catalogue entry. ONS will jointly agree how to deal with active projects using their data and act accordingly. As soon as all data dependencies are addressed ONS will destroy the data.
For individual research projects, after the end of the research project, the specific data used in that research will be kept for a period of 2 years to allow validation of the research results and then it will be destroyed unless ONS specifies otherwise. At that time the project is moved to a data archive. ONS will destroy the data from the archive after 5 years. ONS will make use of the exemptions available for processing data for statistical purposes to allow ONS to use these retention periods.

6. Minimisation: The project accreditation process together with the technical controls in the system ensure that the minimum necessary personal data is made available by ONS to each researcher to achieve the stated research outcomes. This is achieved by restricting researcher access to their own project storage areas, and ensuring that only a limited number of ONS support staff are able to transfer pre-approved data into those folders.

7. Frequency of processing: Normally, ONS request data controllers re-confirm their approval for ONS to hold the data on an annual basis. At this stage ONS request any new iteration to the data which would require further processing. New versions of datasets are received from NHSD according to the related agreements. In the case of this Public health data asset, the linked data will be updated regularly; at least quarterly, and if deemed appropriate monthly, because of the fast moving nature of the pandemic. ONS will be updating the linked data monthly for its own statistical purposes monthly anyway, because new mortality and HES data in particular, become available this often.
If a data controller informs ONS that their data have to be corrected or amended ONS will respond to these requests as quickly as possible.

8. Technology used: ONS use well established statistical techniques based on advice from ONS experts to prepare the data. ONS only approve the use of new software after the Security team assesses the technological and technical risks and no software used in the ONS TRE is able to connect to the internet, and ONS will agree the use of such tools in relation to the jointly controlled data asset with NHS Digital.

Processing of data is done on secure infrastructure, which meets government security standards and complies with ISO27001
In addition, an overall security management regime operates across the ONS TRE for risk assessment, tools management and data management. This is in accordance with the overall ONS Security Framework and supported by governance, policy, process and security operating procedures.

The ONS Security and Information Management team operate a continuous security assurance programme for security controls that are implemented within key business functions such as the processing of personal data. This programme covers information technology operations, corporate governance (including business security clearance implementation), physical security and information management.

Within the TRE ONS use the Five Safes approach to ensure safe processing of data. ONS and the TRE have been accredited under the Digital Economy Act:

Safe People:
Researchers who request access to data are vetted. Their experience and qualifications are scrutinised. Only those applicants that demonstrate their suitability to handle personal data then undergo a rigorous training course focusing on safe behaviours, attitudes, ethical considerations, their obligations within law and statistical methods to ensure research outcomes do not identify respondents within the data. After the training course, researchers also undergo an assessment test.
Researchers also, as part of their application, must be endorsed by their organisation. The organisation signs an agreement to support that each researcher will behave and adhere to the controls in place before access to data is granted.

Safe Projects:
Data is only made available for specific research purposes where data owners give their consent. At all times, a data owner can impose conditions of access, including location of access and how outputs will be checked. A robust and independent governance and scrutiny process is in place to ensure a clear public benefit from the research use of their data is demonstrated. Ethical aspects of the use of data are also considered with further scrutiny by the National Statistician’s Data Ethics Advisory Committee available if required. Research use of data always will adhere to the agreed purpose and controls are in place to ensure any deviation from the agreed purpose are dealt with through the SRS breaches policy.

Safe Settings:
The environment in which Approved Researchers gain access to data for their approved projects is a key element of ensuring safe and secure access to data. Security controls are built into the heart of the technology platform. A dedicated security team have tools to monitor all access to the environment in real time. Forensic controls and security applications record every mouse click, keyboard stroke and screen shot of all access to the system, from the researcher, right through to the administration team. Logs are captured that detail who has tried (or failed) to log on to the system and from where. All activity is recorded and checked.

With the expansion of ‘remote access’ to approved organisations across the secure internet, it is essential to ensure that access is only granted to those researchers from approved locations. All access is monitored and any suspicious activity will be immediately flagged for investigation by the security team. Every organisation is vetted to ensure appropriate security controls are in place before organisation connectivity is granted. Organisations must sign an organisational agreement detailing how access to the platform will be managed from their premises. Auditing and site visits will be conducted to ensure access is only granted from approved locations. Additional technical controls will be implemented to mitigate against the small risk of access from unapproved locations. Every researcher must sign an additional System Operating Procedure document, spelling out their obligations to only access the service from approved locations. Each researcher is given specific multifactor credentials to ensure only approved researchers can gain access to the platform. Realtime monitoring of malicious or intruder access will be implemented. No ability to access the internet outside of the TRE or any ability to remove data is a fundamental principle of the technology platform.

Safe Data:
At the point a research team is given access to the TRE technology platform – technology controls are in place to ensure only those data they are approved to use are made available. All data are deidentified and proportionate (minimised) to the agreed purpose, in support of the statement that the data is anonymised in the hands of the researcher.

Safe Outputs:
Any information removed from the TRE follows a strict governance process to ensure it is not possible to identify a respondent from the output. Specially trained statistical officers check and double check each output to give data owners the assurance that the use of their data are controlled and confidentiality of respondents is protected at all times.

Trusted Research Environments (TREs) such as the ONS Secure Research Service (SRS) have been used for many years, to enable researchers access to data that in any other setting may be considered personal, while ensuring the confidentiality of data subjects at all times, and full compliance with all relevant data protection legislation. This is done by ensuring that any Personal Data are Anonymised (and no longer considered personal data) when used by the researcher, by using the extensive controls within the Five Safes Framework to ensure it is not reasonably likely that any data subject will be identified during or after their analysis.


Use of PDS data to support the linkage of data required to inform statistical analysis of factors associated with the COVID-19 pandemic, by providing NHS number where the quality or completeness of personal identifiers are otherwise insufficient. — DARS-NIC-413717-C8Y6K

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable (Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 – s261(2)(c), Health and Social Care Act 2012 – s261(7); Other-Section 45a of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act (2017)

Purposes: No (Agency/Public Body)

Sensitive: Non-Sensitive

When:DSA runs 2020-12-01 — 2021-06-30

Access method: One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No

Datasets:

  1. Personal Demographic Service

Objectives:

The Office for National Statistics currently receives data from the NHS Patient Demographic Service under section 43 of the Statistics and Registration Service Act 2007. These data are used in the production and publication of statistics on the number and condition of the UK population.

Since the start of the Coronavirus-19 pandemic, ONS has been frequently requested to provide urgent statistical support to contribute to the wider understanding of the virus, helping to inform a range of policy decisions taken by central government, health services and others. For this purpose, the ONS has relied on data either owned by ONS, acquired through its statutory powers, or provided to ONS under COPI Notice for this purpose. These data sources also include the initiation of new data collection instruments to increase understanding of the virus, for example the recent launch of the Coronavirus Infection Survey, a random sample-based household study of COVID-19 incidence.

Because of the range of data being required to inform analysis on Coronavirus, ONS frequently needs to draw on multiple datasets in combination. In some circumstances this includes datasets where individual identifying details are incomplete or not of sufficient quality for data linkage. In the aforementioned Coronavirus Infection Survey (CIS) for example, identifying details are collected from all participants, however the quality of linkage to other health sources is limited by the lack of NHS-number on the study. Such data is not sought from participants themselves as many people do not know their NHS Numbers.

Participants in the CIS do so on a voluntary basis and are informed of the intent to use these to link their data from the study with other data sources (see study protocols which can be found here: https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/protocol-and-information-sheets). Participants do not sign a consent form giving explicit permission for their survey responses to be linked with their health data but the intention for such linkage is clearly described in the Participant Information Sheets and further information is provided when each participant is visited at home when the survey is carried out in person. Permission to link the data is therefore considered implicit in participants’ voluntary participation

Under this Agreement, ONS is permitted to use the Patient Demographic Service (PDS) data to support linkage between the CIS dataset and other health data sources where common personal identifiers are incomplete or of low quality, in order to produce higher quality linkage outcomes specifically and only for COVID analytical studies which require the combination of data from multiple sources.

The use of PDS data for linkage purposes is approved only in circumstances where each of the following conditions apply:
1. The objective is to link CIS data with a health data source containing NHS Numbers and the PDS data will be linked with the CIS data to obtain a complete set of NHS Numbers to enable linkage of the CIS data with the health data source using the NHS Number;
2. The purpose of the linkage of the CIS data and other health data source(s) has been requested by SAGE for COVID-19 related purposes and the request was received via the National Statistician, a member of SAGE;
3. The PDS data will only be used for the purpose of facilitating the data linkage and will not be used in any subsequent analysis or outputs;
4. ONS will only link other health data sources with CIS data with the agreement from the supplier of that health data source and will ensure that all uses of the CIS data is transparent and in line with the reasonable expectations of the data subjects who participate in the CIS.

No clinical data is being requested under this Agreement and no additional supply is needed since a source of PDS data is already provided to ONS under an existing Data Sharing Agreement (reference: DARS-NIC-20951-D2K6S). The following list of PDS variables would be used to assist in COVID linkage work:

• NHS Number
• Family Name
• First Given Name
• Other Given Name
• Gender
• Date of Birth
• Type of Registration
• Postcode
• Address Line 1
• Address Line 2
• Address Line 3
• Address Line 4
• Address Line 5

ONS will also utilise administrative data such as the ‘Business Effective From Date’ which indicate when changes to the demographic data in PDS were updated.

Data security for storage and linkage of the data will be provided with an assured ONS data analysis environment (DAP) that includes the following elements of security control:

• Need To Know applied through user account access and management
• Controlled ingest and export of data into/out from the environment
• Controlled account access using unique credentials based on job role
• Logged and monitored access of user activity within the environment
• Secure build configuration for infrastructure, including cloud services
• Vulnerability tested infrastructure with appropriate remediation and patching
• Compliance checks against security enforcing controls
• Architectural review against standards and best practice
• Staff security cleared to the appropriate level based on their supervised and/or unsupervised access to sensitive data in accordance with ONS clearance policies and data access processes
• Education and awareness of environment users covering security policies and secure working practices
• Operational support processes to securely manage the environment
• Risk assessment to identify security risks and mitigation actions to reduce this risk.

ONS employs rigorous disclosure controls and access restrictions ensuring that physical, technical, procedural and personnel security is kept to the highest and most up to date standards. Following policy specified by the ONS Chief Security Officer, users will be granted supervised or unsupervised access following a clearance application. The ONS Information Asset Owner (IAO) grants access and a list of all authorised users is available on request.

The Coronavirus (COVID-19) Infection Survey is carried out by ONS and sponsored by the University of Oxford. The study involved collaborations from other organisations including IQVIA, The National Biosample Centre, the University of Manchester, Public Health England (PHE) and Wellcome Trust. The University of Oxford and ONS are joint Data Controllers for any data collected as part of the CIS and will be jointly responsible for any decisions to link the CIS data with other health data sources.

However, ONS will be the sole organisation responsible for determining the need to use PDS data and the manner in which the PDS data will be used and is therefore considered to be the sole data controller for the data under this Agreement

This is a task in the public interest. This work is of critical priority across Government as part of the UK’s response to the COVID-19 pandemic. It will contribute to the wider understanding of the virus, helping to inform a range of policy decisions taken by central government, health services and others. Optimising such decision making could ultimately save lives.

ONS will rely on the following lawful basis from the GDPR for processing personal data including special category personal data:

• GDPR Article 6(1)(e) The processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the data controller. The authority for ONS to produce, promote and safeguard official statistics is found in the Statistics and Registration Service Act 2007.

• GDPR Article 9(2)(i) processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy;

Personal identifiers from the PDS would be used only during linkage work to join the information from necessary data sources and would be removed from the resulting analysis dataset after such linkage had taken place.

Expected Benefits:

The primary benefit of the processing described above will be robust, quality-assured data linkages involving CIS and other health datasets.

This will support secondary benefits of producing official statistics which SAGE will use to inform policy decisions taken by the government.

Outputs:

The primary outputs from the use of PDS data as described above will be datasets comprising of CIS and other health datasets available for analysis as pseudonymised linked data within ONS’ SRS.

This will result in secondary outputs resulting from the analysis of such linked datasets for purposes determined by SAGE.

Processing:

ONS receives Personal Demographics Service data from NHS Digital under a separate Data Sharing Agreement.

ONS will reuse that data as follows:

PDS data will be matched at record level with the COVID Infection Survey (CIS) data using an automated linkage algorithm to match individuals based on combinations of identifying details in each dataset. The NHS Number from the PDS data will be combined with the identifying details from the CIS dataset to enable linkage with health datasets containing the NHS Number variable.

Where required, the PDS data will be matched in the same way described above with sets of patient identifiers from other health datasets in preparation for linking those datasets with the CIS data. For example, the Test and Trace dataset contains NHS Numbers for ~90% of respondents so linkage with PDS will help populate the missing NHS Numbers before the dataset is linked with CIS data.

All data linkages will be quality assured.

Once the CIS data has been linked with the other health dataset, PDS data will be removed. The linked datasets will be made available for analysis as pseudonymised linked datasets in ONS’ Secure Research Service (SRS). No PDS data will be accessible within the SRS.


ONS CHRIS Replacement System - Provision of data via PDS to ONS — DARS-NIC-20951-D2K6S

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable (, Statutory exemption to flow confidential data without consent, )

Legal basis: Health and Social Care Act 2012 – s261(7); Other-Section 43 Statistics and Registration Service Act 2007

Purposes: No (Agency/Public Body)

Sensitive: Non-Sensitive

When:DSA runs 2019-07-01 — 2020-06-30

Access method: One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No, Yes

Datasets:

  1. Personal Demographic Service

Objectives:

The Data Recipient agrees to process the Data only for the following purposes agreed with NHS Digital:

Statutory purpose
Under section 20 of the Statistics and Registration Service Act 2007 the Statistics Board (UK Statistics Authority) may itself produce and publish statistics on any matter relating to the UK or any part of it. The Office for National Statistics as the Executive Office of The Board (ONS) carries out this function. Within ONS, Population Statistics Division are responsible for providing key data on the population of the United Kingdom and are comprised of several branches that undertake demographic analysis in migration, fertility, mortality, ageing and families which are used to create population estimates, population projections, the Migration Statistics Quarterly Report and other demographic analytical outputs.

Under section 5 of the Census Act 1920 the Registrar-General has a duty to from time to time collect and publish any statistical information with respect to the number and condition of the population in the interval between one census and another…and the Registrar-General may make arrangements with any Government Department for the purpose of acquiring any materials or information necessary for the purpose aforesaid. This duty was transferred from the Registrar-General to the Statistics Board under s.25(2)(a) of the 2007 Act.


ONS will rely on the following lawful basis from the GDPR for processing personal data including special category personal data:

GDPR Article 6(1)(e) The processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the data controller. The authority for ONS to produce, promote and safeguard official statistics is found in the Statistics and Registration Service Act 2007.

GDPR Article 9(2)(j) Processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.

ONS is the United Kingdom’s National Statistical Institute and largest independent producer of official statistics. It is responsible for producing statistics on a range of key economic, social and demographic topics in order to inform the needs of Government, society, academia and business to enable better decisions to be made. Using administrative data such as PDS allows ONS to produce statistics which are more granular and timely at a lower cost to the public, therefore enabling better decisions and resource allocation.

The specific necessity of processing these data is to develop census statistics and population and migration estimates: these justifications are outlined in greater depth below. Proportionality is considered by business areas, who are required to demonstrate that their goals cannot be achieved without the data. Demographic data (such as name, address and date of birth) are essential for these projects to ensure under coverage and over coverage within the individual data sources are dealt with appropriately. All requests for data are scrutinised to ensure the principle of data minimisation is applied and only data required for the purposes set out will be requested and processed.

ONS does not consider there to be any moral or ethical issues raised by the proposed dissemination. An internal Data Protection Impact Assessment (DPIA) indicates this process poses little risk of potential harm to the public. ONS considers the primary risk of any data share of this type to be the identification of individuals through either misuse by persons within the organisation or by a loss of data in transfer. Therefore, to protect against these risks, ONS employs rigours disclosure controls and access restrictions – ensuring that physical, technical, procedural and personnel security is kept to the highest and most up to date standards. To reduce the risk of data loss, data will be transferred using an agreed, secure, electronic transfer facility which adheres to Government requirements and assurance. An in-depth description of the security capabilities of the environment in which the data is injected, stored, linked and analysed is detailed below under the heading ‘Data Environment’.



Population Statistics Division

NHS Digital previously supplied CHRIS data and Patient Register (NHAIS) data to ONS for the purposes of estimating internal migration in England and Wales, and cross border flows to and from Scotland and Northern Ireland. Patient Register (NHAIS) data are due to be decommissioned and as such ONS need to assess its proposed replacement (PDS). The Patient Register data, in addition to internal migration estimates, are also used in the local authority distribution of international migrants for the mid-year population estimates and in the small area population estimates. Internal migration estimates are national statistics and a key component in population estimates and projections, also national statistics, which are used extensively throughout government to allocate funds. Under Section 5 of the Census Act 1920, ONS have a duty to ‘collect and publish any available statistical information with respect to the number and condition of the population in the interval between one census and another’ and ‘may make arrangements with any Government Department or local authority for the purposes of acquiring any materials or information necessary for the purposes’.

The data will be used in conjunction with other administrative data for estimating internal and international migration, the local authority distribution of international migrants component of change for the mid-year estimates and small area population estimates within England and Wales and estimating migration between England and Wales, Scotland and Northern Ireland. In doing so, it may be combined with other data sources to improve accuracy and quality of estimates. For instance higher education student records obtained from the Higher Education Statistics Agency (HESA) to enhance information on the movement of students.



Census Transformation Programme

The system for providing population and socio-demographic statistics in England and Wales has been built around having a ten-yearly census. The census provides a population count at a point in time and enables a wide range of socio-demographic statistics to be produced for a range of small geographic areas.
The census provides three broad categories of information:
· Population and housing estimates: counts of the population by age and sex, and the number of housing units.
· Population structures: households, families and household relationships.
· Population characteristics: e.g. ethnic groups, health, qualifications and employment.

These statistics are produced for geographies ranging from England and Wales as a whole down to Output Areas (125 households on average) and for very small population groups, such as some ethnic groups and religious communities. As well as the direct census outputs, the census is the benchmark for the population estimates between censuses, for household and population projections and used as a denominator for numerous other statistics. This brings additional benefits which are delivered in the years between Censuses. NHS England currently uses population statistics to allocate billions of pounds to local areas using funding formulae.

In May 2010 and ahead of the census in 2011, the UK Statistics Authority asked ONS to begin a review of the future provision of population statistics in England and Wales in order to inform Government and Parliament about the options for the next census. ONS launched the Beyond 2011 Programme of research and reviews, including a major public consultation on the two leading options at the end of 2013.

In March 2014, the National Statistician recommended:
To the Board of the UK Statistics Authority that future provision of population statistics and the next census should be through:
· “An online census of all households and communal establishments in England and Wales in 2021 … recognis[ing] that special care would need to be taken to support those who are unable to complete the census online.
· Increased use of administrative data and surveys in order to enhance the statistics from the 2021 Census and improve annual statistics between censuses.”

This would make the best use of all available data to provide the population statistics which England and Wales require and offer a springboard to the greater use of administrative data and annual surveys in the future.
The Government welcomed the recommendation, and stating their position for the future of the census in England and Wales:
‘Our ambition is that censuses after 2021 will be conducted using other sources of data and providing more timely statistical information. However, any final decision on moving to the use only of administrative data beyond 2021 will be dependent on the dual running sufficiently validating the perceived feasibility of that approach. In the period up to 2021 UKSA’s plans should include ensuring that adequate research into the use of administrative data and surveys is carried out to enable a decision about the future methodology for capturing population and census data.’ - Government’s response to the National Statistician’s recommendation, 18 July 2014.

The Census Transformation Programme (CTP) has continued research from the Beyond 2011 Programme into the use of administrative data and surveys to assess whether in ONS’ view the Government’s ambition can be realised. To that end, CTP has committed to publishing an annual assessment of progress towards the Government’s ambition and regular Administrative Data Research Outputs on the population, crucially including estimates of the size of the population produced using administrative data. The first such outputs were published in October 2015, based on pseudonymously linking multiple administrative datasets (including NHS Patient Registration Data, DWP/HMRC Customer Information System data and higher education data) to produce a Statistical Population Dataset (SPD). SPD population estimates have been produced for each local authority, by five-year age groups and sex for 2011, 2013 and 2014.

No one dataset in its’ own right can be used to produce a reliable estimate of the size of the usually resident population because, for example, the population registered with a GP is not the same as the usual resident population. The PDS data specified to replace the flow ONS currently receives would provide information about the timing and nature of an individual's interactions with this system. This 'activity' data can be used by ONS to determine whether a record appears to be 'active' or 'inactive' for statistical purposes. This can then be used to improve the quality of population statistics through helping to address issues with definitional differences between administrative data and population definitions and time lags. Research outputs produce aggregated statistics and are subject to the same disclosure control requirements as described by Population Statistics Division.

There is no common identifier that can be used to link individuals’ records across these datasets and therefore ONS need access to identifiable record level data (including NHS number, dates of birth, sex, Full name, addresses and postcodes) to link the datasets and establish the aforementioned SPD. This allows ONS to draw strength across multiple datasets to improve quality and methods to estimate the size of the population and their characteristics. Linkage of identifiable data is needed to support the 2021 Census and provide evidence for the annual assessment of progress towards the Government's ambition beyond 2021. This linkage of identifiable data is required to support the 2021 Census and work towards fulfilling the Government's ambition to conduct future censuses using other sources, while ensuring that population statistics in the future will continue to meet the needs of users e.g. for resource allocation, service provision and planning.

Identifiable data is required to be able to link to other admin data sources, however no clinical data is being requested.

Name variables have been included because they are key variables that enable the linking of PDS data to other administrative data sources to produce Admin Data Based Population Estimates. The Patient Register Extract is currently a key data source used by the Census Transformation Programme (CTP) to produce Admin Data Based Population Estimates, with this being shut down a full understanding of the PDS Stocks data is required to assess its suitability as a replacement. If ONS cannot link administrative sources they cannot identify potential double counting or cases which do not meet their usually-resident population definition

In addition to replacing the CHRIS dataset, PDS has been proposed by NHS Digital as a replacement dataset for the NHAIS Patient Register extract that ONS currently receive. PDS has been linked to NHAIS using the NHS number as the data currently held allows ONS to initially assess the data for this purpose and is in accordance with permitted use. However, the numbers of records on NHAIS and PDS datasets are very different and ONS need to understand why a record may appear on one dataset and not the other. Research into this has begun but ONS are hampered in their investigations due to the lack of name information (received on NHAIS), which prevents further data linkage that may uncover characteristics of those not present on both datasets. If ONS cannot understand the root causes of the differences then ONS cannot then take account of them in their methodologies. This could result in ONS not being able to adopt PDS as a fit for purpose replacement for NHAIS and subsequently have a need for NHAIS data in 2019 and beyond, due to the longitudinal nature of their methods requiring more than one year of data. It may also result in delays to planned changes in methodology as ONS would be unable to link from PDS to other sources where name is required, for example within a component of mid-year population estimates.



Data Environment

The protection of data received from partners is very important to ONS and significant investment, activity and assurance is undertaken to ensure this data is secured throughout the lifecycle from ingest to processing and export. ONS are developing advanced approaches that enable improved statistical analytical capabilities including the linking and matching of personally identifiable information, while maintaining and enhancing procedural, personnel, physical and technological security protections that provide assurance to internal stakeholders and external partners. A developing data analysis environment, the Data Access Platform (DAP), will support this as ONS moves away from the Statistical Research Environment for our analysis work.

Data security for storage and linkage of the data will be provided with an assured ONS data analysis environment (DAP) that includes the following elements of security control:

• Need To Know applied through user account access and management
• Controlled ingest and export of data into/out from the environment
• Controlled account access using unique credentials based on job role
• Logged and monitored access of user activity within the environment
• Secure build configuration for infrastructure, including cloud services
• Vulnerability tested infrastructure with appropriate remediation and patching
• Compliance checks against security enforcing controls
• Architectural review against standards and best practice
• Staff security cleared to the appropriate level based on their supervised and/or unsupervised access to sensitive data in accordance with ONS clearance policies and data access processes
• Education and awareness of environment users covering security policies and secure working practices
• Operational support processes to securely manage the environment
• Risk assessment to identify security risks and mitigation actions to reduce this risk.

Following policy specified by the ONS Chief Security Officer, users will be granted supervised or unsupervised access following a clearance application. The ONS Information Asset Owner (IAO) grants access and a list of all authorised users is available on request.

With reasonable notice, periodic written/verbal checks may be conducted by an authorised employee of NHS Digital to confirm compliance with this DSA.

ONS will keep a record of any processing of Personal Data and will provide a copy of such record to NHS Digital on request.

ONS will not transfer or permit the transfer of the Data to any territory outside the UK without the prior written consent of NHS Digital.


Migration – Business Case for 2018/19

The Centre for International Migration is leading on work which will begin to put administrative data at the core of migration statistics in 2019. This programme of work forms part of a larger Government Statistical Service (GSS) transformation plan which is recognised in the Home Affairs Select Committee’s report and government response.

The Personal Demographic Service (PDS) data would form an integral part of plans to put administrative data at the core of migration statistics. The data will help develop an understanding of international migration and improved migration estimates at local level. Improved statistics, at a more granular level, will enable more effective decision making, policy development and evaluation and resource allocation at local level. Receiving PDS data will allow migration statistics to:

1. Build on ONS' work identifying migrant interactions within the health system as a sign of activity and the impact of migration on the health sector. This will be achieved by linking to other health datasets including the Hospital Episode Statistics and the Improving Access to Psychological Therapies dataset.
2. Compare signs of arrival, activity and departures by linking PDS to other public-sector datasets.

Yielded Benefits:

Statistical Design and Research Since the last Data Access Request Service was agreed, the following developments have been made: • ONS have conducted quality assurance analysis on the dataset, which has fed into internal report on the dataset. • ONS have received Name variables from NHS-Digital as part of the extract received which now feeds into analysis. • PDS has been used as a tool to quality assure the linkage of address data between PR and CIS in a previous SPD publication. • PDS has been used as for the quality assurance of Births and Deaths Registration data which will feed into the migration publication to be released by ONS in late autumn 2018. • PDS has primarily been used to link to other datasets and develop an iterative methodology to produce population and migration estimates. Early parts of this work were presented at the British Society of Population Studies (BSPS) conference earlier this year. Going forward, within SDR one of the key uses of PDS will be continuing to link the data to a range of other datasets at record level to develop a methodology to produce population and migration estimates by sex and age to levels of geography currently published by the official Population Estimates going forward. ONS have recently raised a request for a number of additional variables to be delivered going forward however discussions surrounding the acquisition of these variables is ongoing so no further update can be provided on specific variables. The key variables that have been requested are those which provide some form of identification which is descriptive of a person and any variable which would highlight activity. Additionally, it is likely be used to pick up the demographic information that is missing on other datasets, for example HES and IAPT, to allow for easier, more consistent linkage. Census Statistical Outputs Design Census Statistical Outputs Design are hoping to use the PDS to aid in the imputation and quality assurance of Census, particularly of people resident in communal establishments such as care homes and mental health establishments. This mirrors what was done in 2011 using the PR. Response to Census in communal establishments is variable, and it’s not always clear where non-respondents are located due to unreliable administrative data estimates about the care homes themselves (referring to total capacity rather than occupied capacity). It is hoped that if a person is usually resident in a care home, the chances of them not appearing on the PDS will be small. Therefore, estimates of currently occupied beds in care homes derived from the PDS linked to a list of identified care homes and then compared to Census responses for that care home will identify approximate numbers of people missing from that care home’s Census responses. Demographic information from the PDS for that care home can then be used to determine appropriate types of donor records to impute missing responses. Work to develop this approach is ongoing, with a test of the approach to be conducted during the 2019 Rehearsal. Migration Statistics Division The Centre for International Migration have linked the PDS to HMRC’s Migrant Worker Scan (MWS). This has enabled comparisons of potential arrival and registration dates between the two sources and in turn has tested assumptions on lag between arrival and registration with a GP. Other quality work includes how many ‘new migrants’ on the MWS are seen in PDS. This is part of ONS’ wider work to link different admin data sources together and test assumptions around putting administrative data at the core of migration statistics. This work is planned to continue going forward.

Expected Benefits:

The Department of Health and their agencies specifically use both census and population statistics which are branded as national statistics in the planning and provision of health and social care services and funding allocations. They are almost certainly used as the denominator in any statistics that are published on a per capita basis.
The internal migration estimates are one of the components of the mid-year population estimates that are produced at various levels of geography (including clinical commissioning group). It is these mid-year estimates that are the denominator for a lot of health statistics. For instance, any health data published per capita for a particular level of geography (national, region, local authority, clinical commissioning group, parliamentary constituency etc) is almost certain to have ONS estimated or projected population as the denominator. Estimates and projections are published by age and sex. This means that they can also be used to better target age and sex specific health and care services (e.g. maternity, aging populations etc.)

The population estimates and projections are also used extensively throughout government and specifically by the Department of Health and their agencies for the planning and provision of health and social care services and the distribution of funds. Throughout government, decisions on the distribution of billions of pounds of funds are made based on population estimates and projections.

The National Statistician’s recommendation in March 2014 to the Board of the UK Statistics Authority that future provision of population statistics and the next census should be through “Increased use of administrative data and surveys in order to enhance the statistics from the 2021 Census and improve annual statistics between censuses.” Therefore, CTP are conducting research into this to produce accurate and more timely population statistics.

The research outputs are used by academics, local authorities, other Government Departments and other Statistical institutions for the allocation of funding and resource and other ongoing research in to the use of administrative data to production information about the population. This work will enable efficient funding and resource allocation for a range of services, including health and social care.

Outputs:

Main publication dates are as follows:

Internal migration – Annually – June

Internal migration statistics are published at local authority level by age and sex. They have wide use by local authorities and academics, including feeding into alternative population projections models and subsequent resource allocation planning. Mid-year estimates – Annually – June, with lower level geography releases after this until Small Area Population Estimates are released in November.


National Population projections – normally published at the end of October. Releases have traditionally been every other year. Following the national projections, there are sub-national releases between national releases.

Internal migration estimates are a key component of population estimates and population projections, which have wide and important use in the UK. Estimates are used in resource allocation from Central Government to local government and health authorities as well as in denominators for other statistics across government, which feed into policy decisions. Projections are used to determine future provision of public services such as schools and hospitals. At the local level these statistics are used to determine the types of services which are needed, for example a hospital may use them in determining whether or not to alter the level of provision of maternity or geriatric services.

Population projections and estimates are produced for CCGs and local authorities. Population estimates are also produced for output areas, parliamentary constituencies and other small area types.


Census Transformation Programme (CTP)

CTP has committed to publishing regular annual Administrative Data Research Outputs on the population, including estimates of the size of the population produced using administrative data. The first such outputs were published in October 2015 and included pseudonymously linked multiple datasets (including NHS Patient Register data, DWP/HMRC Customer Information System data and higher education data) to produce a Statistical Population Dataset (SPD). SPD population estimates have been produced for each local authority, by five-year age groups and sex for 2011, 2013 and 2014.

The aim of these research outputs is to:
• Update users on the progress with administrative data and to seek feedback with the aim of improving the methods used.
• Help the process of working with data suppliers to improve data quality for statistical purposes
ONS will aim to develop them in both method and content in the run up to the 2021 Census, and ultimately compare them with results from the 2021 Census. ONS will also expand on the range of topics and granularity published each year.

ONS recognise the need for continued development of their methodology, so future releases will also include outputs based on future developments which will allow feedback to be taken on board from users, new data that becomes available, and alternative techniques to produce population statistics.

A new series was introduced in 2016 to include these developments which showed improvements to the methodology. Additionally, subject to satisfactory data access and quality, ONS also aim to release population statistics at more detailed levels, and outputs about other topics or characteristics of the population (other than age and sex) such as qualifications, personal or household income, and ethnicity, as well as on the number and the size and composition of households. ONS also plan to present their research into the development of a methodology to estimate and adjust for coverage error in the SPD.

In the spring of each year, starting in 2016, ONS will publish an assessment of progress towards the Government’s ambition to produce population statistics from administrative data after 2021.

The Census Transformation Programme (CTP) has continued research from the Beyond 2011 Programme into the use of administrative data and surveys to assess whether in ONS’ view the Government’s ambition can be realised. To that end, CTP has committed to publishing an annual assessment of progress towards the Government’s ambition and regular Administrative Data Research Outputs on the population, crucially including estimates of the size of the population produced using administrative data. The first such outputs were published in October 2015, based on pseudonymously linking multiple administrative datasets (including NHS Patient Registration Data, DWP/HMRC Customer Information System data and higher education data) to produce a Statistical Population Dataset (SPD). These outputs are produced annually, the most recent being October 2016. SPD population estimates have been produced for each local authority, by five-year age groups and sex for 2011, 2013, 2014 and 2015.

NHAIS feeds into annual releases for mid-year population estimates, internal migration estimates and small area population estimates (including clinical commissioning groups); through these estimates, NHAIS then indirectly feeds into the national population projections and the subnational population projections (including clinical commissioning groups) which are produced every two years. These statistics have onward use in resource allocation for example in the distribution of funds from central to local government.


Record level data will be held securely within ONS. The data will be held in the accredited Statistical Research Environment (SRE) located at ONS premises. The environment has been specifically designed to support the linkage of personal data and has security measures in place to mitigate the associated risks. NHS digital were consulted during the design and build of this facility and have continued to be involved in its oversight since it became operational. Only named researchers have access to SRE and it is not possible to access a printer, disks, internet or rest of the ONS network whilst in this room. It is a secure room that requires additional security clearance to gain access to. The approach taken to manage data is described in the ONS's safeguarding data for research policy.

Processing:

Population Statistics Division

Internal and International migration is a key component of population estimates and population projections. In the majority of areas, it is the largest component of change between one year and the next.

The data requested herein will form part of the internal migration method alongside other datasets, primarily HESA student data and replacing the Patient Register data from NHAIS In order to distribute international migrants to local authorities, a registration type denoting new registrations is used, in conjunction with data from other sources such as HESA student data and data from HMRC and DWP are used to apportion the national total of migrants to the local level.

ONS, National Records of Scotland (NRS) and Northern Ireland Statistics and Research Agency (NISRA) share aggregated information on the movements of patients between the four countries of the UK from the NHS systems in the productions of mid-year population estimates and other statistical outputs. The sharing of information was previously covered by a data access agreement. The NHSCR system in England that provided this data was closed in February 2016.

The number of cross border moves between Scotland, England & Wales, and Northern Ireland is shared by the three statistical offices (NRS, ONS and NISRA). The total numbers are agreed, and the totals from the receiving countries are used for each flow because these are considered to be most accurate.

In addition NRS received record level data to ensure a higher level of accuracy for the large number of moves between Scotland and England & Wales is large, which is a large proportion of all moves for Scotland. The data from the receiving country is considered to be the most reliable, as this is based on persons registering with a GP in their new location, whereas estimates from the origin country are based on de-registration of persons who notify their GP that they are moving. Both registering and de-registering are voluntary, and people are more likely to re-register than to de-register.

To continue with the cross border moves, aggregate totals and record level data is required from the PDS system. This data will be extracted from the PDS stocks/weekly movers file using new approved methodology. The variables required to produce the data are:

1. Gender
2. Date of Birth
3. NHAIS Posting
4. NHAIS Posting Business Effective From Date
5. Previous Posting (from Registration Request)
6. Type of Registration
7. Reason for Removal Type
8. Reason for Removal Business Effective from date

In order to distribute local authority population estimates to lower levels of geography (the small area population estimates) a combination of ratio change and apportionment are used based on Patient Register Data from NHAIS.

ONS is permitted to share the above data items with NRS on condition that ONS ensures there are adequate controls in place to ensure that NRS:
a. must not combine the data with other datasets which could potentially increase the risk of reidentification for individuals in the dataset;
b. must not attempt to re-identify individuals in the dataset;
c. must not onwardly share the dataset;
d. must use the dataset solely for the defined purpose of the production of cross-border flows between the four countries of the United Kingdom, using the movement of people, as part of population estimates for the United Kingdom for the Population Estimates Unit, and;
e. must not publish the data.

ONS must maintain its Memorandum of Understanding (MoU) with NRS as a means of applying the above controls.

ONS will only share data with NISRA that is aggregate totals of moves between England or Wales and Northern Ireland. No record level data will be shared with NISRA.


Census Transformation Programme (CTP)

The matching methods ONS use are based on a privacy-preserving approach, where identifiable data (names, dates of birth, and addresses) are pseudonymised consistently using the same algorithm that has been applied to all record level datasets supplied to ONS. CTP plan to link PDS data to other identifiable datasets through linkage to NHAIS GP Registration data via NHS number. At present CTP use the following identifiable administrative data sources in the development of research outputs and to support the 2021 Census:
• NHS Patient Register
• DWP Customer Information System
• Higher Education Statistics Agency Student Record
• English and Welsh School Censuses

ONS are working with departments across government to gain access to administrative data to understand the feasibility of using these data sources to meet the Government’s ambition for census taking beyond 2021.

PDS data would undergo the same pre-processing and pseudonymisation steps as all other datasets used in the programme. The ONS policy on safeguarding data when linking multiple administrative datasets is set out in the previous Beyond 2011 paper Safeguarding Data.

If ONS cannot link administrative sources they cannot identify potential double counting or cases which do not meet their usually-resident population definition. Previous research in the Beyond 2011 Programme explored how population statistics could be produced without linking. This so called pre-aggregated approach (essentially by using non-identifiable data) was unable to deliver statistics that could meet the required quality. The Beyond 2011 Programme concluded that linkage of administrative data would be required in an administrative data-based approach to the Census in the future.

Two supplies of data are requested from NHS Digital for the ongoing supply of PDS Weekly Movers file and the annual Mid Year Stock Extract file. The specific data items required from the Personal Demographic Service are as follows:-

PDS Periodic Movers Extract

Provided Weekly via Messaging Exchange for Social Care and Health (MESH) to ONS, this reflects patients who have moved, in order to assess population movements.

Variables

NHS Number
Gender
Date Of Birth
Postcode from usual Address
NHAIS Posting
NHAIS Posting Business Effective from date
Previous Posting
Type of Registration
Reason for Removal Type
Reason For Removal Business Effective from date


Annual MId Year PDS Stock Extract File

Annual run to be initiated on the closest Saturday to 31st July each year which times in line with the supply of the annual Patient Register and supplied to ONS via MESH

Variables

NHS Number
Gender
Date of Birth
NHAIS Posting
NHAIS Posting Business Effective from date
Reason For Removal
Reason For Removal Business Effective from date
Type of Registration
Address
Postcode (usual Address)
Family Name
First given name
Other given names
Names Business Effective from date.

These are bespoke extracts from the Personal Demographic Service PDS.

From February 2018 ONS moved data onto a new Data Access Platform (DAP). Both the PDS weekly Files and PDS Annual Stock Extracts will be moving to DAP.



Data Environment.

The protection of data received from partners is very important to ONS and significant investment, activity and assurance is undertaken to ensure this data is secured throughout the lifecycle from ingest to processing and export. ONS are developing advanced approaches that enable improved statistical analytical capabilities while maintaining and enhancing security protections that provide assurance to internal stakeholders and external partners. A developing data analysis environment, the Data Access Platform (DAP), will support this as ONS moves away from the Statistical Research Environment for analysis work.

1.1 Data security for storage and linkage of the data will be provided with an assured ONS data analysis environment (DAP) that includes the following elements of security control:
• Need To Know applied through user account access and management
• Controlled ingest and export of data into/out from the environment
• Controlled account access using unique credentials based on job role
• Logged and monitored access of user activity within the environment
• Secure build configuration for infrastructure, including cloud services
• Vulnerability tested infrastructure with appropriate remediation and patching
• Compliance checks against security enforcing controls
• Architectural review against standards and best practice
• Staff security cleared to the appropriate level based on their supervised and/or unsupervised access to sensitive data in accordance with ONS clearance policies and data access processes
• Education and awareness of environment users covering security policies and secure working practices
• Operational support processes to securely manage the environment
• Risk assessment to identify security risks and mitigation actions to reduce this risk.
1.2 Following policy specified by the ONS Chief Security Officer, users will be granted supervised or unsupervised access following a clearance application. The ONS Information Asset Owner (IAO) grants access and a list of all authorised users is available on request.
1.3 With reasonable notice, periodic written/verbal checks may be conducted by an authorised employee of NHS Digital to confirm compliance with this DSA
1.4 ONS will keep a record of any processing of Personal Data and will provide a copy of such record to NHS Digital on request.
1.5 ONS will not transfer or permit the transfer of the Data to any territory outside the UK without the prior written consent of NHS Digital.


Request for remote access to GDPPR for linkage to HES (including APC, OP, A&E and Critical Care) and mortality data — DARS-NIC-388794-Z9P3J

Type of data: information not disclosed for TRE projects

Opt outs honoured: No - Statutory exemption to flow confidential data without consent, Identifiable, Anonymised - ICO Code Compliant (Statutory exemption to flow confidential data without consent, Does not include the flow of confidential data)

Legal basis: CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, Other-COPI Regs 2020, CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002; Other-CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, Other-CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, Health and Social Care Act 2012 – s261(2)(a)

Purposes: No (Agency/Public Body)

Sensitive: Sensitive, and Non Sensitive, and Non-Sensitive

When:DSA runs 2020-07-13 — 2020-10-12 2020.10 — 2024.02.

Access method: System Access, One-Off
(System access exclusively means data was not disseminated, but was accessed under supervision on NHS Digital's systems)

Data-controller type: OFFICE FOR NATIONAL STATISTICS, OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No

Datasets:

  1. Civil Registration - Deaths
  2. GPES Data for Pandemic Planning and Research (COVID-19)
  3. Hospital Episode Statistics Accident and Emergency
  4. Hospital Episode Statistics Admitted Patient Care
  5. Hospital Episode Statistics Critical Care
  6. Hospital Episode Statistics Outpatients
  7. Civil Registration (Deaths) - Secondary Care Cut
  8. COVID-19 Second Generation Surveillance System
  9. Covid-19 UK Non-hospital Antigen Testing Results (pillar 2)
  10. Civil Registrations of Death - Secondary Care Cut
  11. COVID-19 General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR)
  12. Hospital Episode Statistics Accident and Emergency (HES A and E)
  13. Hospital Episode Statistics Admitted Patient Care (HES APC)
  14. Hospital Episode Statistics Critical Care (HES Critical Care)
  15. Hospital Episode Statistics Outpatients (HES OP)
  16. COVID-19 Second Generation Surveillance System (SGSS)
  17. COVID-19 UK Non-hospital Antigen Testing Results (Pillar 2)
  18. COVID-19 SGSS First Positives (Second Generation Surveillance System)
  19. Mental Health Services Data Set (MHSDS)

Objectives:

Under version 0.2 of this Data Sharing Agreement, analysts working for the Office for National Statistics (ONS) were granted remote access to linked GDPPR, HES and Mortality data within NHS Digital’s data environment.

This Agreement extends the period for which ONS analysts are permitted to access the data within NHS Digital’s environment.

Version 0.2 of this Agreement set out ONS’ intentions, prior to being granted data access, to review and analyse the data to determine if and how the data might be used to address gaps in ONS’ analyses of the risks associated with COVID-19. Since access was granted, the work undertaken by ONS analysts has led to subsets of GDPPR data being requested and approved for dissemination under a separate Agreement (DARS-NIC-400304-S1P1B).

Due to the size of the GDPPR data extract, it will take time before the dataset is fully transferred to ONS, ingested, data engineered and linked for analysis. In the meantime, ONS require uninterrupted access to linked GDPPR, HES and Mortality data to continue analyses in progress and to allow further investigations into the potential utilisation of the data to answer new questions in relation to COVID-19 which continue to be raised such as in relation to the phenomenon of ‘long-covid’.

The remainder of this section is unchanged from version 0.2 of this Data Sharing Agreement.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The Office for National Statistics (ONS) is working on urgent analysis to determine the population-level relative risk of hospitalisation or death that COVID-19 presents to different people. This is being achieved by linking information on outcomes with information on characteristics and underlying health conditions at a record level. The data being used so far are being processed and analysed on ONS’s secure data platform. These data are either owned by ONS or have been acquired through its statutory powers.

However, there are some important information gaps in the data ONS has linked and has analysed so far. Primarily, this is to do with comorbidities and primary care data. Without a complete picture of comorbidities, it is difficult to give a fuller account of the differences being found in COVID-19 related mortality and morbidity between different groups/characteristics (such as ethnicity).

The information that has already been linked and is being analysed by ONS are as follows:
• Information on the outcome of death comes from the death registration data ONS already processes and regularly publishes statistics on.
• Information on socio-demographic characteristics such as ethnicity comes from the Census 2011.
• Information on hospitalisation (serious illness) from COVID-19 as an outcome, and information on underlying conditions that result in hospital contact, come from Hospital Episodes Statistics (HES) data that ONS receives from NHS Digital under a separate Data Sharing Agreement. ONS compelled NHS Digital to share this HES data with ONS for the purposes of official statistics under Section 45C of the Statistics and Registration Services Act (2008), as amended by the Digital Economy Act (2017)


ONS is currently unable to control for all comorbidities in statistical models using only the HES data because hospital attendances will only represent the most serious cases, with minor illnesses or managed chronic conditions being handled in primary care. It is likely these missing comorbidities are mediating some of the differences that have been found between different groups/characteristics such as ethnicity.

ONS would like to include primary care data for all or most of the population too. And ideally, this would involve ONS acquiring and transferring the data onto its secure systems where it can be linked to the other data sources described at a record level.

The government’s Scientific Advisory Group for Emergencies (SAGE) and the National Statistician are all keen that such an improvement to the project be enabled. As a result both ONS and NHS Digital are coming under significant pressure share relevant data quickly. More importantly, the insight gained from an improved study will inform decision making that could ultimately save lives.

However, ONS and NHS Digital have agreed that some groundwork is needed in advance of ONS potentially compelling NHS Digital to share the data by issuing a legal Notice under the same statutory powers used to acquire the HES data.

This involves ONS and NHS Digital collaborating to analyse the GPES Data for Pandemic Planning and Research (GDPPR) data on NHS Digital systems, where it can be linked to the HES and mortality data (i.e. the only data missing would be the ONS Census data).

This will allow ONS and NHS Digital analysts to explore and analyse the data in situ, with a view to:
a) confirming there is a strong enough ‘public good’ case for the data being transferred onto ONS systems, and if so
b) refining and minimising any GDPPR extract/specification which ONS then compels NHS Digital to share

This refinement is required because the ONS statutory power to compel data are shared is still subject to GDPR principles, including that data are minimised to that necessary to achieve the purpose. In addition, the powers are only applicable where ONS needs the information for its functions (essentially the production of official statistics for the public good).

The quickest way to ensure these conditions are met is for ONS to gain access to the data remotely and collaborate with NHS Digital analysts as described. Unlike any subsequent transfer of data to ONS under its statutory powers, this remote access by ONS analysts will be covered by the COPI regulation.

To be clear, the present application only covers this refinement stage. If/when ONS compels NHS Digital to share and transfer an extract onto ONS systems under its statutory powers, this will be supported by a separate DARS application and Data Sharing Agreement.

In preparation for access to GDPPR data, ONS has been working with NHS Digital’s analytical experts to better understand GDPPR data and whether it will be fit for the statistical purposes to which ONS wants to put it. The ONS analysts have been provided with a GDPPR user guide and a data specification and have been involved in discussions to plan the setup of the data environment in which the data would be accessed.

Once granted remote access to GDPPR data in a suitable and secure data environment within NHS Digital’s systems, ONS analysts will review the data items, coverage, quality and completeness of these data in line with ONS requirements for producing official statistics under the Code of Practice (particularly transparency, quality and improvement). The collaboration of ONS and NHS Digital analysts during this phase will support NHS Digital’s development and understanding of these new primary care data.

Having access to HES, mortality and GDPPR data containing identifying details will enable ONS analysts to link individuals across the datasets. ONS does not require identifying details for any reason other than for data linkage.

The linked dataset will not fully mirror the linked project dataset that ONS has already produced on its own systems because it will not include the Census 2011 data. However, risk modelling analysis that will inform decision making by bodies such as SAGE will still be possible by ONS and NHS Digital analysts working collaboratively.

This work will also allow further development of the collaborative ONS-NHS Digital view on the quality and utility of the data, and whether there is a ‘public good’ case for any data being transferred to ONS under its statutory powers. Logically, this decision will revolve around the importance of any analyses that are still not possible because the linked data at NHS Digital lacks the Census information, and because the linked data at ONS lacks the GDPPR information.

Note, the only way to bring all four sources together is at ONS. There is no legal gateway which allow the transfer of Census data to NHS Digital systems (whereas there is for the transfer of GDPPR data to ONS systems).

The HES and mortality data included within this linked dataset created on NHS Digital systems will mirror the HES and mortality data that the ONS analysts already have access to on ONS systems. They are suitably security cleared and trained in working with such data.

It is important to note that assessing the quality of the data is a key requirement to produce official statistics so that the strengths and limitations of the different data can be understood and applied or mitigated as required. ONS has to undertake preliminary work to assess the appropriateness of a datasets/data sources for use in the production of official statistics. As an example, noting that Type 1 Patient Objections will be applied to the GDPPR data, ONS will need to assess whether there is a need to make adjustments for that.

As has been made clear, much of the proposed purpose here is to develop an understanding of the data through remote access so that any full acquisition of the data by ONS onto its systems can be minimised appropriately. However, ONS are also keen to minimise the access that ONS analysts get to the GDPPR data on NHS Digital systems where possible.

Based on its current knowledge of the data, ONS considered how it could minimise the data requested as follows:
• Years of data – ideally full histories for patients would be required to ensure we capture the full history of comorbidities but these could be minimised to data from March 2011 onwards in line with census year to allow for data on history of comorbidities to be applied to the study population appropriately. To note this would be for date of activity i.e. when the condition / event happened (not when the patient record was updated).
• Patient groups – the whole population is required because the data are used to predict outcomes associated with COVID-19, and as such ONS needs to ensure it has a big enough population to be able to do the analysis. The risk model aims to identify how COVID-19 patients are different to rest of population so it needs details of all different population groups and needs data to be representative of the population. The linked study dataset ONS has produced includes over 50 million subjects for whom a full Census record is available and for whom it has been possible accurately assign an NHS number to (i.e. those who can then be more easily linked to other datasets that include NHS number such as HES and mortality data).
• Cluster codes – it is currently difficult to specify any minimisation of the data based on cluster codes as the information and guidance on the cluster codes and identification of specific conditions and diagnosis are only just becoming available. A key part of this project is to understand and assess the quality of the data in the context of producing official statistics and as such this area of the data will be a key area requiring quality assessment (in terms of identifying all comorbidities). To minimise access to these data at this time may hamper a full assessment.
• Local area data – postcode data is not required but analysis at a local level is required for the model. Data to Lower Super Output Area (LSOA) is required which will allow a link to socio-demographic variables as provided by the Index of Multiple deprivation (IMD). LSOA is provided in the standard GDPPR dataset so this is fine.

The primary care data to be accessed will be limited to information on:
• patient demographics• diagnoses and findings
• medications and other prescribed items
• investigations, tests and results
• treatments and outcomes
• vaccinations and immunisations

ONS require patient-level data because ONS are interested in patient-level socio-demographics, clinical profiles and outcomes. Use of more aggregated (e.g. regional level) data would result in a lack of statistical precision and risks the analysis being subjected to the so-called ecological fallacy. At present the work ONS has done is only able to control for decade-old socio-demographic factors in its COVID-19 risk models. Up-to-date primary care data on clinical diagnoses, treatments and histories would allow ONS to substantially enhance the risk models, as comorbidities are likely to explain relatively large proportion of the variability in COVID-19 mortality risk.

Whilst certain conditions are known to be risk factors for COVID-19 mortality/morbidity (e.g. patients with severe lung conditions or those on immunosuppressants), it is necessary to have access to the full range of diagnostic and treatment codes (including linkage to HES data) so patients’ comorbidity profiles can be fully explored and controlled for in the models.

Yielded Benefits:

ONS produced a briefing on its analysis of long-COVID symptoms and COVID-19 complications which was presented to the National Statistician and subsequently shared with NHS Digital's Profession Advisory Group. The findings have been published on ONS' website - see: https://www.ons.gov.uk/releases/estimatingtheprevalenceoflongcovidsymptomsandcovid19complications. The information produced by ONS has been briefed to ministers enabling them to reach appropriate decisions in their response to the pandemic.

Expected Benefits:

This analysis is of national public health importance and has been requested by the National Statistician, NHS Digital’s Chief Statistician and members of the Scientific Advisory Group for Emergencies (SAGE), so statistical accuracy and robustness is of upmost importance. The results of the analysis will be used to inform members of SAGE, Members of Parliament (MPs) and other government officials of the differing COVID-19 risk profiles experienced by UK citizens. This risk model will enable the government to refine its policy response to the pandemic using the best evidence available.

The analysis may also improve the public’s understanding of the risk faced by certain population groups, leading to more informed decision making, and add to the growing body of literature being produced and evaluated by the global academic community.

Ultimately this analysis has the potential to deliver public health benefit by reducing COVID-19 related mortality and morbidity in the UK.

A specific benefit of the processing described above will be that ONS analysts will understand how the GDPPR data can be refined and minimised should ONS subsequently use its statutory power to compel NHS Digital to transfer an extract of GDPPR to ONS. ONS’ statutory power to compel data are shared is still subject to GDPR principles, including that data are minimised to that necessary to achieve the purpose. Therefore, the initial access under this Agreement and the processing work described above are necessary to support potential further uses of the GDPPR data.

Outputs:

Analysis outputs will be shared with colleagues at the Office for National Statistics and NHS Digital for scrutiny and quality assurance. NHS Digital will support ONS in the production of any publications with a focus on statistical accuracy, quality assurance and robust peer review.

The use of the data will determine the viability of producing official statistics using the datasets. The processing outlined above may directly result in the production of official statistics or may inform a subsequent methodology which is then used to produce official statistics.

As part of this, a key output is that this work will inform what minimisation can be applied to the GDPPR data in the event that ONS subsequently compels NHS Digital to transfer an extract to ONS.

Any official statistics produced will be shared with MPs, members of SAGE and other government officials to inform the government’s response to the COVID-19 pandemic. Any official statistics produced will be published, for example on the Office for National Statistics website.

In the event that ONS determines that use of the GDPPR data is unsuitable for the purpose of producing official or identifies issues of significance with the data, it is possible that ONS would publish its findings in the form of methodological reports. ONS’ work to development new official statistics may involve testing to investigate whether statistics of sufficient quality can be produced and may also involve the production of statistics badged as ‘experimental’ while further work is done to improve quality aspects such as accuracy.

No patient-level data will be extracted from NHS systems. The expected data outputs are aggregate summary statistics, regression coefficients, and summary plots. All data outputs will be subject to any required disclosure control practices.

Processing:

NHS Digital will create a secure environment containing the data and provide restricted remote access to designated employees of the Office for National Statistics. Access to the data will be limited to only these individuals. No external data will be brought into NHS Digital’s data environment and linked with the data under this Agreement. Other than the datasets described in this Agreement no other data will be linked.

Any outputs exported from NHS Digital’s data environments will contain data only where that data is aggregated with small numbers will be suppressed in line with the HES Analysis Guide.

No patient-level data will be extracted from NHS systems. The expected data outputs are aggregate summary statistics, regression coefficients, and summary plots. All data outputs will be subject to any required disclosure control practices.

All organisations party to this Agreement must comply with the Data Sharing Framework Contract requirements, including those regarding the use (and purposes of that use) by “Personnel” (as defined within the Data Sharing Framework Contract - i.e. employees, agents and contractors of the Data Recipient who may have access to that data).

NHS Digital’s Security Advisor has reviewed ONS’ access arrangements and is content.


Investigating COVID-19 - request for acquisition of GDPPR & ECDS data — DARS-NIC-400304-S1P1B

Type of data: information not disclosed for TRE projects

Opt outs honoured: No - Statutory exemption to flow confidential data without consent, Identifiable, Anonymised - ICO Code Compliant, No (Statutory exemption to flow confidential data without consent)

Legal basis: CV19: Regulation 3 (4) of the Health Service (Control of Patient Information) Regulations 2002, Health and Social Care Act 2012 – s261(7), Health and Social Care Act 2012 – s261(7); Other-Section 45a of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act (2017), Health and Social Care Act 2012 - s261(5)(d); Other-Section 45a of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act (2017), Health and Social Care Act 2012 - s261(5)(d); Other-Section 45a of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act (2017); Other-Section 45a of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act (2017)

Purposes: No, Yes (Agency/Public Body)

Sensitive: Sensitive, and Non Sensitive, and Non-Sensitive

When:DSA runs 2020-09-28 — 2021-03-31 2020.10 — 2024.02.

Access method: Ongoing, One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS), OFFICE FOR NATIONAL STATISTICS

Sublicensing allowed: No

Datasets:

  1. Emergency Care Data Set (ECDS)
  2. GPES Data for Pandemic Planning and Research (COVID-19)
  3. Hospital Episode Statistics Critical Care
  4. Hospital Episode Statistics Accident and Emergency
  5. Hospital Episode Statistics Admitted Patient Care
  6. Hospital Episode Statistics Outpatients
  7. HES-ID to MPS-ID HES Accident and Emergency
  8. HES-ID to MPS-ID HES Admitted Patient Care
  9. HES-ID to MPS-ID HES Outpatients
  10. COVID-19 Ethnic Category Data Set
  11. Birth Notification Data
  12. Improving Access to Psychological Therapies Data Set_v1.5
  13. Personal Demographic Service
  14. COVID-19 General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR)
  15. Hospital Episode Statistics Accident and Emergency (HES A and E)
  16. Hospital Episode Statistics Admitted Patient Care (HES APC)
  17. Hospital Episode Statistics Outpatients (HES OP)
  18. Hospital Episode Statistics Critical Care (HES Critical Care)
  19. Improving Access to Psychological Therapies (IAPT) v1.5

Objectives:

The Office for National Statistics (ONS) is working on urgent analysis to determine the population-level relative risk of hospitalisation or death that COVID-19 presents to different people. This is being achieved by linking information on outcomes with information on characteristics and underlying health conditions at a record level. The data being used so far are being processed and analysed on ONS’s secure data platform. These data are either owned by ONS or have been acquired through its statutory powers.

However, there are some important information gaps in the data that ONS has linked and has analysed so far. Primarily, this is to do with comorbidities. Without a complete picture of comorbidities, it is not possible to explain all of the differences being found in COVID-19 related mortality and morbidity between different groups/characteristics (such as people of different ethnicity).

The information that has already been linked and is being analysed by ONS are as follows:
• Information on the outcome of death comes from the death registration data ONS already processes and regularly publishes statistics on.
• Information on socio-demographic characteristics such as ethnicity comes from the Census 2011.
• Information on hospitalisation (serious illness) from COVID-19 since February 2020 as an outcome, and information on underlying conditions that resulted in hospital contact since 2017/18, come from Hospital Episodes Statistics (HES) data.
ONS already receives this HES data from NHS Digital under a separate Data Sharing Agreement. ONS compelled NHS Digital to share this HES data with ONS for the purposes of official statistics under Section 45C of the Statistics and Registration Services Act (2008), as amended by the Digital Economy Act (2017), in January 2019.

ONS is currently unable to control for all comorbidities in statistical models using only the HES data. This is because hospital attendances will only represent the most serious cases, with minor illnesses or managed chronic conditions being handled solely in primary care. These missing comorbidities will be mediating some of the differences that have been found between different groups/characteristics.

To overcome this limitation, ONS is seeking to include primary care data for all or most of the population in its models This will involve ONS acquiring and transferring the data onto its secure systems where it can be linked to the other data sources described at a record level.

In addition, a change in reporting and data collection by NHS Digital means that their A&E dataset within HES has been discontinued and the more detailed Emergency Care Dataset (ECDS) has been stood up to replace it. Therefore, ONS are also seeking access to an extract of the ECDS data, both to replace the A&E information on a like for like basis, and take advantage of some additional useful information that is new to ECDS compared with HES A&E.

The Government’s Scientific Advisory Group for Emergencies (SAGE) and the National Statistician are all clear that they want this improvement to the project to be enabled by NHS Digital. As a result, both ONS and NHS Digital are coming under significant pressure share relevant data quickly. More importantly, the insight gained from an improved study will inform decision making that could ultimately save lives.

This is a task in the public interest. ONS is the sole data controller and will process personal data for this purpose under Articles 6(1)(e) and 9(2)(j) of the General Data Protection Regulation.

Aside from ONS, no other organisation will process the data under this Agreement.

This Agreement authorises ONS to receive extracts of General Practice Extraction Service (GPES) Data for Pandemic Planning and research (GDPPR) data, and Emergency Care Data Set (ECDS). The ECDS and GDPPR data will be used solely for the purposes described in this Agreement.

The Agreement also authorises reuse a subset of the HES data already supplied under a separate Agreement (ref: DARS-NIC-175120-W5G2X) for the purpose described in this Agreement.


More detail on purpose broken down by dataset:

The HES, ECDS and GDPPR data will be linked to data on Deaths and demographics (2011 Census) to produce statistics on the risk factors, including comorbidities, associated with COVID-19.

Linkage at a record level is a prerequisite to success for the proposed use, and therefore identifiers including postcode, date of birth, sex and NHS number are required. ONS does not require identifying details for any other reason than data linkage.

ONS require patient-level data because ONS are interested in patient-level socio-demographics, clinical profiles and outcomes. Use of more aggregated (e.g. regional level) data would result in a lack of statistical precision and risks the analysis being subjected to the so-called ecological fallacy.

At present the work ONS has done is only able to control for decade-old socio-demographic factors in its COVID-19 risk models. Up-to-date primary care data on clinical diagnoses, treatments and histories would allow ONS to substantially enhance the risk models, as comorbidities are likely to explain relatively large proportion of the variability in COVID-19 mortality risk.

Whilst certain conditions are known to be risk factors for COVID-19 mortality/morbidity (e.g. patients with severe lung conditions or those on immunosuppressants), it is necessary to have access to the full range of diagnostic and treatment codes (ie both HES and GDPPR data) so patients’ comorbidity profiles can be fully explored and controlled for in the models.

Dataset 1: Hospital Episode Statistics (HES)

Hospital episodes data serves two purposes in the project:

Firstly, once linked to Census and mortality data at a record level, HES data (since 2017/18) provides insight into some of the pre-existing conditions (i.e. those that require hospital contact) for those whose death was caused by or involved COVID-19. ONS analysts can then produce statistics on differences by comorbidity, and control for comorbidity when modelling differences between socioeconomic groups.

Secondly, more recent HES data (since the March 2020) allows ONS to identify incidences where people are hospitalised because of COVID-19 but then recover. ONS analysts can then look at ‘serious illness from COVID-19’ as an outcome within the population in addition to simply modelling the binary outcome of death (died / did not die).

At the start of the pandemic, ONS already held HES data covering from April 2010 to March 2019, and was to receive annual updates on an ongoing basis. This is covered under a separate Agreement, with the data being used for other statistical purposes. That Agreement has since been updated such that ONS now has HES data from April 2019 to the most recent HES data available, and will continue to receive further HES data on a monthly basis. The purpose section of that Agreement was also updated to reflect the uses described here.

Therefore, from a HES perspective, the only change permitted under this Agreement is that the HES data ONS holds will be linked to the ECDS and GDPPR data should ONS be granted access to the extracts being sought as per below.

Dataset 2: Emergency Care Dataset (ECDS)

These data are being requested for the same purpose and reasons and described for the HES data above.

The HES data ONS already receives includes the Accident and Emergency (A&E) portion of HES. However, HES A&E data has now been discontinued by NHS Digital and these data have been replaced by the more comprehensive ECDS which started its data collection in 2017. ONS holds HES A&E data covering April 2010 through to March 2020.

ONS requests access to an ECDS extract covering April 2020 onwards, to be updated monthly, that includes:
• Variables equivalent to those previously received from the HES A&E data
• Additional variables that were not available in HES A&E that ONS needs for the purposes of this project, specifically to support understanding of comorbidities and outcomes.

The specification being requested was developed in collaboration with NHS Digital data experts to ensure the data being shared are of sufficient quality (e.g. coverage, accuracy, relevance) to be likely to support the statistical purpose intended.

Dataset 3: General Practice Extraction Service (GPES) Data for pandemic planning and research (GDPPR)

These data are new and clearly very sensitive. Therefore, the specification of the variables being requested has been developed in collaboration with NHS Digital data experts and through a separate data access request which allowed three ONS researchers remote access to the data on NHS Digital systems.

The ONS researchers used this access, and worked with NHS Digital analysts, to learn about the GP data and to run some analyses by combining the data with mortality and HES data which are also present on NHS Digital systems. This has ensured the data being shared under this Agreement are of sufficient quality (e.g. coverage, accuracy, relevance) to be likely to support the statistical purpose intended in line with ONS requirements for producing official statistics under the Code of Practice (particularly transparency, quality and improvement).

This work has helped ONS come to a decision about if and what parts of the GP data need to be transferred to ONS systems to enable ONS’ COVID-19 risk modelling project (see proposed use below). ONS is therefore confident that the data specification being requested has been minimised to that absolutely necessary, in line with GDPR principles. More information on how the remote access to GDPPR was used to help confirm the need for GP data and minimise the request are detailed below.

The GDPPR data will be linked to data on Deaths, demographics (2011 Census) and hospital data (see dataset 2 above) to establish and assess comorbidities and risk factors associated with COVID-19.

The data are needed to understand the full range of comorbidities, patient history and risk factors which could influence COVID-19 outcomes. For example, diabetes and asthma sufferers may well be managing their condition in consultation with their GP, and have had no recent hospital contact (and therefore do not appear in HES). These missing comorbidities will mediate some of the variations in mortality that the project has found so far.

For example, ONS has found significant differences in mortality risk between different ethnic groups, and it is not yet fully understood what is causing this. Differences in prevalence of different comorbidities between ethnic groups is likely to be one reason, and ONS can only assess its contribution with complete comorbidity data.

This work is of critical priority across Government as part of the UK’s response to the COVID-19 pandemic. It will contribute to the wider understanding of the virus, helping to inform a range of policy decisions taken by central government, health services and others. Optimising such decision making could ultimately save lives.


Data Minimisation

Prior to ONS requesting extracts of the ECDS and GDPPR datasets, ONS and NHS Digital agreed that some groundwork was needed.

This involved ONS and NHS Digital collaborating to analyse the GPES Data for Pandemic Planning and Research (GDPPR) data on NHS Digital systems, where it was linked to the HES and mortality data (i.e. the only project data missing was the new ECDS and ONS Census data).

This allowed ONS and NHS Digital analysts to explore and analyse the data in situ, with a view to:
a) confirming there was a strong enough ‘public good’ case for the data being transferred onto ONS systems, and
b) refining and minimising any GDPPR extract/specification which ONS would subsequently want NHS Digital to share via this application

ONS have now confirmed that the data will be needed for the project. In reaching this conclusion, ONS:
• Documented how the GP data will improve the risk factor modelling for COVID-19 outcomes
• Considered which analyses can only be done through linkage to Census data (and therefore require data to be transferred to ONS), and how crucial these are
• Considered whether the Census data could be transferred to NHS Digital as a way to get all the data in the same data platform without the GP data needing to leave NHS Digital
• Investigated the general quality and coverage of the GDPPR data
• Carried out analysis to confirm if and how the GP data will improve understanding of comorbidities compared with HES data alone
• Carried out analysis to confirm if and how the data could be minimised, for example, in respect of years of data, patient groups, cluster codes and identifiable information

i) How the GP data will improve the risk factor modelling for COVID-19 outcomes
• Without the GP data, ONS analysts are constrained to using hospital based comorbidity and pre-existing conditions in their modelling of risk of COVID-19 deaths, which means they were failing to assign conditions such as diabetes and hypertension to those with such a condition who have either not had a hospital admission in the past three years or where such a condition may not be recorded on the hospital record for those with hospital contact.
• GP data and longer-term patient histories of comorbidities and conditions provide a more valid association between a given disease and risk of covid-19 mortality than reliance on only hospital-based comorbidity. See section (v) below.
• GP data will also encompass mental health conditions which are obscured when relying on hospital only data. This will provide the opportunity to investigate links between mental illness and COVID-19 mortality.
• GP data will give ONS a more accurate measure of duration of pre-existing conditions and potentially in combination with hospital data, a measure of severity of diseases as a means by which vulnerable groups can be identified.
• GP data also benefits from risk factors for disease identified on the patient primary care record such as smoking status and obesity which will complement understanding of causal pathways and vulnerability once infected.
• As well as improving the way risk factors for COVID-19 outcomes are captured, the GP data may also be used to define the COVID-19 outcomes themselves. In the future there is likely to be policy interest in the impact of COVID-19 on the population’s physical and mental health. For example, is a COVID-19 diagnosis associated with elevated risk of subsequent respiratory illness, and are individuals in certain socio-economic groups more likely to experience post-lockdown deterioration in mental health.

ii) Which analyses can only be done only through linkage to census data, and how crucial these are
• The Census provides a population of interest at a point in time to follow-up people from – a list of everyone who was in the country on a particular day that can then be followed up through time. ONS cannot get a complete, point-in-time index population from administrative sources such as GDPPR; certain parts of the population are less likely to have frequent contact with a GP, and these individuals may also be at systematically greater/lesser risk of COVID-19 mortality, so their omission from a GP-based study population would bias estimates.
• Linkage to Census allows investigation of the risk of a hospital episode for COVID-19 treatment given a/set of pre-existing condition(s) and how this varies across Census characteristics such as ethnicity, socio-economic position, household type. Given a specific or combination of conditions and population density locale, are there differences in likelihood of hospital admission for COVID-19 by ethnic group? If population density is a good indicator of infection risk, and pre-existing conditions a good indicator of prognosis then this will inform whether ethnic group differences persist given someone’s location and clinical history.
• How much further do population level measures of pre-existing conditions explain existing ethnic contrasts in COVID-19 mortality? Is there an interaction between comorbidity and ethnicity in risk of COVID-19 mortality?
• How much does Census based ethnic background assignment match results of previous studies using health record-based measures of ethnic background that adjusted for comorbidity? Ethnicity is well populated on the 2011 Census (the non-response rate was just 3%), but ONS has found the rate of missingness in the GP data to be much greater , reducing the utility of the ethnicity information available in the GDPPR dataset and increasing the need to link to Census.
• Establish whether ethnic background has an independent effect on risk of COVID-19 death when set against measures of comorbidity, hospital contact and socio-demographic characteristics available from the Census. Compared to a baseline model including Census data on only ethnicity, age, sex, region and IMD decile (the socio-demographic characteristics available in the GDPPR dataset), the excess risk of COVID-19 mortality amongst ethnic minority groups is attenuated by up to 30% when additional socio-economic, household and occupation Census variables are included in the model. This result indicates the importance of combining clinical variables from HES and GDPPR with the full range of socio-demographic characteristics collected by the Census.
• ONS is also interested in exploring the phenomenon of delayed access to hospital care given presence of a pre-existing conditions and what this means for future risk of COVID-19 and all-cause mortality.

iii) ONS and NHS D agreed that transfer of census data to NHS Digital was not appropriate.

Firstly, ONS has never shared record level Census data for analysis and is committed to keeping all personal information collected in the Census safe and confidential and follow a strict security regime to protect subjects’ data. The public are assured of this in Census publicity materials, and to transfer Census data to NHSD would go against these commitments. This could be detrimental to response rates and the success of Census. In turn, that would be detrimental to decision making based on the Census statistics, with the associated risk of significant human and financial costs.

Secondly, given the extent of the work already carried out within ONS systems to enable linkage of health data to census data there would be significant delay if Census data were transferred to NHS Digital systems and this work had to be repeated. This would be detrimental to pandemic response and decision-making for officials who have been calling for further analysis from ONS. Details of the linkage methods have been published in a technical document:
https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/methodologies/coronavirusrelateddeathsbyethnicgroupenglandandwalesmethodology

Thirdly, ONS already securely processes and analyses identifiable Hospital Episode Statistics data, and has been doing so since March 2019.

iv) Investigated the general quality and coverage of the GDPPR data

Through the remote access previously granted ONS analysts have reviewed the data items, coverage, quality and completeness of GDPPR data in line with ONS requirements for producing official statistics under the Code of Practice (particularly transparency, quality and improvement). ONS have carried out basic quality assurance on the GDPPR data and are content that the data are of sufficient quality to support production of official statistics.

Analyses included a check on missingness for key variables, such as date of birth and NHS number, check on population coverage and distributions for key variables, and a review of possible duplicate records. For example, based on a large sample of records there was only one instance of a single NHS Number being associated with multiple dates of birth, and on clerical examination these records appeared to belong to the same individual.

It is important to note that assessing the quality of the data is a key requirement to produce official statistics so that the strengths and limitations of the different data items can be understood and applied or mitigated as required. ONS has to undertake preliminary work to assess the appropriateness of a datasets/data sources for use in the production of official statistics.

v) Carried out analysis to confirm if and how the GP data will improve our understanding of comorbidities compared with HES data alone

ONS analysts aimed to carry out analysis to assess how underlying conditions or comorbidities are associated with outcomes. For example, investigating the association of a history of cardiovascular (CV) conditions on COVID-19 death controlling for age and sex. Based on a sample of patients with recent contact with the primary care system (to manage processing time) ONS compared a 5, 10, 15, 20 and 25-year patient history of CV conditions (based on diagnoses and medication codes) and found that longer (20-year) patient history data added more value to the analysis and strength to any associations than shorter (<20-year) patient histories.

ONS’s current access to 3 years of HES data which includes diagnosis codes cannot provide the same vital patient history data which will be used to help predict outcomes.

vi) Carried out analysis to confirm if and how we can minimise the data, for example, in respect of years of data, patient groups, cluster codes and identifiable information.

ONS considered how it could minimise the data requested as follows:

Years of data - Ideally full histories for patients would be required to ensure we capture the full history of comorbidities, but these could be minimised to a 20-year time period prior to the COVID-19 period (i.e. back to 1 January 2000) to improve the predictive value of these data on outcomes (see above) and to allow for a good history of pre-existing conditions for our study population (i.e. people alive on Census day 2011). To note this would be for a 20-year history of date of activity, based on the DATE field in the GDPPR dataset i.e. when the condition / event happened (not when the patient record was updated as per the RECORD_DATE field).

Patient groups - The whole population is required because the data are used to predict outcomes associated with COVID-19, which can be serious but remain relatively rare in the general population, and as such ONS needs to ensure it has a big enough population to be able to perform the analysis in a statistically robust and reliable manner. The risk model aims to identify how those experiencing adverse outcomes associated with COVID-19 are characteristically different to the rest of the population, so it requires data on risk factors across different population groups that are representative of the general population, rather than on any specific patient group. The linked study dataset ONS has produced includes approximately 50 million subjects for whom a full Census record is available and for whom it has been possible accurately assign an NHS number to (i.e. those who can then be more easily linked to other datasets that include NHS number such as HES and mortality data).

Cluster codes – ONS analysts have made an assessment of the high-level cluster code groups and identified 3 that are NOT required for our analysis:
• SNOMED codes clusters - Declines, contraindications and other exceptions
• SNOMED codes clusters - Review and monitoring
• SNOMED codes clusters - Vaccinations and immunisations

It is currently difficult to specify any further minimisation within the cluster groups requested until full analysis of conditions and comorbidities and any association with COVID-19 outcomes are made. A key part of this project is to understand and assess the quality of the data in the context of producing official statistics, and any further minimisation may hamper such an assessment. Furthermore, the pressing policy questions of today (primarily around risk factors for adverse COVID-19 outcomes) are likely to be different to those of the near future (such as how the prevalence of COVID-19 sequalae might vary between different population groups), and further minimisation at this stage may prevent ONS from being able to answer such questions in a timely and effective manner.

Personal identifiable information – ONS have considered how to minimise the requirement for identifiable data through a review of the data linkage and methodology required (i.e. linkage to the data currently held by ONS - HES, deaths and Census data). ONS receive NHS number, date of birth and postcode for the current HES data supply and request the same identifiers for ECDS and GDPPR (with the addition of date of death for GDPPR). Identifiable data will only be used in for the specific purpose of data linkage and quality assurance of that linkage.

Further identifiable data such as full name and address would potentially allow ONS to refine the current linkage approach and develop a bespoke method, for example, to link people from the study population (Census base) who have could not be identified in the patient register and given an NHS number. This could also allow for replenishment of the study population with post-2011 arrivals. However, at this time and in consideration of the specific purpose, urgency and needs of this project, and the sensitivity of these data, ONS feel it is sufficient not to include these more detailed identifiers in this request.


ONS analysts have worked on the NHSD remote access platform to understand the GDPPR data in more detail, and to refine and minimise any GDPPR extract/specification. The extent of this work has been balanced against the need for timely access to the GDPPR data in response to the urgency of this work to inform the public good. ONS have carried out a full and thorough minimisation exercise.


The data shared with ONS under this Agreement will not be onwardly disseminated or shared, except as disclosure controlled aggregate statistics and/or analysis as aggregated data with small numbers suppressed, in line with the Hospital Episode Statistics Analysis Guide.

Yielded Benefits:

The data processed under this Agreement has already allowed work to extend initial analysis carried out by ONS on their linked Census and Mortality data looking at coronavirus related deaths by ethnic group, by linking in HES data specifically to investigate the explanatory power of hospital-based comorbidity on ethnic differences. This has been published on the ONS website and has been used to inform government policy and health campaign decisions: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/updatingethniccontrastsindeathsinvolvingthecoronaviruscovid19englandandwales/deathsoccurring2marchto28july2020 This analysis has been further extended to take advantage of the acquisition of the GDPPR data in controlling for health status and studying ethnic variations in the mortality risk in the first and second wave of the pandemic. This has been published as a pre-print article and can be used to inform government policy and health campaign decisions: https://www.medrxiv.org/content/10.1101/2021.02.03.21251004v1.full.pdf Though revisions may yet be required, initial validation of the QCOVID algorithm has been completed with results shared as agreed with the Scientific Advisory Group for Emergencies (SAGE) and with PHE and the University of Oxford.

Expected Benefits:

This analysis is of national public health importance and has been requested by the National Statistician, NHS Digital’s Chief Statistician and members of the Scientific Advisory Group for Emergencies (SAGE). The results of the analysis will be used to inform members of SAGE, Members of Parliament (MPs) and other government officials of the differing COVID-19 risk profiles experienced by UK citizens. These statistics will enable the government to refine its policy response to the pandemic using the best evidence available.

The analysis may also improve the public’s understanding of the risk faced by certain population groups, leading to more informed decision making, and add to the growing body of literature being produced and evaluated by the global academic community.

Ultimately this analysis has the potential to deliver public health benefit by reducing COVID-19 related mortality and morbidity in the UK, and potentially saving lives.

Outputs:

Statistics produced by the project will be published in the form of reports and aggregate data on the ONS website.

These will include analysis of COVID-19 outcomes by socio-demographics, comorbidities and risk factors associated with COVID-19.

This will extend the work already published by the analysis team such as:
https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/coronaviruscovid19relateddeathsbyethnicgroupenglandandwales/2march2020to15may2020

Any official statistics produced will be shared with MPs, members of SAGE and other government officials to inform the government’s response to the COVID-19 pandemic. Briefing specific to those users will be produced to accompany the published reports and statistics themselves.

Processing:

Dataset 1: HES data

ONS receives HES data under a separate Data Sharing Agreement (DARS-NIC-175120-W5G2X) which will be reused for this purpose.

Datasets 2 and 3: Emergency Care Dataset (ECDS) and General Practice Extraction Service (GPES) Data for pandemic planning and research (GDPPR)

As with the HES data which is transferred from NHS Digital to ONS on a monthly basis, these data will be transferred by Secure Electronic File Transfer.

Once received by ONS, Data security for storage and linkage of the data will be provided within an assured ONS data analysis environment that includes the following elements of security control:
• Need To Access applied through user account access and management . Access to the data is restricted to individuals granted access on the basis of a justified need to access the data
• Controlled ingest and export of data into/out from the DAP environment
• Controlled account access using unique credentials based on job role
• Logged and monitored access of user activity within the DAP environment
• Secure build configuration for infrastructure
• Vulnerability tested infrastructure with appropriate remediation and patching
• Compliance checks against security enforcing controls
• Architectural review against standards and best practice
• Staff security cleared to the appropriate level based on their supervised and/or unsupervised access to sensitive data in accordance with ONS clearance policies and data access processes
• Education and awareness of environment users covering security policies and secure working practices
• Operational support processes to securely manage the environment
• Risk assessment to identify security risks and mitigation actions to reduce this risk.

Following policy specified by the ONS Chief Security Officer, ONS user access to the data environment is only after approval of an application by the Information Asset Owner including ethical assessment of proposed data use. A list of approved users is available on request.

With reasonable notice, periodic written/verbal checks may be conducted by an authorised employee of NHS Digital to confirm compliance with this Agreement.

ONS will keep a record of any processing of Personal Data and will provide a copy of such record to NHS Digital on request. ONS will not transfer or permit the transfer of the Data to any territory outside the UK without the prior written consent of NHS Digital.

As described in section 5a, the proposed purposes require linkage of records at the individual level. This is why personal identifiers such as date of birth, postcode and NHS number are required. However, ONS is only interested in producing aggregate statistics and using these to uncover trends and other useful insights based on the non-identifiable ‘attribute’ information.

ONS will keep the number of staff permitted to process identifiers to an absolute minimum, and these staff will have a higher level of clearance. All other staff will only be permitted to access non-identifying data.

Inadvertent re-identification is still a risk but ONS will never seek to intentionally re-identify this data. ONS staff are suitably trained; for example ONS’s health analysts in particular are experienced working with sensitive data about deaths (such as individual level data about suicides). Further, only statistical disclosure controlled aggregate outputs will be exportable from the secure data analysis environment. In other words, other than the initial transfer of the data from NHS Digital to ONS, the identifiable data will never be in transit and will always be protected by procedural and technical controls:

Access to data held within the Data Access Platform (DAP), which includes HES, ECDS and GDPPR data, is granted to users on a need-to-know basis depending on their role, through a request process which provides a business justification. Access is authorised on a case-by-case basis by the ONS Information Asset Owner (IAO) responsible for the data, with advice from Security and Information Management. Staff requesting access to sensitive data such as these must be cleared to the appropriate National Vetting level, which is higher than the standard basic clearance required for all ONS staff. Only authorised ONS staff with appropriate security clearance will have access to identifiable HES, ECDS and GDPPR data, with regular audit and monitoring in place to ensure compliance

The necessity for the processing of the data for the purposes described in section 5a are largely to do with ensuring the quality and therefore the value of the statistics that can be produced using such complete and record level data compared with less than this (for example, a subset, random sample, or aggregate data). There is more on statistical quality on the ONS website including the following:

‘The quality of a statistical product can be defined as the “fitness for purpose” of that product. More specifically, it is the fitness for purpose with regards to the European Statistical System dimensions of quality:

• relevance – is the degree to which a statistical product meets user needs in terms of content and coverage?
• accuracy and reliability – how close is the estimated value in the output is to the true result?
• timeliness and punctuality – describes the time between the date of publication and the date to which the data refers, and the time between the actual publication and the planned publication of a statistic
• accessibility and clarity – is the ease with which users can access data, and the quality and sufficiency of metadata, illustrations and accompanying advice?
• coherence and comparability – is the degree to which data derived from different sources or methods, but that refers to the same topic, is similar, and the degree to which data can be compared over time and domain, for example, geographic level?

There are additional characteristics that should be considered when thinking about quality. These include output quality trade-offs, user needs and perceptions, performance cost and respondent burden, and confidentiality, transparency and security.’

In the case of this project, it is crucial that the chances of drawing incorrect conclusions are kept to an absolute minimum, and that accurate statistics can be produced quickly. This could save lives. To receive less data than are being requested, or to use aggregate data will lead to less accurate statistics with greater uncertainty. It will also take longer to produce the statistics as methods will need to be used to account for the missing data and/or greater limitations of the data compared with if ONS have access to everything that has been requested.


Office for National Statistics requirements for NHS-Digital data, for the purposes of Statistics and Statistical Research, under section 45 of the Statistics and Registration Services Act 2007 as amended by the Digital Economy Act 2017 — DARS-NIC-175120-W5G2X

Type of data: information not disclosed for TRE projects

Opt outs honoured: No - legal basis permits flow of identifiable data, No - Statutory exemption to flow confidential data without consent, Identifiable, Anonymised - ICO Code Compliant, No (Statutory exemption to flow confidential data without consent, Mixture of confidential data flow(s) with consent and flow(s) with support under section 251 NHS Act 2006)

Legal basis: Other - Data dissemination is mandated under section 45c of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act 2017, Other-Data dissemination is mandated under section 45c of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act 2017, Other-(Data dissemination is mandated under section 45c of the Statistics and Registration Service Act (2007) as amended by the Digital Economy Act 2017),

Purposes: No (Agency/Public Body)

Sensitive: Sensitive, and Non-Sensitive

When:DSA runs 2019-03-08 — 2022-03-07 2019.03 — 2024.02.

Access method: One-Off, Ongoing

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No

Datasets:

  1. Improving Access to Psychological Therapies Data Set
  2. Hospital Episode Statistics Admitted Patient Care
  3. Hospital Episode Statistics Outpatients
  4. Hospital Episode Statistics Accident and Emergency
  5. Birth Notification Data
  6. Emergency Care Data Set (ECDS)
  7. HES-ID to MPS-ID HES Accident and Emergency
  8. HES-ID to MPS-ID HES Admitted Patient Care
  9. HES-ID to MPS-ID HES Outpatients
  10. Improving Access to Psychological Therapies Data Set_v1.5
  11. Hospital Episode Statistics Accident and Emergency (HES A and E)
  12. Hospital Episode Statistics Admitted Patient Care (HES APC)
  13. Hospital Episode Statistics Outpatients (HES OP)
  14. Improving Access to Psychological Therapies (IAPT) v1.5
  15. Improving Access to Psychological Therapies (IAPT) v2

Objectives:

The Office for National Statistics (ONS), as the executive arm of the UK Statistics Authority (UKSA) requires access to administrative data held by NHS Digital, for the production of official statistics.

In the past it has been difficult for ONS to access administrative data controlled by other Government departments, information that could potentially transform official statistics and the impact they have on decision making for the better. Often, this has been caused by the lack of a clear legal basis under which the data can be shared with ONS. As a result, in 2016, ONS set out why legislation was needed for better access to data:

https://www.statisticsauthority.gov.uk/publication/delivering-better-statistics-for-better-decisions-data-access-legislation-march-2016/

As a result, the Digital Economy Act in April 2017 amended the Statistics and Registration Services Act (2007) (SRSA) such that ONS can require public authorities to share data with it. See the Digital Economy Act (chapter 7 of part 5):

http://www.legislation.gov.uk/ukpga/2017/30/part/5/chapter/7/enacted

More specifically, section 45c of the SRSA 2007 (as inserted by section 80 of the Digital Economy Act 2017) permits the Statistics Board (of which ONS is part) to serve a Notice on a public authority requiring it to disclose information it holds in connection with its functions:

http://www.legislation.gov.uk/ukpga/2007/18/section/45C

To do so, the information so disclosed must be required by the Statistics Board for one or more of its functions as set out in the SRSA 2007 and the Census Act 1920.

The SRSA (2007) states that the ONS’s objectives include ‘promoting and safeguarding the production and publication of official statistics that serve the public good, where serving public good includes informing the public about social and economic matters, and assisting in the development and evaluation of public policy’. It also sets out the Board’s functions, which are the specifically referred to in section 45c of the amended SRSA. Notably they include, under section 20, that ONS ‘may produce and publish statistics relating to any matter relating to the United Kingdom or any part of it’.

Requirements made under section 45 must also be in line with a statistical statement of principles that has been approved by parliament:

https://www.gov.uk/government/publications/digital-economy-act-2017-part-5-codes-of-practice/statistics-statement-of-principles-and-code-of-practice-on-changes-to-data-systems

This states that ‘We will only seek access to data for the purposes of fulfilling one or more of our statutory functions, including to produce official statistics and undertake statistical research that meets identifiable user needs for the public good.’

The statement also sets out six principles to which ONS will adhere when requiring information under section 45; they state that ONS will:
• safeguard confidentiality
• be transparent about what data it is accessing and why
• ensure accessing the data is lawful and meet strict ethical standards
• ensure that accessing the data is in the public interest - for example that the data are fit for purpose for the statistical use which ONS intends
• ensure requiring that the data be supplied is proportionate – for example, ONS will have exhausted possible alternatives
• seek to collaborate with suppliers at all times

In addition, the following is a useful framework for categorizing ONS’s statistical uses for information such as that covered under this agreement. They are all ultimately all related to ONS’s functions of producing Official Statistics mentioned earlier:

• Improvements to existing Official Statistics
• Development of new Official Statistics – this may involve testing to investigate whether statistics of sufficient quality can be produced, and may also involve the production of statistics badged as ‘experimental’ while further work is done to improve quality aspects such as accuracy
• Quality assurance of Official Statistics
• Development of commentary around Official Statistics
• Replacement of current survey questions – developing statistics from available data to directly replace the need to collect the information through survey questions
• Improving efficiency or accuracy of sampling – for example, ensuring that a representative sample of the target population is taken when conducting a survey of the public, such that the statistics produced from the survey are the best possible reflection of reality
• Research and development of methodology – for example, using data to develop and test linkage methodology that is ultimately used to help produce statistics based on other data rather than the original data source

Using robust information governance processes, ONS has determined that the conditions associated with requiring data under section 45c of the amended SRSA have been met for the information in this data sharing agreement. This process involved working closely with NHS Digital’s experts to help determine that the data would likely be of good enough quality to meet the proposed statistical purposes. This work guided ONS’s assessment against some of the principles underpinning its legal powers – for example whether sharing the data is in the public interest, and proportionate in terms of burden on the supplier. In addition, as part of its commitment to transparency, ONS will publish full details of the reasons for acquiring the information, and ONS notes that NHS Digital will also publish this data sharing agreement.

In terms of public interest, it is worth noting that the benefits gained from the statistics enabled by this data share do not need to be specific to health and social care when data are flowing under section 45 of the SRSA. For example, some of the data being required will help improve ONS’s population and economic statistics, and in these cases, the improved statistics may not benefit health and social care directly.

The data shared with ONS under this agreement will not be onwardly disseminated or shared, except as disclosure controlled aggregate statistics and/or analysis as aggregated data with small numbers suppressed, in line with the Hospital Episode Statistics Analysis Guide. Any exceptions to this would require additional NHS Digital approval . It would also require an appropriate alternative legal gateway, because section 45c of the SRSA as amended by the Digital Economy Act only enables data to be shared with ONS (not for example, other Government departments or academic researchers).

The rest of this section will set out the specific purposes for which ONS requires each dataset. Each purpose will be linked to the framework of statistical uses set out above.

In future, ONS may decide to put a dataset to new uses not explained below. In these cases, the new use will be in line with ONS’s legally defined functions. ONS will inform NHS Digital and enter into an amended Data Sharing Agreement before proceeding with that new purpose .

Dataset 1: Birth Notifications data

NHS Digital has disseminated birth notifications data to ONS since 2005. Support under section 251 of the NHS Act 2006 (reference PIAG 4-05(d)/2005) permitted this sharing but the legal gateway under which the data will continue to flow will change to section 45c of the amended SRSA 2007.

The birth notifications data contribute to ONS statistical analyses of births, maternities, and infant mortality outcomes. Analyses are made publicly available as aggregate National Statistics. These statistics help a range of public and other bodies make better decisions (see section 5d). They also feed into the Department of Health's NHS Outcomes Framework for monitoring low birthweight of term babies.

Birth registration data that ONS receives from the General Register Office (GRO) is the primary source for producing these statistics, and ONS become controllers of that data under Section 42 of the 2007 Statistics and Registration Services Act. However, there are some limitations with the GRO data, including a lack of medical information such as length of gestation, as well as some missing and implausible values in the fields that are available.

To mitigate these limitations, the NHS Digital birth notifications data are used to improve and validate the registration data. Before this can be done, the two datasets must be linked at an individual level. Several identifying variables such as NHS number are received to enable this linkage.

In terms of the statistical uses framework set out earlier, then the data are used for:
• Improving official statistics – additional information not on the birth registrations data can be added at the record level once the two sources have been linked
• Quality assurance of official statistics – where information is on both sources, the birth notifications data can be used to validate the values contained in the birth registration data, and potentially edit (overwrite) the birth registrations data where that value is missing or implausible

ONS also plans to use birth notifications data to help develop and improve its data linkage methodology. For example, the birth notifications data allows ONS to link siblings born at different times (i.e. not twins) using the NHS number of the mother which is only available on the notification data. This provides a ‘gold standard’ linkage method.

ONS can then then attempt to link siblings together using only the data available in the registration data – e.g. mother’s name and date of birth, but not NHS number. ONS can then assess how closely the latter linkage method matches the gold standard. This will inform the best matching methodology to use when seeking to link siblings if NHS number of mother is not available. This is needed to link pre-2005 birth registration data, a time when the birth notification data is not available to ONS. This purpose would fall under the Research and development of methodology category in the uses framework above.

Dataset 2: Hospital Episode Statistics

There are a range of initial statistical uses to which ONS intends to put Hospital Episodes Statistics (HES) data.

Generally, linkage to other sources at a record level is a prerequisite to success for all proposed uses, and therefore identifiers including postcode, date of birth, sex and NHS number are required. The other HES information required varies by purpose, broken down below.

The specification of the variables being required has been developed in collaboration with NHS Digital data experts to ensure the data being shared are of sufficient quality (e.g. coverage, accuracy, relevance) to be likely to support the statistical purpose intended. The proposed uses of the HES data are as follows.

2.1. To enable ONS’s Administrative Data Census Project, including placing administrative data at the core of migration statistics, using ‘activity’ and characteristics data from HES

ONS’s Administrative Data Census Project (ADC) is assessing whether the Government’s ambition that ‘censuses after 2021 be conducted using other sources of data’ can be realized.

ONS aims to replicate the type of information collected through a census by using administrative data already held by government, supplemented by surveys. This can then be compared with the data collected by the 2021 census itself. This will allow ONS to determine whether this alternative approach can meet users’ needs.

In addition, ONS set out a cross-Government Statistical Service (GSS) programme working with the Home Office (the lead policy department), the devolved administrations and other government departments who have a strong interest in improving the migration evidence base. ONS aims to deliver improvements in migration statistics by putting administrative data at the core of migration statistics as part of the wider transformation to an administrative data-based population statistics system. The programme also recognises the changing demand from users of migration statistics and the need for more information on the impact migrants have while they are in the UK:

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/internationalmigration/articles/migrationstatisticstransformationupdate/2018-05-24

There are two main types of information from the Hospital Episodes Statistics dataset that are needed for these projects; so called ‘activity data’, and characteristics data.

a. Activity data

ONS has access to administrative sources that include a large proportion of the population such as GP patient registration information and tax records. These provide evidence of how many people live in each area of the country. However, these sources often suffer from over coverage. This is because people may have left the country but still appear in the data, creating the risk that the size of the national population is overestimated. Even when someone is still in the country, they may move without updating their address information with relevant services – for example, they may not register with a new GP at their new location until they need to see a doctor. In this case, there is a risk of ONS including them as contributing to the resident population in the wrong part of the country.

ONS can mitigate these limitations using other sources such as HES. For example, where these other sources show that an individual is interacting with a service, it provides evidence that they are in the country, and indeed which address information is correct (if the main sources mentioned earlier do not agree on this). For this particular use, ONS only requires information about where and when individuals are interacting with hospital services, not why.

b. Characteristics data

Ethnicity and national identity received one of the highest user needs scores from the 2015 census topic consultation, and the census ethnicity information is used by national and local decision makers; for example, in equality impact assessments when local authorities make changes to service delivery. The traditional census includes questions on ethnicity but it is currently very difficult to estimate ethnicity at a local level between censuses. The feasibility of producing admin data based ethnicity estimates will be important when deciding whether to move to an admin data based census after 2021.

Very few administrative sources capture ethnicity at all, so including ethnicity on an administrative data census is challenging. However, HES is one of the few sources where ethnicity is captured. ONS has worked with NHS Digital data experts to understand the limitations of the HES ethnicity data and there are several; for example coverage and differences between the ethnicity categories used on HES vs on the Census. However, there are methodological approaches that can be used to mitigate these, and ONS is of the view that it is in the public interest this ethnicity information is acquired from HES.

In terms of the framework of statistical uses presented earlier in this section, then the Administrative Data Census project work described (both a and b) falls into multiple categories:

• Improvements to existing Official Statistics - If an Administrative Data Census proves feasible, ONS will be able to produce census-type population and other statistics more often, in more granular detail, produce new analyses not possible using traditional census data, and reduce the cost and burden on the public by avoiding a traditional decennial census
• Development of new Official Statistics - In the short term, ‘activity data’ from HES may contribute to new admin data-based migration statistics
• Quality assurance of Official Statistics - ‘Activity data’ will help ONS quality assure presence and address information from other sources
• Development of commentary around Official Statistics - Identification of interaction by migrants with secondary care will allow ONS to expand on and increase the frequency of commentary on population changes and impacts, meeting user demand and providing better evidence to better inform policy-makers; for example, impact of migrants on public service demand
• Research and development of methodology - Estimating ethnicity at a population level by local area using an Administrative Data Census approach will be challenging. Using HES ethnicity data, methodological teams will gain experience of developing methods to mitigate the statistical weaknesses often found in administrative data. For example, how to adjust for bias in coverage, and also data being collected on a different statistical definition compared to the desired definition

2.2. To conduct a range of Statistical Research and Health Analyses using clinical data from HES

ONS’s health analysts will use information about why people have accessed hospital services, for example diagnosis, for a range of statistical purposes.

This information is clearly more sensitive, and the intended statistical uses will require testing to determine whether official statistics of sufficient quality can be produced using HES data. As such, the volume of this information is being minimised to that absolutely necessary to do this. In practice, this means fewer years’ worth of information about why people have accessed hospital services will be shared with ONS, compared with the information about when and where people have accessed services.

a. Exploring the feasibility of producing robust projections of the future health state of the nation.
The State pension age review, 2017, called for more work on healthy life expectancy projections to better inform future decisions about the state pension age. The review also noted their potential value in informing planning future health and social care provision at a local and national level.

These projections would need to take into account population projections, morbidity and mortality trends, and other characteristics, and HES could provide some of the information required. ONS recognises that there are serious limitations when using healthcare activity data, particularly hospital episodes, to make inferences about the health of the population. However, using the HES data experimentally will allow ONS to investigate the possibilities of this dataset contributing to more complete estimation of selected serious and acute illnesses, in combination with mortality data and other relevant sources.

It will be necessary to link the HES data with other data sources to prevent double counting of cases and understand the relative completeness, coverage and quality of each data source, and to enable additional demographic variables to be applied to the HES data, therefore record level identifiable data is required.
In terms of the framework of statistical uses, this would be Research and Development of Methodology in the first instance, with the ultimate goal of Developing New National Statistics.

b. Exploring the use of linked morbidity, mortality, census, benefits and other data to produce more granular statistics on health inequalities and health state life expectancies.

ONS healthy life expectancy statistics are central amongst the public health indicators that help guide decisions by Local Authorities (LAs) about the distribution and prioritisation of services. More local level health expectancy statistics, and more breakdowns such as ethnicity, educational attainment and occupation based socioeconomic position to examine interactions would provide insight allowing LAs to better target interventions to reduce health inequalities.

Researching the feasibility of meeting this need will involve linking the HES data to individuals’ self-assessments of their health and disability status as collected by the 2011 Census, the ONS annual population survey since 2011 (for those surveyed), and ultimately the 2021 Census once collected in due course. ONS will explore the relationship between hospital admissions and self-reported health status at both individual and small area levels, and with reference to potentially mediating or confounding demographic and geographic variables. Therefore, identifiable record level data is required, including postcodes.

Research will include exploring the feasibility of using actual morbidity data such as HES to supplement or even replace survey data to produce healthy life expectancy estimates, potentially allowing more granular statistics.
In terms of the framework of statistical uses, this would be this would be Developing New National Statistics and potentially Replacing current survey questions.

c. Exploring the completeness of death certification and patterns of comorbidities in specific population groups
ONS holds data from the compulsory registration of all deaths in England and Wales. The information recorded about causes of death is sometimes unclear or inadequate for the range of public health, monitoring and research purposes to which the data can be put. The majority of deaths occur in hospital, or following an illness for which the deceased had hospital treatment. Linking the diagnosis data in HES with the registered causes of death will allow exploration of the relationships between them, including:

(i) Understanding multi-morbidity and vulnerability in the elderly. It is well-known that deaths of elderly people tend to mention more health conditions, but also to be less specific in a way which makes identifying the factor(s) which contributed most to death difficult. Terms such as ‘old age’ and ‘frailty’ are often used on death certificates with no specific clinical cause of death. By examining the HES diagnoses and registered causes of death together, ONS will aim to throw more light on the combinations of health conditions in elderly people (multimorbidity), the role and frequency of key conditions such as pneumonia and sepsis in the causal pathways leading to death, and if possible to develop new measures of avoidable mortality in the elderly.

This use would require the linkage of HES to deaths at the individual record level. ONS would also link the data to the Census and/or survey data, so as to explore the role of social factors such as living alone in deaths of the elderly along with clinical factors, with the potential to identify at-risk groups and improve targeting of preventive interventions.

(ii) Understanding infant mortality. The causes of death recorded at registration of perinatal deaths in particular are often very broad and not clinically meaningful. ONS is discussing with clinical and scientific experts ways to improve this information and to determine the underlying cause of death. Linkage of the HES data to registered deaths will provide extra information on the factors underlying the recorded causes of death. ONS will aim to improve the accuracy and completeness of infant mortality statistics, potentially contributing to the government ambition to halve infant mortality by 2025.

In terms of the framework of statistical uses, these projects would contribute to Improvements to existing Official Statistics, Quality Assurance of Official Statistics and Developing New National Statistics.

2.3. improving ONS’ Address Register

This project will investigate using HES data to identify and/or validate the addresses of communal establishments, and would require information including where individuals were admitted from and discharged to. Also:
• Length of stay information will provide evidence of how many people ONS would expect to be classed as usually resident (> 6 months stay) in hospital at any given time
• Sex information may assist with identifying communal establishments that are male or female only.

In terms of the framework of statistical uses, this research, if successful, would enable Quality Assurance of Official Statistics and Improved efficiency / accuracy of sampling.

2.4. Creating a better estimate of the UK household expenditure on hospital services (inpatient only) and medical and paramedical services (outpatient)

The ONS national accounts framework provides a simple and understandable description of national production, income, consumption, accumulation and wealth.

The national accounts research team will investigate whether HES data can improve estimates of revenue paid by patients, split into outpatient and inpatient activity, private patient episodes split by outpatient and inpatient activity, and outpatient activity split between medical services and paramedical services.

The data may also be used to improve the figures on UK healthcare resources, activity and expenditure which are provided regularly to the international institutions (Eurostat, OECD and WHO) for comparative purposes.

In terms of the framework of statistical uses, the ultimate aim would be to Improve an existing National Statistic – i.e. UK national accounts.

2.5. Enabling the UK to report data or proxy indicator data to measure its progress against the United Nation's Sustainable Development Goals (SDGs)

The UK is committed to reporting progress against all of the internationally agreed Sustainable Development Goals (SDGs), and ONS will lead on delivering this. In some cases, new indicators will need to be developed, and/or new uses made of existing data. Interest in HES is specifically around the feasibility of providing data for the following Sustainable Development indicators:
• Maternal mortality ratio
• Proportion of births attended by skilled health personnel
• Number of people requiring interventions against neglected tropical diseases
• Coverage of treatment interventions (pharmacological, psychosocial and rehabilitation and aftercare services) for substance use disorders
• Proportion of women of reproductive age (aged 15-49 years) who have their need for family planning satisfied with modern methods
• Coverage of essential health services (defined as the average coverage of essential services based on tracer interventions that include reproductive, maternal, newborn and child health, infectious diseases, non-communicable diseases and service capacity and access, among the general and the most disadvantaged population)

ONS’s SDGs team are working with NHS Digital and Public Health England (PHE) to produce these indicators without the need for data sharing. However, ONS also needs to disaggregate these headline indicators by ethnicity, age, sex, disability and geography. In some cases, NHS Digital / PHE will not hold data that would enable this, but linking HES data to ONS held data such as from Census 2011 at an individual level may fill this gap.

In terms of the framework of statistical uses, the ultimate aim would be to Develop a new National Statistic.

Dataset 3: Improving Access to Psychological Therapies (IAPT) Dataset

3.1. To enable research being conducted by ONS’ Administrative Data Census and Migration Statistics improvement projects using ‘activity’ and characteristics data from IAPT.
This first use is essentially the same as described for the uses of HES data within these projects: The IAPT data provides evidence of presence at a particular address, and it also includes information on characteristics including ethnicity. See section 2.1 above (within the HES section) for the full rationale for why this information is needed.

3.2. To conduct a range of Statistical Research and Health Analyses using IAPT data

a. Statistical Research to inform Primary Mental Health Service Policy Making

This project will focus on common mental health disorders (CMDs) such as anxiety and depression. Using a phased approach, ONS will look first at the mortality risk of people with CMDs, co-morbidities between mental and physical health problems, and investigate inequalities around mental health. In the second phase, ONS will investigate income and employment transitions for patients who have been through mental health treatment.

The first phase will address existing evidence gaps on co-morbidities between mental and physical health, improve understanding on the demographics of people with CMDs, and investigate whether some or all people with CMDs are more at risk of death than the general population.

The IAPT data for 2012 to 2017 will be linked to the 2011 Census to provide detailed demographic background, and to death registrations from 2012 to 2018. The mortality analysis will focus on specific causes of death which may be connected to mental health (suicide, alcohol and drug abuse) as well as overall risk. In addition, the causes of death will be compared to the distribution of causes in the general population to identify any common co-morbidity with life-threatening illnesses. This goes some way to provide insight into important issues raised by the NHS England Five Year Forward View on mental health:
“An important barrier to good care is the lack of appropriate data sharing to enable organisations to identify co-morbidities…People with poor mental health may require primary care, secondary physical care and social care, as well as mental health services, but the lack of linked datasets hinders effective provision.”

IAPT data is estimated to cover over 15% of people with CMDs in England. Because of the service’s large, national scale and focus on people with mild and moderate mental health conditions, it provides a reasonable proxy for patterns and trends in the population of people with diagnosable CMDs.

The IAPT data will be compared with the findings of the Adult Psychiatric Morbidity Survey (2007 and 2014) to assess likely issues of representativeness, such as the under-representation of specific population groups in the treatment cohort. People with severe mental health conditions are not typically treated in the IAPT programme.

The three-way linkage will provide an independent and more detailed demographic baseline than the IAPT data could do alone, and allow ONS to investigate if there have been changes in peoples’ circumstances between the Census and treatment in IAPT (e.g. becoming disabled or living alone). Having the mortality data linked as well allows ONS to see the overall trends in mortality, plus to see if there is any relationship between changes in demographics and the cause of death outcomes.

The research is not aiming to look at individual level outcomes or to evaluate the IAPT treatment, but to look for trends in the aggregate data after linkage, to provide population level analysis to inform policy.

Entry into IAPT treatment will be used as the main indicator of having a diagnosable CMD. The clinical data will not be analysed except to:
• Group the cohort into broad types of CMD
• Potentially, link successful/unsuccessful treatment outcome to risk of subsequent death.

b. Exploring the feasibility of producing robust projections of the future health state of the nation.

The State pension age review, 2017, called for more work on healthy life expectancy projections to better inform future decisions about the state pension age. The review also noted their potential value in informing planning future health and social care provision at a local and national level.

These projections would need to take into account population projections, morbidity and mortality trends, and other characteristics, and IAPT data could provide some of the information required. ONS recognises that there are serious limitations when using healthcare activity data to make inferences about the health of the population. However, ONS will investigate the possibilities of IAPT data contributing to more complete estimation of morbidity, to then use alongside mortality data and other relevant sources in health projection modelling. This assessment of morbidity would be in conjunction with other sources such as NHS Digital’s Hospital Episode Statistics (HES) that the Board has already required be shared with ONS.

c. Exploring the use of linked morbidity, mortality, census, benefits and other data to produce more granular statistics on health inequalities and health state life expectancies.

ONS healthy life expectancy statistics are central amongst the public health indicators that help guide decisions by Local Authorities (LAs) about the distribution and prioritisation of services. More local level health expectancy statistics, and more breakdowns such as ethnicity, educational attainment and occupation based socioeconomic position to examine interactions, would provide insight allowing LAs to better target interventions to reduce health inequalities.

Researching the feasibility of meeting this need will involve linking IAPT data to individuals’ self-assessments of their health and disability status as collected by the 2011 Census, the ONS annual

Yielded Benefits:

...

Expected Benefits:

As per section 5a, the legal gateway under which data will flow from NHS Digital to ONS will be Section 45c of the SRSA 2007 (as amended by the Digital Economy Act 2017). This means ONS can require that data are shared as long as the data are required for its functions, and the share is in line with the statistical statement of principles that underpins these powers.

These considerations include that the purposes to which ONS puts the data must be in the public interest and serve the public good. However, for this legal gateway, the benefits do not need to be to health and social care specifically. This is unlike some other legal gateways under which NHS Digital data can be disseminated, for example section 251 of the NHS Act 2006, when research outcomes must benefit health and social care.

In the above context, the following will briefly cover all the potential benefits by dataset.

Dataset 1: Benefits of the ONS births statistics which depend on the Birth Notifications data

Local authorities and other government departments are important users of birth statistics and use the data for planning and resource allocation. For example, local authorities use birth statistics to decide how many school places will be needed in a given area. The Department for Work and Pensions uses detailed birth statistics to feed into statistical models they use for pensions and benefits. The Department of Health uses the data to plan maternity services and inform policy decisions.

Other users include academics, demographers and health researchers, who conduct research into trends and characteristics. Lobby groups use birth statistics for their cause, for example, campaigns against school closures or midwife shortages. Special interest groups, such as Birth Choice UK, make the data available to enable comparisons between maternity units to help women choose where they might like to give birth, and work closely with health professionals. Charities, such as the Twins and Multiple Births Association provide advice and support to multiple birth parents and use the data to monitor trends. Organisations such as Eurostat and the UN use ONS birth statistics for international comparison purposes. The media also report on trends and statistics.

In addition, ONS’ births data is used as a component of its population statistics. Population estimates and projections are also used extensively throughout government and specifically by the Department of Health and their agencies for the planning and provision of health and social care services, and the distribution of funds. Throughout government, decisions on the distribution of billions of pounds of funds are made based on population estimates and projections.

In addition, they are used as the denominator in any statistics that are published on a per capita basis. For example, any health data published per capita for a particular level of geography (national, regional, local authority, clinical commissioning group, parliamentary constituency etc) is almost certain to use ONS estimated or projected population as the denominator. Population estimates and projections are published by age and sex. This means that they can also be used to better target age and sex specific health and care services (e.g. maternity, aging populations etc)

Benefits of Data Linkage Methodology Research that will use the birth notifications data: Any improvements in Data linkage expertise and methodology would be an enabler to other projects and therefore their benefits. This is because the more accurately data can be linked, the more accurate any statistics derived from the linked dataset will be. The benefits of these projects may or may not be relevant to health and social care – for example, many of the projects involving HES data will depend on accurate data linkage.

Dataset 2: Predicted Benefits of the uses for Hospital Episode Statistics data

2.1. Benefits of the Admin Data Census Project and Improved Migration Statistics

Population estimates and information on population characteristics are used by a wide range of national and local organisations for numerous purposes including resource and funding allocation for both local and central Government, service planning and delivery, policy development, monitoring and evaluation, and providing an accurate denominator for other statistics.

The Department of Health and their agencies use ONS’ population statistics for the planning and provision of health and social care services and the distribution of funds. Throughout government, decisions on the distribution of billions of pounds of funds are made based on population estimates and projections.

Respondents to the Census Topic Consultation conducted in June 2015 gave strong evidence for high-quality and more timely population estimates. If it proves feasible, an Admin Data Census approach will deliver more timely statistics. It will potentially also deliver more accurate and timely population statistics, at least in inter-censal periods, if not traditional census year itself. An Admin Data Census approach will also reduce cost and respondent burden.

New and more accurate information on international and internal migration is needed to better inform migration system policy making in a post-Brexit era. Evidence of this includes the 2017 Migration Advisory Committee call for evidence on aspects of migration, in response to a Government commission to guide decisions on post-Brexit migration policy and the cross-Government Statistical Service (GSS) migration transformation programme

2.2. Benefits of the Health Analyses

Successful production of robust health projections would support better decision making around where to set the state pension age, and planning of health and social care services:

Evidence of this includes the Cridland report, 2017 which was commissioned by government to independently review the state pension age, and made the following statement:
(https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/611460/independent-review-of-the-state-pension-age-smoothing-the-transition.pdf)

“We believe more work is needed to understand healthy life expectancy, as it affects a range of policy areas. Projecting healthy life expectancy into the future is not currently possible, but would be valuable for future Reviews, as well as in work around health and caring.”
The report also notes:
• Developments in Healthy Life Expectancy and Health State Transitions will have a notable impact on the demand for social care and different types of medical care, for instance the number of trained dementia nurses required in 40 years’ time
• In order to manage budgets and allocate funding effectively, there is a need to understand what the main patterns of key diseases will be, and what the distribution of these illnesses across the population will look like
• It is likely that the prevalence of diseases which affect the oldest old such as cancer and dementia will increase
• If social care and health care provision needs to be increased, the national budget will need to be changed to reflect this which may result in other services seeing cuts.

Current healthy life expectancy estimates rely on ONS surveys, where despite the large sample size, the number of possible breakdowns geographically and by characteristic is limited by this sample size. Current estimates also rely on aggregate figures – ie the prevalence of poor health / limiting long term conditions, and also mortality rates by age are calculated independently and then fed into the model. These factors limit the accuracy of the model.

Linking health states and mortality at the individual level over time, and for a greater proportion of the population (which may be possible using HES data) will allow more granular analysis. Linking to Census and other sources to add in other characteristics, could inform interventions to support tackling inequalities at the local level.

Improving understanding of causes of death in vulnerable population groups such as the elderly and infants, by using HES diagnostic data to supplement the registered causes of death, will improve mortality statistics which are currently relied on by government for a wide range of policy and resource allocation purposes and as indicators in the NHS outcomes frameworks.

Developing a better understanding of complex causes of death in the elderly will help to address an internationally acknowledged issue which is of growing importance as the average age of the population, and the proportion of deaths which are among elderly people, increases globally. There is international interest in developing new measures of avoidable death in the elderly, and the potential of studies on this to help identify those who are most at risk and target preventive interventions.

2.3. Benefits of Improving the ONS Address Register

This research will enhance the Address Register including the information held on communal establishments (CEs), for which there is currently a recognized data gap. A better Address Register will in turn benefit ONS’ other statistics, such as the population statistics described earlier. For example, it will allow ONS to quality assure its local level population statistics (whether from a traditional Census or other method) as local areas with CEs can have unusual demographic profiles, which can cause concern over the accuracy of the statistics unless the location and nature of the CE is known. It will also help with better planning of survey operations and sample design.

2.4. Benefits of Improving UK household Expenditure

Household Final Consumption Expenditure is a component of National Accounts; improvements to its accuracy therefore improve estimates of Gross Domestic Product (GDP). GDP is a key national economic indicator that drives national economic policy making, in turn potentially affecting the wellbeing (financial or otherwise) of everyone in the country.

2.5. Benefits of Improving ONS Sustainable Development Indicators

The UK was at the forefront of developing the United Nations recognized Sustainable Development Goals (SDGs). ONS aims to fully report UK progress against these goals (i.e. have data available for the SDG indicators that have been proposed), given the UK was heavily involved in SDG development, and wants to continue to show leadership in this space.

A key theme of the SDGs is to leave no one behind and ONS needs to be able to disaggregate the headline indicators so that it can be sure progress occurs across all groups, regardless of ethnicity, age, sex, disability, geography. Subject feasibility research, linking HES data to ONS held data such as from Census 2011 at an individual level, may help to achieve this goal.

In some cases, reporting against the SDG indicators will not always enable better decision making on UK government policy, but it will encourage other nations to fully report against the indicators, and by extension enable better decision making in those nations.

Dataset 3: Benefits of the uses for IAPT data

3.1. Benefits of the Admin Data Census project and improved migration statistics

This is essentially the same as described for the uses of HES data within ONS’ Admin Data Census Project. See section 2.1 above (within the HES section) for benefits of these uses.

3.2. Benefits of Statistical Research and Health Analyses using IAPT data

Mental health is a high priority in government policy. Co-morbidities between mental and physical health, as well as inequalities in mental health, are of increasing interest within health policy.

In their response to the Five Year Forward View (FYFV), NHS England set an objective for the majority of new common mental health disorder (CMD) services to be integrated with physical healthcare by 2020/21. This is in line with a King’s Fund report which provided evidence for the strong links between mental and physical health.

This project will add to the evidence base by:
• providing information on many physical conditions (rather than a focus on only a few key health problems, as in the Adult Psychiatric Morbidity Survey)
• providing a detailed demographic context, including information such as ethnicity, sexual orientation, occupation, marital status
• Investigating inequalities

Investigating the links between mental health, mortality, and co-morbidity has clear benefits for the public. By determining physical and mental health conditions that commonly co-occur, the government can target its health services to better meet the needs of patients resulting in a better patient experience, and ultimately could saves lives. For example, it may be that a particular cause of death has an increased prevalence in patients with CMD compared to the general population; by ensuring policy makers and clinical staff are aware of this, prevention and intervention could be more targeted.

The second benefit of the project is analysis of inequalities in mental health, in line with the FYFV “focus on tackling inequalities. Mental health problems disproportionately affect people living in poverty, those who are unemployed and who already face discrimination”. The King’s Fund found that people with long term physical health problems and co-morbid mental illness disproportionately live in deprived areas. This analysis would allow detailed geographical mapping of those with a CMD who died from particular causes, and analysis by deprivation deciles.

Other demographic variables could also be used for inequalities analysis to investigate any difference in premature mortality in certain demographic groups of IAPT users (age, sex, or occupation) versus the general population. Obtaining this information will benefit the public by allowing healthcare providers to target groups who may be disproportionately affected by physical and mental health problems, and subsequently reduce premature mortality due to co-morbidities.

The benefits of the other health statistics mentioned in section 5a (3.2) to which IAPT data will contribute are already described under the benefits gained from ONS acquiring HES data.

Outputs:

Dataset 1: Birth Notifications

Official Birth Statistics

Annual birth outputs represent births occurring in England and Wales in a given year. A package containing summary tables for the previous calendar year is released in July, with supporting commentary in a statistical bulletin. More detailed figures are then released between August and December in a series of themed packages. Each package consists of a number of data tables; these are generally accompanied by a statistical bulletin. ONS’ tables provide the latest year’s figures with some also showing historical data for comparison, sometimes back to 1837. ONS publishes all its statistics on its website, and also extends its reach through social media, for example its twitter feed.

There are several published packages:
Birth summary tables: includes the number of live births and stillbirths, fertility rates, percentage of live births outside marriage and civil partnership, mean age of mother and percentage of live births to non-UK born mothers for England and Wales as a whole. Live births (number and rate) and the number of stillbirths are also provided down to local authority level. To aid with user interpretation, ONS also publishes an interactive fertility mapping tool, which enables users to analyse trends in fertility by county district and unitary authority; this is contained within the statistical bulletin.

Parents’ country of birth: includes births by country of birth of mother and total fertility rates for UK born and non-UK born women for England and Wales as a whole. Summary figures are also available down to local authority level. ONS publishes detailed analysis on parents’ country of birth because this information is collected at birth registration and does not change over time, while their nationality or ethnicity may change.

Birth characteristics and by area of usual residence: contains statistics on stillbirths and maternities for England and Wales, birthweight data for live and stillbirths by mother's region of usual residence, and live births and stillbirths in hospitals and communal establishments by region of occurrence. These tables also provide figures on month and quarter of occurrence, place of birth, ethnicity and gestational age and multiple births for England and Wales as a whole. Also provides summary data for live births down to local authority level including figures by age of mother figures are published using boundaries in place during the year the birth occurred.

Births by parents’ characteristics: provides live birth, stillbirth and maternity statistics by age of mother and type of registration (within marriage and civil partnership, joint, sole). It also provides data on previous live-born children, National Statistics Socio-economic Classification (NS-SEC), median birth intervals, age-specific fertility rates for men and mean age of fathers. All tables are for England and Wales as a whole with no sub-national breakdown.

Childbearing for women born in different years (formerly known as Cohort fertility): presents data on fertility by year of birth of mother rather than the year of birth of child for England and Wales as a whole this package includes the average number of live-born children and the proportion of women remaining childless for women born in different years.

Data Linkage Methodology Research: This will result in internal, and potentially external, ONS reports and presentations on how best to link siblings / family units together when linkage based on NHS number is not possible. Any reports or presentations would not include statistics derived from the birth notifications data. They would only include figures comparing the success of various matching strategies compared to one based on linking using mother’s NHS number.

Dataset 2 and dataset 3: Hospital Episode Statistics and Improving Access to Psychological Therapies data

The initial uses to which ONS will put HES and IAPT data are most commonly new or improved official statistics that will enable better decision making (see sections 5a and 5d). To reach this goal, a lot of development work, testing, and quality assurance will be required to determine whether official statistics of sufficient quality can be produced in each case.

Generally, this initial work will be disseminated through a range of products and channels, in particular research updates and research outputs. For example, the Admin Data Census project already publishes its research outputs and work involving HES will be reported in similar fashion on this section of the ONS website:

https://www.ons.gov.uk/census/censustransformationprogramme/administrativedatacensusproject/administrativedatacensusresearchoutputs

Subsequently, projects will move on to the production of experimental statistics and potentially in due course, National Statistics (a status that can only be gained once certain quality standards are met). Both types are released via the ONS website.

By way of illustration, a good example of an experimental statistic is here:

https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/estimatingsuicideamonghighereducationstudentsenglandandwalesexperimentalstatistics/2018-06-25

This release is based on a project linking information about suicides with information on higher education students to increase the evidence base on suicides by those in higher education.

No targets can be given as to if and when experimental or National Statistics will be produced using HES or IAPT data until the initial stage of any given project is complete. All ONS statistical teams engage regularly with users, and will seek to provide frequent updates on these projects during that first stage.

Processing:

Dataset 1: Birth Notifications data

ONS receives the data in real time through its Spine2 connection from NHS Digital. It arrives as xml files which are converted on a secure WebLogic server before being transferred to another secure server for processing. Here, it is processed ready for ONS use. This server is separate to those which are used for other datasets from NHS Digital, due to the long-standing nature of this data share.

The birth notifications data are linked with ONS’s birth registrations data at an individual level. Where possible, NHS number of baby and/or mother are used. In some cases this will fail, for example when the same NHS number is used twice in the registrations data in error. Therefore, other demographic variables are used for linkage when required. The majority of this is automated matching with no visual inspection of the identifiable data, but in a small number of cases, clerical matching is required. Only a small number of security cleared, trained, substantive ONS employees are involved with this part of the process.

Once linkage is complete, other variables from the birth notifications data are used to either enhance or validate the birth registrations data. Once the enhancement and validation are complete, all additional birth notification data that are not needed for the production of statistics, notably the identifiers, are removed before any more ONS staff can access the data. As suggested in 5a, the resulting de-identified, linked dataset produced includes additional variables from the birth notifications data that were not on the birth registrations data.

This linked de-identified dataset is transferred to another secure server where health analysts can produce the statistics listed in section 5c. No attempt is made to re-identify individuals; ONS is only interested in producing aggregate statistics for the public good.

ONS employs strict security procedures to protect confidentiality throughout processing. These include:
• Only a small number of substantive ONS employees can access the data and all ONS employees who have access to the data have contractual obligations of confidentiality, enforceable via disciplinary procedures, as set out in the ONS Code of Practice
• Relevant staff are Security Check cleared, and have undergone appropriate training and ongoing supervision to maintain confidentiality and integrity
• Data are held on secure servers with restricted access, the data are only held in an identifiable form for the shortest period necessary to enable the data to be used for the stated purposes

The complete Birth Notification dataset is required, rather than just a sample, because the birth registrations data, which is the primary source for ONS’s birth statistics, is (in theory) a Census of all births. This means the statistics are more accurate than statistics based on surveys that suffer from sampling error.

If the variables that are appended to the registrations data from the birth notifications were only a sample, this would reduce the accuracy of some of the birth statistics, limiting their use and impact. In addition, the difference in accuracy between statistics based on different variables within the same statistical release would be confusing for users.

Some variables that are on the birth registrations have their value overwritten with the equivalent value from the birth notifications data. In this case, only having a sample of birth notifications data, or certain geographic regions, would potentially introduce bias. For example, it would mean ONS’s birth statistics are more accurate for the regions where it has birth notifications data and is therefore able to improve on any implausible values in the birth registrations data, than for those where it would not have been able to do this.

Dataset 2 and 3: Hospital Episode Statistics (HES) and Improving Access to Psychological Therapies (IAPT) data

Data security for storage and linkage of the data will be provided with an assured ONS data analysis environment that includes the following elements of security control:
• Need To Access applied through user account access and management . Access to the data is restricted to individuals granted access on the basis of a justified need to access the data
• Controlled ingest and export of data into/out from the DAP environment
• Controlled account access using unique credentials based on job role
• Logged and monitored access of user activity within the DAP environment
• Secure build configuration for infrastructure
• Vulnerability tested infrastructure with appropriate remediation and patching
• Compliance checks against security enforcing controls
• Architectural review against standards and best practice
• Staff security cleared to the appropriate level based on their supervised and/or unsupervised access to sensitive data in accordance with ONS clearance policies and data access processes
• Education and awareness of environment users covering security policies and secure working practices
• Operational support processes to securely manage the environment
• Risk assessment to identify security risks and mitigation actions to reduce this risk.

Following policy specified by the ONS Chief Security Officer, ONS user access to the data environment is only after approval of an application by the Information Asset Owner including ethical assessment of proposed data use. A list of approved users is available on request.

With reasonable notice, periodic written/verbal checks may be conducted by an authorised employee of NHS Digital to confirm compliance with this application.
ONS will keep a record of any processing of Personal Data and will provide a copy of such record to NHS Digital on request. ONS will not transfer or permit the transfer of the Data to any territory outside the UK without the prior written consent of NHS Digital.

As described in section 5a, the proposed purposes require linkage of records at the individual level. This is why personal identifiers such as date of birth, postcode and NHS number are required. However, ONS is only interested in producing aggregate statistics and using these to uncover trends and other useful insights based on the non-identifiable ‘attribute’ information.

ONS is in the process of enhancing the capabilities of its Data Access Platform to allow for variable-by-variable control of researcher access granting. Until this is completed, any analyst who is granted access to the dataset will technically have access to all variables and identifiers. However, users will not be permitted to access identifiers for the purposes of analysis. In addition to the system protocols above, ONS will therefore keep the number of staff permitted to process identifiers to an absolute minimum, and these staff will have a higher level of clearance. All other staff will only be permitted to access non-identifying data.

Inadvertent re-identification is still a risk but ONS will never seek to intentionally re-identify this data. ONS staff are suitably trained; for example ONS’s health analysts in particular are experienced working with sensitive data about deaths (such as individual level data about suicides). Further, only statistical disclosure controlled aggregate outputs will be exportable from the secure data analysis environment. In other words, other than the initial transfer of the data from NHS Digital to ONS, the identifiable data will never be in transit and will always be protected by procedural controls in place now and technical controls to be implemented by 31/12/2019 to enforce the controls as described above.

The reasons complete information on who, when and where people accessed hospital services for 2009/10 onwards is needed varies across the multiple statistical purposes presented in section 5a. The drivers are largely to do with quality and therefore value of the statistics that can be produced using the full dataset compared with less than this, for example a subset or random sample. There is more on statistical quality on the ONS website including the following:

‘The quality of a statistical product can be defined as the “fitness for purpose” of that product. More specifically, it is the fitness for purpose with regards to the European Statistical System dimensions of quality:

• relevance – is the degree to which a statistical product meets user needs in terms of content and coverage
• accuracy and reliability – is how close the estimated value in the output is to the true result
• timeliness and punctuality – describes the time between the date of publication and the date to which the data refers, and the time between the actual publication and the planned publication of a statistic
• accessibility and clarity – is the ease with which users can access data, and the quality and sufficiency of metadata, illustrations and accompanying advice
• coherence and comparability – is the degree to which data derived from different sources or methods, but that refers to the same topic, is similar, and the degree to which data can be compared over time and domain, for example, geographic level

There are additional characteristics that should be considered when thinking about quality. These include output quality trade-offs, user needs and perceptions, performance cost and respondent burden, and confidentiality, transparency and security.’

The clearest example of the need for the information in this agreement is using so called activity data to determine where in the country people were/are resident as part of the ONS Administrative Data Census project. This project is developing new population statistics methods and products that cover the whole of England (and beyond), so complete HES and IAPT coverage is required. In addition, a decision is required post-2021 Census about whether these new methods and statistics can replace the traditional Census. To determine this robustly requires that the new methods and statistics are produced for the whole of the 2011 to 2021 time period. This will allow a robust view to be taken of the level of error and drift of those new statistics during this period, comparing them to the gold standard Census figures available for 2011 and 2021.

For the health analysis purposes presented in section 5a that require why people interacted with hospital services, then similar arguments around quality apply; Health projections that rely on HES diagnosis information will require full coverage for an extended time period. However, this and the other health analysis uses presented are more complex than a lot of the other proposed uses and require more ground work to determine whether statistics of sufficient quality can be produced. In addition, diagnosis information is clearly more sensitive. As a result, ONS has determined that it is proportionate and in the public interest that the years worth of HES diagnosis information required is minimised for now. It will still be possible to test statistical quality for these uses with this volume of information. But ONS do expect this work to be successful and if it is ONS will require additional years of diagnosis information be shared at a later date.

Access to data held within the Data Access Platform (DAP), which includes HES data, is granted to users on a need-to-know basis depending on their role, through a request process which provides a business justification. Access is authorised on a case-by-case basis by the ONS Information Asset Owner (IAO) responsible for HES data, with advice from Security and Information Management. Staff requesting access to HES data must be cleared to the appropriate National Vetting level, which is higher than the standard basic clearance required for all ONS staff. Only authorised ONS staff with appropriate security clearance will have access to identifiable HES data, with regular audit and monitoring in place to ensure compliance


ONS Longitudinal Study — DARS-NIC-705741-K8K9G

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable, No (Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 - s261(5)(d); Other-Section 45A of The Statistics and Registration Service Act 2007 (SRSA) as amended by the Digital Economy Act 2017

Purposes: No (Agency/Public Body)

Sensitive: Non-Sensitive

When:DSA runs 2023-07-01 — 2024-06-30 2023.08 — 2023.08.

Access method: One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: Yes

Datasets:

  1. Demographics

Yielded Benefits:

1. Health and Mortality: How large are inequalities between people from different backgrounds? When the ONS Longitudinal Study (LS) was established in 1974 a primary goal was to compile new information on differences in mortality between people in different occupations. Since then, it has been used to provide unique information to support a series of major reports for government on health and mortality: • Inequalities In Health, 1980 (The Black Report) for the Department of Health and Social Security: https://pubmed.ncbi.nlm.nih.gov/7118327/ • The Health Divide: Inequalities In Health In The 1980s, 1987 (The Whitehead Report) for the Health Education Council: • Independent Inquiry into Inequalities in Health Report, 1998 (The Acheson Report) for the Department of Health: www.archive.official-documents.co.uk/document/doh/ih/contents.htm • Fair Society Healthy Lives (The Marmot Review) for the Department of Health: http://www.instituteofhealthequity.org/resources-reports/fair-society-healthy-lives-the-marmot-review Each of these reports has used data that is only available from the LS, which is unique in both its large number of records, and its long timespan. Focusing on the most recent Marmot Review, its information includes: • Standardised limiting illness rates in 2001 at ages 16–74, by education level recorded in 2001 • Life expectancy at birth by social class, a) males and b) females, England and Wales, 1972–2005 • Standardised limiting illness rates at ages 55 and over in 2001 by the educational level they had in 1971 2. Life Expectancy and Pensions: How long do people from different social backgrounds live? Increasing life expectancy has become recognised as a major policy issue in recent years, with one particular concern being the implications for pensions. The Pensions Commission, led by Lord Turner, was appointed by the government in December 2002 with the remit of keeping under review the adequacy of private pension saving in the UK, and advising on appropriate policy changes, including on whether there is a need to “move beyond the voluntary approach”. National Statistics on ‘Trends in Life Expectancy at 65 by socio-economic position’ were supplied to the commission to demonstrate how life expectancy varies for different socio-economic groups. These National Statistics can only be produced and published using the LS: no other data source has the LS’s combination of a large sample size, its long time period, and such a very high retention of members. This series of National Statistics has been running since 1982. ONS is currently working on the next release which will include data covering 2012 to 2016. The previous release is published here: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/bulletins/trendinlifeexpectancyatbirthandatage65bysocioeconomicpositionbasedonthenationalstatisticssocioeconomicclassificationenglandandwales/2015-10-21 3. Adult social care: What is the extent of the need? The need for, and cost of, adult social care has received increasing political attention for more than a decade. The Royal Commission on the Funding of the Long Term Care of the Elderly reported in 1999, and the Commission on Funding of Care and Support, chaired by Andrew Dilnot, was set up in 2010 with the task of making recommendations on how to achieve an affordable and sustainable funding system for care and support for all adults in England, both in the home and in other settings. Professor Emily Grundy was a member of the Dilnot Commission’s Academic Panel and submitted a paper in evidence: Survivorship 2001-2008 among residents of communal establishments in 2001 in England & Wales: Results from the Office for National Statistics Longitudinal Study. The submission used data from the LS on the survival of older people who in the 2001 Census were recorded as residents of residential care homes, nursing homes, or other types of communal establishment, and examined differentials in the survival of this population by characteristics including broad type of establishment (residential, nursing, or other); gender and marital status in 2001. It also used information on place of death to assess the assumption that residents in communal establishments of various types in 2001 remained in institutional care throughout the follow-up period (from the 2001 Census to the end of 2008). 4. Current Research The LS continues to be widely used, helping health and social researchers investigate and analyse a wide range of topics. Current projects include: • Understanding the impact of migration on population mortality dynamics and cancer risk – Ayse Arik (Heriot-Watt University) • Airborne pollution and lifecycle population health - Gabriella Conti (UCL), Edward Pinchbeck (University of Birmingham), Sefi Roth (LSE) and Elisabetta De Cao (LSE) • Understanding the social determinants of place of death in older adults - Joanna Davies, Katherine Sleeman and Matthew Maddocks (King's College London) and Fliss Murtagh (Hull York Medical School) • Health and education outcomes of first and second generation migrant children in England and Wales - Pia Hardelid and Kate Lewis (UCL) • Cancer diagnosis and outcomes amongst Pakistanis and their descendants in England and Wales - Joseph Harrison, Hill Kulu and Frank Sullivan (University of St Andrews) • Early life and intergenerational transmission of health – Genevieve Jeffrey, Elisabetta De Cao and Alistair Mcguire (LSE) • Assessing inequalities in health, wellbeing and social participation outcomes for young carers - Rebecca Lacey (UCL) and Lynne Forrest (University of Edinburgh) • Variations in bowel cancer survival by individual characteristics and area type – Paul Norman and Charlotte Sturley (University of Leeds) • Do maternal characteristics have an impact on birthweight of liveborn child? - Jitka Pikhartova (UCL) • Workplace location deprivation: relationship with health, cancer and mortality - Nicola Shelton (UCL)

Expected Benefits:

Maintaining the LS Research Database ensures that the benefits of the LS are being utilized as a research resource. Specifically, it enables research into a wide range of topics including health inequality, limiting long term illness, ageing and caring. Research findings are regularly published in journals and presented at conferences.

Recent examples of published LS research include:
i) Association of childhood out-of-home care status with all-cause mortality up to 42 years later by Murray, Lacey, Maughan and Sacker published in BMC Public Health
bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-020-08867-3
ii) (Un-)healthy ageing in local areas: England and Wales 2001 to 2011 by Norman, Murray, Shelton and Head presented at the 2021 conference of the British Society for Population Studies
www.researchgate.net/publication/354601659_PN-HOPE-BSPS-Sep-2021?channel=doi&linkId=6141e9bc2db97e68051c00e2&showFulltext=true
iii) Associations between commute mode and cardiovascular disease, cancer, and all-cause mortality, and cancer incidence, using linked Census data over 25 years in England and Wales: a cohort study by Patterson, Panter, Vamos, Cummins, Millett and Laverty published in The Lancet Planetary Health
www.thelancet.com/journals/lanplh/article/PIIS2542-5196(20)30079-6/fulltext

The LS can be used for several types of analysis, over many different research areas. The studies that make best use of LS data are those that link social, occupational and demographic information to data on life events. Examples include studies of mortality, and fertility patterns. The individual-level data of the LS means that person-years at risk can be calculated for epidemiological studies.

The ability to combine detailed personal characteristics with area characteristics has proved useful in many studies of health, for example, those looking at environmental effects on health, and those on inequalities in health.
Studies of social mobility have examined changing class position by age. Information on co-residents of LS sample members has been used to study inter-generational mobility.

The size of the LS makes it suitable for the study of ageing. Studies have used the information collected on the co-residents and family status of LS sample members to examine changes to household and family arrangements that come with age.

The LS is used by ONS to produce National Statistics on 'Trends in life expectancy by the National Statistics Socio-economic Classification'. These National Statistics are key measures used by the Department of Health and Social Care and agencies such as Public Health England to monitor progress in meeting legal duties on health inequalities.

Outputs:

The expected outputs of the processing will be:
• The maintenance of the LS Research Database. The LS Development Team operates an annual processing cycle resulting in a new version of the LS Research Database being released each year.
• The provision of access to the LS Research Database leading to individual project research outputs.

All individual project research outputs contain aggregated data that have been checked by ONS to ensure there is no risk of identifying any individual. When submitting proposed outputs to ONS for clearance, researchers must show all underlying, unweighted counts, which should adhere generally to a threshold of ten. Outputs with counts below the threshold may be considered in exceptional cases, where the researcher can demonstrate the necessity to their research and that the output is still safe. Research outputs are commonly journal papers, research reports and results, and presentations at conferences.

Recent examples of published LS research include:
i) Association of childhood out-of-home care status with all-cause mortality up to 42 years later by Murray, Lacey, Maughan and Sacker published in BMC Public Health
bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-020-08867-3
ii) (Un-)healthy ageing in local areas: England and Wales 2001 to 2011 by Norman, Murray, Shelton and Head presented at the 2021 conference of the British Society for Population Studies
www.researchgate.net/publication/354601659_PN-HOPE-BSPS-Sep-2021?channel=doi&linkId=6141e9bc2db97e68051c00e2&showFulltext=true
iii) Associations between commute mode and cardiovascular disease, cancer, and all-cause mortality, and cancer incidence, using linked Census data over 25 years in England and Wales: a cohort study by Patterson, Panter, Vamos, Cummins, Millett and Laverty published in The Lancet Planetary Health
www.thelancet.com/journals/lanplh/article/PIIS2542-5196(20)30079-6/fulltext

A list of research areas in which the LS has been used can be accessed online. To date there have been in excess of 600 LS publications. See: https://www.ucl.ac.uk/epidemiology-health-care/research/epidemiology-and-public-health/research/celsius/research

Processing:

No data will flow to NHS England for the purposes of this Agreement.

NHS England will provide the legacy cohort file containing the following variables:
• NHS number
• LS Number
• An indicator of whether the LS member was matched in 2011 LS-Census link
• An indicator of whether the LS member was matched in 2021 LS-Census link
• An indicator of whether the LS member was identified by and added during the 2021 LS-Census link

NHS England supply Personal Demographics Service (PDS) data to ONS under a separate Data Sharing Agreement (ref: DARS-NIC-20951-D2K6S). Under this Data Sharing Agreement (ref: DARS-NIC-705741-K8K9G) ONS is permitted to reuse the PDS data for the purposes of maintaining the LS Research Database.

The PDS data contains directly identifying data items including Names, NHS Number, Date of Birth, Postcode, Gender. ONS only require NHS Number for linkage purposes, Date of Birth for LS member identification purposes, and Postcode to derive Postcode Sector. The only other variables ONS will use from the PDS data are non-identifying.

Pseudonymised data will be integrated into the LS Research Database by the LSDT. Only members of the LSDT have access to the raw data. This processing activity takes place in a dedicated ONS environment provided by Amazon Web Services.

A new version of the LS Research Database is created on the completion of this processing activity and placed in the Secure Research Service (SRS).

The SRS is hosted on the Crown Cloud platform. The servers used to store data and to host the analysis environment are located within a Pan-Government and National Cyber Security Centre (NCSC) Accredited (PGA) data centre, provided by Crown Hosting Data Centres and based in England/Wales.

The data will remain in the SRS at all times. Access to the full LS Research Database in the SRS is restricted to authorised personnel from ONS and UCL/CeLSIUS. A bespoke subset of the data is created for each approved project and made available to the researcher in a dedicated SRS project area. The bespoke data extract will only contain the data subjects and variables required for the research approved by the UK Statistics Authority’s Research Accreditation Panel.

Personnel are prohibited from downloading or copying data to local devices. Researchers cannot download or copy data out of the SRS. All their work takes place in the SRS environment. Proposed outputs from their research are disclosure checked and cleared before being made available to them outside of the SRS.

The SRS can be accessed in a number of ways, summarised at www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/secureresearchservice/accessthedatasecurely:
i) In a safe setting at an ONS location; or
ii) In a SafePod; or
iii) At their institution under an Assured Organisational Connectivity (AOC) agreement; or
iv) From home, only if the researcher’s institution has an AOC and they use an institutional device to connect via their institution’s VPN.

The data will not leave the SRS environment at any time. All of the SRS access routes detailed previously are limited to be within the United Kingdom. The SRS cannot be accessed from abroad.

Access is restricted to employees or agents of ONS and UCL who have authorisation from ONS.

All personnel accessing the data have been appropriately trained in data protection and confidentiality.

The data will be linked at person record level with census and life events data including births to sample mothers, deaths and cancer registrations.

No direct identifiers are included in the LS Research Database. Variables that present the highest risk of identity disclosure (e.g. date of birth, low level geography codes) are not available to researchers in their raw form, but can be used to derive or link variables that carry a low disclosure risk.

Members of the LSDT and CeLSIUS teams will process the data for the purposes described above.


CSDS- Legal Notice — DARS-NIC-633657-S7Z5R

Type of data: information not disclosed for TRE projects

Opt outs honoured: Identifiable, No (Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 - s261(5)(d); Other-Statistics and Registrations Services Act (2007) Section 45C

Purposes: No (Agency/Public Body)

Sensitive: Non-Sensitive

When:DSA runs 2022-11-28 — 2025-11-27 2023.02 — 2023.08.

Access method: Ongoing

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No

Datasets:

  1. Community Services Data Set
  2. Community Services Data Set (CSDS)

Objectives:

The Office for National Statistics (ONS) agrees to process the Data only for the purposes outlined below, as agreed with NHS Digital:

Statutory purpose

The ONS, as the executive arm of the UK Statistics Authority (UKSA), requires access to administrative data held by NHS Digital for the production of official statistics.

In the past it has been difficult for ONS to access administrative data controlled by other Government departments, information that could potentially transform official statistics and the impact they have on decision making for the better. Often, this has been caused by the lack of a clear legal basis under which the data can be shared with ONS. As a result, in 2016, ONS set out why legislation was needed for better access to data:

https://www.statisticsauthority.gov.uk/publication/delivering-better-statistics-for-better-decisions-data-access-legislation-march-2016/

As a result, the Digital Economy Act in April 2017 amended the Statistics and Registration Services Act (2007) (SRSA) such that ONS can require public authorities to share data with it. See the Digital Economy Act (chapter 7 of part 5):

http://www.legislation.gov.uk/ukpga/2017/30/part/5/chapter/7/enacted

More specifically, section 45c of the SRSA 2007 (as inserted by section 80 of the Digital Economy Act 2017) permits the Statistics Board (of which ONS is a part) to serve a Notice on a public authority requiring it to disclose information it holds in connection with its functions:

http://www.legislation.gov.uk/ukpga/2007/18/section/45C

To do so, the information so disclosed must be required by the Statistics Board for one or more of its functions as set out in the SRSA 2007 and the Census Act 1920.

The SRSA (2007) states that the ONS’ objectives ‘include promoting and safeguarding the production and publication of official statistics that serve the public good, where serving the public good includes informing the public about social and economic matters and assisting in the development and evaluation of public policy’. It also sets out the Board’s functions, which are specifically referred to in section 45c of the amended SRSA. Notably they include, under section 20, that ONS ‘may produce and publish statistics relating to any matter relating to the United Kingdom or any part of it’.

Requirements made under section 45 must also be in line with a statistical statement of principles that has been approved by parliament:

https://www.gov.uk/government/publications/digital-economy-act-2017-part-5-codes-of-practice/statistics-statement-ofprinciples-and-code-of-practice-on-changes-to-data-systems

This states that ‘We will only seek access to data for the purposes of fulfilling one or more of our statutory functions, including to produce official statistics and undertake statistical research that meets identifiable user needs for the public good.’

The statement also sets out six principles which ONS will adhere to when requiring information under section 45. The principles state that ONS will:

• safeguard confidentiality
• be transparent about what data it is accessing and why
• ensure that accessing the data is lawful and meets strict ethical standards
• ensure that accessing the data is in the public interest - for example that the data are fit for purpose for the statistical use for which ONS intends
• ensure requiring that the data supplied is proportionate - for example, ONS will have exhausted possible alternatives
• seek to collaborate with suppliers at all times

In addition, the following is a useful framework for categorizing ONS’ statistical uses for information such as those covered under this Agreement. They are all related to ONS’ function of producing Official Statistics as aforementioned:

• Improvements to existing Official Statistics
• Development of new Official Statistics - this may involve testing to investigate whether statistics of sufficient quality can be produced, and may also involve the production of statistics badged as ‘experimental’ while further work is done to improve quality aspects such as accuracy
• Quality assurance of Official Statistics
• Development of commentary around Official Statistics
• Replacement of current survey questions - developing statistics from available data to directly replace the need to collect the information through survey questions
• Improving efficiency or accuracy of sampling - for example, ensuring that a representative sample of the target population is taken when conducting a survey of the public, such that the statistics produced from the survey are the best possible reflection of reality
• Research and development of methodology - for example, using data to develop and test linkage methodology that is used to help produce statistics based on other data rather than the original data source

Using robust information governance processes, ONS has determined that the conditions associated with requiring data under section 45c of the amended SRSA have been met for the information in this Data Sharing Agreement. This process involved working closely with NHS Digital’s experts to help determine that the data would be of a good enough quality to meet the proposed statistical purposes. This work guided ONS’ assessment against some of the principles underpinning its legal powers - for example whether sharing the data is in the public interest and proportionate in terms of the burden on the supplier. In addition, as part of its commitment to transparency, ONS will publish full details of the reasons for acquiring the information, and ONS notes that NHS Digital will also publish this Data Sharing Agreement on its data uses register.

In relation to the issue of public interest, it is worth noting that the benefits gained from the statistics enabled by this data share do not need to be specific to health and social care when data are flowing under section 45 of the SRSA. For example, some of the data required will help improve ONS’ population and economic statistics, and in these cases, the improved statistics may not always benefit health and social care directly. The data shared with ONS under this Agreement will not be onwardly disseminated or shared, except as disclosure controlled aggregate statistics and/or analysis as aggregated data with small numbers suppressed, in line with the Hospital Episode Statistics Analysis Guide. Any exceptions to this would require additional NHS Digital approval. It would also require an appropriate alternative legal gateway because section 45c of the SRSA as amended by the Digital Economy Act only enables data to be shared with ONS (not for example, other Government departments or academic researchers).

The rest of this section will set out the specific purposes for which ONS requires the dataset. Each purpose will be linked to the framework of statistical uses set out above. In future, ONS may decide to put a dataset to new uses not explained below. In these cases, the new use will be in line with ONS’ legally defined functions. ONS will inform NHS Digital and enter into an amended Data Sharing Agreement before proceeding with that new purpose.

ONS will rely on the following lawful bases from the General Data Protection Regulation (GDPR) for processing personal data, inclusive of special category personal data:

• GDPR Article 6(1)(e) - The processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the data controller. The authority for ONS to produce, promote and safeguard official statistics is found in SRSA 2007.

• GDPR Article 9(2)(j) - The processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and interests of the data subject.

Community Services Dataset (CSDS)

CSDS is a secondary uses data set that re-uses clinical and operational data to help improve the quality of publicly funded community services in England. These include mental health trusts, community healthcare trusts and care trusts. CSDS is required by ONS for research into the potential to model general health status and long-term health conditions from administrative data.

The data collected in CSDS covers all NHS-funded Community Health Services provided by Health Care Providers in England. This includes all services that have transitioned into new organisational forms because of the Transforming Community Services (TCS) programme. This includes acute and independent Sector healthcare providers that provide NHS-funded Community Health Services.

CSDS will be used to deliver longitudinal analysis capturing experiences and outcomes from childhood (from birth, early years education to higher education) to adulthood. Analyses will cover educational attainment, physical and mental health, poverty status, economic activity (and occupations held) and income, training and education, crime and homelessness. These variables will be set against characteristics, family relationships, parenthood, caring responsibilities and living environment (accommodation and place). A better understanding of social mobility and where support, funding and policy initiatives is needed across health, employment, education, transport, housing, and recreational services. ONS will further link the CSDS data to other data sources it holds to widen the scope of analysis. This may include, for example, linking to other health data to better understand health outcomes; linking to education, employment and income data to better understand social mobility; linking to Census and population data to improve population and linking to migration statistics and household survey data to better refine and enhance survey statistics.

Social Statistics Admin First, Health and Special Projects (SSAF HASP)

CSDS will help with research into ethnicity, health, disability, language, gender identity and caring.** Of these, ethnicity and health have been identified by the director of Health, Population and Methods (HPM) Transformation as key topics that the ONS need to conduct research on.

** By the term ‘research’, ONS refer to the analysis of the data in order to produce official statistics, not ‘research’ as it is generally understood to mean. ONS use the term ‘research’ to distinguish between regular statistical publications and more exploratory statistics (which are referred to as ‘research’). ONS recognise that, under section 45c of the SRSA 2007, they are not permitted to disclose the information they receive to researchers without the consent of the person who disclosed the information.

The research so far on ethnicity has used the admin-based population estimates (ABPE) dataset as the population base and linked on ethnicity data from Hospital Episode Statistics (HES), Improving Access to Psychological Therapies (IAPT) and English School Census (ESC) data. Admin based ethnicity statistics were then produced based on the proportion of people in each ethnic group. In the research so far, ONS was able to assign an ethnicity to 70% of the people in the 2016 admin-based population base using HES, ESC and IAPT data. ONS are now looking to link on ethnicity data from additional datasets, including CSDS, to increase the representativeness of the linked admin data and improve the accuracy of the resulting admin-based ethnicity statistics. Improved admin-based ethnicity statistics would enable ONS to produce statistics that are more likely to meet quality standards and increase the potential for producing multi-variate statistics.

Annual statistics on the population by ethnic group would also enable better equality monitoring and therefore better informed policy and decision making to reduce inequality and improve lives. Despite ethnicity being a high priority topic for users, ONS do not currently produce annual statistics on the population by ethnic group and the last official statistics published were from the 2011 Census. This is due to a lack of data, with survey data not having a large enough sample size to produce estimates below national level for the 18 ethnic groups.

Improved admin-based ethnicity statistics would also enable ONS to explore the potential of providing users with more regular information about health outcomes and health inequalities and estimates of healthy life expectancy.

Data on the general health of the population at subnational level is only available every 10 years from the Census. This information is important for monitoring health inequality and developing health policy. The general health data from the census is also used in the production of healthy life expectancy statistics. If the ONS can develop a health measure from administrative data, this will enable more frequent statistics to be published on general health and a more timely, objective health measure used in the production of health life expectancy statistics. There is a desire for coherent, high-quality estimates (size and distribution) of population sub-groups, such as protected characteristics (disability, marriage and civil partnership, ethnicity, religion, sexual orientation, and gender identity).

The research so far on health has explored the potential to model general health status from HES data. ONS found they were unable to model general health status using just HES data alone and thus, are now looking to incorporate additional datasets to explore whether this improves model accuracy.

Population and Migration Statistics Transformation (PMST)

ONS are looking to use CSDS as an activity source when further developing ABPE. PMST may also be able to use demographic and address variables from CSDS to validate information on other data sources. There are also ambitious user needs which aim for a fully inclusive statistical system that provides frequent and timely statistics for the entire population, not just those living in private households. Within the Living Arrangements team, the ONS are also working towards contributing evidence to the 2023 Recommendation. ONS’ focus is on producing timely estimates on the number of households in England and Wales and their size and composition (such as one-person households, families with children, lone parent families). As part of ONS’ work, they are also looking at populations living in communal establishments. ONS estimates help national and local government to understand public needs, plan services and direct funding. To produce these, ONS needs up to date address data as well as information on relationships (such as mother/child). Just as the Resident Population team would need to, PMST need to understand how CSDS relates to PDS data and whether CSDS is able to help ONS with coverage (by including people who may not be on other data sources such as PDS) and, if it can, provide ONS with additional data on relationships and more up to date activity data to ensure ONS have the correct address information.

The fully inclusive statistics informing the delivery of appropriate services and infrastructure at both national and local level will help meet the demands of the entire population by, for example, addressing intercensal gaps such as people living in communal establishments, travellers, circular migrants, homeless, illegal migrants, children, and young people, etc.

In the Longitudinal Linkage team, ONS’ goal, as part of the 2023 Recommendation, is to, as accurately as possible, replicate and reduce the need for the 10-year census. The 10-year census becomes increasingly outdated the further we move away from the census collection date. By longitudinally linking datasets together, ONS can understand the population over time, including how many people are active on a dataset, which people interact with healthcare services etc. This directly addresses intercensal gaps and will help better direct public funds to those who need them across the country. Certain datasets with address/location data, further this goal, by allowing more targeted understanding of public needs.

The Resident Population Team's rationale is similar to that outlined above. As part of the 2023 Recommendation, the Resident Population Team aims to produce annual estimates of the size and demography of the resident population of England and Wales. ONS produce these estimates by linking various administrative data sources together. Statistics on the size (or stock) of the population contribute to ONS’ understanding of how it changes over time, both nationally and locally. This in turn can inform the direction of public funding. Adding the CSDS as an administrative data source could help to address age groups or geographical regions where there is coverage error (i.e., underestimation). This would help ONS to produce more complete admin-based population estimates. However, ONS would first need to understand the extent to which records on CSDS match with records on other health data sources that ONS currently uses e.g. PDS.

Health Analysis and Life Events (HALE) and Wider Surveillance Studies (WSS)

ONS' health analysts will use information about who has accessed community services as well as when, where, and why, for example diagnosis codes, for a range of statistical purposes in line with ONS' function to produce statistics for the public good. All use of CSDS for health and social care analysis will be to improve the availability and quality of statistics. Examples of the proposed uses include:

- Understanding health, morbidity and mortality.
- Understanding health inequalities, outcomes and care pathways.
- Understanding the effect of health conditions on socio-economic outcomes and life chances.
- Understanding the effect of health conditions and inequalities on educational outcomes.
- Understanding transitions into and out of health and social care services and the relationship with outcomes.
- Investigating the quality of characteristics data such as ethnicity data across different data sources.

Expected Benefits:

CSDS is needed by ONS for research into the potential to produce statistics on population characteristics from administrative data. The research will form part of the evidence base for the 2023 National Statistician’s Recommendation on the future of population and social statistics. The Recommendation will include a decision as to whether ONS has another traditional Census in 2031. As the Census is an important data source for many organisations and decision makers across the country, it is important that the Recommendation is based on the best possible evidence.

Health is a high priority topic for users. Therefore, in advance of the Recommendation, ONS needs to have fully explored the potential to produce health statistics without another Census. General health research so far has used PDS and HES data to model general health status for the population in England. The research found that more administrative data are required to model health status effectively as around half the population in England did not appear within the HES dataset. There was therefore no additional information, other than age and sex, to base the modelling on and thus the results were not of comparable quality to 2011 Census results. Obtaining additional health information from community services providers may improve the accuracy and coverage of the resulting admin-based health statistics.

Modelling general health status using administrative data may allow ONS to provide lower-level geographical outputs more frequently than once every ten years via the Census. If ONS can bring together as many administrative data sources as possible that contain health related information, ONS may be able to maximise population coverage and stand the best chance of producing health statistics that are of sufficient quality to meet user needs.

The Department of Health and Social Care and their agencies specifically use both census and population statistics (branded as national statistics) in the planning and provision of health and social care services and funding allocations. They are almost certainly used as the population baseline in any statistics that are published on a per capita basis.

The internal migration estimates are one of the components of the mid-year population estimates that are produced at various levels of geography (including clinical commissioning groups). It is these mid-year estimates that are the population baseline for a lot of health statistics. For instance, any health data published per capita for a particular level of geography (national, region, local authority, clinical commissioning group, parliamentary constituency etc) is almost certain to have ONS estimated or projected population statistics as the population baseline. Estimates and projections are published by age and sex. This means that they can also be used to better target age and sex specific health and care services (e.g., maternity, aging populations etc.)

The population estimates and projections are also used extensively throughout government and specifically by the Department of Health and Social Care and their agencies for the planning and provision of health and social care services and the distribution of funds. Throughout government, decisions on the distribution of billions of pounds of funds are made based on population estimates and projections.

The research outputs are used by academics, local authorities, other Government Departments and other statistical institutions for the allocation of funding and resource and other ongoing research into the use of administrative data to produce information about the population. This work is anticipated to enable efficient funding and resource allocation for a range of services, including health and social care.

Information about general health enables the monitoring of general health over time, the tracking and assessment of fitness for work and helps to estimate the need for care and benefits. It also enables the calculation of healthy life expectancy. Data collected on health contributes to the funding, development and planning of health care and helps to reduce social inequalities in health. Disability data is essential for public health spending and resource allocation and for monitoring the quality and outcomes of policies for persons with disabilities. Central and local government users use long term health problem and disability data with a wide range of sub-topics to identify and understand associations to inform service planning and delivery.

One of the key uses of CSDS will be linking the data to a range of other datasets at record level to inform the planning and provision of health and social care services and funding allocations. For example, the work to define and quantify the social care population will inform the Health and Social Care Levy Bill and will be used in the evaluation of current and future policy.

Outputs:

The outputs expected from this work include statistics, statistical reports, presentations, methodology papers and analyses to help fulfil ONS’ role to produce statistics for the public good. Should outputs contain any data, this data will be in aggregated form with small numbers supressed.

Any official statistics produced will be published so that they are publicly available - for example on the Office for National Statistics website or in statistical bulletins, analysis articles or data dashboards.

Projects focused on new topics or new data sources hope to include an exploration of the feasibility of using and/or linking the data as well as quality assessments of the data and linkage prior to full development of analysis and publication of statistics. Briefings for technical/expert audiences or methodology reports may be written for internal use in the development and assessment of new data sources and analysis. These reports hope to be published alongside experimental statistics or official statistics on the ONS website or in peer-reviewed journals as appropriate.

All ONS statistical teams engage regularly with users and will seek to provide frequent updates, through their regular stakeholder conversations, presentations, and events, on the production of official statistics which have been derived from the CSDS data required under this Agreement.

ONS have a publishing strategy and team which supports analytical teams to develop different outputs to suit a range of different audiences. ONS also has active communications, media, and social media teams to promote the dissemination of statistics to as wide a ranging audience as possible.

It is the aim for the ONS to have received the data before the end of October 2022, so the necessary data ingest, engineering and linkage can be completed. The aim is for the ONS to have completed research by the end of February 2023 and have this information written up for the Recommendation by the end of April 2023.


ONS Longitudinal Study — DARS-NIC-194340-D6F3B

Type of data: information not disclosed for TRE projects

Opt outs honoured: No - Statutory exemption to flow confidential data without consent, Identifiable, No (Statutory exemption to flow confidential data without consent)

Legal basis: Health and Social Care Act 2012 - s261(5)(d), Health and Social Care Act 2012 - s261(5)(d); Other-Section 45A of The Statistics and Registration Service Act 2007 (SRSA) as amended by the Digital Economy Act 2017

Purposes: No (Agency/Public Body)

Sensitive: Non Sensitive, and Non-Sensitive

When:DSA runs 2020-10-15 — 2021-11-08 2020.11 — 2021.04.

Access method: One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: Yes

Datasets:

  1. Demographics

Objectives:

The Office for National Statistics (ONS) Longitudinal Study (LS) is the largest longitudinal data resource in England and Wales. It contains linked census and life events data for an approximate 1% sample of the population of England and Wales.

The LS has linked records at each Census since 1971, for people born on 1 of 4 selected dates in a calendar year. These 4 dates were used to update the sample at the 1981, 1991, 2001 and 2011 Censuses. The LS is largely representative of the population as a whole.
At each Census point approximately 500,000 LS members usually resident in England and Wales are identified.
Over the 40 years of the longitudinal study, data on approximately 1.2 million members has been collected which includes members that have since died but are retained in the study.

Life events data are also linked for LS members including births to sample mothers, deaths and cancer registrations. New LS members enter the study through birth and immigration (if they are born on 1 of the 4 selected birth dates).

Purpose 1 – maintenance of the LS Research Database
The LS Research Database contains pseudonymised patient level data with unique LS Member IDs being used in place of identifying data. NHS Digital holds the master index and has the ability to link LS Member IDs to identifiable data. NHS Digital supports the LS by providing data – some of which are integrated into the LS Research Database and some of which are used to quality assure the LS dataset and resolve anomalies.
The data provided feeds into annual LS processing activity. Once processing is complete, a version of the database which is suitable for research, is made available to authorised researchers in a controlled environment.
Purpose 2 – providing access to the LS Research database
The ONS actively promotes wide use of the LS Research Database, while maintaining the confidentiality of individuals in the LS sample. To ensure confidentiality the LS Research Database is held on an access controlled SQL Server within the Secure Research Service (SRS) and can only be accessed via the Secure Research Service. Only statistical analysis or data tabulations are released to researchers though an output clearance process.
Researchers need to make an application to access the LS for research purposes. A user support service is available to help researchers. This includes:
• advice on sample sizes and the suitability of the LS for particular projects
• advice on data content and linkage issues
• helping applicants through the application procedure
• identifying the variables and the study population to be included in an extract
• making data extracts
• transforming data and producing the tables or files necessary for your analyses
• advising on clearance procedures and confidentiality rules
No direct identifiers are included in the LS Research database. Variables that present the highest risk of identity disclosure (e.g. date of birth, low level geography codes) are not available to researchers in their raw form, but can be used to derive or link variables that carry a low disclosure risk.
The LS Research database is made available for use by researchers under strictly controlled conditions. The controls in place are:
i) All LS project applications both from internal researchers and those applying for sub-license use of the data need to be approved by the Research Accreditation Panel (RAP).
The RAP was established by the UK Statistics Authority to oversee the independent accreditation of processors, researchers and research projects under the Digital Economy Act 2017 legislation. The Panel provide the governance of the accreditation of researchers and processors, through overseeing the training of researchers and the security standards, policies and procedures that processors must comply with. The RAP also assess each project application to access de-identified data against the following criteria:

i. Is there public benefit?
ii. Is there demonstrable analytical merit?
iii. Is the project feasible?
iv. Are any relevant privacy implications sufficiently mitigated?
v. Has the project successfully completed a formal ethical review?

The RAP consists of independent members, representatives from government departments, and representatives from the devolved administrations.

ii) Researchers are only given access to a bespoke data extract as defined in their project application. This typically involves a subset of people from the LS sample, and only the variables that are needed for their research.
iii) The data can only be accessed through ONS’ Secure Research Service (SRS). SRS users have no means to import or export data, or to print or copy and paste the data they are using.
iv) When a researcher wishes to take outputs out of the SRS, they make a formal request and their outputs are assessed. They are only released from the SRS if they present no risk of the identification of an individual.
v) In order to work with the data in the SRS, a researcher must be accredited as an Accredited Researcher.
Any transgression by a researcher with regards to these data may lead to prosecution under the Statistics & Registration Service Act 2007 and could lead to a two-year prison sentence.

Yielded Benefits:

1. Health and Mortality: How large are inequalities between people from different backgrounds? When the ONS Longitudinal Study (LS) was established in 1974 a primary goal was to compile new information on differences in mortality between people in different occupations. Since then, it has been used to provide unique information to support a series of major reports for government on health and mortality: • Inequalities In Health, 1980 (The Black Report) for the Department of Health and Social Security: https://pubmed.ncbi.nlm.nih.gov/7118327/ • The Health Divide: Inequalities In Health In The 1980s, 1987 (The Whitehead Report) for the Health Education Council: • Independent Inquiry into Inequalities in Health Report, 1998 (The Acheson Report) for the Department of Health: www.archive.official-documents.co.uk/document/doh/ih/contents.htm • Fair Society Healthy Lives (The Marmot Review) for the Department of Health: http://www.instituteofhealthequity.org/resources-reports/fair-society-healthy-lives-the-marmot-review Each of these reports has used data that is only available from the LS, which is unique in both its large number of records, and its long timespan. Focusing on the most recent Marmot Review, its information includes: • Standardised limiting illness rates in 2001 at ages 16–74, by education level recorded in 2001 • Life expectancy at birth by social class, a) males and b) females, England and Wales, 1972–2005 • Standardised limiting illness rates at ages 55 and over in 2001 by the educational level they had in 1971 2. Life Expectancy and Pensions: How long do people from different social backgrounds live? Increasing life expectancy has become recognised as a major policy issue in recent years, with one particular concern being the implications for pensions. The Pensions Commission, led by Lord Turner, was appointed by the government in December 2002 with the remit of keeping under review the adequacy of private pension saving in the UK, and advising on appropriate policy changes, including on whether there is a need to “move beyond the voluntary approach”. National Statistics on ‘Trends in Life Expectancy at 65 by socio-economic position’ were supplied to the commission to demonstrate how life expectancy varies for different socio-economic groups. These National Statistics can only be produced and published using the LS: no other data source has the LS’s combination of a large sample size, its long time period, and such a very high retention of members. This series of National Statistics has been running since 1982. ONS is currently working on the next release which will include data covering 2012 to 2016. The previous release is published here: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/bulletins/trendinlifeexpectancyatbirthandatage65bysocioeconomicpositionbasedonthenationalstatisticssocioeconomicclassificationenglandandwales/2015-10-21 3. Adult social care: What is the extent of the need? The need for, and cost of, adult social care has received increasing political attention for more than a decade. The Royal Commission on the Funding of the Long Term Care of the Elderly reported in 1999, and the Commission on Funding of Care and Support, chaired by Andrew Dilnot, was set up in 2010 with the task of making recommendations on how to achieve an affordable and sustainable funding system for care and support for all adults in England, both in the home and in other settings. Professor Emily Grundy was a member of the Dilnot Commission’s Academic Panel and submitted a paper in evidence: Survivorship 2001-2008 among residents of communal establishments in 2001 in England & Wales: Results from the Office for National Statistics Longitudinal Study. The submission used data from the LS on the survival of older people who in the 2001 Census were recorded as residents of residential care homes, nursing homes, or other types of communal establishment, and examined differentials in the survival of this population by characteristics including broad type of establishment (residential, nursing, or other); gender and marital status in 2001. It also used information on place of death to assess the assumption that residents in communal establishments of various types in 2001 remained in institutional care throughout the follow-up period (from the 2001 Census to the end of 2008). 4. Current Research The LS continues to be widely used, helping health and social researchers investigate and analyse a wide range of topics. Current projects include: • Mortality variations over the rural urban continuum: context, compositional or migratory - Rebecca Allan, Paul Williamson, Hill Kulu • Inequalities in cumulative exposure to air pollution in the home and workplace, and their health impacts: exploring socio-economic and ethnic differences over the lifecourse - Gemma Catney, Fran Darlington-Pollock • Care providers, care receivers: a longitudinal perspective - Emily Grundy • Chronic health effects of air pollution on respiratory and cardiovascular mortality in the UK - Anna Hansell, Paul Elliott, David Strachan, Ravi Maheswaran, Marta Blangiardo, Rebecca Ghosh, Chloe Morris • Early life and intergenerational transmission of health - Genevieve Jeffrey, Elisabetta De Cao, Alistair Mcguire • The influence of early life health and nutritional environment on later life health and morbidity - Melanie Luhrmann, Tanya Wilson • Mortality of immigrants and their descendants in Britain - Matthew Wallace, Hill Kulu, Paul Williamson, Gemma Catney • Do maternal characteristics have an impact on birthweight of liveborn child? - Jitka Pikhartova

Expected Benefits:

Maintaining the LS Research Database ensures that the benefits of the LS are being utilized as a research resource. Specifically, it enables research into a wide range of topics including health inequality, limiting long term illness, ageing and caring. Research findings are regularly published in journals and presented at conferences.

The LS can be used for several types of analysis, over many different research areas. The studies that make best use of LS data are those that link social, occupational and demographic information to data on life events. Examples include studies of mortality, and fertility patterns. The individual-level data of the LS means that person-years at risk can be calculated for epidemiological studies.

The ability to combine detailed personal characteristics with area characteristics has proved useful in many studies of health, for example, those looking at environmental effects on health, and those on inequalities in health.
Studies of social mobility have examined changing class position by age. Information on co-residents of LS sample members has been used to study inter-generational mobility.

The size of the LS makes it suitable for the study of ageing. Studies have used the information collected on the co-residents and family status of LS sample members to examine changes to household and family arrangements that come with age.

The LS is used by ONS to produce National Statistics on 'Trends in life expectancy by the National Statistics Socio-economic Classification'. These National Statistics are key measures used by the Department of Health and Social Care and agencies such as Public Health England to monitor progress in meeting legal duties on health inequalities.

Outputs:

ONS Outputs:
Purpose 1 – maintenance of the LS Research Database
The output is the maintenance of the LS Research Database. The LS Development Team operates an annual processing cycle resulting in a new version of the LS Research Database being released each year.

Purpose 2 – providing access to the LS Research Database for research
The outputs of providing access to the LS Research Database for approved research projects are individual project research outputs. All such outputs would contain aggregated data with small numbers suppressed in line with the HES Analysis Guide. Research outputs are commonly journal papers, research reports and results, and presentations at conferences.

A list of research areas in which the LS has been used can be accessed online. To date there have been in excess of 600 LS publications. See: https://www.ucl.ac.uk/epidemiology-health-care/research/epidemiology-public-health/research/celsius/research/columns/research-projects-theme

Processing:

ONS will send files of births, deaths and census events for tracing or flagging at NHS Digital. NHS Digital will return files with LS numbers and success outcomes.

NHS Digital will provide event reports annually for LS members who have had death notifications, or who have embarkation or re-entrant postings. A person with an LS birth date who is new to the NHS will be added as an 'Immigrant' record.

Additionally NHS digital will provide Annual Reports for LS members and for people with LS dates of birth, who haven't been flagged as an LS Member which includes posting information, removal information, postcode sector information, and country of birth (where declared).

ONS will add these records to the ONS Longitudinal Study , which will be made available for researchers via the ONS Secure Research Service.
---------------------------------------------------------
Purpose 1 – maintenance of the LS Research Database
The following activities take place within an annual cycle:
1. Event Notification to the LS
Annually, NHS Digital generates a file of potential new members and events for LS members flagged on NHS Digital’s database (MIDAS). NHS Digital transfers this file to the LS Development Team (LSDT) for processing. Once processed, the data files are destroyed.
This file contains:
• all the death, enlistment, embarkation (to destinations abroad and to Northern Ireland) and re-entrant events recorded on PDS for individuals already flagged as LS members, and
• Immigrant events for potential new LS members (i.e. individuals who have an LS date of birth but are not already LS members).

2. Flagging New LS Members on MIDAS
At regular intervals, the LSDT sends NHS Digital listings of new LS members who are new births or immigrants with an LS date of birth, with LS numbers assigned by LSDT. NHS Digital flags the records on MIDAS (i.e. they store the LS numbers linked against the relevant patient records in MIDAS).
3. Tracing of probable LS members on MIDAS
a. Births to Sample Mothers and Widow(er)hoods
Annually, the LSDT sends separate listings to NHS Digital of births for which the mothers have LS dates of birth and deaths for which the spouses have LS dates of birth. NHS Digital traces the respective parents/spouses on MIDAS. The parents/spouses should be current members of the LS cohort because they each have an LS date of birth. Where the records of the parents/spouses are found and are linked to an LS flag, NHS Digital reports their LS numbers to the LSDT.
b. Deaths tracing
The LSDT sends annual listings to NHS Digital of deaths registrations of people with LS dates of birth that the LSDT have identified from data held by ONS that have not previously been notified to the LSDT by NHS Digital. NHS Digital traces the records of the people on MIDAS. Where the relevant patient records are found and are linked to an LS flag, NHS Digital returns the corresponding LS numbers to the LSDT. Where the patient records are traced but found not to be a current LS member, NHS Digital notifies the LSDT.
4. Query Resolution
a. LS deaths
NHS Digital investigates cases notified by the LSDT where there are inconsistencies between the deaths records from ONS and NHS Digital.
5. Various
NHS Digital also investigates ad hoc queries from LSDT regarding data quality. These all relate to members of the LS cohort or individuals believed to have LS dates of birth.
6. LS Annual report
Annually (or as requested) NHS Digital provides the LSDT with the LS Annual Reports. These reports list patient registration data of LS Members and non-members with an LS date of birth. The files detail Posting Changes, Postcode sector changes or Removals on MIDAS.

The data to be integrated into the LS Research Database are initially processed by the LS Development Team (LSDT) at ONS. Only members of the LSDT have access to the raw data. A new version of the LS Research Database is created each year on the completion of this processing activity.

Purpose 2 – providing access to the LS Research Database
When a new LS research project is approved and set up, a bespoke extract of data from the LS Research Database is created for the research team. They can only access these data via a dedicated project area set up for them within ONS's Secure Research Service (SRS). The SRS is hosted at a data centre within the UK mainland. The SRS is ONS and Pan Government Accredited to OFFICIAL (BIL 3,2,2 or BIL 4,2,2 by aggregation). A researcher must be accredited as an Approved Researcher before being allowed to work with data in the SRS.
Two groups of people have the ability to create bespoke data extracts from the LS Research Database for new research projects:
i) Five members of the Centre for Longitudinal Study Information & User Support (CeLSIUS) team.
- The Centre for Longitudinal Study Information & User Support (CeLSIUS) are funded by the Economic & Social Research Council to provide free guidance and support to UK-based users of the LS from the academic, public and voluntary sectors.
- Management/leadership and administrative support of the CeLSIUS team is provided by staff based at University College London (UCL).
- The five User Support Officers (USOs), i.e. those delivering the support service to users, are also UCL staff but are all based at ONS’s London office at Drummond Gate, Pimlico. (The current situation is that they have been issued with ONS laptops to be able to continue to provide this service working from home.)
- All of CeLSIUS team's work at ONS takes place using ONS devices on ONS infrastructure.
- Regarding access to data - the CeLSIUS USOs do not have access to the full LS dataset.
- The version that the CeLSIUS team has access to is missing a number of key variables that present a raised disclosure risk, namely date of birth, full date of death and geography codes identifying small geographical areas (the lowest level they have access to is Local Authority level).
- CeLSIUS USOs only access these data via ONS’s Secure Research Service (SRS) environment. In the SRS, all data are held and processed on secure servers – no data is held on the device being used to access the SRS. CeLSIUS USOs do not have the ability to take any data out of the SRS.
- When a CeLSIUS-supported researcher and project have successfully passed all approvals processes, CeLSIUS USOs create a bespoke data extract for the researcher that only contains the data subjects and variables specified in their project application. This data extract is placed in the researcher’s dedicated project area within the SRS.
- CeLSIUS USOs are also involved in checking researchers’ outputs that they wish to have provided to them outside of the SRS. Each output is checked by two people – the first can be a CeLSIUS USO and the second will always be an ONS member of staff. These checks make sure that anything released from the SRS is absolutely safe and doesn’t present any risk of disclosing the identity of any individual data subject.
- All CeLSIUS USOs have been security cleared to Baseline Personnel Security Standard (BPSS).

ii) The LSDT provide an equivalent service for researchers from all other sectors, as well as researchers from ONS itself.


COVID-19 Vaccinations Survey – Over 80s — DARS-NIC-434738-K7Z9L

Type of data: information not disclosed for TRE projects

Opt outs honoured: Yes - patient objections upheld, Identifiable, Yes (Statutory exemption to flow confidential data without consent)

Legal basis: Other - COPI Notice - Regulation 3(3) and 3(1), Other-COPI Notice - Regulation 3(3) and 3(1)

Purposes: No (Research, Agency/Public Body)

Sensitive: Non Sensitive, and Non-Sensitive

When:DSA runs 2021-02-10 — 2021-09-30 2021.01 — 2021.02.

Access method: One-Off

Data-controller type: OFFICE FOR NATIONAL STATISTICS (ONS)

Sublicensing allowed: No

Datasets:

  1. Demographics

Objectives:

During the national COVID-19 - SARS-CoV-2 vaccinations programme, patients in England who are aged 80 years or over were offered a vaccination as part of cohort 2, as detailed in the JCVI Greenbook Chapter 14a.

The Office for National Statistics (ONS) would like to survey a sample of persons aged 80 years or over in order to gather behavioural insight and attitude information of individuals that are likely to have been offered the vaccines, to inform public messaging around vaccinations and public health strategies.

ONS are requesting demographics data from NHS Digital to enable sampling for the COVID-19 vaccination survey. ONS do not have patient contact details. The data provided to ONS will be used to invite individuals to take part in the survey. This is a non-direct care purpose, to inform vaccination policy which is an important part of the COVID-19 pandemic response.

The COVID-19 Vaccination Survey will collect behavioural insight and attitude information of individuals that are likely to have been offered the vaccines. There is a need for timely information research in this area to inform public messaging around vaccinations and public health strategies.

Policy questions:
• Has being vaccinated, affected behaviours and adherence to guidance? For example, meeting grandchildren
• Attitudes to risk once vaccinated (control group being OPN)
• What are the reasons why a vaccine was not accepted?
• What are the characteristics of those accepting or rejecting the vaccine?

The purpose of the data collection is covered by the COVID-19 Public Health Directions 2020, 17th March 2020:
•understanding information about patient access to health services and adult social care services as a direct or indirect result of COVID-19 and the availability and capacity of those services
•monitoring and managing the response to COVID-19 by health and social care bodies and the Government including providing information to the public about COVID-19 and its effectiveness and information about capacity, medicines, equipment, supplies, services and the workforce within the health services and adult social care services
•delivering services to patients, clinicians, the health services and adult social care services workforce and the public about and in connection with COVID-19, including the provision of information, fit notes and the provision of health care and adult social care services

ONS are data controller who will also process the data. The GDPR legal bases for dissemination are Article 6(1)(c) - legal obligation by virtue of COPI notice, Article 6(1)(e) - public task (statutory function for delivery of NHS Service Pharmacy, Services, Charges and Processing) and Article 9(2)(g) - substantial public interest (plus Part 2 Sched 1 DPA18, para 6 statutory and governmental purpose).

Data items and the number of records provided have been limited to the minimum required through consultation between NHS Digital and ONS.

Expected Benefits:

ONS will provide survey results to the Department of Health and Social Care who will use the survey outputs for policy planning/vaccination planning in relation to Covid-19.

There is a significant public benefit to this research. It will provide evidence of the opinions held and the current behaviours being exhibited by those in the over 80-year-old cohort. It will provide evidence of what this cohort is thinking and experiencing in relation to the vaccine, reasons for getting vaccinated or not getting vaccinated and whether they are continuing to comply with self-isolation guidance etc.

Understanding more about this demographic will allow policies to be created to support them further in staying safe during the Covid-19 pandemic. It will enable greater understanding of the reasons held by and the characteristics of those refusing the vaccine. This will help the government to understand what factors may be preventing individuals in this age group from obtaining a vaccination, who they need to target and how to increase uptake of the vaccine. This is all for the purpose of trying to reduce the transmission of the Covid-19 virus.

Outputs:

The NHS Digital data will be used to send surveys to participants.

Processing:

A subset of the PDS will be disseminated dependent on matching to the criteria provided by ONS. Data to be disseminated - for persons aged 80 and over only:
• Title
• First name
• Surname
• Age
• Gender
• Home address (including postcode)
• City
• Region
• Telephone number(s)

Exclusions: records with formal and informal deaths will be excluded from the sample provided to ONS.

Title, name and home address are required for the ONS to be able to send invitation letters and check that they are in contact with the correct person. Telephone number is required in order to contact the individual for the actual interview. Age is required to ensure that the individual is over 80 years of age. Gender, age and city/region are used to sort and stratify the sample to ensure that it is representative. Gender and age will also be used for weighting responses.

In order to achieve a sample of 2-3,000 and 30% response-rate, 7,000 surveys will need to be issued. To ensure the survey delivers statistically viable results ONS are preparing a suitable methodology and will advise on the number they require. To avoid strain on care staff and maximise the sample that have good levels of mental capacity care home residents will be avoided if possible. Therefore, those where residence indicates care home or age is less than 80 will be removed from the frame before sampling.

This information relates to health only by virtue of the sampled persons being called for vaccination as part of the COVID19 vaccination programme.

Data subjects may be concerned about the risks of identification or the disclosure of information. There is a low risk of the misuse of data accessed by individuals working within ONS. To mitigate the risk of data misuse, ONS will:
• Limit access to personal data to members of ONS involved in survey
• Limited personal data shared with telephone interviewers to names and contact details of participants
• Ensure all individuals with access are bound by their duty of confidentiality and have undergone relevant IG training;
• Staff contracts and mandatory training
• All data processors have GDPR compliant DPAs or equivalent contractual arrangements with the data controller
• Contract/MOU in place regarding research study
• All ONS staff are cleared through advanced vetting. Security Check (SC) level for people regularly working with sensitive data, especially in bulk through large extracts or entire data sets.
• Mandatory Hut Six training on information security.

This record level data is being disseminated to the Office of National Statistics (ONS).

Data will be stored and processed in the UK only. There will be no onward sharing of the data by the recipient.

ONS will need to have access to a sample extract each wave. The frequency and number of waves is to be confirmed however the cohort spec and sample size will remain the same in each wave. Data will be disseminated to ONS as soon as approval is provided. Future requirements to be confirmed and the agreement will be amended to reflect any further releases of data.


Project 12 — DARS-NIC-177068-M1P0L

Type of data: information not disclosed for TRE projects

Opt outs honoured: No - data flow is not identifiable (Does not include the flow of confidential data)

Legal basis: Health and Social Care Act 2012 – s261(1) and s261(2)(b)(ii), Other - Statistics and Registration Service Act 2007 section 45(a)

Purposes: ()

Sensitive: Non Sensitive

When:2018.10 — 2019.04.

Access method: System Access
(System access exclusively means data was not disseminated, but was accessed under supervision on NHS Digital's systems)

Data-controller type:

Sublicensing allowed:

Datasets:

  1. Hospital Episode Statistics Accident and Emergency
  2. Hospital Episode Statistics Admitted Patient Care
  3. Hospital Episode Statistics Outpatients

Objectives:

The Digital Economy Act 2017 amended the Statistics and Services Registration Act (SRSA) 2007 such that ONS can request or require information is shared by a crown body, other public authority, charity or undertaking as long as it is for its functions (essentially statistics and statistical research).

Where a request is made, the data controller may disclose the information and this permissive power overcomes any other duty of confidence the organisation may have, except if sharing the data would contravene the Data Protection Act, relevant parts of the Investigatory Powers Act, or relevant EU legislation. No other legislation is mentioned (e.g. care act). The part of the legislation covering this permissive gateway has been live since July 2017.

The power to require data are shared is not yet live. This part of the legislation requires a code of practice to underpin it and this must be approved by parliament. The code has been drafted and consulted upon. Parliamentary approval of the code is expected by the end of June 2018.

In discussions with NHS Digital (including its, Caldicott guardian), it was agreed that ONS would acquire data for its functions under this latter power to require. Therefore ONS cannot require HES data are shared with it until this power is live. In the meantime, ONS is working with NHS Digital’s analytical experts to better understand HES data and whether it will be fit for the statistical purposes to which ONS wants to put it. Remote access to pseudonymised HES data will be significant in helping with this process.

This Data Sharing Agreement will permit ONS to access pseudonymised HES data via the HES Data Interrogation System (HDIS). ONS will use this data to assess the technical feasibility of using HES data for the purposes outlined below.

Getting this right ties in with the current wording of the code of practice that will underpin the power to require data are shared. This states that ‘We will only seek access to data for the purposes of fulfilling one or more of our statutory functions, including to produce official statistics and undertake statistical research that meets identifiable user needs for the public good.’

The statement also sets out six principles to which ONS will adhere when requiring data; They state that ONS will:
• safeguard confidentiality
• be transparent about what data it is accessing and why
• ensure accessing the data are lawful and meet strict ethical standards
• ensure that accessing the data is in the public interest - for example that the data are fit for purpose for the statistical use to which ONS intends to put it
• ensure requiring that the data supplied is proportionate – for example, ONS will have exhausted possible alternatives
• seek to collaborate with suppliers at all times

The statistical purposes for which ONS will ultimately require HES data are shared with it under the SRSA are outlined below. ONS currently believes these would be in the public interest, but the final application for identifiable HES data may not include all of these uses, subject to the data quality and feasibility work enabled by this remote access.

The interim use of HDIS will inform what data is necessary to perform the following purposes once identifiable HES data are shared later this year:

There are a range of initial statistical uses to which ONS intends to put identifiable Hospital Episodes Statistics (HES) data. Generally, linkage to other sources at a record level will be a prerequisite to success, and therefore identifiers including name, postcode, date of birth, sex and NHS number will be required at that point. The other HES information required varies by purpose and it is this other information which ONS employees can familiarize themselves with and assess the quality of, by gaining access to pseudonymised HES data remotely. The variables and time periods ONS have requested are those that they believe will potentially support these uses, and may be in the subsequent application.

1. To enable research being conducted by ONS' Administrative Data Census Project using ‘activity’ and characteristics data

ONS’ Administrative Data Census Project is assessing whether the Government’s stated ambition that ‘censuses after 2021 be conducted using other sources of data’ can be realized. ONS aims to replicate the type of information collected through a Census by using the administrative data already held by government, supplemented by surveys. ONS’ goal is to compare statistical outputs based on administrative data and surveys with the outputs possible using data from the planned traditional Census in 2021, to show that this alternative can meet users’ needs with high quality information at a lower cost, and more frequently.

There are two main types of information from the Hospital Episodes Statistics dataset that are needed for this project; ‘activity data’ and characteristics data. In both cases, the information will also help with ONS’ research into improving migration statistics which has significant overlap with the Admin Data Census project:

a. Activity data

ONS already has access to Administrative sources with high population coverage such as GP patient registration and tax records that provide evidence of how many people live in each area of the country. However, these sources often suffer from over coverage where people have actually left the country even though they still appear on the source, and/or address information can be out of date so although they are still in the UK, ONS would assign them to the wrong part of the country.

Evidence from other administrative sources that an individual is interacting with a service (‘activity’ data), even if these sources only cover a proportion of the population, will provide evidence that they are in the country. It may also help determine which address information recorded on the other high population coverage sources is the correct one, when those sources do not agree.

In addition to the Administrative Data census project work, ONS is also researching whether administrative sources can improve its migration statistics and ‘activity data’ in this context would be useful for the same reasons.

For this particular use, ONS only require information about where and when individuals are interacting with secondary care, not why.

b. Characteristics data

The traditional Census includes questions on ethnicity. It is currently very difficult to estimate ethnicity at a local level between Censuses. Also, very few administrative sources capture ethnicity at all, which would currently make including ethnicity on an Administrative Data Census challenging; Hospital Episode Statistics is one of the few datasets where ethnicity is captured. ONS has worked with NHS Digital data experts to understand the limitations of the ethnicity data, for example coverage and definitions used. ONS can research methodological approaches into mitigating these limitations.

Ethnicity and national identity received one of the highest user needs scores from the 2015 Census Topic Consultation, and Census ethnicity information is used by national and local decision makers. For example, in equality impact assessments when local authorities make changes to service delivery. The feasibility of producing admin data based ethnicity estimates will be important when deciding whether to move to an Admin Data based Census after 2021.

In terms of the framework of uses presented earlier in this section, then the Administrative Data Census project work described falls into multiple categories:

2. To conduct a range of Statistical Research and Health Analyses using clinical data

ONS’ Health analysts will use clinical data from HES for a range of initial statistical purposes:

a. Exploring the feasibility of producing robust projections of the future health state of the nation.

These projections would need to take into account population projections, morbidity and mortality trends, and other characteristics.

It is likely HES can provide some of the information required, although there will be gaps in for example morbidity data. The different models that could produce projections – bayesian modelling, markov chain, microsimulation - would all rely on linked individual level datasets. HES may provide valuable information on the prevalence of conditions across the population but there are gaps even with HES – for example, those with conditions / disabilities only interacting with primary and community services, or private secondary care providers (there are / may be other datasets available that could fill these gaps).

The State pension age review, 2017, called for more work on healthy life expectancy projections to better inform future decisions about the state pension age. The review also noted their potential value in informing planning future health and social care provision at a local and national level.

b. Exploring the use of linked morbidity, mortality, Census, benefits and other data to produce more granular statistics on health inequalities and health state life expectancies.

This is potentially more straightforward than a), and involves for example, linking individual’s self assessment of their health and disability state in the 2011 Census, the ONS annual population survey since 2011 (if they were surveyed), and ultimately the 2021 Census (once collected). Where an individual’s self-reported health has transitioned from good to poor, or they indicate for the first time that they have a long term limiting condition, HES data on actual morbidity can be linked in to compare this with perception of their own health.

There are limitations in using HES alone for morbidity information, not least that it only covers secondary care and many new conditions will be diagnosed through primary care only. Research would include what methodological techniques could be used to account for these limitations.

The ultimate goal would be producing healthy life expectancy estimates that do not rely on survey data, potentially allowing more granular statistics. A decision on the feasibility of removing self-reported health state questions from the Census and surveys may also lead to reductions in cost and respondent burden.

ONS healthy life expectancy statistics are central amongst the public health indicators that help guide local decisions by Local Authorities (LAs) about distribution and prioritisation of services. More local level health expectancy statistics, and more breakdowns by other characteristics, would provide insight allowing LAs to better target interventions to reduce health inequalities.

c. Exploring the care pathways in the run up to death.

This would allow ONS to add detail to avoidable mortality statistics, and explore any links between health care access and premature mortality, for example suicides and drug related deaths. Linking in Census 2011 data, ONS’ mortality data and HES, will also allow ONS to investigate health inequalities at a local level, bringing in information on characteristics such as ethnicity and occupation from the Census. A specific aspect to this research would be analysing inequalities in infant mortality, where ONS would collaborate with NHS Digital to ensure policy makers have evidence to help meet the Secretary of State’s target to halve infant mortality by 2030.

3. Statistical Research into whether ONS can improve its Address Register

This Statistical Research would have a particular focus on identification of communal establishments when using HES data, and would require information in HES about where individuals were admitted from and discharged to. Length of stay will also provide a picture of how many people ONS would expect to be classed as usually resident (> 6 months stay) in hospital at any given time. Sex information may assist with identifying communal establishments that are male or female only.

4. Statistical research to assess the feasibility of creating a better estimate in the UK household expenditure on hospitals services (inpatient only) and medical and paramedical services (outpatient).

The national accounts framework brings units and transactions together to provide a simple and understandable description of production, income, consumption, accumulation and wealth. The team will conduct Statistical Research into whether HES data can improve estimates of revenue paid by patients, split into outpatient and inpatient activity, private patient episodes split by outpatient and inpatient activity, and outpatient activity split between medical services and paramedical services.

5. To assess the feasibility of HES data enabling the UK to report data or proxy indicator data to measure its progress against the United Nation's Sustainable Development Goals (SDGs).

Interest in HES is specifically around the feasibility of better estimating the following Sustainable Development indicators:

3.1.1: Maternal mortality ratio
3.1.2: Proportion of births attended by skilled health personnel
3.3.5: Number of people requiring interventions against neglected tropical diseases
3.5.1: Coverage of treatment interventions (pharmacological, psychosocial and rehabilitation and aftercare services) for substance use disorders
3.7.1: Proportion of women of reproductive age (aged 15-49 years) who have their need for family planning satisfied with modern methods
3.8.1: Coverage of essential health services (defined as the average coverage of essential services based on tracer interventions that include reproductive, maternal, newborn and child health, infectious diseases, non-communicable diseases and service capacity and access, among the general and the most disadvantaged population)

While ONS’ SDG team will work with NHS Digital and Public Health England to produce these indicators without the need for data sharing, ONS need to be able to disaggregate these headline indicators by ethnicity, age, sex, disability, geography. Linking HES data to ONS held data such as from Census 2011 at an individual level, may help ONS to achieve some of these breakdowns where this goal.

No sensitive data can be accessed through the HDIS. The data provided would include the standard non-sensitive HES fields.

Expected Benefits:

The main short term benefit is to support ONS in learning more about HES data quality and supporting it in determining what HES data to subsequently require are shared with it under the Statistics and Registration Services Act 2007, as amended by the Digital Economy Act, 2017.

The proposed ultimate statistical uses for the data acquired under those powers / that application, are detailed in the objectives section. The potential benefits of that statistical research (which will not be possible based on remote access to HES alone) are:

1. Admin Data Census project and improved migration statistics

Population estimates and information on population characteristics are used by a wide range of national and local organisations for numerous purposes including resource and funding allocation for both local and central Government, service planning and delivery, policy development, monitoring and evaluation, and providing an accurate denominator for other statistics.

The Department of Health and their agencies use ONS' population statistics for the planning and provision of health and social care services and the distribution of funds. Throughout government, decisions on the distribution of billions of pounds of funds are made based on population estimates and projections.

Respondents to the Census Topic Consultation conducted in June 2015 gave strong evidence for high-quality and more timely population estimates. If it proves feasible, an Admin Data Census approach will deliver more timely statistics. It will potentially also deliver more accurate population statistics, at least in inter-censal periods, if not traditional Census year itself. An Admin Data Census approach will also reduce cost and respondent burden.

New and more accurate information on international and internal migration is needed to better inform migration system policy making in a post-Brexit era. For example, note the 2017 Migration Advisory Committee call for evidence on aspects of migration in response to a Government commission to guide decisions on post-Brexit migration policy.

2. Health analyses

Successful production of robust health projections would support better decision making around where to set the state pension age, and planning of health and social care services:

The Cridland Report (2017) which was commissioned by government to independently review the state pension age made the following statement:

“We believe more work is needed to understand healthy life expectancy, as it affects a range of policy areas. Projecting healthy life expectancy into the future is not currently possible, but would be valuable for future Reviews, as well as in work around health and caring.”
Independent Review of State Pension Age: Smoothing the Transition, 2017, pg 35

The report also notes:
• Developments in Healthy Life Expectancy (HLE) and Health State Transitions (HST) will have a notable impact on the demand for social care and different types of medical care, for instance the number of trained dementia nurses required in 40 years’ time?
• In order to manage budgets and allocate funding effectively, there is a need to understand what the main patterns of key diseases will be, and what the distribution of these illnesses across the population will look like.
• It is likely that the prevalence of diseases which affect the oldest old such as cancer and dementia will increase.
• If social care and health care provision needs to be increased, the national budget will need to be changed to reflect this which may result in other services seeing cuts.

Current healthy life expectancy estimates rely on ONS surveys, where despite the large sample size, the number of breakdowns geographically and by characteristic possible is limited by this sample size. Current estimates rely on aggregate figures – i.e. the prevalence of poor health / limiting long term conditions, and mortality rates by age are calculated independently and then fed into the model.

Linking health states and mortality at the individual level over time, and for a greater proportion of the population (which may be possible using HES data) will allow more granular analysis. Linking to Census and other sources to add in other characteristics, could inform interventions to support tackling inequalities at the local level.

See here for PHE guidance to local authorities:
https://www.gov.uk/government/publications/reducing-health-inequalities-in-local-areas
See here for local authority profiles from PHE which rely on a lot of ONS data:
http://fingertipsreports.phe.org.uk/health-profiles/2017/e07000087.pdf

More accurate and new statistics on the characteristics and factors associated with suicides, drug related deaths and infant mortality may provide insight that leads to better targeting of interventions, or the development of new interventions, that could ultimately save lives. Similar to some of the above, the unique benefit ONS can bring in this space is the ability to link the health and mortality sources (which NHS Digital also own and can link / analyse), with other non-health sources such as Census and DWP/HMRC data on benefits and income. This can also be done for a large proportion of the population (although there will be gaps that need to be understood and assessed).

3. Address Register Research

Research will enhance the Address Register including the information held on communal establishments (CEs), for which there is currently a recognized data gap. A better Address Register will in turn benefit ONS' other statistics, such as the population statistics described above. For example, it will allow ONS to quality assure its local level population statistics (whether from a traditional Census or other method) as local areas with CEs can have unusual demographic profiles, which can cause concern over the accuracy of the statistics unless the location and nature of the CE is known. It will also help with better planning of survey operations and sample design.

4. UK household Expenditure Statistical Research

Household Final Consumption Expenditure is a component of National Accounts; improvements therefore affect estimates of Gross Domestic Product (GDP). This is a key national economic indicator that drives national economic policy making.

5. Sustainable Development Indicator Research

The UK was at the forefront of developing the UN recognized Sustainable Development Goals (SDGs) and ONS aims to fully report on an agenda that it pushed to develop to continue to show leadership in this space. A key theme of the SDGs is to leave no one behind and ONS needs to be able to disaggregate the headline indicators so that ONS can be sure progress occurs across all groups, regardless of ethnicity, age, sex, disability, geography. Subject feasibility research, linking HES data to ONS held data such as from Census 2011 at an individual level, may help to achieve this goal.

Outputs:

The key output will be evidence to feed into ONS’ full DARS application for identifiable HES data which will follow later this year (summer 2018 at the earliest depending on when full Digital Economy Act powers come into force).

Other outputs will include internal statistical data quality reports and desk notes to guide ONS researchers working with HES data once an identifiable dataset has been acquired. These can be shared with NHS Digital analytical colleagues if useful.

No external publications or statistics are expected to be released based ONS’ remote access alone - these will come later after the subsequent application, which will therefore detail the expected external outputs.

Processing:

This Agreement permits online access to the record level HES database via the HDIS system. The system is hosted and audited by NHS Digital meaning that large transfers of data to on-site servers is reduced and NHS Digital has the ability to audit the use and access to the data.

HDIS is accessed via a two-factor secure authentication method to approved users who are in receipt of an encryption token ID. Users have to attend training before the account is set up and users are only permitted to access the datasets that are agreed within this agreement. Users log onto the HDIS system and are presented with a SAS software application called Enterprise Guide which presents the users with a list of available data sets and available reference data tables so that they can return appropriate descriptions to the coded data.

The access and use of the system is fully auditable and all users have to comply with the use of the data as specified in this agreement. The software tool also provides users with the ability to perform full data minimisation and filtering of the HES data as part of processing activities. Users are not permitted to upload data into the system.

Users of HDIS are able to produce outputs from the system in a number of formats. The system has the ability to be able to produce small row count extracts for local analysis in Excel or other local analysis software. Users are also able to produce tabulations, aggregations, reports, charts, graphs and statistical outputs for viewing on screen or export to a local system.

Only registered HDIS users will have access to record level data downloaded from the HDIS system. Following completion of the analysis the record level data will be securely destroyed.

In addition to those outlined elsewhere within this Agreement, the Office for National Statistics will:
1. only use the HES data for the purposes as outlined in this Agreement;
2. comply with the requirements of NHS Digital Code of Practice on Confidential Information, the Caldicott Principles and other relevant statutory requirements and guidance to protect confidentiality;
3. not publish the results of any analyses of the HES data unless safely de-identified in line with the anonymisation standard; and
4. comply with the guidelines set out in the HES Analysis Guide;
5. ensure role-based control access is in place to manage access to the HES data within the Office for National Statistics.

As this Agreement permits remote access to HES, this would be limited to analysing the data within the secure environment ONS is given access to, and potentially requesting export of aggregate tables.

Work would include:
-Gaining experience of wrangling such a large dataset - for example, linking records for the same individual across years
-Assessing the quality of key variables such as the ethnicity variable, for example assessing missingness and frequency of ethnic group by other characteristics
-Summarizing the number of hospital interactions by person, age, sex and Geography to give an idea of what proportion of each age-sex group in each area are interacting, and therefore, for what proportion of the population HES will give ONS evidence of their presence in the country and up to date location
-Summarizing secondary care morbidity using the diagnosis variables to exhaust how useful this aggregate information would be for ONS’ proposed health statistics purposes - this will guide if and how much clinical data ONS seeks in the subsequent full application for identifiable HES data (it is expected that some uses will require record level linkage to other sources that only ONS hold and therefore aggregate may well not suffice if the potential benefits are to be realized).

It may be useful for ONS to export some of these aggregate tables if possible. ONS already has good links into the NHS Digital secondary care information team who can help with queries about findings in the data and how best to use it.


Project 13 — DARS-NIC-57592-H7S8B

Type of data: information not disclosed for TRE projects

Opt outs honoured: N ()

Legal basis: Health and Social Care Act 2012, Section 42(4) of the Statistics and Registration Service Act (2007) as amended by section 287 of the Health and Social Care Act (2012), Approved researcher accreditation under section 39(4)(i) and 39(5) of the Statistical Registration Service Act 2007

Purposes: ()

Sensitive: Sensitive, and Non Sensitive

When:2017.09 — 2017.05.

Access method: Ongoing

Data-controller type:

Sublicensing allowed:

Datasets:

  1. MRIS - Bespoke
  2. MRIS - Scottish NHS / Registration

Objectives:

The Longitudinal Study (LS) contains data on 1 per cent of the population of England and Wales. It is used for several types of analysis: for example, studies using registration event data as outcomes or studies using linked census data.

The purpose of these studies include those that link social, occupational and demographic information to data on vital events. Examples include studies of mortality, cancer incidence and survival, and fertility patterns. Those looking at environmental effects on health and inequalities in health. Also those investigating social mobility and the study of ageing.


Project 14 — DARS-NIC-48781-Z5C2L

Type of data: information not disclosed for TRE projects

Opt outs honoured: N ()

Legal basis: Health and Social Care Act 2012

Purposes: ()

Sensitive: Non Sensitive

When:2017.09 — 2017.02.

Access method: Ongoing

Data-controller type:

Sublicensing allowed:

Datasets:

  1. MRIS - Bespoke

Objectives:

Provide an assessment of the quality of the informal date of death contained within PDS death notifications, compared with formal date of death. Provide an indication as to whether the informal date of death is sufficiently accurate, and whether there would be benefit in providing this date to researchers in advance of the formal notification. To establish what advantages of timeliness this provision could bring, and whether there is any variation by age, gender and cause of death.

Expected Benefits:

Expected measurable benefits to health and/or social care including target date: More timely notification of deaths to medical researchers will benefit the health and social care system by substantially reducing delays in the discovery-potential from record-linkage studies. Currently, research-teams may observe that a subject’s series of court appearances or benefits claims has ceased but cannot know assuredly - due to lateness of notification of fDoD - whether the explanation is that the subject has died or that s/he has been rehabilitated/employed. The research team has to allow about 2-years to account for late registered deaths, which delays deriving new knowledge, in this example about criminal sanctions or benefits.

Outputs:

The outputs will inform an assessment as to the potential advantages of using informal date of death to notify research studies of deaths in their study cohorts. At present, research-teams need to delay their record-linkage requests [for follow-up to 31 December 2015, say] by at least two years to be almost sure that ONS has been notified of almost all deaths that actually occurred in England and Wales on or before 31 December 2015. If iDoD is substantially accurate, this undesirable delay to record-linkage studies could be avoided if informal date of death was available to researchers. The research team will not know ICD-10 chapter for cause of death but in many studies, only fact-of-death was needed and, in others, imputation for likely cause-of-death may be technically possible. These are huge advantages for the discovery potential from approved record-linkage studies but are dependent on knowing how reliable iDOD is likely to be. Among those for whom iDoD exists in 2011 but no fDoD was notified by 30 June 2016, there may be falsely assigned notifications (eg in terms of NHS number) so that the number (%) of C-differences which exceed 4-years provides an upper limit for this error-rate.

Processing:

Compare PDS death notifications with GRO death registrations. NHS Digital will extract identifiable data (NHS Number, gender, date of birth, date of registration and ICD 10 primary cause of death code from ONS Mortality data) for persons with a date of death recorded between 1st Jan 2011 and 31th June 2016. NHS Digital will then link these data, using NHS Number, to data from Patient Demographic Service (PDS). For linked records NHS Digital will extract informal date of death and the PDS-system notification date, the formal date of death and ONS date of registration. NHS Digital will calculate age at formal date of death (using informal date if formal date not available) and stratify to the following age bands: < 5 years; 5-14 years; 15-44 years; 45-64 years; 65-74 years; 75-84 years; 85+ years. NHS Digital will calculate A. the difference between the ONS date of death and the informal date of death from PDS [ONS-PDS] and stratify as follows: zero days, 1-7 days; 8-14 days; 15-28 days; 29-90 days; 91-182 days; 183-365 days; 366-730 days; 731+ days; no ONS death recorded; no PDS death recorded. NHS Digital will calculate B. the difference between ONS date of registration and the PDS informal death system notification date [ONS registration-PDS notification], and stratify as follows: zero days, 1-7 days; 8-14 days; 15-28 days; 29-90 days; 91-182 days; 183-365 days; 366-730 days; 731+ days, no ONS death recorded; no PDS death recorded. The above difference-records (A and B separately) will be aggregated and tabulated by i) sex, ii) ICD-10 chapter, iii) age group and iv) year of death (from formal death date); and by all pairs of i) to iv). The difference-records (A and B) will also be cross-tabulated, separately for each co-variate level of the following four covariates: i) sex, ii) ICD-10 chapter, iii) age group and iv) year of death (from formal death date) Tabulations will compare the difference between the formal date of death (fDoD) and the informal date of death (iDoD), indicating the quality/accuracy of the informal death date; they will also compare the difference between the date of registration and date the informal death was recorded in PDS, indicating the days gained in advance notification using the informal date vs the formal date. In addition, we’d like to know, for each calendar year [2011 to 2015], how many iDODs were notified in that calendar year for whom there was no fDOD prior to 1 January 2016 had been notified by 30 June 2016. We’d like these counts, if possible, to be provided separately for each covariate-level of the following two covariates i) gender and iii) informal age group where age-group at death is based on iDOD (since, for these cases, fDOD has not been registered). Moreover, C. we’d like the death registration delay to be computed as [31 December 2015 – iDOD] and stratified as follows: zero days, 1-7 days; 8-14 days; 15-28 days; 29-90 days; 91-182 days; 183-365 days; 366-730 days; 731-1096 days; 1097-1462; 1463-1827. The difference-records [C.] will be aggregated and tabulated by i) sex, iii) age group and iv) year of iDOD; and by all pairs of i) to iv). Tabulated outputs, with small number suppression applied, will be provided to ONS. No record level or identifiable data will be released by NHS Digital.