19 May 2023
At Oxford Insights, we believe in creating a positive vision for how AI can be used by governments to improve public services for the people using them. The release of a number of widely accessible and adaptable large language models (LLMs), such as ChatGPT, Bard and Sydney, in recent months prompted an internal discussion on what this type of model could mean for the public sector.
It might seem premature, or too rash, to be having this discussion. Jan Leike, Alignment Lead at OpenAI, for instance, called for caution in response to what looks like a rapid adoption of these immature technologies across our economies. The technology that governments use supports the delivery of essential services and so the need for caution is arguably even more vital than in other parts of our economies.
However, governments are already finding and implementing use cases of LLMs. Within the UK government, they are being used to spot trends in healthcare reports, for example.
So, we find ourselves at a point in time where there is excitement, early movement towards adoption, and a need for caution. Therefore, we felt that it was a good time to bring together some of our own ideas and concerns about the technology. To do so, we held a group brainstorming session (a neatened version of which can be found here) to bring together our thoughts on the subject.
Framing the discussion
First of all, were we talking about letting ChatGPT loose on government data? Short answer: no.
The long answer involves explaining what LLMs are. LLMs are deep learning algorithms that generate text based on predictions about which words are likely to occur next in a piece of text. The ‘large’ in LLM refers to the number of parameters the model has and the amount of data it has been trained on. For reference, 10 years ago, state-of-the-art algorithms were being trained on 150GB datasets; now, LLMs are trained on up to an estimated 10,000GB. Being trained on all this data means that the text an LLM generates can be very useful. Models can, among other things, summarise information, extract information from large sets of text, translate, answer questions, and follow your instructions.
This type of model is not limited to those built and run by large AI companies, such as OpenAI or Google. In fact, models run by large AI companies seem less promising for integrating into government. Given the legal and security issues—such as those around privacy and GDPR—it would be difficult for governments to make use of them as they stand at the moment. Instead, we saw the most promise in models that are open-source, procured domestically, or built by government, and that are run locally on government computers. This distinction helped scope our discussion of the risks involved and how the technology could be integrated into government operations.
The applications we discussed fell into several categories; user experience, policymaking, service design, data processing, and professional support. Each is expanded below.
Government services can be hard to find and what the user needs to do to access a service can be unclear. LLMs could help users as they navigate government websites by offering tailored, real-time support.
Offering interactive guidance on regulation.
Having a digital assistant that surfaces services you are eligible for when you explain your situation.
Policy decisions need to be reached based on the best evidence available. LLMs could help increase the evidence base and support policymakers’ analysis of it.
Identifying and summarising existing policy interventions in a policy area, including from documents in a foreign language that may otherwise be inaccessible.
Detecting trends in user feedback or on publicly available discussion online to evaluate policy outcomes.
Digital services benefit from being designed and built according to agile principles. This requires significant user engagement, testing and iteration. LLMs could support teams in each of these steps. This could be impactful within services with limited design and software skills.
Constructing virtual users to supplement user research where getting engagement is a challenge.
Increasing the number of prototypes that can be tested by generating code for alternatives that may be too expensive to build currently.
Fixing bugs in software so that the teams can quickly respond to issues in early phases of roll out.
Government teams have to process large amounts of unstructured data and are often doing so manually. LLMs could be used to help automate parts of the data processing that is going on by extracting and structuring data currently collected in an unstructured format.
Unlocking useful text data that is held in legacy IT systems or auto-filling datasets from information in scans and pdfs.
Civil servants can have high caseloads or admin-heavy parts of their job to deal with, which can worsen their professional experience. LLMs could remove some of the burden for civil servants.
Reducing duplicative work by creating templates for, and help drafting of, common documents such as business cases, data sharing agreements, and memorandums of understandings.
Facilitating procurement by reviewing bids and conducting supplier research.
We tried to keep the discussion of risks and concerns to those we think specifically arise from using LLMs in government. Our thoughts covered how models are built and run, alongside how they are used in government.
The size of models, lack of transparency on their training data, and range of applications present auditing challenges. Governments may find it hard to undertake necessary auditing activities including, robustness testing, completing biases checks and carrying out risk assessments. This could be a particular challenge with LLMs procured from private providers and protected by IP.
Use of publicly available data as training data raises both data privacy and sovereignty concerns for governments. What data is used to train a model, where the data processing is done, and for what purpose, are all relevant to whether a government can and should use a model.
The environmental impact of running data centres include a large carbon and water footprint. These need to be addressed if governments are to align the use of LLMs with their environmental responsibilities and targets.
Service quality could suffer if LLMs become the primary point of communication with government. When people interact with government it may not be solely to receive information. People often want a human interaction as this provides them with additional reassurance.
The use of LLMs in policymaking could limit the new ideas we generate and test. Models are limited to predictions based on the data they have been trained on. Consequently, they will provide iterative ideas rather than revolutionary new ideas.
Current LLMs can convincingly present false information to users (hallucinate). Basing decisions on false information could have negative or harmful outcomes, for instance, if it is used to evaluate a user’s eligibility for a service. The need to ‘double check’ outputs could limit efficiency gains.
Staff could have a worse experience of working in government if models impede the way employees work or automate parts of a job that employees value, such as content creation. Staff may struggle to thrive as editors rather than creators.
There could be a slippery slope towards models making high stakes decisions. If LLMs are integrated into a government’s decision making processes, they may start to be used for higher stake decisions. If we cannot explain why a model makes a decision nor confirm its values are aligned with a government’s aims then this could cause irreversible harm.
These concerns demand that governments think deeply about how they manage LLM use in the public sector. Immediate government responses to recent LLM releases range from encouraging the developments to disabling some models for breaching existing regulations. Meanwhile, there is practical research underway into how we can mitigate potential harm, for example into how we should audit these models, which should feed into governments’ longer term responses.
As we collectively make sense of these technologies, we need a public discussion that takes seriously both the valid concerns about LLMs and how they could contribute to the public good. We welcome everyone to carry on our discussion on how LLMs could be used by governments to do so.