Insight

Bloomers talk about Large Language Models (LLMs)

Insights about the strengths and weaknesses of LLMs from disciplines across Bloom

At Bloom we work to improve outcomes for people by empowering governments to deliver services better. As a team of just 45 people, we’re always looking for ways to scale the impact of our work beyond the projects our small but mighty team can take on.

So, over the past months, we’ve deliberately engaged our teams to learn about the strengths and weaknesses of LLMs, and establish some baseline principles for applying them. These experiments have led to fruitful discoveries, unexpected turns, and some dead ends. The lessons we’re learning have helped us amplify our team’s impact within and across our project work—and we want to share our learnings more widely!

In this post, we’ll share some of our team’s impressions from our experiments so far.

There is potential for LLM-enabled services but you have to understand what your audience is looking for.
—
Paul Craig, technical strategist

Thoughts on LLMs from a content strategist

Chelsea Levinson is a content strategist at Bloom with previous experience using LLMs as part of a writing process. We sat down to ask Chelsea her thoughts on how LLMs will change her work.

What ways might LLMs change writing in government?

LLMs show a lot of promise for updating content to meet government plain language and design standards. For example, an older page about a government program might start off as a massive block of text with tons of jargon. You could run it through an LLM and it will come out with easily scannable sections with descriptive headers, plus shorter and simpler sentences. You could even program the LLM to replace outdated jargon with updated plain language terminology. Or you could program it to adhere to the rules of your department’s style guide.

LLMs also have a lot of promise for language translation. I think they could even potentially be used for information architecture and creating ideal sitemaps based on user data.

Are there times when using LLMs to write content are more less risky?

There’s a level of risk any time you use LLMs to generate or rewrite content. That’s because they’re not always accurate. So anytime you use an LLM to generate or edit government content, it should be more of a jumping off point. Everything will still need to be checked for accuracy by both a skilled editor and a subject matter expert. It’s not a magic pill to instantly rewrite content. And without a properly programmed and trained LLM, it can sometimes take longer to edit generated content for accuracy than if you write it from scratch. (Ask me how I know!)

That said, there are some areas where using LLMs for content is definitely riskier. That includes any subject with a high level of complexity or where compliance is a major factor. So, for example communicating government loan programs, or healthcare or social security benefits. These are touchpoints where accuracy is so important, both from a user trust perspective and from a compliance standpoint. You don’t want to be wrong when it comes to these subjects because there are heavy consequences. They really affect peoples’ lives.

What’s your advice for content strategists trying LLMs?

Experiment, but don’t lose the human touch! LLMs work best as collaborators. They can’t do everything, and they shouldn’t. As a content strategist, you’re the one who knows the ins and outs of your work. You’re aware of the intricacies of your programs that an LLM can’t “understand.” Always check and double check any LLM-generated content. An LLM can help save you a lot of time and work. But ultimately, it’s just another tool, not a replacement for human touch and expertise.

Experiment, but don’t lose the human touch! LLMs work best as collaborators. They can’t do everything, and they shouldn’t.
—
Chelsea Levinson, content strategist

Thoughts on LLMs from a UX/UI designer

Jackie Chang is a UX/UI designer at Bloom. We sat down to ask Jackie how generative AI (often driven by LLMs) might change interface and visual design.

How might LLMs change designers’ work?

We’re beginning to see that AI which generates images (many of which use LLM-like components) will make it easier for more people to generate images and storyboards—and that can be very helpful during the design process. I think we will start looking to source illustrations from LLMs. I can also see a near-term world where generative AI helps us design interfaces, particularly common interface types.

Are there ways designers shouldn’t use LLMs?

Some people say that you can replace talking to users with AI tools that simulate what a user would do. That’s obviously not true—we get a lot of richness and detail from talking to real people that AI can never replicate. But making it faster to draw illustrations, and generally assemble pieces of the design process, seems low risk.

One piece of advice for government designers thinking about LLMs?

Design and research is a large discipline, and it’s hard to make general statements about where it will be more or less useful. But beyond changing our work practice, I am hopefully cautious that LLM-based tools will make it easier for people to work with governments, particularly to “translate” complicated government pages into something easier to understand. I hope we all keep our eyes out for ways LLMs can help people interact with the government, not just make our [designer] lives easier.

We get a lot of richness and detail from talking to real people that AI can never replicate.
—
Jackie Chang, UX/UI designer

Thoughts on LLMs from a tech strategist

Paul Craig is a technology strategist at Bloom, where he helps government agencies understand which technologies match their (and their users’) needs. We sat down to talk with Paul about his outside-of-work experiences building LLM-driven products.

How might LLMs help government agencies?

I’ve worked for governments for a long time, and I really think LLMs that help answer people’s questions make sense for governments. LLMs are good at finding and summarizing content in long documents, and governments are great at producing long documents. Governments are (rightly) afraid of the risks of deploying an experimental technology like AI, but using an LLM to generate an answer for someone is potentially better than waiting on hold for 2 hours.

I’ve actually built a custom chatbot that does this: TaxGPT.ca gives tax-filing information to Canadians using public government data. Nobody has to use it, but it’s a free way to get information quickly, making it a good option for some people. The challenge of building with an LLM, as with any technology, is how to leverage the good features (fast, cheap, responds like a human) and mitigate bad ones (sometimes inaccurate, doesn’t know when it doesn’t know)?

Where should governments start?

Canada’s Institute for Governance published a report on public-sector AI usage which makes the case that AI in government should be ‘explainable’—you can explain how your system works—and “reversible”—automated actions can be undone if needed. Any technology is a risk if not applied judiciously and with forethought. Start with low-risk informational use-cases and take on more functionality as your confidence grows.

One piece of advice for engineers building LLM-based tools?

The hype around LLMs has created a lot of light but not a lot of heat. There is much enthusiasm about AI’s potential, but little information about usage or consumer preference. There are also plenty of models being released, making it seem very hard to keep up.

As a counterbalance to this, I recently published usage data from my custom chatbot and my takeaway is: there is potential for LLM-enabled services but you have to understand what your audience is looking for.

Given this, my one piece of advice would be: pick one model and one use case, build the cheapest possible prototype that you can, and get it in the hands of users to test with. A lot of the hype focuses on the technology, but you should be starting with the users.

AI in government should be ‘explainable’—you can explain how your system works—and 'reversible'—automated actions can be undone if needed.
—
Paul Craig strategist

Thoughts on LLMs from a senior product and delivery manager

Shana Kimball leads interdisciplinary Bloom teams that serve government agencies. We spoke with Shana about how LLMs have changed her own work as a product and delivery manager, and the work she hopes to do with government clients.

What ways might LLMs help your field meet peoples’ needs better?

One of my goals as a product manager is relentlessly removing distractions so people can focus on what they’re good at. So, with LLMs (and other tools), I look to remove distractions, swat away tedium, and create the right “vibes” for the team. For example, I use ChatGPT to synthesize rambly writing into something more succinct and turn long texts we have to read into bullet points. I’ve also used LLMs to do some quick data operations, like comparing two spreadsheets and looking for differences, or looking for patterns. Sometimes I’m not very impressed with it (like an intern), but often it’s saving me a bit of time.

What uses of LLMs seem low-risk for product managers? Which seem higher risk?

I think it’s riskiest to completely offload decision making to any technology, AIs included. As a PM, I could imagine a future situation where my client wants to bring LLMs to bear on their problems. I’d want to be a thought partner to unpack and manage the trade-offs and costs of using various AI-based approaches. I love the idea of being a PM on a project that’s research focused in that way.

At the low risk end, lots of the content analysis and text tasks I mentioned above. My big principle is making sure using these tools are not evacuating my own judgment.

One piece of advice for product managers trying LLMs?

As product managers, we have the opportunity to experiment with the tools we use for our work, but also model and frame how teams engage with them. In the Bloom context, that looks like the team knowing and adopting our principles, but also adapting them (and LLMs) to our problem space. What makes sense within our work. How can we use these principles, and come up with our recipes.

As product managers, we have the opportunity to experiment with the tools we use for our work, but also model and frame how teams engage with them. In the Bloom context, that looks like the team knowing and adopting our principles, but also adapting them (and LLMs) to our problem space.
—
Shana Kimball, product and delivery manager

Thoughts on LLMs from a content strategist

Thoughts on LLMs from a UX/UI designer

Thoughts on LLMs from a tech strategist

Thoughts on LLMs from a senior product and delivery manager

Related work

How we're exploring Large Language Models (LLMs) in civic tech

Our principles for using Large Language Models (LLMs)

3 potential ways to use Large Language Models (LLMs) in public interest work

Let’s chat