Since GPT-4 was launched in March 2023, there's been huge interest in prompt engineering.
Get involved with our crowdsourced digital platform to deliver impact at scale
Stay up to date:
Listen to the article
- Albert Phelps joined Accenture as a Prompt Engineer in October 2021.
- Since GPT-4 was launched in March 2023, there's been huge interest in prompt engineering. AI and machine learning specialists are the world's fastest-growing jobs, according to the World Economic Forum's Future of Jobs Report 2023.
- Here, Phelps explains what being a prompt engineer means and shares his advice for employees and organizations looking to upskill.
Albert Phelps believes that one day we might all be doing some of his job.
"Pretty much everyone at some point will become a prompt engineer, because it will be incorporated into all of our workflows, I would expect."
Since the launch of Generative Pre-trained Transformer 4 (aka GPT-4) in March 2023, prompt engineering - essentially writing tasks to train AI tools - has been touted as one of the hottest new technology jobs, with headlines focusing on the six-figure salaries they pay.
Back in January, a session at the World Economic Forum's Annual Meeting in Davos was dedicated to the impact AI would have on jobs, and prompt engineering was highlighted as a role that would grow.
Now AI and machine learning specialists have topped the Forum's list of 10 fastest-growing jobs, in the Future of Jobs Report 2023, which also found that 75% of companies surveyed were likely or highly likely to adopt AI in the coming five years.
AI and big data skills rank third for businesses' reskilling priorities – 12 places higher than in their evaluation of core skills – and are due to invest 9% of their reskilling efforts in it.
Phelps has been a full-time prompt engineer for Accenture since October 2021.
In March, Accenture's research found that across all industries, 40% of all working hours could be impacted by Large Language Models (LLMs) like GPT-4. Language tasks make up 62% of the total time employees work – and 65% of that time can be made more productive through augmentation and automation, it says.
In this edited interview, Phelps explains exactly what the role involves, why it's crucial to keep humans in the loop and shares his advice for others looking to become the prompt engineers of the future.
What is a prompt engineer and what does your role involve day-to-day?
I spend my time thinking about business problems or challenges that our clients have and working out how to set that as a task for an LLM to try and help with.
So my day-to-day could be working out how to get an LLM to give a commentary on an earnings report in the same way that an analyst would. It could be working out how to build a tool to help people do regulatory compliance for financial services.
It's about understanding the task, the objective, then coming up with some conditions to test the model can do that correctly, and then writing and iterating prompts.
In a sense, it's similar to software engineering because you are trying to get to a point where you can prove something works and optimize it. One of the main things that we optimize for is token length, because tokens equal cost in terms of inference. We're trying to get as much useful meaning and instruction to the model as possible in the least amount of space. So I work with clients to understand some of the challenges and interests they have and then design prompts and applications built with those prompts to meet those problems and business objectives.
How is the World Economic Forum ensuring the responsible use of technology?
How does AI augment the work you do?
It's very much a human-in-the-loop design. It's a dialogue that you have between yourself and the LLM as you're trying to test it and give it suggestions.
There are different techniques for prompting, so I would start off with a "zero-shot prompt". That is essentially me distilling the task into a series of instructions that the model is then supposed to follow, without examples. Then we'll have a look and say, "Well, it was good at this part of the task, but it wasn't so good at this part of the task". So then you might give the model an example of that task being performed. Many of my use cases have been in financial services and banking because that's my background.
I know more than the LLM about a topic or at least have knowledge parity, which is important because you need to be able to spot where the model is getting the answers right and where it's getting them incorrect. Once you've written the prompts and incorporated that within an application, and the model's consistently following it, I would still want a human in the loop as that is going into production because this is a new technology with associated risks.
Also, from a perspective of enjoying work, I think an augmentation-type use case in general is a better thing for people that are doing their jobs. Some tasks are frustrating or difficult – I would love to have, for instance, a bot that could just manage my diary! I would say 95% of these cases that I work on, even once we've got to the point of developing that high-quality prompt, we're still going to be imagining a human in the loop when that moves into production just simply because it's a new technology.
What exactly are 'tokens'?
Tokens are part of words, essentially the charging model of closed-source LLMs like OpenAI's GPT-4 and Anthropic's Claude. Their charging structure is usually based on how many tokens you consume as you build out your application with that model. There are limitations in terms of how much you can put within what's called the context window of that LLM. So you have to be efficient in the way you want to query it, both from a cost perspective, but also in terms of giving the people using the thing you built the opportunity to be able to connect with it more and have more of an interaction with it before it runs out of memory. There are also technical mitigations for that.
A lot of the work I am now looking at is about how you combine multiple prompted LLMs. I would usually look at that more for a sort of production application than when I'm just noodling with prompts to find new use cases and proof points.
What makes a good prompt?
Every LLM will have a subtly different way of prompting it best. There are some common techniques, like zero-shot and few-shot prompting. But for a typical prompt, initially, I would try to instantiate within the model, not necessarily a personality, but a role that I wanted it to play that I would record within the system prompt.
So, for example, if I was looking at a compliance research assistant, I would probably go, "This is what you're good at. This is what you should be aware of as your limitations, your training data cut-off is X", and then you start to get a little bit more complicated within that system prompt if you want to.
I work a lot with selection inference frameworks, which means instructing the model, before it gives its answer, to select from the context that I give it. That could be within the prompt or calling out to an external database, to select the relevant information based on the question that I'm asking it. And then to make an inference from that information as it's giving its answer. So you're splitting up how the model is coming back to you with an answer. That's useful diagnostically because you can say, "The model got this answer incorrect. Is that because it selected the wrong information or because it made the wrong inference of that information?"
That type of prompt engineering is very exciting to me. It came out in 2022 with a paper from DeepMind and Antonia Creswell. What's cool about it, is it allows you to inspect how the model is going about doing its answers and you mitigate some of the challenges that everyone's probably aware of, which is the black-box nature of some of these models. We're trying to get it to show us how it's coming up with the answers and to give us a way of getting what's called a natural language causal reasoning trace so we can see: user asked this question, model selected this data, model made this inference, here's the answer.
What was your career path to becoming a prompt engineer?
My career has always been involved with words in financial services. I started on the front line of retail banking. I handled complaints, quality-checked compliance, and worked on banking remediations, which did involve data because you were thinking about the customer impact of something going wrong at a bank and how to ascribe a monetary value to that and get the apology to the customer.
Then I moved more into risk-based compliance and worked on data privacy and cybersecurity – so looking at what the regulations say and helping technical people comply with them. I started to move more into the technology space, looking at automating controls and making them more reliable.
I'd just read about GPT-3 at the end of 2020/start of 2021.
In "the playground", I started asking questions and prompts about my own expertise, in regulation and cybersecurity. It was very good at copying, if you spoke to it in a certain way, it would speak to you back in that way. That was impressive because it wasn't directly mapping my intents, which is how natural language processing previously worked. That was quite mechanistic, but this was naturally picking up on the way that I was talking to it and it was coming back to me. But it was much harder to prompt than it is today.
At that point, I thought, this is super exciting. I want to go work on this full-time. I was fortunate enough to come across Accenture. Since then, I've been working on building prompts and, beyond that, products for our clients as a result of that journey.
What skills do you need to be a prompt engineer?
Being clear and economical in the way that you write is important. English, history, and philosophy people rejoice because you can interact with these very advanced models just with words, which is awesome.
But in parallel to building those prompts just with words, I've also been on a journey of learning how to incorporate those into applications technically. So, for instance, the GPT line of models and GitHub CoPilot have together made me much better at coding. I'm now able to not just use and build prompts within the OpenAI playground or any of the other LLM providers our clients are interested in using, but I'm also moving to the command-line interface and building applications thanks to that technology.
So you don't need a computer science background?
There are some amazing data scientists I work with who are also fantastic prompt engineers. What's unique about it is you can approach it from either side and meet in the middle. Core software development practices are still going to be relevant to prompt engineering. But if you're able to express yourself with words, there's the Andrej Karpathy quote, "The hottest new programming language is English."
I would say that is being borne out in the way that we develop stuff with our clients. I've always been interested in the technical side of things as well, but there's not such a barrier to entry now if you don't have a completely technical background.
You've probably seen that with other prompt engineers out there who might often come from a humanities background, which is great. Diversity of viewpoints as well as diversity of experiences will only make the products that you build and the solutions you're able to work on better.
How do you navigate the risks associated with generative AI?
I think there will always be a balance between open- and closed-source models. Organizations will want to upskill themselves in terms of fine-tuning an LLM, and by that, I mean more examples that you're able to show to the model.
What I say to clients is to have a look at both open- and closed-source models, they both have value, and you need to develop skills in both. But I would say the performance differential at the moment between closed- and open-source models with regard to text is still quite vast.
That simply comes down to the number of parameters and the size of the models that an OpenAI or a Cohere or an Anthropic can create. And then also the expertise that they have within those research labs to do reinforcement learning from human feedback, which makes the models more useful, able to follow instructions better.
So be cautious with what you put in there and we can talk about all the different facets of the different model providers and the terms that they offer.
What has changed since prompt engineering came under the spotlight?
I had six months of doing this in blissful peace, just getting on writing prompts. The last six months since Chat-GPT went viral have been quite different because it's now much more about how you train loads of people to do this, how you talk about this.
But we have some keen, high-quality prompt engineers that are learning from myself, and all of the people around Accenture that have this insight. People like Noelle Russell, Global AI Solutions Lead for Accenture Cloud First’s Data & AI group, she's been running training for all of Accenture who want to understand it.
It's gone much more viral than perhaps I was expecting because when I used to interact with GPT-3, even back in the day, you could see at the time it was going to be transformational, but it bubbled along. It took a few years for the Open AI playground to get anywhere near a million users. With GPT-4, it happened over an incredibly short period of time. But it's all to the good because it just means new things to try and test the model with and new opportunities to go and build stuff.
What impact do you think AI will have on jobs?
I'm not a doomer at all in terms of jobs. I think it will be a positive thing. The most mature LLM product in the market is probably GitHub CoPilot, that's probably been around for almost two years.
If you speak to developers that use it, you look at GitHub and other people's studies, they love it because it makes their job a lot easier. The co-pilot will write the code, they can verify they're happy with it, and then they can run it.
I saw a study today from Stanford, where it talked about how a relatively inexperienced call handler could be assisted by generative AI as they were talking to a customer. So, this is the mood of the customer, this is maybe what you should think about saying next. They don't have to, but they can follow that. I think it improved their performance so much that they were performing more like somebody that had been there for six months or a year.
For people that are new in role, it's going to give them the ability to self-actualize and reduce that time to being productive. Everyone loves to feel like they're doing a good job – and that's where generative AI would come in.
My partner, Rikki Turner, is an artist and she is very much inspired by generative AI. She takes a painting she creates, runs that through an image-to text-model, and then gets a title for her painting that's been created by the AI's interpretation of what it looks like.
I think the people that think this is all going to be automation underestimate the power and flexibility of the human mind.
We love novel things, but then we adapt to them super quickly.”
What advice do you give people wanting to get into prompt engineering?
Read around it. There are a few papers, there are some classics from 2022 on chain-of-thought reasoning, zero-shot chain-of-thought reasoning, and then the selection inference framework. Those are the best technical introductions to prompt engineering and why it works. The older models, you could ask it a maths question, simple arithmetic, and it would get it wrong. If you just simply added the phrase, "Let's think step by step", the model would then go step-by-step through the problem and get the answer correct.
That's the power of prompt engineering and picking the right words because you're finding emergent capabilities within the model that you didn't necessarily know were there until you found the right words. So read up on the academic papers, but also just play with the models.
There are numerous APIs and applications available to test the latest foundation models, from companies like OpenAI, Anthropic and Cohere, including OpenAI Playground, Chat-GPT, Cohere Playground and Claude from Anthropic.
Go write some prompts, that's the best way to learn because you'll start to, by trial and error, realize your voice and understand how best to write prompts that suit you and what you're looking to do.”
How can we build resiliency into the role?
It's an interesting one because I saw a paper the other day talking about an automatic prompt writer for LLMs using LLMs. You may have seen some of the hype around Auto-GPT where you're giving it an instruction and then it will go off and do stuff itself.
I'm biased, but I think it will stick around because generative AI is relatively high-risk AI as we tell our clients, you always want a human in the loop, at least in the design phase and then a testing phase just to validate that it's right.
The way that you prompt changes pretty much every time you switch models, so being adaptable, having some core understanding of the underlying technology – you don't have to know the maths, but how does it work? What's the best way of eliciting the behaviours that I want? I do think it will be a role that sticks around in a specialist sense in terms of how I get LLMs to do these tasks effectively, safely and responsibly.
But I also think pretty much everyone at some point will become a prompt engineer because it will be incorporated into all of our workflows, I would expect that if you look at the direction of travel with Microsoft.
But we'll still have to know the right question to ask, right? So I feel like everyone will take on some of this at some point.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
The views expressed in this article are those of the author alone and not the World Economic Forum.
More on Davos AgendaSee all
March 1, 2024
February 28, 2024
Mattie Rodrigue and Diva Amon
February 23, 2024
February 22, 2024
Pasquale Frega and Katrine Luise DiBona
February 21, 2024
Ameya Hadap, Thibault Villien De Gabiole and Laia Barbarà
February 20, 2024