How to avoid common AI pitfalls in the workplace

Advice from our latest season of “Boss Class”

An illustration of an office worker holding a robot’s head—echoing Hamlet’s “to be or not to be” speech—as he contemplate AI’s role in the workplace.

Illustration: Paul Blow

Listen to this story

Your browser does not support the <audio> element.

T HE PIZZA HUT in Plano, north of Dallas, looks much like any other. Cars draw up at the drive-through window. Inside the restaurant, staff slide pizzas out of the oven into cardboard boxes. But this restaurant is special: it is a laboratory for the chain’s new ideas. And that means it is a place where the worlds of melted cheese and artificial intelligence (AI) collide.

Customers place orders by talking to a voice-enabled AI model. Machine-learning algorithms work out which orders the kitchen should make first. A screen shows AI -synthesised customer feedback from review sites and social-media platforms. Fast-food restaurants tend to have high staff turnover: new joiners here can query a chatbot to see how much of each ingredient ought to go on a medium-size pizza.

The Plano Pizza Hut is a small parable of generative- AI adoption by firms. The technology is making its way into all corners of the workplace. But it still feels incremental, not transformative. AI boosters talk of superintelligence, the end of work and of data centres in space. Here on planet Earth, the technology merely increases the chances of having the right number of pepperoni slices on your next takeaway.

Humble experiments such as these raise important questions for companies trying to use generative AI in the workplace: are the benefits just incremental and, if so, what is holding up progress? Last, what should companies do to make the most of it? All these questions are tackled in the new season of “ Boss Class ”, our subscriber-only podcast on work and management, released on January 29th. It finds that although AI models are improving rapidly, adoption still takes time. Organisations and employees have to adjust to make the technology work.

On the first question, as to whether AI is worth its salt, boosters will rightly argue that a pizza restaurant is not the best test of the technology. But that is another way of saying that its impact is very unevenly distributed. A recent analysis by Indeed, a jobs site, found that a large majority of skills mentioned in a typical posting for a software-development role could be profoundly affected by AI; most of the listed skills in a typical nursing job are currently beyond the technology.

The firms behind the AI models point to rising volumes of activity and claim these lead to hefty productivity gains. In December Open AI reported that Chat GPT Enterprise was saving users an average of 40-60 minutes on each day they used it. The newest models are approaching parity with industry experts on many real-world tasks, according to GDP val, an evaluation published by Open AI in September. Their capabilities are improving all the time.

Chart: The Economist

But many firms are still waiting for the benefits to materialise. A big new survey of executives in America, Australia, Britain and Germany, conducted by researchers from the Federal Reserve Bank of Atlanta, Macquarie University, the Bank of England and the Bundesbank, shows that almost three-quarters of businesses are using AI in some way. Yet 86% of bosses across these four countries report the technology has had no impact on labour productivity over the past three years (see chart).

If this makes for a confusing picture, it echoes the experience of actually using the technology. An AI model can outperform the world’s best mathematicians while still being stumped by the number of “r”s in “strawberry”. Its confidence in asserting things that are completely wrong would make an economist proud. Working with AI involves a mixture of achievement, sycophancy and disappointment. This is a faithful reflection of office life, but not exactly what was promised.

As for the question of why progress is halting, the best answer is that general-purpose technologies, from electricity to the internet, all take time to have their full impact. The era of generative AI is still in its infancy. “It’s like we’re all accountants and Microsoft Excel was invented last weekend,” says Bret Taylor, the chairman of Open AI and a co-founder of Sierra, a startup that builds customer-service AI agents (tools which act autonomously).

The firms behind the AI models—the likes of Open AI, Anthropic and others—are all trying to make their products more useful to organisations. Mike Krieger, who works on new products at Anthropic, the firm behind Claude, makes a distinction between models’ horizontal and vertical capabilities. Horizontal capabilities are the kinds of generic activities that are useful to almost all white-collar workers: writing, conducting research, making a PowerPoint slide without becoming homicidal.

Vertical capabilities are harder to get right because they involve specific skills: building a cashflow model in banking, say. The big AI firms are trying to amass more industry expertise by hiring specialists, among other things. But working out what it is that people do all day is hard enough when you sit right next to them, let alone if you’re a software engineer with no experience of the outside world.

A host of AI startups is trying to plug gaps of this sort, but it also takes time for markets to mature. Mr Taylor recalls that in the early days of the internet, firms spent shedloads of money to make their websites work. Now they can get much of what they need off the shelf. In time, he says, the same will be true of AI agents. “I’m hopeful that five years from now, it’ll be a very mature landscape of vendors who sell agents as solutions to problems rather than people selling models and saying, ‘Here’s a bunch of wood, build a house.’”

In other words, companies are still having to make sense of the technology for themselves. And that leads to the third question: how to manage all the problems that generative AI throws up in firms. These problems are behavioural, technical and organisational.

Behavioural problems can affect the average worker and the corner office. Employees are best placed to come up with uses for AI, says Ethan Mollick, a professor at the Wharton School at the University of Pennsylvania. But they also have lots of reasons to avoid AI,or to keep quiet about using it. They might want to take credit for work done by machines, or avoid advertising that they have more free time. Above all, they might not want to signal that their jobs can be done by AI. (“Look, boss, I’m redundant.” “Yes, you are.”)

Wage against the machine

Firms encourage adoption in all sorts of ways. Some offer cash bonuses to employees who automate tasks. Some have dashboards that show how each department uses the technology. Performance reviews can specifically call out AI adoption.

But carrots and sticks of this sort get you only so far if trust is lacking between employees and executives. Being honest about the uncertainty that lies ahead might sound like a bromide, but it is vital. “There are jobs that are going to disappear,” says Nimish Panchmatia, the head of AI at DBS, South-East Asia’s largest bank. “But new jobs are going to get created as well.” The bank runs programmes to help its employees learn new skills that might, for instance, help turn a customer-service agent into a salesperson.

Often the behavioural problem to solve is not apprehension but convenience. Glowforge, a Seattle-based manufacturer of desktop laser-cutting machines, tried a third-party AI sales-coaching tool that emailed summaries of sales calls to its staff. “Every single sales rep had routed it directly into the bin,” says Dan Shapiro, its CEO. “It was too noisy and it didn’t have a place in the rhythm of the team.”

Glowforge has since built its own tool. It, too, automatically listens in to sales calls and emails its views on what went well and badly. But now the AI ’s feedback forms part of a weekly discussion between the salesperson and their managers; the expectation that it will be talked about means people pay much more attention to the tool. “You can have a superior product, but if it doesn’t fit into somebody’s workflow, if it doesn’t fit into their day, it’s tough to get adoption,” agrees Cameron Davies, the head of AI for Yum! Brands, the owner of Pizza Hut, Taco Bell and other brands.

Overenthusiasm is another behavioural problem to solve. The unpredictability of AI ’s strengths and weaknesses—what Mr Mollick and others have christened the “jagged frontier”—means that it takes time to develop intuition for how to use the technology. Painful lessons are learned in the initial rush to adopt AI. Last year the Australian arm of Deloitte, a consultancy, issued a partial refund to the federal government for writing a report littered with AI -generated errors. This month the West Midlands police force in Britain admitted that a decision to ban Israeli fans from a football match in Birmingham was partly based on an AI hallucination about a match that never took place.

Avoiding horror stories like these also means solving a variety of technical issues. Yet the hidden costs of doing so are easily overlooked, says Rama Ramakrishnan, a former tech executive who now teaches at the Massachusetts Institute of Technology. The first cost is to adapt the model to the specific use case. This means training it on the right data, fine-tuning it and driving hallucinations down. Mr Davies of Yum! Brands says that by drawing on small language models, which are trained on subsets of data and focused on specific tasks, voice-ordering applications at the firm’s restaurants have less scope to hallucinate. “I don’t need the model that you’re ordering a pizza from to be able to tell you about the most famous economist in the world.” (Nice idea, though.)

Still, sometimes even hallucinations can be valuable. Brice Challamel, the head of AI strategy at Open AI, describes AI as a teammate capable of playing several different roles—an assistant that helps with repetitive tasks, an expert that explains complex concepts, a coach that provides feedback and a creative partner that comes up with ideas. What counts as a hallucination if it comes from the expert persona could count as imagination when the AI is being asked to brainstorm.

Glowforge’s sales-coach tool is a good example of how errors can be tolerated, or even turned to advantage. The AI often gets its feedback wrong—asserting that a sales opportunity has been missed, say, when the call was designed to tend a client relationship. But the tool has also been engineered to be “low conviction” in its judgments: its views are deliberately designed to be fodder for discussion.

Because generative AI works on the basis of probabilities, you can never know for sure what it is going to come up with. So the second hidden cost is to put safeguards in place for those use cases where errors matter. Sierra, for example, uses a “supervisor model” to monitor real-time interactions between customers and its AI agents, with humans on hand to step in if needed. Another model evaluates conversations after the fact and pushes tricky cases towards human reviewers.

Problems become much more tractable when the tasks given to agents are narrow, says Mr Taylor. Retailers have standard criteria for returning items, for example, which means a customer-service agent can ask specific questions about when the item was bought and whether it has been used, before working out what to do.

The same kind of thinking is visible at Garfield, a British startup that was the first firm in the world to be regulated to provide AI legal services. Garfield helps creditors pursue small claims, defined as unpaid debts below £10,000 ($13,800). Taking people to court for unpaid bills is a daunting process for most people; if a lawyer gets involved, it quickly becomes uneconomic. Generative AI can make this much more affordable. Businesses can connect their accounting software to Garfield, which ingests invoices and tells them whether they have a valid claim; it can then send out letters for action, which are often enough to prompt debtors to cough up, and help claimants in court, too.

Philip Young, one of the firm’s co-founders and the only lawyer on the team, says that the idea works in part because the small-claims process has “relatively well-defined inputs and outputs and has a relatively finite universe of possibilities”. More complex litigation claims would have to cope with many more permutations, which would increase the potential for errors.

As well as behavioural and technical issues, firms must also solve a variety of organisational problems to make AI work for them. Finding the right talent is an obvious issue. Failing to give the machines access to the right data is another common pitfall.

Models also have to be evaluated to ensure that the output is high-quality. For some tasks, this is quite simple. Sarah Guo, an AI investor in Silicon Valley, says that one of the reasons software engineering is in the vanguard of AI adoption is because verifying whether a bit of code works is relatively easy. In other areas, evaluating whether something is up to scratch is much harder. Trying to make a model funny, she says, is much harder because funniness is “soft and fuzzy”.

Lots of corporate tasks fall into this fuzzier category. So human experts are needed to define what counts as good enough. They are also needed to supply unwritten knowledge about how to get stuff done (the GDP val evaluation which suggests that frontier models can rival industry experts excludes tasks that depend on tacit knowledge). Harnessing this kind of in-house expertise is, in part, an organisational challenge. Mr Mollick points to the example of one large firm in which senior engineers and subject-matter experts are being put into small cross-departmental teams to move fast on specific projects.

Moving faster in one area can cause bottlenecks in another, however. Vibe-coding, a slangy term for using natural-language prompts to get an AI to write a computer program, makes it much easier for novices to create apps and features. In one way, this approach is a boon. Coding tools like Claude Code and platforms like Lovable or Replit allow end users and product managers to show what it is they want to build, rather than wasting endless hours on PowerPoint decks and lengthy documents. The phrase “demo, don’t memo” is now circulating inside some tech firms.

But that leads to a new problem. “You’ve stopped having the bottleneck at how quickly can you write code, and now you’ve got the bottleneck at how quickly can you review the code,” says Hannah Calhoon, the head of AI at Indeed. Jim Swanson, the chief information officer of Johnson & Johnson, a pharmaceutical firm, says that he used to hear managers in different territories rave about how they had used AI to improve the invoicing process, forgetting that meant more work piling up for the finance team.

An illustration of a human and a robot making pizza side by side, working together in a shared kitchen as collaborators rather than competitors.

Illustration: Paul Blow

J&J is an example of how the early rush to experiment with AI has evolved into something more measured. The firm started off with a let-a-thousand-flowers-bloom ethos. That led to a lot of weeds, too. According to Mr Swanson, 85% of the value generated was attributable to just 15% of these applications. J&J has now switched to a more focused approach, in which a central AI council and a data council ensure that the most fruitful projects are being nodded through and that the right data are available to make them work.

Metrics are also maturing, away from crude targets for AI usage and towards things that matter to the business. “One of the most important things you can do…is specify a business outcome you’re trying to drive more than a technical outcome,” says Mr Taylor. His startup, Sierra, uses outcomes-based pricing, which means clients are charged only when the AI agent actually solves a customer’s problem; if a human has to get involved, it’s free.

None of this is to downplay how remarkable generative AI is, or how quickly it is advancing into the workplace. As it makes more technical advances, tasks that were beyond it will become feasible. New business models and organisational forms will follow. Bosses in America, Australia, Britain and Germany may not have seen much impact from AI yet but the new survey shows they expect large job losses and productivity gains in the next three years.

It also helps not to get too carried away by the idea of an alien intelligence. To make AI work within organisations, a prosaic set of management problems needs to be solved. These include well-designed incentives for adoption, guardrails to mitigate problems, and systems for choosing, measuring and implementing applications. You need a mixture of pragmatism and ambition, says Mr Swanson. You need to be “a cynical optimist”. ■

논증 분석

유형: diagnosis

핵심 주장

직장 내 생성형 AI 도입은 기술 자체의 빠른 발전에도 불구하고 행동적·기술적·조직적 장벽으로 인해 변혁적이지 않고 점진적으로 진행되고 있으며, 기업은 이러한 함정을 인식하고 적극적으로 대응해야 한다.

논리구조

전제: Pizza Hut Plano 매장 사례처럼 생성형 AI는 직장 곳곳에 침투하고 있지만, 그 효과는 변혁적이지 않고 점진적으로 느껴진다.
진단: AI의 효과는 매우 불균등하게 분포되어 있다. Indeed 분석에 따르면 소프트웨어 개발 직무의 대다수 스킬은 AI에 의해 깊이 영향받지만, 간호직 등의 스킬은 현재 기술 수준 밖에 있다.
논거: OpenAI는 ChatGPT Enterprise가 사용자에게 하루 평균 40~60분을 절약해 준다고 보고했으며, 최신 모델들은 여러 실무 과제에서 전문가 수준에 근접하고 있다.
반론: Federal Reserve Bank of Atlanta, Macquarie University, Bank of England, Bundesbank 공동 조사에서 경영진의 86%는 AI가 지난 3년간 노동 생산성에 영향을 미치지 못했다고 응답했다.
진단: 진보가 더딘 이유는 전기, 인터넷 등 범용 기술이 모두 그랬듯 생성형 AI도 완전한 영향을 발휘하기까지 시간이 필요하기 때문이다. Bret Taylor OpenAI 회장은 ‘AI 시대는 아직 초기 단계’라고 표현했다.
진단: OpenAI, Anthropic 등 AI 기업들은 수평적(범용) 역량과 수직적(산업 특화) 역량의 격차를 좁히려 하지만, 수직적 역량 개발은 업무 전문성 부족으로 여전히 어렵다.
진단: 행동적 문제: 직원들은 AI 활용을 숨기려 하거나 회피할 유인이 있다. 자신의 직무가 AI로 대체 가능함을 드러내고 싶지 않기 때문이다. Ethan Mollick Wharton School 교수는 직원들이 AI 사용 동기와 기피 동기를 동시에 갖는다고 설명한다.
처방: 보너스, 대시보드, 성과 평가 등 인센티브와 함께, 경영진은 불확실성에 대해 솔직하게 소통해야 한다. DBS AI 책임자 Nimish Panchmatia는 사라지는 일자리와 생겨나는 일자리 모두에 대한 투명한 소통과 재교육 프로그램을 강조한다.
진단: 행동적 문제의 또 다른 측면은 우려가 아닌 불편함이다. Glowforge의 사례처럼, AI 도구가 직원의 업무 흐름에 자연스럽게 통합되지 않으면 외면받는다.
처방: AI 피드백이 매니저와의 주간 토론에 녹아드는 Glowforge의 방식처럼, AI 도구는 업무 리듬에 맞게 설계되고 조직 문화에 내재화되어야 도입이 성공한다.
진단: 과도한 열정도 문제다. AI의 ‘들쭉날쭉한 경계(jagged frontier)’ 때문에 초기 도입 단계에서 Deloitte 호주법인의 오류 보고서, 영국 West Midlands 경찰의 AI 환각 사건 등 심각한 실수가 발생한다.
진단: 기술적 문제: AI를 특정 용도에 맞게 조정하는 숨겨진 비용(데이터 학습, 파인튜닝, 환각 억제)이 간과되기 쉽다. Yum! Brands의 Cameron Davies는 소형 언어 모델을 활용해 특정 작업에 집중함으로써 환각 가능성을 줄인다고 설명한다.
처방: 오류가 치명적인 경우 안전장치를 마련해야 한다. Sierra는 ‘슈퍼바이저 모델’로 AI 에이전트 실시간 상호작용을 감시하고 필요 시 인간이 개입하는 구조를 사용한다.
처방: 에이전트에 부여하는 과제를 좁게 정의할수록 문제는 훨씬 다루기 쉬워진다. Bret Taylor는 과제의 범위를 제한하는 것이 AI 에이전트 성공의 핵심이라고 강조한다.

결론

생성형 AI의 직장 내 잠재력을 실현하려면 기술 자체의 발전만큼이나 직원의 신뢰 구축, 업무 흐름 통합, 기술적 안전장치 마련, 과제 범위의 명확한 정의 등 행동적·조직적·기술적 장벽을 체계적으로 극복하는 노력이 병행되어야 한다.

Explore more

→See the latest from topics you follow

Quartz 4

Explorer