The Path to Safe and Productive Human-AI Work

Celeb Shot

The Path to Safe and Productive Human-AI Work

23 October, 2024 · Derek Leben

Professor and President of Ethical Algorithms Derek Leben examines the rise of Shadow AI and how organizations can responsibly navigate Human-AI hybrid work.

Recently, I was speaking with a paralegal friend who had just spent four hours of her workday doing some tedious paperwork for adoption cases. The paperwork involved taking the template for a court briefing and replacing all the relevant details, using a sheet with information about the adoptive parents and child. After a moment of hesitation, I asked: “what would you say if I told you that pretty much any frontier large language model could do this for you in a small fraction of the time?” Her first response was: “I would hope that my bosses don’t find out.”

This attitude is an increasingly common one. As workers realize that large parts of their work can be handled by AI, they are hiding it from their managers, a phenomenon with the ominous label: Shadow AI. The fears driving Shadow AI are that managers will respond by assigning workers additional tasks or even cutting entire jobs. Given the historical response of management to new technologies, these fears are justified. However, shifting entire jobs to full automation will be unwise in most cases. Instead of simply choosing between humans or AI, responsible organizations should prepare for the coming wave of Human-AI hybrid work (H-AI work) in a way that promotes more engagement with AI from workers, rather than less.

It’s becoming clear that a significant percentage of tasks in most jobs can be done more effectively by AI, and incorporating AI in even minimal ways can greatly increase productivity (see also reports from organizations like Brookings and McKinsey). For example, I sat down with my paralegal friend and uploaded two documents to GPT-4o: the court document template and a sample of fake information about adoptive parents and children. Then we gave it some careful instructions, and in less than thirty seconds, it generated a complete court filing. Looking through it, my friend was impressed with how it interpreted information rather than just filling in the blanks, recognizing things like “no specific religious affiliation” from the information sheet (even though those words did not explicitly appear), and correctly changing the name of an adopted child to the parents’ last names. There were different templates for adoptive parents with various backgrounds (with children, without children, etc.), but these weren’t necessary; any large foundation model at the level of GPT-4 or above is more than capable of adapting a template to different circumstances, which is what makes Generative AI such an amazing advance.

On the other hand, there are also serious safety and liability risks that come along with using Generative AI to perform tasks with minimal human supervision. This is mostly what my own teaching and research focuses on. My paralegal friend noticed that there were a few mistakes in the sample adoption briefing that GPT-4o generated, including writing the adoption date as the child’s birthdate. This is not a typical error; it’s a strange one that a human wouldn’t have made. It’s also a very serious error that could cause real harm if not flagged by a human. These errors can be minimized by more careful prompt engineering, like “remember that the birthdate of the child is different than the child’s adoption date,” but this is a clumsy whack-a-mole approach that will never replace human supervisors. When the inevitable errors do occur, the liability question of whom to blame also becomes more challenging, and this is another reason why it’s important to preserve a human agent as an identifiable responsible party.

As Ethan Mollick argues in his recent book, Co-Intelligence: Living and Working with AI, organizations should start categorizing some tasks as eligible for full automation, other tasks for humans-only, and a third category of tasks “delegated” to AI systems, with a human-in-the-loop. This approach to H-AI work seems to get us the best of all worlds: the productivity benefits of AI and the responsibility of humans. However, the devil is in the details, and “human-in-the-loop” includes a wide range of ways that human labor can engage with AI labor, from weak engagement to strong engagement. This is similar to the scale developed by the Society of Automotive Engineers for how engaged drivers should be when an AI system is driving the car:

Level-0: “No Assistance”: Driver performs all tasks related to driving

Level-1: “Driver Assist”: Driver performs driving tasks with assistance from built-in safety features. The driver maintains control of the vehicle.

Level-2: “Partial Automation”: Driver must remain alert, engaged, in the environment and in control. Vehicle can perform one or more driving tasks simultaneously.

Level-3: “Conditional Automation”: The vehicle monitors the environment and can perform driving tasks. Driver is not required to remain constantly alert, but must be ready to take control with notice.

Level-4: “High Automation”: Vehicle performs all driving tasks and monitors environment under limited conditions. Driver may still be required to intervene.

Level-5: “Full Automation”: Vehicle performs all driving functions under all conditions. Driver is not required.

Using this analogy, strong engagement will look more like the lower levels on this scale (1-2): constant human monitoring and review for every decision made by the AI. Weak engagement might just look like the higher levels on this scale (3-4): humans checking in randomly to make sure that AI systems are meeting performance and safety metrics.

It may seem like the best balance between productivity and safety/liability is a weak hybrid approach (levels 3-4), and this is indeed what many organizations are currently moving towards in their corporate policies. However, I believe this is the wrong approach, because weak engagement in H-AI work introduces a new type of risk: a false sense of vigilance. Asking a human worker to monitor an AI system sounds good in theory, but research has shown that it does not work in practice. People tend to trust an AI system more and more over time, a phenomenon known as automation complacency. Drivers told to monitor the behavior of an autonomous vehicle often fall asleep. Radiologists miss more cases of cancer when they use an automated system than without one. Students using AI may perform worse on standardized tests compared with those who use no tools. And there is, of course, the most famous example of overreliance on automation: in 2009 the pilots on Air France 447 failed to take control in time from their automated system, leading to a fatal crash. Some research suggests that this is a systemic problem in weak hybrid approaches to automating tasks, and AI systems post an especially high risk for automation complacency. Because of this, I recommend that organizations move instead towards a strong hybrid approach (levels 1-2), rather than a weak one. As surprising as it sounds, it may even be safer to have a task fully automated than to use a weak hybrid approach, since at least a fully automated task will lead to more vigilance downstream!

In summary, the benefits of each method for H-AI work described here can be visualized in the following table, where weak hybrid approaches lead to a loss in vigilance, along with safety and liability risks.

Clearly, H-AI work might take us to either the best possible outcome or the worst possible outcome, depending on how we manage it. Organizations must find a path to strong engagement in H-AI work, while avoiding the deathtrap of false vigilance in weak engagement. This is one of the goals of current academic and industry management research, and a top priority among my colleagues at the Tepper School of Business (see recent examples here and here). Finding the right management tools to chart this path will be one of the key challenges faced by organizations in the coming decade.

WSB works with thousands of respected influencers, thought leaders, and speakers each year and our experienced sales team is committed to the success of your event. For more employee engagement speaker ideas, please contact us.

Learn More About Derek Leben See More Artificial Intelligence Speakers Contact WSB