December 29, 2023

Generative Agents to Simulate Human Decision-Making

Synthetic Humans?

In many cases, language models proved effective at "reasoning" like humans. A natural evolution of LLMs is generative agents: LLMs are given a memory, some reasoning abstraction to make decisions, and access to external tools they can use when needed. In the last few months, an increasing number of researchers (notably Park, O'Brien, Cai, et al., 2023; Vezhnevets, Agapiou, Aharon et al., 2023) popularized the idea of creating virtual environments populated with generative agents.

If the agents are believable, i.e., their behavior is a reasonable representation of human behavior, such simulations can allow us to make inferences about human decision-making without conducting experiments with real humans. Several frameworks for creating simulated environments have already been published, but there has yet to be a consensus on running systematic in-silico experiments and drawing inferences from those.

The Epistemic Question

I want to digress about the epistemic question: "By what standard should we judge whether (and in what ways, and under which conditions) the results of in-silico experiments are likely to generalize to the real world?" (formulation taken from Vezhnevets, Agapiou, Aharon, et al., 2023). We are at such an early stage of generative agents that we have no means to answer this question fully.

Consider the tools we currently have to simulate human behavior:

Economic models, agents are usually Homo economicus, rational actors maximizing economic benefit. Valid for many scenarios, but often a poor representation of actual human reasoning, especially in social situations.
Sociological and psychological models, focus on how humans behave in social situations, but constrained by the number of experiments you can run on humans, and by the challenge of generalizing findings to a larger population.

In both cases, models were derived from observed past human behavior, with researchers extrapolating rules and applying prior beliefs. If we make models based on LLMs, we are doing something analogous: we use past observed human behavior (LLMs are trained on human artifacts from the web), the results will not always generalize, and the simulation's design will introduce biases. These are all downsides we already accept in models we trust, Supply and Demand, System 1 and System 2 thinking.

Where Simulations Could Be Used

Assuming a believable representation of human beings, here are scenarios where simulations could eventually be used:

Single individuals, before important decisions, simulate millions of outcomes with different seed initializations to estimate probabilities. What is the best strategy for a salary negotiation given my manager's profile?
Smaller communities, what incentives build a stronger volunteer community? What happens when you start hiring paid staff?
Companies, before launching a feature or running A/B tests, use a sandbox to form initial hypotheses. If I add more ads, will usage drop?
Governments, if I am the mayor of a city, what happens if I change immigration policy? What are second-order effects on the local economy?

Experiments I Find Interesting

Morality, how do group dynamics affect moral choices? What happens when a moral agent is surrounded by self-interested agents? Does conformity emerge?
External tools and APIs, how do agents interact with chat apps, email, and calendars? This might reveal simulations' potential as a test bed for software products.
AGI scenarios, visualizing what happens as AI's capabilities are gradually increased inside a simulation. Does it reach a point where it becomes detrimental? Potentially useful for alignment research.

Next Steps

Choose or build a framework for the simulated environment
Choose an abstraction for the reasoning and memory of agents
Create a strategy to assess the believability of the simulations
Select a handful of scenarios that could yield interesting emergent behavior
Evaluate: are the agents believable, and if so, what can we infer about human decision-making from the results?