← Essays
December 29, 2023
Generative Agents to Simulate Human Decision-Making
Synthetic Humans?
In many cases, language models proved effective at "reasoning" like humans. A natural
evolution of LLMs is generative agents: LLMs are given a memory, some reasoning
abstraction to make decisions, and access to external tools they can use when needed.
In the last few months, an increasing number of researchers (notably Park, O'Brien, Cai,
et al., 2023; Vezhnevets, Agapiou, Aharon et al., 2023) popularized the idea of creating
virtual environments populated with generative agents.
If the agents are believable, i.e., their behavior is a reasonable representation of
human behavior, such simulations can allow us to make inferences about human
decision-making without conducting experiments with real humans. Several frameworks for
creating simulated environments have already been published, but there has yet to be a
consensus on running systematic in-silico experiments and drawing inferences from those.
The Epistemic Question
I want to digress about the epistemic question: "By what standard should we judge
whether (and in what ways, and under which conditions) the results of in-silico experiments
are likely to generalize to the real world?" (formulation taken from Vezhnevets,
Agapiou, Aharon, et al., 2023). We are at such an early stage of generative agents that
we have no means to answer this question fully.
Consider the tools we currently have to simulate human behavior:
- Economic models, agents are usually Homo economicus, rational actors
maximizing economic benefit. Valid for many scenarios, but often a poor representation
of actual human reasoning, especially in social situations.
- Sociological and psychological models, focus on how humans behave
in social situations, but constrained by the number of experiments you can run on
humans, and by the challenge of generalizing findings to a larger population.
In both cases, models were derived from observed past human behavior, with researchers
extrapolating rules and applying prior beliefs. If we make models based on LLMs, we are
doing something analogous: we use past observed human behavior (LLMs are trained on human
artifacts from the web), the results will not always generalize, and the simulation's
design will introduce biases. These are all downsides we already accept in models we
trust, Supply and Demand, System 1 and System 2 thinking.
Where Simulations Could Be Used
Assuming a believable representation of human beings, here are scenarios where simulations
could eventually be used:
- Single individuals, before important decisions, simulate millions
of outcomes with different seed initializations to estimate probabilities. What is the
best strategy for a salary negotiation given my manager's profile?
- Smaller communities, what incentives build a stronger volunteer
community? What happens when you start hiring paid staff?
- Companies, before launching a feature or running A/B tests, use a
sandbox to form initial hypotheses. If I add more ads, will usage drop?
- Governments, if I am the mayor of a city, what happens if I change
immigration policy? What are second-order effects on the local economy?
Experiments I Find Interesting
- Morality, how do group dynamics affect moral choices? What happens
when a moral agent is surrounded by self-interested agents? Does conformity emerge?
- External tools and APIs, how do agents interact with chat apps,
email, and calendars? This might reveal simulations' potential as a test bed for
software products.
- AGI scenarios, visualizing what happens as AI's capabilities are
gradually increased inside a simulation. Does it reach a point where it becomes
detrimental? Potentially useful for alignment research.
Next Steps
- Choose or build a framework for the simulated environment
- Choose an abstraction for the reasoning and memory of agents
- Create a strategy to assess the believability of the simulations
- Select a handful of scenarios that could yield interesting emergent behavior
- Evaluate: are the agents believable, and if so, what can we infer about human
decision-making from the results?