Last week, I mentioned that I was working through and implementing the paper “Generative Agents: Interactive Simulacra of Human Behavior.” I’ve made some progress that I thought I would log here.
Background
The paper implements “generative agents”, realistic simulations of humans living in a town called Smallville. These agents act like autonomous entities with the ability to perform human-like activities, such as brushing their teeth, going to work, or grabbing lunch.
The paper describes the architecture of these agents as containing three major components:
Memory
Reflection
Planning
Memory
Each agent has a memory stream, which is a list of things that the agent has experienced. These memories are stored as a list, and a retrieval function is implemented which takes a particular situation and returns a subset of the agent’s memories that LLM should use to decide the agent’s reaction to that situation.
The retrieval method uses three factors to decide if a particular memory should be sent to the language model.
Recency: How old the memory is.
Relevance: How relevant a particular memory is to the situation. The paper uses OpenAI embeddings to calculate this.
Importance: For each memory, the paper asks the LLM to rate it between 1-10 in terms of the importance of the memory. 1 is purely mundane (e.g., brushing teeth, making bed) and 10 is extremely poignant (e.g., a break up, college acceptance).
Then we add up these three factors to get a score for each memory.
The code for this is in memory.py.
Reflection
The agents also have the ability to reflect on their experiences. These reflections are also saved in the memory stream, and they help the agents make more realistic choices in the long term as they get time to “think” about their experiences.
The paper makes the agents reflect on their recent experiences if the sum of the importances of the memories gets higher than a particular threshold.
The code for this is in agent.py.
Planning
The paper uses the LLM to get the agents to make plans for their future actions. These plans can be as detailed as needed. I’ve currently only added a high-level broad strokes planning component but I intend to improve this as things go on.
So, how does it work?
Not that good so far. I tried simulating a single agent, “Robinson Crusoe”, a castaway on a remote island. There are some issues where the agent keeps doing the same thing over and over for me.
Next steps
I’m going to continue improving the prompts and see if there are some obvious bugs I need to fix. After that, I want to start a small graphical interface so that the agents can move around and interact with their environment. This is done using Phaser in the paper.
This is so cool stuff you are building, looking forward to see how you progress ahead!
Feedback: include a screenshot of “wow a busy day for Robinson Crusoe”