SatisfIA

The SatisfIA project

We are an interdisciplinary research team developing aspiration-based designs for intelligent agents. The project is hosted by the FutureLab on Game Theory & Networks of Interacting Agents at PIK in collaboration with the AI Safety Camp, SPAR, and ENS Paris, led by Jobst Heitzig. We are currently looking for funding and are open for further collaborations.

Our original contribution to AI safety research is our focus on non-maximizing agents. The project’s approach diverges from traditional AI designs that are based on the idea of maximizing objective functions, which is unsafe if these objective functions are not perfectly aligned with actually desired outcomes.

Instead, SatisfIA’s AI agents are designed to fulfill goals specified through constraints known as aspirations, reducing the likelihood of extreme actions and increasing safety.

This is part of a broader agenda of designing agents in safer ways that you can learn about in this talk at the ENS Paris. There's also an earlier interview on Will Petillo's Y*uTube channel where Jobst talks about the rationale of non-maximizing (and also about “satisficing”, an alternative but related idea to this project's, see below).

Research focus

Our research explores various designs for non-maximizing agents that operate in complex environments under several forms on uncertainty. These environments are modelled as fully or partially observed Markov Decision Processes. We also develop corresponding planning and/or learning algorithms, primarily variations of model-based planning and reinforcement learning. Our first academic publication is about such an algorithm.

This involves the theoretical design of agents and algorithms, implementing them in software (mostly using Python and/or WebPPL), simulating their behavior in test environments such as AI safety gridworlds, and analyzing their behavior and safety implications. The goal is to provide numerical evidence and formal proofs where possible, and to contribute to the academic and non-academic discourse through publications such as this theory paper and a sequence of blog posts.

Motivation and background

The motivation for this research stems from a growing consensus among AI researchers and the public about the potential existential threat posed by powerful AI agents if not properly aligned with human values and safety concerns.
In the worst case scenario it won’t be possible to align AIs with human values, since

there might not be a set of universally acceptable values,
they might not be formalizable,
if they are, it might not be possible to discover or implement them before risk-factors start appearing.

By adopting a non-maximizing, aspiration-based approach, we aim to address these safety concerns directly by helping designing AI agents that operate in a safer and maybe also more predictable way.

Theoretical foundations

Aspiration-based goals

In contrast to traditional AI, we do not think that one should give a powerful AI system the goal or task to take that action which results in the largest possible value of some objective function (“maximizing”). Instead, we assume that goals and tasks for powerful AI systems should be formulated without any reference to such a function at all. Instead, goals should be specified by requesting that certain observables or variables should fall into certain desirable ranges.

E.g., a powerful climate-managing AI system should rather be given a task such as “keep global mean temperature in the range between zero and two degrees warming without decreasing global GDP”, rather than “maximize this or that complicated function of temperature and GDP”.

These desirability ranges are called aspirations in our project. Since aspiration-based goals are typically easier to fulfill than maximizing a certain objective function, the agent gains some freedom in choosing how exactly to achieve them. Hence we can design agents so that they use this freedom to fulfill the goals in ways that are generally safer than what a maximizing agent would do.

In order to find these safer ways of satisfying the aspirations, our designs make use of a number of safety criteria such as

how much the agent changes or disturbs its environment,
how predictable or conventional its actions are,
how much influence or resources it acquires or uses.

This approach is fundamentally different from the alternative approach of “satisficing”

In the satisficing approach, the agent does have an objective function, which has the general interpretation of “more is better”, but which might still not reflect all aspects of the agent’s preferences. In such a case, bringing about the maximum of that imperfect objective function would incur too large costs of a type not included in the specification of the objective function, and would thus not be optimal or even advisable.
A satisficing agent would thus choose to not go all the way towards the maximum of the imperfect objective function, but would rather stop searching for further increases in the objective function once a certain value of that function has been reached.

Goodhart’s law

Even though behavioral science suggests that human behavior can sometimes be well modeled as “satisficing” behavior, we do not adopt a “satisficing” approach here because it does not address another common safety risk from using objective functions: “Goodhart’s law.”

Goodhart’s law serves as a critical caution in the formulation and pursuit of goals, especially in complex systems such as those a general-purpose AI system will typically find itself in. The law consists in the finding that, usually,

when a measure becomes a target, it ceases to be a good measure.

This principle highlights a fundamental risk in setting objectives, particularly in AI-driven endeavors. By making a specific measure the objective function of an AI agent’s actions, we inadvertently shift its goals. The agent, in striving to maximize that objective function, may exploit loopholes or take shortcuts that align with the letter of the goal but deviate from its intended spirit. This will usually lead to unintended consequences and side-effects, where the pursuit of a narrowly defined objective overshadows broader, not explicitly specified considerations, such as ethical implications, societal impact, or long-term sustainability.

In the context of our research, Goodhart’s law underscores the importance of designing AI agents whose goals are not the full or partial maximization of some objective function. Instead, by embracing aspiration-based designs, we aim to create systems that are inherently more aligned with holistic and adaptable criteria.
This approach seeks to mitigate the risks associated with Goodhart’s law by ensuring that the metrics used to guide AI behavior are not fixed targets but rather flexible aspirations that encourage the agent to consider a wider range of outcomes and behaviors. Thus, our project recognizes and addresses the challenge posed by Goodhart’s law, advocating for a more nuanced and safety-conscious strategy in the development of AI systems.

Challenges

Dealing with different forms of uncertainty, incoming data, changing circumstances or goals, chance, and good or bad luck.
Adapting existing model-based planning and reinforcement learning algorithms from the maximizing context into the aspiration-based context.
Dealing with goals specified as constraints on several variables, such as temperature and precipitation, without merging them into one joint indicator.

Contributions and Impact

By advancing the understanding and implementation of non-maximizing, aspiration-based AI agents, the SatisfIA project aims to contribute significantly to the field of AI safety. The project’s outcomes will include well-documented software components, academic papers, and educational materials to disseminate our findings and encourage further research in this vital area.

In essence, SatisfIA is pioneering a shift towards safer AI by embedding the principles of aspiration-based decision making at the core of AI agent design. It can be seen as a form of AI that is “safe-by-design”. This research not only challenges conventional maximization paradigms but also opens new avenues for creating AI systems that are not contradicting human values and safety requirements. Not the least, this research poses interesting mathematical and engineering questions!

The Team

Our team currently consists of about 15 volunteers with various backgrounds, organized into small, partially overlapping sub-teams of two-to-three persons, working part-time on: