AI Safety Fundamentals: Alignment

Imitative Generalisation (AKA ‘Learning the Prior’)

June 17, 2024 BlueDot Impact Season 13
Imitative Generalisation (AKA ‘Learning the Prior’)
AI Safety Fundamentals: Alignment
More Info
AI Safety Fundamentals: Alignment
Imitative Generalisation (AKA ‘Learning the Prior’)
Jun 17, 2024 Season 13
BlueDot Impact

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A more detailed explanation of exactly what the training objective is (with diagrams), and the correspondence with Bayesian inference, are in the appendix.


Source:

https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1


Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Show Notes Chapter Markers

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A more detailed explanation of exactly what the training objective is (with diagrams), and the correspondence with Bayesian inference, are in the appendix.


Source:

https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1


Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

TL;DR
Goals of this post
Example: using IG to avoid overfitting in image classification.
Key difficulties for IG
Relationship with Microscope AI