An approach developed by Anthropic where an AI system is trained to follow a set of principles (a 'constitution') rather than relying solely on human feedback for every decision.
An approach developed by Anthropic where an AI system is trained to follow a set of principles (a 'constitution') rather than relying solely on human feedback for every decision. The model critiques and revises its own outputs based on these principles. Used to make Claude safer and more helpful.
Reinforcement Learning from Human Feedback.
The research field focused on making sure AI systems do what humans actually want them to do.
An AI safety company founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.