Essential reading on AI alignment, fairness, interpretability and responsible development.
14 items
Comprehensive safety overview
RLHF methodology
Self-supervised alignment
Fairness evaluation
Scaling considerations
Stuart Russell's framing
Anthropic's research
Hallucination and truthfulness
Behavioral emergent risks
Safety benchmarks
Map of active research areas
Foundational AI safety paper by Amodei et al.
Research organization for reducing AI risks
Government AI risk standards
Start curating and sharing your links, files, and resources.