To create this cheatsheet, a variety of contributors provided resources, papers, and tools relevant to open foundation model development. Resources were grouped and curated by a team with a focus on speech and vision modalities (led by Nay San and Gabriel Ilharco, respectively).
Curators (Alphabetical):
- Data (Across Subcategories): David Adelani, Stella Biderman, Gabriel Ilharco, Kyle Lo, Shayne Longpre, Luca Soldaini, Nay San
- Data Cleaning, Filtering, & Mixing: Alon Albalak, Kyle Lo, Luca Soldaini
- Data Decontamination: Stella Biderman, Shayne Longpre
- Data Governance: Stella Biderman, Yacine Jernite, Sayash Kapoor,
- Efficiency & Resource Allocation: Hailey Schoelkopf
- Environmental Impact: Peter Henderson, Sayash Kapoor, Sasha Luccioni
- General Capabilities: Rishi Bommasani, Shayne Longpre
- License Selection: Stella Biderman, Yacine Jernite, Kevin Klyman, Aviya Skowron, Daniel McDuff
- Model Documentation: Sayash Kapoor, Shayne Longpre
- Pretraining Repositories: Stella Biderman, Gabriel Ilharco, Nay San, Hailey Schoelkopf
- Reproducibility: Stella Biderman, Shayne Longpre
- Risks & Harms: Maribeth Rauh, Laura Weidinger, Bertie Vidgen
- Usage Monitoring: Kevin Klyman
- Website: Shayne Longpre, Luca Soldaini, Justin Riddiough
Advisors: Stella Biderman, Peter Henderson, Yacine Jernite, Sasha Luccioni, Percy Liang, Arvind Narayanan, Victor Sanh