To create this cheatsheet, a variety of contributors provided resources, papers, and tools relevant to open foundation model development. Resources were grouped and curated by a team with a focus on speech and vision modalities (led by Nay San and Gabriel Ilharco, respectively).

Curators (Alphabetical):

  • Data (Across Subcategories): David Adelani, Stella Biderman, Gabriel Ilharco, Kyle Lo, Shayne Longpre, Luca Soldaini, Nay San
  • Data Cleaning, Filtering, & Mixing: Alon Albalak, Kyle Lo, Luca Soldaini
  • Data Decontamination: Stella Biderman, Shayne Longpre
  • Data Governance: Stella Biderman, Yacine Jernite, Sayash Kapoor,
  • Efficiency & Resource Allocation: Hailey Schoelkopf
  • Environmental Impact: Peter Henderson, Sayash Kapoor, Sasha Luccioni
  • General Capabilities: Rishi Bommasani, Shayne Longpre
  • License Selection: Stella Biderman, Yacine Jernite, Kevin Klyman, Aviya Skowron, Daniel McDuff
  • Model Documentation: Sayash Kapoor, Shayne Longpre
  • Pretraining Repositories: Stella Biderman, Gabriel Ilharco, Nay San, Hailey Schoelkopf
  • Reproducibility: Stella Biderman, Shayne Longpre
  • Risks & Harms: Maribeth Rauh, Laura Weidinger, Bertie Vidgen
  • Usage Monitoring: Kevin Klyman
  • Website: Shayne Longpre, Luca Soldaini, Justin Riddiough

Advisors: Stella Biderman, Peter Henderson, Yacine Jernite, Sasha Luccioni, Percy Liang, Arvind Narayanan, Victor Sanh