Practitioners should consider using already-optimized codebases, especially in the pre-training phase, to ensure effective use of computational resources, capital, power, and effort. Existing open-source codebases targeted at foundation model pretraining can make pretraining significantly more accessible to new practitioners and help accumulate techniques for efficiency in model training.
10 Pretraining Repositories for Foundation Model Training
- Home /
- Foundation Model Resources /
- Pretraining Repositories for Foundation Model Training
Pretraining Repositories
Levanter
Levanter is a framework for training large language models (LLMs) and other foundation models that strives for legibility, scalability, and reproducibility:
TextGPT-NeoX
A library for training large language models, built off Megatron-DeepSpeed and Megatron-LM with an easier user interface. Used at massive scale on a variety of clusters and hardware setups.
TextMegatron-DeepSpeed
A library for training large language models, built off of Megatron-LM but extended by Microsoft to support features of their DeepSpeed library.
TextMegatron-LM
One of the earliest open-source pretraining codebases for large language models. Still updated and has been used for a number of landmark distributed training and parallelism research papers by NVIDIA.
TextOpenLM
OpenLM is a minimal language modeling repository, aimed to facilitate research on medium sized LMs. They have verified the performance of OpenLM up to 7B parameters and 256 GPUs. They only depend only on PyTorch, XFormers, or Triton.
TextPytorch Image Models (timm)
Hub for models, scripts and pre-trained weights for image classification models.
Vision