Default Profiles
As you have seen from the previous tutorials, your systems are fully customizable in classy
.
Even if we strongly encourage you to create you own configurations, we provide a set of predefined and well-established profiles
that will work with competitive performances in almost all setting and scenarios.
To use a profile, you just have to pass the profile name to the parameter --profile
at training time
classy train <task> <dataset-path> -n <model-name> --profile <profile_name>
distilbert 🌳 🚀
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 4GB |
Model and Optimization
Model | Optimizer |
---|---|
DistilBERT (📄 Paper | 🔨 Implementation) | Adafactor (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilbert
When should I use this profile?
- You want a blazing fast training and inference
- Quick run to evaluate your dataset and check for possible flaws
- You don't have at your disposal a GPU with more than 4GB VRAM
- You will use the model in low energy consumption scenarios
distilroberta 🌳 🚀
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 4GB |
Model and Optimization
Model | Optimizer |
---|---|
DistilRoBERTa (🔨 Implementation) | Adafactor (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilroberta
When should I use this profile?
- You want a blazing fast training and inference
- Quick run to evaluate your dataset and check for possible flaws
- You don't have at your disposal a GPU with more than 4GB VRAM
- You will use the model in low energy consumption scenarios
squeezebert 🌳 🚀
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 4GB |
Model and Optimization
Model | Optimizer |
---|---|
SqueezeBERT (📄 Paper | 🔨 Implementation) | Adafactor (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile squeezebert
When should I use this profile?
- You want a blazing fast training and inference
- Quick run to evaluate your dataset and check for possible flaws
- You don't have at your disposal a GPU with more than 4GB VRAM
- You will use the model in low energy consumption scenarios
bert-base 🌲 🚄
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
BERT-base (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base
When should I use this profile?
- You want a trade-off between training/inference speed and model performances
- You want a well-established model for everyday use
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
gpt2 🌲 🚄
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
generation | English | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
GPT2 (📄 Paper | 🔨 Implementation) | Adam (Paper 📄 Implementation 🔨) |
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2
When should I use this profile?
- You want an affordable (decoder-only) generative model for English
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
roberta-base 🌲 🚄
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
RoBERTa-base (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base
When should I use this profile?
- You want a trade-off between training/inference speed and model performances
- You want a well-established model for everyday use
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
deberta-base 🌲 🚄
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
DeBERTa-base (📄 Paper | 🔨 Implementation) | RAdam (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-base
When should I use this profile?
- You want a trade-off between training/inference speed and model performances
- You want a recently released model with state-of-the-art performances on several NLU benchmarks
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
bart-base 🌲 🚄
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa generation | English | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
Bart-base (📄 Paper | 🔨 Implementation) | RAdam (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-base
When should I use this profile?
- You want a trade-off between training/inference speed and model performances
- You want to tackle an English generation task with an affordable model
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
multilingual-bert 🌲 🚄 🌏
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | 104 (Complete List) | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
mBERT (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile multilingual-bert
When should I use this profile?
- You require a multilingual model covering languages other than English
- You want a trade-off between training/inference speed and model performances
- You want a well-established model for everyday use
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
xlm-roberta-base 🌲 🚄 🌏
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | 100 (Complete list in the reference paper) | < 8GB |
Model and Optimization
Model | Optimizer |
---|---|
XLM-RoBERTa-base (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-base
When should I use this profile?
- You require a state-of-the-art multilingual model covering languages other than English
- You want a trade-off between training/inference speed and model performances
- You want a well-established model for everyday use
- You have at your disposal a GPU with at least 8GB of VRAM
- You will use the model in moderate energy consumption scenarios
bert-large 🌵 🚜
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 11GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
BERT-large (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-large --fp16
Remember to use the --fp16
at training time or otherwise the model may not fit in memory.
When should I use this profile?
- You want state-of-the-art performances, no compromise!
- You want to show how far you can go with the proper infrastructure
- You want a well-established model used by thousands of users
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
roberta-large 🌵 🚜
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 11GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
RoBERTa-large (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile roberta-large --fp16
Remember to use the --fp16
at training time or otherwise the model may not fit in memory.
When should I use this profile?
- You want state-of-the-art performances, no compromise!
- You want to show how far you can go with the proper infrastructure
- You want a well-established model used by thousands of users
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
deberta-large 🌵 🚜
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | English | < 11GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
DeBERTa-large (📄 Paper | 🔨 Implementation) | RAdam (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-large --fp16
Remember to use the --fp16
at training time or otherwise the model may not fit in memory.
When should I use this profile?
- You want state-of-the-art performances, no compromise!
- You want to show how far you can go with the proper infrastructure
- You want one of the latest released SotA models
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
xlm-roberta-large 🌵 🚜 🌏
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa | 100 (Complete list in the reference paper) | < 16GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
XLM-RoBERTa-large (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-large --fp16
Remember to use the --fp16
at training time or otherwise the model may not fit in memory.
When should I use this profile?
- You require a state-of-the-art multilingual model covering languages other than English, with no compromise
- You want to show how far you can go with the proper infrastructure
- You want a well-established model used by thousands of users
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
gpt2-medium 🌵 🚜
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
generation | English | < 11GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
GPT2 (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2-medium
When should I use this profile?
- You want a medium (decoder-only) generative model for English
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
bart-large 🌵 🚜
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa generation | English | < 11GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
Bart-large (📄 Paper | 🔨 Implementation) | RAdam (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-large
When should I use this profile?
- You want state-of-the-art performances, especially on English generation problems, with no compromise!
- You want to show how far you can go with the proper infrastructure
- You want a well-established model used by thousands of users
- You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
mbart 🌵 🏗️ 🌏
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
sequence sentence-pair token qa generation | English | < 24GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
mBART (📄 Paper | 🔨 Implementation) | RAdam (Paper 📄 Implementation 🔨) |
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bart-base
When should I use this profile?
- You want a state-of-the-art multilingual model, covering 25 languages and particularly suited for generation tasks (e.g. machine translation), with no compromise
- You want to show how far you can go with the proper infrastructure
- You want a well-established model used by thousands of users
- You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction
gpt2-large 🌵 🏗️
General Info
Supported Tasks | Supported Languages | Required VRAM |
---|---|---|
generation | English | < 24GB (fp16) |
Model and Optimization
Model | Optimizer |
---|---|
GPT2 (📄 Paper | 🔨 Implementation) | AdamW (Paper 📄 Implementation 🔨) |
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2-large
When should I use this profile?
- You want a large (decoder-only) generative model for English
- You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
- You don't have any energy consumption restriction