Skip to main content

Default Profiles

As you have seen from the previous tutorials, your systems are fully customizable in classy. Even if we strongly encourage you to create you own configurations, we provide a set of predefined and well-established profiles that will work with competitive performances in almost all setting and scenarios.

tip

To use a profile, you just have to pass the profile name to the parameter --profile at training time

classy train <task> <dataset-path> -n <model-name> --profile <profile_name>

distilbert 🌳 🚀

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 4GB
Model and Optimization
ModelOptimizer
DistilBERT (📄 Paper | 🔨 Implementation)Adafactor (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilbert
When should I use this profile?
  • You want a blazing fast training and inference
  • Quick run to evaluate your dataset and check for possible flaws
  • You don't have at your disposal a GPU with more than 4GB VRAM
  • You will use the model in low energy consumption scenarios

distilroberta 🌳 🚀

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 4GB
Model and Optimization
ModelOptimizer
DistilRoBERTa (🔨 Implementation)Adafactor (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilroberta
When should I use this profile?
  • You want a blazing fast training and inference
  • Quick run to evaluate your dataset and check for possible flaws
  • You don't have at your disposal a GPU with more than 4GB VRAM
  • You will use the model in low energy consumption scenarios

squeezebert 🌳 🚀

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 4GB
Model and Optimization
ModelOptimizer
SqueezeBERT (📄 Paper | 🔨 Implementation)Adafactor (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile squeezebert
When should I use this profile?
  • You want a blazing fast training and inference
  • Quick run to evaluate your dataset and check for possible flaws
  • You don't have at your disposal a GPU with more than 4GB VRAM
  • You will use the model in low energy consumption scenarios

bert-base 🌲 🚄

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 8GB
Model and Optimization
ModelOptimizer
BERT-base (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base
When should I use this profile?
  • You want a trade-off between training/inference speed and model performances
  • You want a well-established model for everyday use
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

gpt2 🌲 🚄

General Info
Supported TasksSupported LanguagesRequired VRAM
generationEnglish< 8GB
Model and Optimization
ModelOptimizer
GPT2 (📄 Paper | 🔨 Implementation)Adam (Paper 📄 Implementation 🔨)
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2
When should I use this profile?
  • You want an affordable (decoder-only) generative model for English
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

roberta-base 🌲 🚄

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 8GB
Model and Optimization
ModelOptimizer
RoBERTa-base (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base
When should I use this profile?
  • You want a trade-off between training/inference speed and model performances
  • You want a well-established model for everyday use
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

deberta-base 🌲 🚄

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 8GB
Model and Optimization
ModelOptimizer
DeBERTa-base (📄 Paper | 🔨 Implementation)RAdam (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-base
When should I use this profile?
  • You want a trade-off between training/inference speed and model performances
  • You want a recently released model with state-of-the-art performances on several NLU benchmarks
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

bart-base 🌲 🚄

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa generationEnglish< 8GB
Model and Optimization
ModelOptimizer
Bart-base (📄 Paper | 🔨 Implementation)RAdam (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-base
When should I use this profile?
  • You want a trade-off between training/inference speed and model performances
  • You want to tackle an English generation task with an affordable model
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

multilingual-bert 🌲 🚄 🌏

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa104 (Complete List)< 8GB
Model and Optimization
ModelOptimizer
mBERT (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile multilingual-bert
When should I use this profile?
  • You require a multilingual model covering languages other than English
  • You want a trade-off between training/inference speed and model performances
  • You want a well-established model for everyday use
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

xlm-roberta-base 🌲 🚄 🌏

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa100 (Complete list in the reference paper)< 8GB
Model and Optimization
ModelOptimizer
XLM-RoBERTa-base (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-base
When should I use this profile?
  • You require a state-of-the-art multilingual model covering languages other than English
  • You want a trade-off between training/inference speed and model performances
  • You want a well-established model for everyday use
  • You have at your disposal a GPU with at least 8GB of VRAM
  • You will use the model in moderate energy consumption scenarios

bert-large 🌵 🚜

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 11GB (fp16)
Model and Optimization
ModelOptimizer
BERT-large (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-large --fp16
caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?
  • You want state-of-the-art performances, no compromise!
  • You want to show how far you can go with the proper infrastructure
  • You want a well-established model used by thousands of users
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

roberta-large 🌵 🚜

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 11GB (fp16)
Model and Optimization
ModelOptimizer
RoBERTa-large (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile roberta-large --fp16
caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?
  • You want state-of-the-art performances, no compromise!
  • You want to show how far you can go with the proper infrastructure
  • You want a well-established model used by thousands of users
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

deberta-large 🌵 🚜

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qaEnglish< 11GB (fp16)
Model and Optimization
ModelOptimizer
DeBERTa-large (📄 Paper | 🔨 Implementation)RAdam (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-large --fp16
caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?
  • You want state-of-the-art performances, no compromise!
  • You want to show how far you can go with the proper infrastructure
  • You want one of the latest released SotA models
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

xlm-roberta-large 🌵 🚜 🌏

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa100 (Complete list in the reference paper)< 16GB (fp16)
Model and Optimization
ModelOptimizer
XLM-RoBERTa-large (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-large --fp16
caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?
  • You require a state-of-the-art multilingual model covering languages other than English, with no compromise
  • You want to show how far you can go with the proper infrastructure
  • You want a well-established model used by thousands of users
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

gpt2-medium 🌵 🚜

General Info
Supported TasksSupported LanguagesRequired VRAM
generationEnglish< 11GB (fp16)
Model and Optimization
ModelOptimizer
GPT2 (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2-medium
When should I use this profile?
  • You want a medium (decoder-only) generative model for English
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

bart-large 🌵 🚜

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa generationEnglish< 11GB (fp16)
Model and Optimization
ModelOptimizer
Bart-large (📄 Paper | 🔨 Implementation)RAdam (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-large
When should I use this profile?
  • You want state-of-the-art performances, especially on English generation problems, with no compromise!
  • You want to show how far you can go with the proper infrastructure
  • You want a well-established model used by thousands of users
  • You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

mbart 🌵 🏗️ 🌏

General Info
Supported TasksSupported LanguagesRequired VRAM
sequence sentence-pair token qa generationEnglish< 24GB (fp16)
Model and Optimization
ModelOptimizer
mBART (📄 Paper | 🔨 Implementation)RAdam (Paper 📄 Implementation 🔨)
Train command
classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bart-base
When should I use this profile?
  • You want a state-of-the-art multilingual model, covering 25 languages and particularly suited for generation tasks (e.g. machine translation), with no compromise
  • You want to show how far you can go with the proper infrastructure
  • You want a well-established model used by thousands of users
  • You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction

gpt2-large 🌵 🏗️

General Info
Supported TasksSupported LanguagesRequired VRAM
generationEnglish< 24GB (fp16)
Model and Optimization
ModelOptimizer
GPT2 (📄 Paper | 🔨 Implementation)AdamW (Paper 📄 Implementation 🔨)
Train command
classy train [generation] my_dataset_path -n my_model --profile gpt2-large
When should I use this profile?
  • You want a large (decoder-only) generative model for English
  • You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
  • You don't have any energy consumption restriction