Default Profiles

As you have seen from the previous tutorials, your systems are fully customizable in classy. Even if we strongly encourage you to create you own configurations, we provide a set of predefined and well-established profiles that will work with competitive performances in almost all setting and scenarios.

tip

To use a profile, you just have to pass the profile name to the parameter --profile at training time

classy train <task> <dataset-path> -n <model-name> --profile <profile_name>

distilbert 🌳 🚀

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 4GB

Model and Optimization

Model	Optimizer
DistilBERT (📄 Paper \| 🔨 Implementation)	Adafactor (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilbert

When should I use this profile?

You want a blazing fast training and inference
Quick run to evaluate your dataset and check for possible flaws
You don't have at your disposal a GPU with more than 4GB VRAM
You will use the model in low energy consumption scenarios

distilroberta 🌳 🚀

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 4GB

Model and Optimization

Model	Optimizer
DistilRoBERTa (🔨 Implementation)	Adafactor (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile distilroberta

When should I use this profile?

You want a blazing fast training and inference
Quick run to evaluate your dataset and check for possible flaws
You don't have at your disposal a GPU with more than 4GB VRAM
You will use the model in low energy consumption scenarios

squeezebert 🌳 🚀

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 4GB

Model and Optimization

Model	Optimizer
SqueezeBERT (📄 Paper \| 🔨 Implementation)	Adafactor (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile squeezebert

When should I use this profile?

You want a blazing fast training and inference
Quick run to evaluate your dataset and check for possible flaws
You don't have at your disposal a GPU with more than 4GB VRAM
You will use the model in low energy consumption scenarios

bert-base 🌲 🚄

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 8GB

Model and Optimization

Model	Optimizer
BERT-base (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base

When should I use this profile?

You want a trade-off between training/inference speed and model performances
You want a well-established model for everyday use
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

gpt2 🌲 🚄

General Info

Supported Tasks	Supported Languages	Required VRAM
`generation`	English	< 8GB

Model and Optimization

Model	Optimizer
GPT2 (📄 Paper \| 🔨 Implementation)	Adam (Paper 📄 Implementation 🔨)

Train command

classy train [generation] my_dataset_path -n my_model --profile gpt2

When should I use this profile?

You want an affordable (decoder-only) generative model for English
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

roberta-base 🌲 🚄

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 8GB

Model and Optimization

Model	Optimizer
RoBERTa-base (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-base

When should I use this profile?

You want a trade-off between training/inference speed and model performances
You want a well-established model for everyday use
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

deberta-base 🌲 🚄

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 8GB

Model and Optimization

Model	Optimizer
DeBERTa-base (📄 Paper \| 🔨 Implementation)	RAdam (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-base

When should I use this profile?

You want a trade-off between training/inference speed and model performances
You want a recently released model with state-of-the-art performances on several NLU benchmarks
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

bart-base 🌲 🚄

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa` `generation`	English	< 8GB

Model and Optimization

Model	Optimizer
Bart-base (📄 Paper \| 🔨 Implementation)	RAdam (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-base

When should I use this profile?

You want a trade-off between training/inference speed and model performances
You want to tackle an English generation task with an affordable model
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

multilingual-bert 🌲 🚄 🌏

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	104 (Complete List)	< 8GB

Model and Optimization

Model	Optimizer
mBERT (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile multilingual-bert

When should I use this profile?

You require a multilingual model covering languages other than English
You want a trade-off between training/inference speed and model performances
You want a well-established model for everyday use
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

xlm-roberta-base 🌲 🚄 🌏

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	100 (Complete list in the reference paper)	< 8GB

Model and Optimization

Model	Optimizer
XLM-RoBERTa-base (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-base

When should I use this profile?

You require a state-of-the-art multilingual model covering languages other than English
You want a trade-off between training/inference speed and model performances
You want a well-established model for everyday use
You have at your disposal a GPU with at least 8GB of VRAM
You will use the model in moderate energy consumption scenarios

bert-large 🌵 🚜

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 11GB (fp16)

Model and Optimization

Model	Optimizer
BERT-large (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bert-large --fp16

caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?

You want state-of-the-art performances, no compromise!
You want to show how far you can go with the proper infrastructure
You want a well-established model used by thousands of users
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

roberta-large 🌵 🚜

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 11GB (fp16)

Model and Optimization

Model	Optimizer
RoBERTa-large (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile roberta-large --fp16

caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?

You want state-of-the-art performances, no compromise!
You want to show how far you can go with the proper infrastructure
You want a well-established model used by thousands of users
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

deberta-large 🌵 🚜

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	English	< 11GB (fp16)

Model and Optimization

Model	Optimizer
DeBERTa-large (📄 Paper \| 🔨 Implementation)	RAdam (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile deberta-large --fp16

caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?

You want state-of-the-art performances, no compromise!
You want to show how far you can go with the proper infrastructure
You want one of the latest released SotA models
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

xlm-roberta-large 🌵 🚜 🌏

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa`	100 (Complete list in the reference paper)	< 16GB (fp16)

Model and Optimization

Model	Optimizer
XLM-RoBERTa-large (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile xlm-roberta-large --fp16

caution

Remember to use the --fp16 at training time or otherwise the model may not fit in memory.

When should I use this profile?

You require a state-of-the-art multilingual model covering languages other than English, with no compromise
You want to show how far you can go with the proper infrastructure
You want a well-established model used by thousands of users
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

gpt2-medium 🌵 🚜

General Info

Supported Tasks	Supported Languages	Required VRAM
`generation`	English	< 11GB (fp16)

Model and Optimization

Model	Optimizer
GPT2 (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [generation] my_dataset_path -n my_model --profile gpt2-medium

When should I use this profile?

You want a medium (decoder-only) generative model for English
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

bart-large 🌵 🚜

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa` `generation`	English	< 11GB (fp16)

Model and Optimization

Model	Optimizer
Bart-large (📄 Paper \| 🔨 Implementation)	RAdam (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa|generation] my_dataset_path -n my_model --profile bart-large

When should I use this profile?

You want state-of-the-art performances, especially on English generation problems, with no compromise!
You want to show how far you can go with the proper infrastructure
You want a well-established model used by thousands of users
You have at your disposal a GPU with at least 11GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

mbart 🌵 🏗️ 🌏

General Info

Supported Tasks	Supported Languages	Required VRAM
`sequence` `sentence-pair` `token` `qa` `generation`	English	< 24GB (fp16)

Model and Optimization

Model	Optimizer
mBART (📄 Paper \| 🔨 Implementation)	RAdam (Paper 📄 Implementation 🔨)

Train command

classy train [sequence|sentence-pair|token|qa] my_dataset_path -n my_model --profile bart-base

When should I use this profile?

You want a state-of-the-art multilingual model, covering 25 languages and particularly suited for generation tasks (e.g. machine translation), with no compromise
You want to show how far you can go with the proper infrastructure
You want a well-established model used by thousands of users
You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

gpt2-large 🌵 🏗️

General Info

Supported Tasks	Supported Languages	Required VRAM
`generation`	English	< 24GB (fp16)

Model and Optimization

Model	Optimizer
GPT2 (📄 Paper \| 🔨 Implementation)	AdamW (Paper 📄 Implementation 🔨)

Train command

classy train [generation] my_dataset_path -n my_model --profile gpt2-large

When should I use this profile?

You want a large (decoder-only) generative model for English
You have at your disposal a GPU with at least 24GB of VRAM that supports fp16 precision
You don't have any energy consumption restriction

distilbert 🌳 🚀​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

distilroberta 🌳 🚀​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

squeezebert 🌳 🚀​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

bert-base 🌲 🚄​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

gpt2 🌲 🚄​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

roberta-base 🌲 🚄​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

deberta-base 🌲 🚄​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

bart-base 🌲 🚄​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

multilingual-bert 🌲 🚄 🌏​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

xlm-roberta-base 🌲 🚄 🌏​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

bert-large 🌵 🚜​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

roberta-large 🌵 🚜​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

deberta-large 🌵 🚜​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

xlm-roberta-large 🌵 🚜 🌏​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

gpt2-medium 🌵 🚜​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

bart-large 🌵 🚜​

General Info​

Model and Optimization​

Train command​

When should I use this profile?​

distilbert 🌳 🚀

General Info

Model and Optimization

Train command

When should I use this profile?

distilroberta 🌳 🚀

General Info

Model and Optimization

Train command

When should I use this profile?

squeezebert 🌳 🚀

General Info

Model and Optimization

Train command

When should I use this profile?

bert-base 🌲 🚄

General Info

Model and Optimization

Train command

When should I use this profile?

gpt2 🌲 🚄

General Info

Model and Optimization

Train command

When should I use this profile?

roberta-base 🌲 🚄

General Info

Model and Optimization

Train command

When should I use this profile?

deberta-base 🌲 🚄

General Info

Model and Optimization

Train command

When should I use this profile?

bart-base 🌲 🚄

General Info

Model and Optimization

Train command

When should I use this profile?

multilingual-bert 🌲 🚄 🌏

General Info

Model and Optimization

Train command

When should I use this profile?

xlm-roberta-base 🌲 🚄 🌏

General Info

Model and Optimization

Train command

When should I use this profile?

bert-large 🌵 🚜

General Info

Model and Optimization

Train command

When should I use this profile?

roberta-large 🌵 🚜

General Info

Model and Optimization

Train command

When should I use this profile?

deberta-large 🌵 🚜

General Info

Model and Optimization

Train command

When should I use this profile?

xlm-roberta-large 🌵 🚜 🌏

General Info

Model and Optimization

Train command

When should I use this profile?

gpt2-medium 🌵 🚜

General Info

Model and Optimization

Train command

When should I use this profile?

bart-large 🌵 🚜

General Info

Model and Optimization

Train command

When should I use this profile?