Introduction
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
It wraps the best libraries around (PyTorch, PyTorch Lightning,
Transformers, Hydra, Streamlit, ...)
and offers them to users with a simple CLI interface.
Installation
Example Walkthrough
You have the following data at disposal:
| User | Product-ID | Review | Overall Judgement |
|---|---|---|---|
| U1 | P1 | I really like this headphones | positive |
| U2 | P1 | Sound is terrible! I am returning these headphones right away. | negative |
| ... | ... | ... | ... |
That is, a list of reviews made by users for a given product, along with their overall judgement. Now, say you want to train a classification model on this data so that, given a new review as input, it would yield whether its overall judgement is positive or negative.
This is a Sequence Classification problem, specifically Sentiment Classification.
In a nutshell, this is how you would do it with classy:
Data Organization
By default, classy is able to work with .tsv and .jsonl files. In the case of .tsv for Sequence Classification,
classy needs the file to have only two columns, one with the sequence and the other with the label.
Say your review data is already stored in a .tsv file (with four columns, as we saw above), all you need to do is:
$ cut -f3,4 data/raw.tsv > data/classy-sentiment.tsv
classy supports many tasks and input formats, and is independent of how you
convert your data to a classy-compatible format (i.e., sed, awk, and a python script are all viable options).
Training
Now that our data is classy-compatible, we can train our model! Let us call it sequence-example:
sequence in the above command tells classy to train a Sequence Classification model. This is the only thing, beside
organizing data, that classy expects you to do. We'll go back to this later on.
Using your trained model
Now that we have trained our model, classy offers a number of commands that let you use it:
classy demospawns a Streamlit-based interface that lets you query your model visually;classy predictlets you choose whether to use your model for file prediction or in bash-interactive mode (similarly todemo, but in the terminal);classy servespawns a REST API (through FastAPI) that you can query using any REST-compatible client (requests,curl, you name it);classy evaluateruns your model on a given test set and reports metrics on the task.