Tf-idf is to find the most significant words in the documents. We used this method to represent the text to feed into a model
Term Frequency, or TF, is the number of times a word or phrase appears in a text. When a search term has a high frequency value, it means that the document is a good match because it uses that term often.
Formula for TF
$$ TF(t)=(numberOfTimesTermT Appears)/(total Number Of Terms) $$
IDF (Inverse Document Frequency) says that terms that are common in one document but are also common in many other documents may not be important. The IDF score tells how important a word is across the whole corpus.
$$ IDF =log(totalNumberOfDocuments)/(numberOfDocumentsWithTermT) $$
Then multiply:
$$ TFIDF = TF * IDF $$
Raw documents can be turned into a grid of TF-IDF features that can be used with the Tfidf Vectorizer.
Example