What is Fake News?

Fake news is a type of yellow journalism that involves information that has been widely shared on the internet and is likely to be false.
This is often done with the help of political agendas that try to promote or force certain views.
The algorithmic spread of these kinds of stories could trap their readers in a "filter bubble" where they only see information that backs up what they already believe.

About Detecting Fake News with Python

The goal of this Python project is to tell the difference between fake news stories and real ones.
We use sklearn to make a Tf-idf Vectorizer to look at our data. After setting up a Passive-Aggressive Classifier, we finally fit the model.
The accuracy score and the confusion matrix show how well our model works.

Tf-idf is to find the most significant words in the documents. We used this method to represent the text to feed into a model
Term Frequency, or TF, is the number of times a word or phrase appears in a text. When a search term has a high frequency value, it means that the document is a good match because it uses that term often.
- Formula for TF
  
  $$ TF(t)=(numberOfTimesTermT Appears)/(total Number Of Terms) $$
IDF (Inverse Document Frequency) says that terms that are common in one document but are also common in many other documents may not be important. The IDF score tells how important a word is across the whole corpus.
- Formula for IDF
$$ IDF =log(totalNumberOfDocuments)/(numberOfDocumentsWithTermT) $$
Then multiply:

$$ TFIDF = TF * IDF $$
Raw documents can be turned into a grid of TF-IDF features that can be used with the Tfidf Vectorizer.
Example

After converting our text into vector form through TF-IDF, the data was ready to be used.
The Passive-Aggressive classifier, rather than using the dataset as a whole, takes in one piece at a time, adjusting the weights of its model based on each entry's results.
- If the prediction is correct, the model is not changed: it is "passive."
- If the prediction is incorrect, it adjusts the weights of the model until the prediction becomes correct. Hence it is "aggressive."
- PA algorithms are ideal for classifying massive streams of data.
Its goal is to make changes that fix the loss while making changes to the norm of the weight vector that aren't too big.
As with other supervised learning algorithms, the passive aggressive classifier works by taking a set of training data and dividing it into two groups: a training set and a test set. The passive aggressive classifier then uses the training set to learn how to correctly classify objects into one of two categories. Once it has learned how to do this, it is then tested on the data in the test set, and its accuracy is measured.