This project analyzes the latest headlines to surface common themes. The notebook that generates these results runs in four steps:
Each section below explains the step and presents its current output as an interactive table.
All stories are pulled from our headlines collection, ordered by publication date and stripped of duplicate titles.
The headline text is tokenized and every unique word counted, ignoring the stop words listed in exclude.txt
. The raw frequency of each word becomes its score.
For each headline we sum the scores of the words it contains and sort the list from highest to lowest.
After removing the most common word from the score list and re-ranking, this step repeats ten times to highlight varied stories.