Analysis.CastroMedia.org

Top News Topics

This project analyzes the latest headlines to surface common themes. The notebook that generates these results runs in four steps:

Load and deduplicate headlines
Calculate word scores
Rank the headlines
Select top headlines

Each section below explains the step and presents its current output as an interactive table.

1. Load and deduplicate headlines

All stories are pulled from our headlines collection, ordered by publication date and stripped of duplicate titles.

2. Calculate word scores

The headline text is tokenized and every unique word counted, ignoring the stop words listed in exclude.txt. The raw frequency of each word becomes its score.

3. Rank the headlines

For each headline we sum the scores of the words it contains and sort the list from highest to lowest.

4. Select top headlines

After removing the most common word from the score list and re-ranking, this step repeats ten times to highlight varied stories. Any publisher listed in exclude_sources.txt is skipped during this pass so that only approved outlets appear in the highlights.