Analysis.CastroMedia.org

A platform for primary sociological research based on the latest data from reputable sources on local issues.

Here are the projects:

Data Source Dashboard - Summarizes all data sources and their current headline counts.
Compute - Catalogues estimated int8 compute accessible to different actors.
Headline Analysis - Collects all news headlines and produces time-based summaries.
Latest Headlines - All the current headlines from all the rss feeds we are monitoring, sorted by pubdate, deduplicated.
Top News Topics - Highlights news stories by scoring and ranking the latest headlines.

Here are the data sources we collect and curate:

bbc - Asia News from Deutsche Welle -
bbc - Europe News from Deutsche Welle -
bbc - Top News from Deutsche Welle -
bbc - Business News from Deutsche Welle -
tribune - Business News from the Chicago Tribune -
bbc - World News from Deutsche Welle -
tribune - Chicago News from the Chicago Tribune -
census - California Zip Code Demographics -
fred - US Nominal GDP (quarterly) - US Nominal GDP (quarterly)
fred - German Real GDP (quarterly) - German Real GDP (quarterly)
fred - US Real GDP (quarterly) - US Real GDP (quarterly)
fred - US Federal Debt: Total US Public Debt - US Federal Debt: Total US Public Debt
fred - US Federal Debt: Total Public Debt as Percent of Gross Domestic Product - US Federal Debt: Total Public Debt as Percent of Gross Domestic Product
fred - Real M2 Money Stock (monthly) - Real M2 Money Stock (monthly)
fred - US Household Debt Service Payments as a Percent of Disposable Personal Income - US Household Debt Service Payments as a Percent of Disposable Personal Income
fred - US Civilian Unemployment Rate (monthly) - US Civilian Unemployment Rate (monthly)
latimes - Business News from the Los Angeles Times - Business News from the Los Angeles Times
latimes - US News from the Los Angeles Times - US News from the Los Angeles Times
latimes - US Politics News from the Los Angeles Times - US Politics News from the Los Angeles Times
bbc - BBC News Africa - BBC News Africa
nyt - Africa News from the New York Times - Africa News from the New York Times
nyt - Americas' News from the New York Times - Americas' News from the New York Times
bbc - BBC News Asia - BBC News Asia
dw - Asia News from Deutsche Welle - Asia News from Deutsche Welle
nyt - Asia News from the New York Times - Asia News from the New York Times
bbc - BBC News Business - BBC News Business
chitri - Business News from the Chicago Tribune - Business News from the Chicago Tribune
dw - Business News from Deutsche Welle - Business News from Deutsche Welle
nypost - Business News from the New York Post - Business News from the New York Post
nyt - Business News from the New York Times - Business News from the New York Times
toi - Latest Business News Today: Stock Markets, Financial News, India Business & World Business News - Latest Business News Today: Stock Markets, Financial News, India Business & World Business News
wapo - Business News from the Washington Post - Business News from the Washington Post
wsj - US Business News from the Wall Street Journal - US Business News from the Wall Street Journal
nyt - Economic News from the New York Times - Economic News from the New York Times
wsj - US Economy News from the Wall Street Journal - US Economy News from the Wall Street Journal
bbc - BBC News Europe - BBC News Europe
dw - Europe News from Deutsche Welle - Europe News from Deutsche Welle
nyt - Europe News from the New York Times - Europe News from the New York Times
toi - Europe News Headlines, Latest Europe News and Live Updates - Times of India - Europe News Headlines, Latest Europe News and Live Updates - Times of India
bbc - BBC News Latin America - BBC News Latin America
wsj - US Market News from the Wall Street Journal - US Market News from the Wall Street Journal
bbc - BBC News Middle East - BBC News Middle East
nyt - Middle East News from the New York Times - Middle East News from the New York Times
toi - Gulf News, Latest Middle East News Headlines & Live News Updates - Times of India - Gulf News, Latest Middle East News Headlines & Live News Updates - Times of India
bbc - BBC News Politics - BBC News Politics
cbc - CBC | Politics News - CBC | Politics News
dw - Top News from Deutsche Welle - Top News from Deutsche Welle
48hills - News from 48 Hills - News from 48 Hills
bbc - BBC News US & Canada - BBC News US & Canada
startribune - Business News from the Star Tribune - Business News from the Star Tribune
eltecolote - News from El Tecolote - News from El Tecolote
missionlocal - News from Mission Local - News from Mission Local
nypost - US News from the New York Post - US News from the New York Post
nyt - US News from the New York Times - US News from the New York Times
chitri - US Politics News from the Chicago Tribune - US Politics News from the Chicago Tribune
nypost - US Politics News from the New York Post - US Politics News from the New York Post
nyt - US Politics News from the New York Times - US Politics News from the New York Times
wapo - US Politics News from the Washington Post - US Politics News from the Washington Post
wsj - US Politics News from the Wall Street Journal - US Politics News from the Wall Street Journal
toi - US News from the Times of India - US News from the Times of India
wapo - US News from the Washington Post - US News from the Washington Post
wsj - US News from the Wall Street Journal - US News from the Wall Street Journal
bbc - BBC News World - BBC News World
cbc - CBC | World News - CBC | World News
chitri - Chicago News from the Chicago Tribune - Chicago News from the Chicago Tribune
dw - World News from Deutsche Welle - World News from Deutsche Welle
nypost - World News from the New York Post - World News from the New York Post
nyt - World News from the New York Times - World News from the New York Times
toi - World News from the Times of India - World News from the Times of India
wapo - World News from the Washington Post - World News from the Washington Post
wsj - World News from the Wall Street Journal - World News from the Wall Street Journal
census - California Zip Code Demographics by ZIP Code - California Zip Code Demographics by ZIP Code

About this project

This repository hosts datasets and Jupyter notebooks for the studies published at analysis.castromedia.org. Over time the tooling has grown to ingest RSS and XML feeds with sentiment analysis and to auto-generate index pages complete with interactive charts and tables. There is still no application code here—only the data, notebooks and rendered artefacts that power the public site.

1. Repository structure at a glance

data/ — holds every raw dataset.
A single catalog.csv file lists where each dataset comes from, what file format it uses, how often it should be refreshed, when it was last fetched, and which sub-folder it lives in.
- Each dataset has its own folder (for example data/xyz/). Inside that folder:
  - One file named with the exact download date (such as 2025-06-03.json) preserves a permanent, timestamped snapshot. The extension matches the catalog’s declared file type.
  - A companion file called latest.json (or the matching extension) is simply a copy (or symbolic link) to the most recent snapshot, so analyses can always point at a stable filename.
data/update.ipynb — the single Jupyter notebook that orchestrates everything: fetching new data, versioning it, and triggering re-analysis.
analysis/ — contains one sub-directory per research project. Inside each project folder you keep the working notebook, a rendered Markdown version of that notebook, and any static figures (PNG, SVG) the analysis produces.
assets/js/ — custom scripts powering interactive DataTables and charts.
Each dataset folder has an index.md page generated automatically with links to each snapshot and built-in visualizations when possible.
Each dataset and analysis folder also contains a metadata.md file describing the columns and providing a short blurb. These metadata files drive the project and data lists at the top of this page.
News feeds are organized by region under data/news/<region>/<source>/.

2. How the data-refresh cycle works

Scheduling
- The notebook is executed on a separate “dev” machine (your laptop, a lab workstation, or a cheap cloud VM).
- A local scheduler—cron, Windows Task Scheduler, or an equivalent—starts the notebook at whatever frequency you choose.
Checking the catalog
- The notebook opens catalog.csv, looks at the “cadence” field for each source (expressed as a standard cron pattern or any other schedule convention you prefer), and decides which datasets are due for an update.
Downloading and versioning
- Every dataset that’s out of date is downloaded and saved in its folder under today’s date.
- The notebook also refreshes latest.json (or whatever extension is specified) so that analyses never need to guess which file is newest.
- The “last-fetched” timestamp in catalog.csv is updated so the next run knows the job is done.
Opening a pull request for data changes
- Still within the notebook, a new Git branch is created, the modified files are committed, and a pull request (PR) titled something like “Data refresh – YYYY-MM-DD” is opened on the public repository.
- Reviewers can inspect every byte that changed before merging.

3. How analyses stay in sync

Detecting staleness
- After updating any datasets that need it, the same update.ipynb notebook scans every analysis notebook in analysis/.
- For each notebook it asks: “Is the newest data file it depends on more recent than the last time this notebook was executed?” This is easy because it will always be a reference to one of the data sources, and we have the list of them because we are still in the updater notebook.
Re-running and publishing
- Any notebook deemed stale is executed end-to-end in a headless manner.
- On completion it produces two kinds of artefacts stored alongside the notebook:
  - A Markdown export of the fresh results, ready for Jekyll to turn into a web page.
  - Static images (plots, diagrams) in PNG or SVG form that the Markdown file references.
  - These images and markdown files are saved as whatever filename, and also as filename_y-m-d so old versions are still viewable on the public website. These should only be saved if they are different from the last saved version. A simple sha comparison is fine.
Opening a pull request for analysis updates
- If at least one notebook produced new outputs, the notebook creates a second Git branch, commits the changes inside analysis/, and opens another PR (often called “Analysis [NAME] refresh – YYYY-MM-DD”).
- Again, humans can review the exact diff—image hashes, Markdown updates, even regenerated notebooks—before approving.

4. What happens after pull requests are merged

Once the two PRs are approved and merged into the main branch:

GitHub Pages automatically rebuilds the site because Jekyll kicks in on every push to the default branch.
Visitors immediately see the new tables, text, and figures corresponding to the freshly-acquired data.
The complete audit trail—datasets, notebook code, and rendered results—lives forever in Git history and in old version files on the public website.