Mirroring (republishing) news media content for censorship circumvention

Find a file

Abel Luck cccb2d5950 Revert "Move RSS response headers into app" This reverts commit `a6632ef769`.		2026-03-31 17:55:08 +02:00
demo	Fix feed validation output	2026-03-31 12:14:47 +02:00
repub	Revert "Move RSS response headers into app"	2026-03-31 17:55:08 +02:00
scripts	Add validate-feed helper script	2026-03-31 11:28:26 +02:00
tests	Revert "Move RSS response headers into app"	2026-03-31 17:55:08 +02:00
.envrc	switch to uv and to nix flakes	2026-03-29 12:59:08 +02:00
.flake8	init repo	2024-04-17 10:31:33 +02:00
.gitignore	fix output paths	2026-03-30 15:10:47 +02:00
AGENTS.md	Add qa command entrypoints	2026-03-31 12:52:59 +02:00
flake.lock	fix local path	2026-03-29 13:10:53 +02:00
flake.nix	with htpy and css	2026-03-30 12:13:04 +02:00
LICENSE.md	implement media pipelines and url rewriting	2024-04-18 15:34:23 +02:00
pyproject.toml	Add qa command entrypoints	2026-03-31 12:52:59 +02:00
README.md	Serve RSS feeds through app with host rewrites	2026-03-31 17:30:24 +02:00
scrapy.cfg	basic feed rebuilding	2024-04-18 11:57:24 +02:00
treefmt.nix	switch to uv and to nix flakes	2026-03-29 12:59:08 +02:00
uv.lock	update pygea	2026-03-31 16:05:46 +02:00

README.md

AnyNews Republisher

The AnyNews Republisher is a tool for mirroring news content to alternative distribution points to avoid censorship or make content available to communities suffering from high Internet cost, slow or limited access, or natural disaster.

The organization with the original news content is the "publisher".

The AnyNews Republisher is managed through a local web UI. Sources, schedules, and job executions are stored in SQLite. On an interval the Republisher crawls the configured sources and mirrors the content (text and media) offline into an RSS feed.

The AnyNews app can then be configured to use this mirror (or more than one such mirror).

The Republisher currently accepts the following source input types:

RSS and Atom feeds
Pangea sources via pygea

Usage

Sync dependencies and start the admin UI:

uv sync --all-groups
uv run repub

With no arguments, uv run repub starts the web UI in local dev mode. The Python app serves published .rss files from /feeds/... out of out/feeds/..., and in dev mode it also serves non-RSS feed artifacts from the same tree.

By default the UI listens on 127.0.0.1:8080. You can override that with REPUBLISHER_HOST and REPUBLISHER_PORT, or with:

uv run repub serve --host 0.0.0.0 --port 8080

If you invoke the serve subcommand explicitly, use --dev-mode to expose non-RSS feed artifacts directly from the Quart app:

uv run repub serve --dev-mode

Requests for /feeds/**/*.rss are always handled by the Python app. It rewrites mirrored feed URLs on the fly by replacing the configured Feed URL origin with https://<Host header>.

In --dev-mode, non-RSS requests under /feeds/... are served from out/feeds/....

In production, keep /feeds/**/*.rss routed to the Python app. Non-RSS feed artifacts under out/feeds/... should still be served directly by the reverse proxy at /feeds/....

Important: the admin UI has no built-in authentication. Keep it bound to localhost or put it behind a trusted network layer such as Tailscale.

Once the UI is running:

Open http://127.0.0.1:8080/.
Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
Open Settings and set Feed URL to the public origin that serves mirrored feeds, for example https://mirror.example.
Configure the job schedule and any spider arguments.
Use Run now to trigger an immediate crawl, or leave the job enabled for scheduled runs.
Watch running jobs and logs live from the Runs pages.

Operational notes:

The default database path is republisher.db. Set REPUBLISHER_DB_PATH to use a different SQLite file.
Mirrored feeds are written under out/feeds/<slug>/. In production, route /feeds/**/*.rss to the Python app and expose the remaining out/feeds/ artifacts directly from the reverse proxy at /feeds/.
Feed URL is used to generate absolute media URLs and atom:link rel="self" in exported feeds.
Job logs and stats artifacts are written under out/logs/.

The legacy one-shot config-driven crawler is still available:

uv run repub crawl -c repub.toml

For config-driven crawls, set the public feed origin in scrapy.settings.REPUBLISHER_FEED_URL:

[scrapy.settings]
REPUBLISHER_FEED_URL = "https://mirror.example"

Roadmap

Offlines RSS feed xml
Downloads media and enclosures
Rewrites media urls
Image normalization (JPG, RGB)
Audio transcoding
Video transcoding
Image compression - Do we want this? -> DEFERED for now
Download and rewrite media embedded in content/CDATA fields
Config file to drive the program
Add sqlite database and simple admin UI to replace config
Integrate pygea as input source
Operationalize with metrics and error reporting

License

republisher, a tool to mirror RSS/ATOM feeds completely offline

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.