republisher/README.md
2026-03-30 11:54:28 +02:00

3.1 KiB

AnyNews Republisher

The AnyNews Republisher is a tool for mirroring news content to alternative distribution points to avoid censorship or make content available to communities suffering from high Internet cost, slow or limited access, or natural disaster.

The organization with the original news content is the "publisher".

The AnyNews Republisher can be configured with various publisher news sources. Then on an interval the Republisher crawls the sources, mirrors the content (text and media) offline into an RSS feed.

The AnyNews app can then be configured to use this mirror (or more than one such mirror).

The Republisher currently accepts the following source input types:

  • RSS Feeds
nix develop
uv sync --all-groups
cat > repub.toml <<'EOF'
out_dir = "out"

[[feeds]]
name = "Guardian Project Podcast"
slug = "gp-pod"
url = "https://guardianproject.info/podcast/podcast.xml"

[[feeds]]
name = "NASA Breaking News"
slug = "nasa"
url = "https://www.nasa.gov/rss/dyn/breaking_news.rss"
EOF
uv run repub --config repub.toml

out_dir may be relative or absolute. Relative paths are resolved against the directory containing the config file. Each feed now needs a user-provided slug, which is used for output paths and filenames. Optional Scrapy runtime overrides can be set in the same file:

[scrapy.settings]
LOG_LEVEL = "DEBUG"
DOWNLOAD_TIMEOUT = 30

Additional feed definitions can also be imported from one or more TOML files, including a pygea-generated manifest.toml:

feed_config_files = ["/absolute/path/to/pygea/feed/manifest.toml"]

Imported files only need [[feeds]] entries with name, slug, and url.

See demo/README.md for a self-contained example config.

TODO

  • Offlines RSS feed xml
  • Downloads media and enclosures
  • Rewrites media urls
  • Image normalization (JPG, RGB)
  • Audio transcoding
  • Video transcoding
  • Image compression - Do we want this? -> DEFERED for now
  • Download and rewrite media embedded in content/CDATA fields
  • Config file to drive the program
  • Add sqlite database and simple admin UI to replace config
  • Integrate pygea as input source
  • Daemonize the program
  • Operationalize with metrics and error reporting

License

republisher, a tool to mirror RSS/ATOM feeds completely offline

Copyright (C) 2024-2026 Abel Luck

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.