republisher/README.md

94 lines
3.1 KiB
Markdown
Raw Normal View History

2026-03-30 11:32:11 +02:00
# AnyNews Republisher
The AnyNews Republisher is a tool for mirroring news content to alternative distribution points to avoid censorship or make content available to communities suffering from high Internet cost, slow or limited access, or natural disaster.
The organization with the original news content is the "publisher".
The AnyNews Republisher can be configured with various publisher news sources. Then on an interval the Republisher crawls the sources, mirrors the content (text and media) offline into an RSS feed.
The [AnyNews app][app] can then be configured to use this mirror (or more than one such mirror).
The Republisher currently accepts the following source input types:
- RSS Feeds
[app]: https://gitlab.com/guardianproject/anynews/anynews-web-client
``` shell
2026-03-29 12:59:08 +02:00
nix develop
uv sync --all-groups
2026-03-29 13:52:23 +02:00
cat > repub.toml <<'EOF'
out_dir = "out"
[[feeds]]
name = "Guardian Project Podcast"
slug = "gp-pod"
2026-03-29 13:52:23 +02:00
url = "https://guardianproject.info/podcast/podcast.xml"
[[feeds]]
name = "NASA Breaking News"
slug = "nasa"
2026-03-29 13:52:23 +02:00
url = "https://www.nasa.gov/rss/dyn/breaking_news.rss"
EOF
uv run repub --config repub.toml
```
`out_dir` may be relative or absolute. Relative paths are resolved against the
directory containing the config file. Each feed now needs a user-provided
`slug`, which is used for output paths and filenames. Optional Scrapy runtime
overrides can be set in the same file:
2026-03-29 13:52:23 +02:00
```toml
[scrapy.settings]
LOG_LEVEL = "DEBUG"
DOWNLOAD_TIMEOUT = 30
```
Additional feed definitions can also be imported from one or more TOML files,
including a `pygea`-generated `manifest.toml`:
```toml
feed_config_files = ["/absolute/path/to/pygea/feed/manifest.toml"]
```
Imported files only need `[[feeds]]` entries with `name`, `slug`, and `url`.
2026-03-29 13:52:23 +02:00
See [`demo/README.md`](/home/abel/src/guardianproject/anynews/republisher-redux/demo/README.md) for a self-contained example config.
2024-04-18 15:43:03 +02:00
## TODO
- [x] Offlines RSS feed xml
- [x] Downloads media and enclosures
- [x] Rewrites media urls
2024-04-18 17:28:09 +02:00
- [x] Image normalization (JPG, RGB)
- [x] Audio transcoding
- [x] Video transcoding
2026-03-30 11:32:11 +02:00
- [ ] Image compression - Do we want this? -> DEFERED for now
- [x] Download and rewrite media embedded in content/CDATA fields
2026-03-29 14:10:52 +02:00
- [x] Config file to drive the program
2026-03-30 11:32:11 +02:00
- [ ] Add sqlite database and simple admin UI to replace config
- [ ] Integrate pygea as input source
2024-04-18 15:43:03 +02:00
- [ ] Daemonize the program
- [ ] Operationalize with metrics and error reporting
## License
2026-03-30 11:32:11 +02:00
republisher, a tool to mirror RSS/ATOM feeds completely offline
2026-03-29 14:10:52 +02:00
Copyright (C) 2024-2026 Abel Luck
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.