Fix feed validation output

This commit is contained in:
Abel Luck 2026-03-31 12:14:47 +02:00
parent c834c3c254
commit db1d9b44b7
13 changed files with 477 additions and 54 deletions

View file

@ -48,15 +48,17 @@ Once the UI is running:
1. Open `http://127.0.0.1:8080/`.
2. Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
3. Configure the job schedule and any spider arguments.
4. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
5. Watch running jobs and logs live from the Runs pages.
3. Open `Settings` and set `Feed URL` to the public origin that serves mirrored feeds, for example `https://mirror.example`.
4. Configure the job schedule and any spider arguments.
5. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
6. Watch running jobs and logs live from the Runs pages.
Operational notes:
- The default database path is `republisher.db`. Set `REPUBLISHER_DB_PATH` to use a different SQLite file.
- Mirrored feeds are written under `out/feeds/<slug>/`.
In production, expose `out/feeds/` directly from the reverse proxy at `/feeds/`.
- `Feed URL` is used to generate absolute media URLs and `atom:link rel="self"` in exported feeds.
- Job logs and stats artifacts are written under `out/logs/`.
The legacy one-shot config-driven crawler is still available:
@ -65,6 +67,13 @@ The legacy one-shot config-driven crawler is still available:
uv run repub crawl -c repub.toml
```
For config-driven crawls, set the public feed origin in `scrapy.settings.REPUBLISHER_FEED_URL`:
```toml
[scrapy.settings]
REPUBLISHER_FEED_URL = "https://mirror.example"
```
## Roadmap
- [x] Offlines RSS feed xml