Fix feed validation output
This commit is contained in:
parent
c834c3c254
commit
db1d9b44b7
13 changed files with 477 additions and 54 deletions
15
README.md
15
README.md
|
|
@ -48,15 +48,17 @@ Once the UI is running:
|
|||
|
||||
1. Open `http://127.0.0.1:8080/`.
|
||||
2. Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
|
||||
3. Configure the job schedule and any spider arguments.
|
||||
4. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
|
||||
5. Watch running jobs and logs live from the Runs pages.
|
||||
3. Open `Settings` and set `Feed URL` to the public origin that serves mirrored feeds, for example `https://mirror.example`.
|
||||
4. Configure the job schedule and any spider arguments.
|
||||
5. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
|
||||
6. Watch running jobs and logs live from the Runs pages.
|
||||
|
||||
Operational notes:
|
||||
|
||||
- The default database path is `republisher.db`. Set `REPUBLISHER_DB_PATH` to use a different SQLite file.
|
||||
- Mirrored feeds are written under `out/feeds/<slug>/`.
|
||||
In production, expose `out/feeds/` directly from the reverse proxy at `/feeds/`.
|
||||
- `Feed URL` is used to generate absolute media URLs and `atom:link rel="self"` in exported feeds.
|
||||
- Job logs and stats artifacts are written under `out/logs/`.
|
||||
|
||||
The legacy one-shot config-driven crawler is still available:
|
||||
|
|
@ -65,6 +67,13 @@ The legacy one-shot config-driven crawler is still available:
|
|||
uv run repub crawl -c repub.toml
|
||||
```
|
||||
|
||||
For config-driven crawls, set the public feed origin in `scrapy.settings.REPUBLISHER_FEED_URL`:
|
||||
|
||||
```toml
|
||||
[scrapy.settings]
|
||||
REPUBLISHER_FEED_URL = "https://mirror.example"
|
||||
```
|
||||
|
||||
## Roadmap
|
||||
|
||||
- [x] Offlines RSS feed xml
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue