Add media retention cleanup command
This commit is contained in:
parent
3b6503a6ed
commit
507074b80e
10 changed files with 722 additions and 52 deletions
21
README.md
21
README.md
|
|
@ -72,6 +72,27 @@ Operational notes:
|
|||
Reordering `REPUBLISHER_IMAGE` changes canonical feed image URLs.
|
||||
- Job logs and stats artifacts are written under `out/logs/`.
|
||||
|
||||
Media cleanup:
|
||||
|
||||
- Published media can outlive the current feed when articles fall out of the
|
||||
feed window. Use `cleanup-media` to delete old media files that are no longer
|
||||
referenced by the latest published `feed.rss`.
|
||||
- The default retention window is 25 days. Run a dry run first:
|
||||
|
||||
```sh
|
||||
uv run repub cleanup-media --feeds-dir out/feeds --days 25 --dry-run
|
||||
```
|
||||
|
||||
- Remove `--dry-run` to delete matching files. The command protects media
|
||||
referenced by the latest published feed and uses a lock to avoid racing with
|
||||
active crawls.
|
||||
- For config-driven deployments, pass the runtime config so cleanup uses the
|
||||
configured `out_dir` and media directory names:
|
||||
|
||||
```sh
|
||||
uv run repub cleanup-media --config repub.toml --dry-run
|
||||
```
|
||||
|
||||
The legacy one-shot config-driven crawler is still available:
|
||||
|
||||
```sh
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue