Compare commits
3 commits
939cd9ea5d
...
db1d9b44b7
| Author | SHA1 | Date | |
|---|---|---|---|
| db1d9b44b7 | |||
| c834c3c254 | |||
| dce67ea9e3 |
19 changed files with 1001 additions and 116 deletions
17
AGENTS.md
17
AGENTS.md
|
|
@ -27,7 +27,7 @@ The views are pure functions data in -> html out.
|
||||||
By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.
|
By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.
|
||||||
|
|
||||||
- we only have a single render function per page
|
- we only have a single render function per page
|
||||||
By having a single render function per page you can simplify the reasoning about your app to view = f(state). You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.
|
By having a single render function per page you can simplify the reasoning about your app to view = f(state). In immediate-mode terms the server is re-running the whole page render against the latest state, like a game loop, rather than trying to patch the view incrementally by hand. You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.
|
||||||
|
|
||||||
|
|
||||||
- any database change -> re render all connected users with 200ms throttle
|
- any database change -> re render all connected users with 200ms throttle
|
||||||
|
|
@ -52,11 +52,18 @@ The views are pure functions data in -> html out.
|
||||||
Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.
|
Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.
|
||||||
|
|
||||||
- Stateless views
|
- Stateless views
|
||||||
The only way for actions to affect the view returned by the render-fn running in a connection is via the database. The ensures CQRS. This means there is no connection state that needs to be persisted or maintained (so missed events and shutdowns/deploys will not lead to lost state). Even when you are running in a single process there is no way for an action (command) to communicate with/affect a view render (query) without going through the database.
|
The state passed to a render-fn should be thought of as `{persistent db state, ephemeral tab state}`. The database is the source of truth for durable application state. Ephemeral tab state is server-owned in-memory state keyed by tab id for non-persistent UI concerns like pagination, sort order, expanded panels, wizard step, etc.
|
||||||
|
|
||||||
|
This tab state is not a client signal and not a database row. It exists so that non-persistent actions can still participate in the same immediate-mode render model: an action mutates server-side tab state, then the render-fn re-runs with the new `{db state, tab state}` and sends the latest full page view.
|
||||||
|
|
||||||
|
Tab state must be scoped to a single tab/SSE connection, initialized when the long-lived page stream connects, cleaned up when that stream closes, and periodically reaped for stale entries so memory cannot grow without bound.
|
||||||
|
|
||||||
|
Nothing else should influence a render. Do not smuggle view state through ad hoc globals, connection-local mutable objects, or client-owned signals that the server "trusts". If state should survive reconnects, restarts, or be shared across users, it belongs in the database. If it is purely per-tab and ephemeral, it belongs in tab state.
|
||||||
|
|
||||||
- CQRS
|
- CQRS
|
||||||
Actions modify the database and return a 204 or a 200 if they patch-signals.
|
Persistent actions modify the database and return a 204 or a 200 if they patch-signals.
|
||||||
Render functions re-render when the database changes and send an update down the updates SSE connection.
|
Ephemeral actions modify tab state and return a 204 or a 200 if they patch-signals.
|
||||||
|
Render functions re-render from the combined `{db state, tab state}` and send an update down the updates SSE connection.
|
||||||
|
|
||||||
- Work sharing (caching)
|
- Work sharing (caching)
|
||||||
Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.
|
Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.
|
||||||
|
|
@ -81,12 +88,14 @@ The only way for actions to affect the view returned by the render-fn running in
|
||||||
- Sync Python dependencies with `uv sync --all-groups`.
|
- Sync Python dependencies with `uv sync --all-groups`.
|
||||||
- Run the app with `uv run repub`.
|
- Run the app with `uv run repub`.
|
||||||
- Generate CSS with `tailwindcss -i ./repub/static/app.tailwind.css -o ./repub/static/app.css` and add `--watch` when you need live rebuilds.
|
- Generate CSS with `tailwindcss -i ./repub/static/app.tailwind.css -o ./repub/static/app.css` and add `--watch` when you need live rebuilds.
|
||||||
|
- Validate a generated feed with `./scripts/validate-feed path/to/feed.rss`. This wraps the local checkout at `~/src/github.com/w3c/feedvalidator` and pages the validator output through `less` by default.
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
uv sync --all-groups
|
uv sync --all-groups
|
||||||
uv run pytest
|
uv run pytest
|
||||||
uv run flake8 repub/ tests/
|
uv run flake8 repub/ tests/
|
||||||
uv run pyright
|
uv run pyright
|
||||||
|
./scripts/validate-feed out/feeds/mn-cuba/feed.rss
|
||||||
nix fmt
|
nix fmt
|
||||||
nix flake check
|
nix flake check
|
||||||
uv run repub
|
uv run repub
|
||||||
|
|
|
||||||
15
README.md
15
README.md
|
|
@ -48,15 +48,17 @@ Once the UI is running:
|
||||||
|
|
||||||
1. Open `http://127.0.0.1:8080/`.
|
1. Open `http://127.0.0.1:8080/`.
|
||||||
2. Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
|
2. Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
|
||||||
3. Configure the job schedule and any spider arguments.
|
3. Open `Settings` and set `Feed URL` to the public origin that serves mirrored feeds, for example `https://mirror.example`.
|
||||||
4. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
|
4. Configure the job schedule and any spider arguments.
|
||||||
5. Watch running jobs and logs live from the Runs pages.
|
5. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
|
||||||
|
6. Watch running jobs and logs live from the Runs pages.
|
||||||
|
|
||||||
Operational notes:
|
Operational notes:
|
||||||
|
|
||||||
- The default database path is `republisher.db`. Set `REPUBLISHER_DB_PATH` to use a different SQLite file.
|
- The default database path is `republisher.db`. Set `REPUBLISHER_DB_PATH` to use a different SQLite file.
|
||||||
- Mirrored feeds are written under `out/feeds/<slug>/`.
|
- Mirrored feeds are written under `out/feeds/<slug>/`.
|
||||||
In production, expose `out/feeds/` directly from the reverse proxy at `/feeds/`.
|
In production, expose `out/feeds/` directly from the reverse proxy at `/feeds/`.
|
||||||
|
- `Feed URL` is used to generate absolute media URLs and `atom:link rel="self"` in exported feeds.
|
||||||
- Job logs and stats artifacts are written under `out/logs/`.
|
- Job logs and stats artifacts are written under `out/logs/`.
|
||||||
|
|
||||||
The legacy one-shot config-driven crawler is still available:
|
The legacy one-shot config-driven crawler is still available:
|
||||||
|
|
@ -65,6 +67,13 @@ The legacy one-shot config-driven crawler is still available:
|
||||||
uv run repub crawl -c repub.toml
|
uv run repub crawl -c repub.toml
|
||||||
```
|
```
|
||||||
|
|
||||||
|
For config-driven crawls, set the public feed origin in `scrapy.settings.REPUBLISHER_FEED_URL`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[scrapy.settings]
|
||||||
|
REPUBLISHER_FEED_URL = "https://mirror.example"
|
||||||
|
```
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
- [x] Offlines RSS feed xml
|
- [x] Offlines RSS feed xml
|
||||||
|
|
|
||||||
|
|
@ -13,3 +13,4 @@ url = "https://www.nasa.gov/rss/dyn/breaking_news.rss"
|
||||||
[scrapy.settings]
|
[scrapy.settings]
|
||||||
LOG_LEVEL = "INFO"
|
LOG_LEVEL = "INFO"
|
||||||
DOWNLOAD_TIMEOUT = 30
|
DOWNLOAD_TIMEOUT = 30
|
||||||
|
REPUBLISHER_FEED_URL = "https://mirror.example"
|
||||||
|
|
|
||||||
|
|
@ -3,6 +3,8 @@ from __future__ import annotations
|
||||||
import asyncio
|
import asyncio
|
||||||
import hashlib
|
import hashlib
|
||||||
from collections.abc import AsyncGenerator, Awaitable, Callable
|
from collections.abc import AsyncGenerator, Awaitable, Callable
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import UTC, datetime, timedelta
|
||||||
from typing import Protocol
|
from typing import Protocol
|
||||||
|
|
||||||
from datastar_py import ServerSentEventGenerator as SSE
|
from datastar_py import ServerSentEventGenerator as SSE
|
||||||
|
|
@ -15,22 +17,102 @@ class HtmlRenderable(Protocol):
|
||||||
|
|
||||||
RenderResult = str | HtmlRenderable
|
RenderResult = str | HtmlRenderable
|
||||||
RenderFunction = Callable[[], Awaitable[RenderResult]]
|
RenderFunction = Callable[[], Awaitable[RenderResult]]
|
||||||
|
PageState = dict[str, object]
|
||||||
|
TabState = dict[str, PageState]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class _TabSession:
|
||||||
|
data: TabState = field(default_factory=dict)
|
||||||
|
created_at: datetime = field(default_factory=lambda: datetime.now(UTC))
|
||||||
|
modified_at: datetime = field(default_factory=lambda: datetime.now(UTC))
|
||||||
|
connections: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
class TabStateStore:
|
||||||
|
def __init__(self, *, clean_age_threshold: timedelta = timedelta(hours=24)) -> None:
|
||||||
|
self._sessions: dict[str, _TabSession] = {}
|
||||||
|
self.clean_age_threshold = clean_age_threshold
|
||||||
|
|
||||||
|
def connect(self, tab_id: str, *, now: datetime | None = None) -> TabState:
|
||||||
|
session = self._sessions.get(tab_id)
|
||||||
|
current_time = _now(now)
|
||||||
|
if session is None:
|
||||||
|
session = _TabSession(created_at=current_time, modified_at=current_time)
|
||||||
|
self._sessions[tab_id] = session
|
||||||
|
session.connections += 1
|
||||||
|
return session.data
|
||||||
|
|
||||||
|
def disconnect(self, tab_id: str) -> None:
|
||||||
|
session = self._sessions.get(tab_id)
|
||||||
|
if session is None:
|
||||||
|
return
|
||||||
|
session.connections = max(0, session.connections - 1)
|
||||||
|
if session.connections == 0:
|
||||||
|
self._sessions.pop(tab_id, None)
|
||||||
|
|
||||||
|
def get_tab_state(self, tab_id: str) -> TabState | None:
|
||||||
|
session = self._sessions.get(tab_id)
|
||||||
|
return None if session is None else session.data
|
||||||
|
|
||||||
|
def get_page_state(self, tab_id: str | None, page_key: str) -> PageState:
|
||||||
|
if tab_id is None:
|
||||||
|
return {}
|
||||||
|
session = self._sessions.get(tab_id)
|
||||||
|
if session is None:
|
||||||
|
return {}
|
||||||
|
return session.data.get(page_key, {})
|
||||||
|
|
||||||
|
def update_page_state(
|
||||||
|
self,
|
||||||
|
tab_id: str,
|
||||||
|
page_key: str,
|
||||||
|
update: Callable[[PageState], PageState],
|
||||||
|
*,
|
||||||
|
now: datetime | None = None,
|
||||||
|
) -> PageState:
|
||||||
|
current_time = _now(now)
|
||||||
|
session = self._sessions.get(tab_id)
|
||||||
|
if session is None:
|
||||||
|
session = _TabSession(created_at=current_time, modified_at=current_time)
|
||||||
|
self._sessions[tab_id] = session
|
||||||
|
|
||||||
|
page_state = dict(session.data.get(page_key, {}))
|
||||||
|
session.data[page_key] = update(page_state)
|
||||||
|
session.modified_at = current_time
|
||||||
|
return session.data[page_key]
|
||||||
|
|
||||||
|
def cleanup_stale(self, *, now: datetime | None = None) -> set[str]:
|
||||||
|
current_time = _now(now)
|
||||||
|
removed: set[str] = set()
|
||||||
|
for tab_id, session in tuple(self._sessions.items()):
|
||||||
|
if current_time - session.modified_at < self.clean_age_threshold:
|
||||||
|
continue
|
||||||
|
self._sessions.pop(tab_id, None)
|
||||||
|
removed.add(tab_id)
|
||||||
|
return removed
|
||||||
|
|
||||||
|
|
||||||
class RefreshBroker:
|
class RefreshBroker:
|
||||||
def __init__(self) -> None:
|
def __init__(self) -> None:
|
||||||
self._subscribers: dict[asyncio.Queue[object], asyncio.AbstractEventLoop] = {}
|
self._subscribers: dict[
|
||||||
|
asyncio.Queue[object], tuple[asyncio.AbstractEventLoop, str | None]
|
||||||
|
] = {}
|
||||||
|
|
||||||
def subscribe(self) -> asyncio.Queue[object]:
|
def subscribe(self, *, tab_id: str | None = None) -> asyncio.Queue[object]:
|
||||||
queue: asyncio.Queue[object] = asyncio.Queue(maxsize=1)
|
queue: asyncio.Queue[object] = asyncio.Queue(maxsize=1)
|
||||||
self._subscribers[queue] = asyncio.get_running_loop()
|
self._subscribers[queue] = (asyncio.get_running_loop(), tab_id)
|
||||||
return queue
|
return queue
|
||||||
|
|
||||||
def unsubscribe(self, queue: asyncio.Queue[object]) -> None:
|
def unsubscribe(self, queue: asyncio.Queue[object]) -> None:
|
||||||
self._subscribers.pop(queue, None)
|
self._subscribers.pop(queue, None)
|
||||||
|
|
||||||
def publish(self, event: object = "refresh-event") -> None:
|
def publish(
|
||||||
for queue, loop in tuple(self._subscribers.items()):
|
self, event: object = "refresh-event", *, tab_id: str | None = None
|
||||||
|
) -> None:
|
||||||
|
for queue, (loop, subscriber_tab_id) in tuple(self._subscribers.items()):
|
||||||
|
if tab_id is not None and subscriber_tab_id != tab_id:
|
||||||
|
continue
|
||||||
loop.call_soon_threadsafe(_publish_event, queue, event)
|
loop.call_soon_threadsafe(_publish_event, queue, event)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -96,3 +178,7 @@ def _coerce_html(view: RenderResult) -> str:
|
||||||
|
|
||||||
def _render_hash(html: str) -> str:
|
def _render_hash(html: str) -> str:
|
||||||
return hashlib.blake2s(html.encode("utf-8"), digest_size=16).hexdigest()
|
return hashlib.blake2s(html.encode("utf-8"), digest_size=16).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def _now(now: datetime | None) -> datetime:
|
||||||
|
return now or datetime.now(UTC)
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,7 @@ from repub.model import (
|
||||||
SourcePangea,
|
SourcePangea,
|
||||||
database,
|
database,
|
||||||
initialize_database,
|
initialize_database,
|
||||||
|
load_feed_url,
|
||||||
)
|
)
|
||||||
from repub.spiders.rss_spider import RssFeedSpider
|
from repub.spiders.rss_spider import RssFeedSpider
|
||||||
|
|
||||||
|
|
@ -271,6 +272,7 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
stats_path=stats_path,
|
stats_path=stats_path,
|
||||||
convert_images=source_config.convert_images,
|
convert_images=source_config.convert_images,
|
||||||
convert_video=source_config.convert_video,
|
convert_video=source_config.convert_video,
|
||||||
|
feed_url=load_feed_url(),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
print(
|
print(
|
||||||
|
|
@ -424,7 +426,10 @@ def _build_crawl_settings(
|
||||||
stats_path: Path,
|
stats_path: Path,
|
||||||
convert_images: bool = True,
|
convert_images: bool = True,
|
||||||
convert_video: bool = True,
|
convert_video: bool = True,
|
||||||
|
feed_url: str | None = None,
|
||||||
):
|
):
|
||||||
|
if feed_url is None or feed_url.strip() == "":
|
||||||
|
raise ValueError("feed_url setting is required for job runs")
|
||||||
base_settings = build_base_settings(
|
base_settings = build_base_settings(
|
||||||
RepublisherConfig(
|
RepublisherConfig(
|
||||||
config_path=out_dir / "job-runner.toml",
|
config_path=out_dir / "job-runner.toml",
|
||||||
|
|
@ -448,6 +453,7 @@ def _build_crawl_settings(
|
||||||
priority="cmdline",
|
priority="cmdline",
|
||||||
)
|
)
|
||||||
settings.set("REPUB_JOB_STATS_PATH", str(stats_path), priority="cmdline")
|
settings.set("REPUB_JOB_STATS_PATH", str(stats_path), priority="cmdline")
|
||||||
|
settings.set("REPUBLISHER_FEED_URL", feed_url, priority="cmdline")
|
||||||
return settings
|
return settings
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -34,6 +34,8 @@ DATABASE_PRAGMAS = {
|
||||||
SCHEMA_GLOB = "*.sql"
|
SCHEMA_GLOB = "*.sql"
|
||||||
MAX_CONCURRENT_JOBS_SETTING_KEY = "max_concurrent_jobs"
|
MAX_CONCURRENT_JOBS_SETTING_KEY = "max_concurrent_jobs"
|
||||||
DEFAULT_MAX_CONCURRENT_JOBS = 1
|
DEFAULT_MAX_CONCURRENT_JOBS = 1
|
||||||
|
FEED_URL_SETTING_KEY = "feed_url"
|
||||||
|
DEFAULT_FEED_URL = ""
|
||||||
|
|
||||||
database = SqliteDatabase(None, pragmas=DATABASE_PRAGMAS)
|
database = SqliteDatabase(None, pragmas=DATABASE_PRAGMAS)
|
||||||
|
|
||||||
|
|
@ -163,8 +165,16 @@ def load_max_concurrent_jobs() -> int:
|
||||||
return parsed if parsed >= 1 else DEFAULT_MAX_CONCURRENT_JOBS
|
return parsed if parsed >= 1 else DEFAULT_MAX_CONCURRENT_JOBS
|
||||||
|
|
||||||
|
|
||||||
|
def load_feed_url() -> str:
|
||||||
|
value = load_setting(FEED_URL_SETTING_KEY, DEFAULT_FEED_URL)
|
||||||
|
return value if isinstance(value, str) else DEFAULT_FEED_URL
|
||||||
|
|
||||||
|
|
||||||
def load_settings_form() -> dict[str, object]:
|
def load_settings_form() -> dict[str, object]:
|
||||||
return {"max_concurrent_jobs": load_max_concurrent_jobs()}
|
return {
|
||||||
|
"max_concurrent_jobs": load_max_concurrent_jobs(),
|
||||||
|
"feed_url": load_feed_url(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def load_source_form(slug: str) -> dict[str, object] | None:
|
def load_source_form(slug: str) -> dict[str, object] | None:
|
||||||
|
|
|
||||||
|
|
@ -242,8 +242,27 @@ def _completed_row(execution: Mapping[str, object]) -> tuple[Node, ...]:
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _completed_page_href(page: int) -> str:
|
def _completed_page_action_path(page: int) -> str:
|
||||||
return f"/runs?completed_page={page}"
|
return f"/actions/runs/completed-page/{page}"
|
||||||
|
|
||||||
|
|
||||||
|
def _pagination_button(
|
||||||
|
*,
|
||||||
|
label: str,
|
||||||
|
page: int,
|
||||||
|
current: bool = False,
|
||||||
|
class_name: str,
|
||||||
|
) -> Renderable:
|
||||||
|
attributes = {
|
||||||
|
"data-on:pointerdown": f"@post('{_completed_page_action_path(page)}')",
|
||||||
|
}
|
||||||
|
if current:
|
||||||
|
attributes["aria-current"] = "page"
|
||||||
|
return h.button(
|
||||||
|
attributes,
|
||||||
|
type="button",
|
||||||
|
class_=class_name,
|
||||||
|
)[label]
|
||||||
|
|
||||||
|
|
||||||
def _completed_history_pagination(
|
def _completed_history_pagination(
|
||||||
|
|
@ -258,8 +277,9 @@ def _completed_history_pagination(
|
||||||
|
|
||||||
start_result = ((completed_page - 1) * completed_page_size) + 1
|
start_result = ((completed_page - 1) * completed_page_size) + 1
|
||||||
end_result = min(completed_total_count, completed_page * completed_page_size)
|
end_result = min(completed_total_count, completed_page * completed_page_size)
|
||||||
link_class = (
|
button_class = (
|
||||||
"relative inline-flex items-center px-4 py-2 text-sm font-semibold text-slate-700 "
|
"relative inline-flex items-center px-4 py-2 text-sm font-semibold text-slate-700 "
|
||||||
|
"cursor-pointer "
|
||||||
"ring-1 ring-inset ring-slate-200 hover:bg-stone-50"
|
"ring-1 ring-inset ring-slate-200 hover:bg-stone-50"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -267,16 +287,22 @@ def _completed_history_pagination(
|
||||||
class_="flex items-center justify-between border-t border-slate-200 bg-white px-4 py-3 sm:px-6"
|
class_="flex items-center justify-between border-t border-slate-200 bg-white px-4 py-3 sm:px-6"
|
||||||
)[
|
)[
|
||||||
h.div(class_="flex flex-1 justify-between sm:hidden")[
|
h.div(class_="flex flex-1 justify-between sm:hidden")[
|
||||||
h.a(
|
_pagination_button(
|
||||||
href=_completed_page_href(max(1, completed_page - 1)),
|
label="Previous",
|
||||||
class_="relative inline-flex items-center rounded-xl border border-slate-200 bg-white px-4 py-2 text-sm font-medium text-slate-700 hover:bg-stone-50",
|
page=max(1, completed_page - 1),
|
||||||
)["Previous"],
|
class_name=(
|
||||||
h.a(
|
"relative inline-flex items-center rounded-xl border border-slate-200 "
|
||||||
href=_completed_page_href(
|
"bg-white px-4 py-2 text-sm font-medium text-slate-700 hover:bg-stone-50"
|
||||||
min(completed_total_pages, completed_page + 1)
|
|
||||||
),
|
),
|
||||||
class_="relative ml-3 inline-flex items-center rounded-xl border border-slate-200 bg-white px-4 py-2 text-sm font-medium text-slate-700 hover:bg-stone-50",
|
),
|
||||||
)["Next"],
|
_pagination_button(
|
||||||
|
label="Next",
|
||||||
|
page=min(completed_total_pages, completed_page + 1),
|
||||||
|
class_name=(
|
||||||
|
"relative ml-3 inline-flex items-center rounded-xl border border-slate-200 "
|
||||||
|
"bg-white px-4 py-2 text-sm font-medium text-slate-700 hover:bg-stone-50"
|
||||||
|
),
|
||||||
|
),
|
||||||
],
|
],
|
||||||
h.div(class_="hidden sm:flex sm:flex-1 sm:items-center sm:justify-between")[
|
h.div(class_="hidden sm:flex sm:flex-1 sm:items-center sm:justify-between")[
|
||||||
h.p(class_="text-sm text-slate-600")[
|
h.p(class_="text-sm text-slate-600")[
|
||||||
|
|
@ -293,17 +319,16 @@ def _completed_history_pagination(
|
||||||
class_="isolate inline-flex -space-x-px rounded-xl shadow-xs",
|
class_="isolate inline-flex -space-x-px rounded-xl shadow-xs",
|
||||||
)[
|
)[
|
||||||
(
|
(
|
||||||
h.a(
|
_pagination_button(
|
||||||
href=_completed_page_href(page_number),
|
label=str(page_number),
|
||||||
aria_current=(
|
page=page_number,
|
||||||
"page" if page_number == completed_page else None
|
current=page_number == completed_page,
|
||||||
),
|
class_name=(
|
||||||
class_=(
|
|
||||||
"relative z-10 inline-flex items-center bg-amber-500 px-4 py-2 text-sm font-semibold text-slate-950"
|
"relative z-10 inline-flex items-center bg-amber-500 px-4 py-2 text-sm font-semibold text-slate-950"
|
||||||
if page_number == completed_page
|
if page_number == completed_page
|
||||||
else link_class
|
else button_class
|
||||||
),
|
),
|
||||||
)[str(page_number)]
|
)
|
||||||
for page_number in range(1, completed_total_pages + 1)
|
for page_number in range(1, completed_total_pages + 1)
|
||||||
)
|
)
|
||||||
],
|
],
|
||||||
|
|
|
||||||
|
|
@ -41,7 +41,8 @@ def settings_page(
|
||||||
"data-signals": "{_formError: '', _formSuccess: ''}",
|
"data-signals": "{_formError: '', _formSuccess: ''}",
|
||||||
"data-signals__ifmissing": (
|
"data-signals__ifmissing": (
|
||||||
"{"
|
"{"
|
||||||
f"maxConcurrentJobs: '{_value(settings, 'max_concurrent_jobs', '1')}'"
|
f"maxConcurrentJobs: '{_value(settings, 'max_concurrent_jobs', '1')}', "
|
||||||
|
f"feedUrl: '{_value(settings, 'feed_url')}'"
|
||||||
"}"
|
"}"
|
||||||
),
|
),
|
||||||
"data-on:submit": f"@post('{action_path}')",
|
"data-on:submit": f"@post('{action_path}')",
|
||||||
|
|
@ -74,6 +75,14 @@ def settings_page(
|
||||||
help_text="Must be an integer greater than or equal to 1.",
|
help_text="Must be an integer greater than or equal to 1.",
|
||||||
signal_name="maxConcurrentJobs",
|
signal_name="maxConcurrentJobs",
|
||||||
),
|
),
|
||||||
|
input_field(
|
||||||
|
label="Feed URL",
|
||||||
|
field_id="feed-domain",
|
||||||
|
value=_value(settings, "feed_url"),
|
||||||
|
placeholder="https://mirror.example",
|
||||||
|
help_text="Example: http://localhost:8080. Must include http:// or https:// and point at the public base URL that serves /feeds/.",
|
||||||
|
signal_name="feedUrl",
|
||||||
|
),
|
||||||
],
|
],
|
||||||
h.div(class_="flex flex-wrap justify-end gap-3 pt-2")[
|
h.div(class_="flex flex-wrap justify-end gap-3 pt-2")[
|
||||||
muted_action_link(href="/", label="Back to dashboard"),
|
muted_action_link(href="/", label="Back to dashboard"),
|
||||||
|
|
|
||||||
53
repub/rss.py
53
repub/rss.py
|
|
@ -1,5 +1,7 @@
|
||||||
from datetime import datetime
|
import re
|
||||||
from time import mktime
|
from calendar import timegm
|
||||||
|
from datetime import UTC, datetime
|
||||||
|
from email.utils import format_datetime
|
||||||
|
|
||||||
import lxml.etree as ET
|
import lxml.etree as ET
|
||||||
import lxml.html
|
import lxml.html
|
||||||
|
|
@ -93,20 +95,54 @@ def serialize(root):
|
||||||
|
|
||||||
def date_format(d):
|
def date_format(d):
|
||||||
if d:
|
if d:
|
||||||
return d.strftime("%a, %d %b %Y %H:%M:%S %z")
|
return format_datetime(d.astimezone(UTC))
|
||||||
|
|
||||||
|
|
||||||
def to_datetime(struct_time):
|
def to_datetime(struct_time):
|
||||||
if struct_time:
|
if struct_time:
|
||||||
return datetime.fromtimestamp(mktime(struct_time))
|
return datetime.fromtimestamp(timegm(struct_time), tz=UTC)
|
||||||
|
|
||||||
|
|
||||||
def normalize_date(struct_time):
|
def normalize_date(struct_time):
|
||||||
return date_format(to_datetime(struct_time))
|
return date_format(to_datetime(struct_time))
|
||||||
|
|
||||||
|
|
||||||
|
HTML_ATTRIBUTE_DENYLIST = frozenset({"contenteditable", "mode", "querystring"})
|
||||||
|
|
||||||
|
|
||||||
|
def parse_html_fragment(raw_html):
|
||||||
|
if raw_html.strip() == "":
|
||||||
|
return None
|
||||||
|
return lxml.html.fragment_fromstring(raw_html, create_parent=True)
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_html(raw_html: str) -> str:
|
||||||
|
fragment = parse_html_fragment(raw_html)
|
||||||
|
if fragment is None:
|
||||||
|
return raw_html
|
||||||
|
for el in fragment.iter():
|
||||||
|
for attr in HTML_ATTRIBUTE_DENYLIST:
|
||||||
|
el.attrib.pop(attr, None)
|
||||||
|
return (fragment.text or "") + "".join(
|
||||||
|
lxml.html.tostring(child, encoding="unicode") for child in fragment
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def plain_text_summary(raw_html: str | None, max_length: int = 4000) -> str | None:
|
||||||
|
if raw_html is None:
|
||||||
|
return None
|
||||||
|
fragment = parse_html_fragment(raw_html)
|
||||||
|
text = raw_html if fragment is None else fragment.text_content()
|
||||||
|
normalized = re.sub(r"\s+", " ", text).strip()
|
||||||
|
if normalized == "":
|
||||||
|
return None
|
||||||
|
return normalized[:max_length]
|
||||||
|
|
||||||
|
|
||||||
def munge_cdata_html(raw_html, replace_link_fn) -> str:
|
def munge_cdata_html(raw_html, replace_link_fn) -> str:
|
||||||
html = lxml.html.fromstring(raw_html)
|
html = parse_html_fragment(raw_html)
|
||||||
|
if html is None:
|
||||||
|
return raw_html
|
||||||
for el, attr, link, pos in html.iterlinks():
|
for el, attr, link, pos in html.iterlinks():
|
||||||
if attr == "srcset":
|
if attr == "srcset":
|
||||||
# these are a messy special case
|
# these are a messy special case
|
||||||
|
|
@ -133,4 +169,9 @@ def munge_cdata_html(raw_html, replace_link_fn) -> str:
|
||||||
else:
|
else:
|
||||||
new = cur[:pos] + new_link + cur[pos + len(link) :]
|
new = cur[:pos] + new_link + cur[pos + len(link) :]
|
||||||
el.set(attr, new)
|
el.set(attr, new)
|
||||||
return lxml.html.tostring(html, encoding="utf-8", pretty_print=True).decode("utf-8")
|
for el in html.iter():
|
||||||
|
for attr in HTML_ATTRIBUTE_DENYLIST:
|
||||||
|
el.attrib.pop(attr, None)
|
||||||
|
return (html.text or "") + "".join(
|
||||||
|
lxml.html.tostring(child, encoding="unicode") for child in html
|
||||||
|
)
|
||||||
|
|
|
||||||
|
|
@ -7,7 +7,18 @@ from scrapy.spiders import Spider
|
||||||
from scrapy.utils.spider import iterate_spider_output
|
from scrapy.utils.spider import iterate_spider_output
|
||||||
|
|
||||||
from repub.items import ChannelElementItem, ElementItem
|
from repub.items import ChannelElementItem, ElementItem
|
||||||
from repub.rss import CDATA, CONTENT, ITUNES, MEDIA, E, munge_cdata_html, normalize_date
|
from repub.rss import (
|
||||||
|
ATOM,
|
||||||
|
CDATA,
|
||||||
|
CONTENT,
|
||||||
|
ITUNES,
|
||||||
|
MEDIA,
|
||||||
|
E,
|
||||||
|
munge_cdata_html,
|
||||||
|
normalize_date,
|
||||||
|
plain_text_summary,
|
||||||
|
sanitize_html,
|
||||||
|
)
|
||||||
from repub.utils import FileType, determine_file_type, local_file_path, local_image_path
|
from repub.utils import FileType, determine_file_type, local_file_path, local_image_path
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -42,11 +53,57 @@ class BaseRssFeedSpider(Spider):
|
||||||
file_dir = self.settings["REPUBLISHER_VIDEO_DIR"]
|
file_dir = self.settings["REPUBLISHER_VIDEO_DIR"]
|
||||||
elif file_type == FileType.AUDIO:
|
elif file_type == FileType.AUDIO:
|
||||||
file_dir = self.settings["REPUBLISHER_AUDIO_DIR"]
|
file_dir = self.settings["REPUBLISHER_AUDIO_DIR"]
|
||||||
return f"{file_dir}/{local_path}"
|
relative_path = f"{file_dir}/{local_path}"
|
||||||
|
return self.absolute_feed_url(relative_path)
|
||||||
|
|
||||||
def rewrite_image_url(self, url):
|
def rewrite_image_url(self, url):
|
||||||
return self.rewrite_file_url(FileType.IMAGE, url)
|
return self.rewrite_file_url(FileType.IMAGE, url)
|
||||||
|
|
||||||
|
def absolute_feed_url(self, path: str) -> str:
|
||||||
|
feed_url = str(self.settings.get("REPUBLISHER_FEED_URL", "")).rstrip("/")
|
||||||
|
if feed_url == "":
|
||||||
|
return path
|
||||||
|
return f"{feed_url}/feeds/{self.feed_name}/{path.lstrip('/')}"
|
||||||
|
|
||||||
|
def compact_attrib(self, **attrib):
|
||||||
|
return {
|
||||||
|
key: str(value) for key, value in attrib.items() if value not in (None, "")
|
||||||
|
}
|
||||||
|
|
||||||
|
def itunes_explicit_value(self, value) -> str:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return (
|
||||||
|
"true"
|
||||||
|
if value.strip().lower() in {"true", "yes", "explicit"}
|
||||||
|
else "false"
|
||||||
|
)
|
||||||
|
return "true" if bool(value) else "false"
|
||||||
|
|
||||||
|
def publisher_email(self, feed) -> str | None:
|
||||||
|
publisher_detail = feed.get("publisher_detail")
|
||||||
|
if publisher_detail and publisher_detail.get("email"):
|
||||||
|
return publisher_detail.get("email")
|
||||||
|
publisher = feed.get("publisher")
|
||||||
|
if isinstance(publisher, str) and "@" in publisher:
|
||||||
|
return publisher
|
||||||
|
return None
|
||||||
|
|
||||||
|
def itunes_category(self, feed) -> str:
|
||||||
|
del feed
|
||||||
|
return "News"
|
||||||
|
|
||||||
|
def latest_entry_date(self, feed) -> str | None:
|
||||||
|
published_dates = [
|
||||||
|
normalize_date(entry.get("published_parsed"))
|
||||||
|
for entry in feed.entries
|
||||||
|
if entry.get("published_parsed") is not None
|
||||||
|
]
|
||||||
|
if published_dates:
|
||||||
|
return max(published_dates)
|
||||||
|
return normalize_date(feed.feed.get("updated_parsed")) or normalize_date(
|
||||||
|
feed.feed.get("published_parsed")
|
||||||
|
)
|
||||||
|
|
||||||
def munge_cdata_html(self, html) -> Tuple[str, Dict[FileType, List[str]]]:
|
def munge_cdata_html(self, html) -> Tuple[str, Dict[FileType, List[str]]]:
|
||||||
urls = {FileType.IMAGE: [], FileType.VIDEO: [], FileType.AUDIO: []}
|
urls = {FileType.IMAGE: [], FileType.VIDEO: [], FileType.AUDIO: []}
|
||||||
|
|
||||||
|
|
@ -100,14 +157,31 @@ class BaseRssFeedSpider(Spider):
|
||||||
channel = E.channel(
|
channel = E.channel(
|
||||||
E.title(f.get("title")),
|
E.title(f.get("title")),
|
||||||
E.link(f.get("link")),
|
E.link(f.get("link")),
|
||||||
E.description(f.get("description")),
|
E.description(sanitize_html(f.get("description", ""))),
|
||||||
E.language(f.get("language")),
|
E.language(f.get("language")),
|
||||||
E.copyright(f.get("copyright")),
|
E.copyright(f.get("copyright")),
|
||||||
E.webMaster(f.get("publisher")),
|
E.webMaster(self.WEBMASTER_VALUE),
|
||||||
E.generator(f.get("generator")),
|
E.generator(f.get("generator")),
|
||||||
E.pubDate(normalize_date(f.get("published_parsed"))),
|
E.pubDate(normalize_date(f.get("published_parsed"))),
|
||||||
E.lastBuildDate(normalize_date(f.get("updated_parsed"))),
|
E.lastBuildDate(self.latest_entry_date(feed)),
|
||||||
ITUNES.explicit("yes" if f.get("itunes_explicit", False) else "no"),
|
ITUNES.explicit(
|
||||||
|
self.itunes_explicit_value(f.get("itunes_explicit", False))
|
||||||
|
),
|
||||||
|
ITUNES.category(text=self.itunes_category(f)),
|
||||||
|
(
|
||||||
|
ITUNES.owner(ITUNES.email(email))
|
||||||
|
if (email := self.publisher_email(f))
|
||||||
|
else None
|
||||||
|
),
|
||||||
|
(
|
||||||
|
ATOM.link(
|
||||||
|
rel="self",
|
||||||
|
href=self.absolute_feed_url("feed.rss"),
|
||||||
|
type="application/rss+xml",
|
||||||
|
)
|
||||||
|
if self.settings.get("REPUBLISHER_FEED_URL")
|
||||||
|
else None
|
||||||
|
),
|
||||||
)
|
)
|
||||||
for tag in f.get("tags", []):
|
for tag in f.get("tags", []):
|
||||||
channel.append(E.category(tag.term))
|
channel.append(E.category(tag.term))
|
||||||
|
|
@ -119,7 +193,7 @@ class BaseRssFeedSpider(Spider):
|
||||||
E.title(f.get("title")),
|
E.title(f.get("title")),
|
||||||
E.link(f.get("link")),
|
E.link(f.get("link")),
|
||||||
E.url(self.rewrite_image_url(f.image.get("href"))),
|
E.url(self.rewrite_image_url(f.image.get("href"))),
|
||||||
E.description(f.get("description")),
|
E.description(sanitize_html(f.get("description", ""))),
|
||||||
)
|
)
|
||||||
image_urls.append(f.image.get("href"))
|
image_urls.append(f.image.get("href"))
|
||||||
else:
|
else:
|
||||||
|
|
@ -127,7 +201,7 @@ class BaseRssFeedSpider(Spider):
|
||||||
E.title(f.image.get("title")),
|
E.title(f.image.get("title")),
|
||||||
E.link(f.image.get("link")),
|
E.link(f.image.get("link")),
|
||||||
E.url(self.rewrite_image_url(f.image.get("url"))),
|
E.url(self.rewrite_image_url(f.image.get("url"))),
|
||||||
E.description(f.image.get("description")),
|
E.description(sanitize_html(f.image.get("description", ""))),
|
||||||
E.width(f.image.get("width")),
|
E.width(f.image.get("width")),
|
||||||
E.height(f.image.get("height")),
|
E.height(f.image.get("height")),
|
||||||
)
|
)
|
||||||
|
|
@ -205,14 +279,14 @@ class RssFeedSpider(BaseRssFeedSpider):
|
||||||
item = E.item(
|
item = E.item(
|
||||||
E.title(entry.get("title")),
|
E.title(entry.get("title")),
|
||||||
E.link(entry.get("link")),
|
E.link(entry.get("link")),
|
||||||
E.description(entry.get("description")),
|
E.description(sanitize_html(entry.get("description", ""))),
|
||||||
E.guid(
|
E.guid(
|
||||||
entry.get("id"),
|
entry.get("id"),
|
||||||
{"isPermaLink": "true" if entry.guidislink else "false"},
|
{"isPermaLink": "true" if entry.guidislink else "false"},
|
||||||
),
|
),
|
||||||
E.pubDate(normalize_date(entry.get("published_parsed"))),
|
E.pubDate(normalize_date(entry.get("published_parsed"))),
|
||||||
E.author(entry.get("author")),
|
E.author(entry.get("author")),
|
||||||
ITUNES.summary(entry.get("summary")),
|
ITUNES.summary(plain_text_summary(entry.get("summary"))),
|
||||||
ITUNES.duration(entry.get("itunes_duration")),
|
ITUNES.duration(entry.get("itunes_duration")),
|
||||||
ITUNES.image(
|
ITUNES.image(
|
||||||
None,
|
None,
|
||||||
|
|
@ -230,9 +304,11 @@ class RssFeedSpider(BaseRssFeedSpider):
|
||||||
file_type = determine_file_type(url=url, mimetype=enc.get("type"))
|
file_type = determine_file_type(url=url, mimetype=enc.get("type"))
|
||||||
item.append(
|
item.append(
|
||||||
E.enclosure(
|
E.enclosure(
|
||||||
E.url(self.rewrite_file_url(file_type, url)),
|
**self.compact_attrib(
|
||||||
E.length(enc.get("length")),
|
url=self.rewrite_file_url(file_type, url),
|
||||||
E.type(enc.get("type")),
|
length=enc.get("length"),
|
||||||
|
type=enc.get("type"),
|
||||||
|
)
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
self.logger.debug(
|
self.logger.debug(
|
||||||
|
|
@ -261,19 +337,21 @@ class RssFeedSpider(BaseRssFeedSpider):
|
||||||
)
|
)
|
||||||
item.append(
|
item.append(
|
||||||
MEDIA.content(
|
MEDIA.content(
|
||||||
E.url(self.rewrite_file_url(file_type, media.get("url"))),
|
**self.compact_attrib(
|
||||||
E.type(media.get("type")),
|
url=self.rewrite_file_url(file_type, media.get("url")),
|
||||||
E.medium(media.get("medium")),
|
type=media.get("type"),
|
||||||
E.isDefault(media.get("isDefault")),
|
medium=media.get("medium"),
|
||||||
E.expression(media.get("expression")),
|
isDefault=media.get("isDefault"),
|
||||||
E.bitrate(media.get("bitrate")),
|
expression=media.get("expression"),
|
||||||
E.framerate(media.get("framerate")),
|
bitrate=media.get("bitrate"),
|
||||||
E.samplingrate(media.get("samplingrate")),
|
framerate=media.get("framerate"),
|
||||||
E.channels(media.get("channels")),
|
samplingrate=media.get("samplingrate"),
|
||||||
E.duration(media.get("duration")),
|
channels=media.get("channels"),
|
||||||
E.height(media.get("height")),
|
duration=media.get("duration"),
|
||||||
E.width(media.get("width")),
|
height=media.get("height"),
|
||||||
E.lang(media.get("lang")),
|
width=media.get("width"),
|
||||||
|
lang=media.get("lang"),
|
||||||
|
)
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
add_url(file_type, media.get("url"))
|
add_url(file_type, media.get("url"))
|
||||||
|
|
@ -289,3 +367,5 @@ class RssFeedSpider(BaseRssFeedSpider):
|
||||||
video_urls=video_urls,
|
video_urls=video_urls,
|
||||||
videos=[],
|
videos=[],
|
||||||
)
|
)
|
||||||
|
|
||||||
|
WEBMASTER_VALUE = "support@guardianproject.info (Guardian Project)"
|
||||||
|
|
|
||||||
155
repub/web.py
155
repub/web.py
|
|
@ -3,6 +3,8 @@ from __future__ import annotations
|
||||||
import asyncio
|
import asyncio
|
||||||
import hashlib
|
import hashlib
|
||||||
from collections.abc import AsyncGenerator, Awaitable, Callable
|
from collections.abc import AsyncGenerator, Awaitable, Callable
|
||||||
|
from contextlib import suppress
|
||||||
|
from datetime import timedelta
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import TypedDict, cast
|
from typing import TypedDict, cast
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
|
|
@ -13,16 +15,9 @@ from datastar_py.quart import DatastarResponse, read_signals
|
||||||
from datastar_py.sse import DatastarEvent
|
from datastar_py.sse import DatastarEvent
|
||||||
from htpy import Renderable
|
from htpy import Renderable
|
||||||
from peewee import IntegrityError
|
from peewee import IntegrityError
|
||||||
from quart import (
|
from quart import Quart, Response, request, send_from_directory, url_for
|
||||||
Quart,
|
|
||||||
Response,
|
|
||||||
has_request_context,
|
|
||||||
request,
|
|
||||||
send_from_directory,
|
|
||||||
url_for,
|
|
||||||
)
|
|
||||||
|
|
||||||
from repub.datastar import RefreshBroker, render_stream
|
from repub.datastar import RefreshBroker, TabStateStore, render_stream
|
||||||
from repub.jobs import (
|
from repub.jobs import (
|
||||||
COMPLETED_EXECUTION_PAGE_SIZE,
|
COMPLETED_EXECUTION_PAGE_SIZE,
|
||||||
JobRuntime,
|
JobRuntime,
|
||||||
|
|
@ -58,10 +53,14 @@ from repub.pages.sources import PANGEA_CONTENT_FORMATS, PANGEA_CONTENT_TYPES
|
||||||
|
|
||||||
REFRESH_BROKER_KEY = "repub.refresh_broker"
|
REFRESH_BROKER_KEY = "repub.refresh_broker"
|
||||||
JOB_RUNTIME_KEY = "repub.job_runtime"
|
JOB_RUNTIME_KEY = "repub.job_runtime"
|
||||||
|
TAB_STATE_STORE_KEY = "repub.tab_state_store"
|
||||||
|
TAB_STATE_CLEANER_TASK_KEY = "repub.tab_state_cleaner_task"
|
||||||
DEFAULT_LOG_DIR = Path("out/logs")
|
DEFAULT_LOG_DIR = Path("out/logs")
|
||||||
DEFAULT_FEEDS_DIR = Path("out/feeds")
|
DEFAULT_FEEDS_DIR = Path("out/feeds")
|
||||||
|
RUNS_TAB_STATE_KEY = "runs"
|
||||||
|
TAB_STATE_CLEAN_INTERVAL = timedelta(seconds=10)
|
||||||
|
|
||||||
RenderFunction = Callable[[], Awaitable[Renderable]]
|
PatchRenderFunction = Callable[[str | None], Awaitable[Renderable]]
|
||||||
|
|
||||||
|
|
||||||
class SourceFormData(TypedDict):
|
class SourceFormData(TypedDict):
|
||||||
|
|
@ -93,6 +92,7 @@ class SourceFormData(TypedDict):
|
||||||
|
|
||||||
class SettingsFormData(TypedDict):
|
class SettingsFormData(TypedDict):
|
||||||
max_concurrent_jobs: int
|
max_concurrent_jobs: int
|
||||||
|
feed_url: str
|
||||||
|
|
||||||
|
|
||||||
DEFAULT_PANGEA_CONTENT_FORMAT = "MOBILE_3"
|
DEFAULT_PANGEA_CONTENT_FORMAT = "MOBILE_3"
|
||||||
|
|
@ -144,6 +144,8 @@ def create_app(*, dev_mode: bool = False) -> Quart:
|
||||||
app.config["REPUB_DEV_MODE"] = dev_mode
|
app.config["REPUB_DEV_MODE"] = dev_mode
|
||||||
app.extensions[REFRESH_BROKER_KEY] = RefreshBroker()
|
app.extensions[REFRESH_BROKER_KEY] = RefreshBroker()
|
||||||
app.extensions[JOB_RUNTIME_KEY] = None
|
app.extensions[JOB_RUNTIME_KEY] = None
|
||||||
|
app.extensions[TAB_STATE_STORE_KEY] = TabStateStore()
|
||||||
|
app.extensions[TAB_STATE_CLEANER_TASK_KEY] = None
|
||||||
|
|
||||||
@app.get("/feeds/<path:feed_path>")
|
@app.get("/feeds/<path:feed_path>")
|
||||||
async def published_feed(feed_path: str) -> Response:
|
async def published_feed(feed_path: str) -> Response:
|
||||||
|
|
@ -202,23 +204,27 @@ def create_app(*, dev_mode: bool = False) -> Quart:
|
||||||
|
|
||||||
@app.post("/")
|
@app.post("/")
|
||||||
async def dashboard_patch() -> DatastarResponse:
|
async def dashboard_patch() -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_dashboard(app))
|
return await _page_patch_response(app, lambda _tab_id: render_dashboard(app))
|
||||||
|
|
||||||
@app.post("/sources")
|
@app.post("/sources")
|
||||||
async def sources_patch() -> DatastarResponse:
|
async def sources_patch() -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_sources(app))
|
return await _page_patch_response(app, lambda _tab_id: render_sources(app))
|
||||||
|
|
||||||
@app.post("/sources/create")
|
@app.post("/sources/create")
|
||||||
async def create_source_patch() -> DatastarResponse:
|
async def create_source_patch() -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_create_source(app))
|
return await _page_patch_response(
|
||||||
|
app, lambda _tab_id: render_create_source(app)
|
||||||
|
)
|
||||||
|
|
||||||
@app.post("/sources/<string:slug>/edit")
|
@app.post("/sources/<string:slug>/edit")
|
||||||
async def edit_source_patch(slug: str) -> DatastarResponse:
|
async def edit_source_patch(slug: str) -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_edit_source(slug, app))
|
return await _page_patch_response(
|
||||||
|
app, lambda _tab_id: render_edit_source(slug, app)
|
||||||
|
)
|
||||||
|
|
||||||
@app.post("/settings")
|
@app.post("/settings")
|
||||||
async def settings_patch() -> DatastarResponse:
|
async def settings_patch() -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_settings(app))
|
return await _page_patch_response(app, lambda _tab_id: render_settings(app))
|
||||||
|
|
||||||
@app.post("/actions/sources/create")
|
@app.post("/actions/sources/create")
|
||||||
async def create_source_action() -> DatastarResponse:
|
async def create_source_action() -> DatastarResponse:
|
||||||
|
|
@ -288,12 +294,30 @@ def create_app(*, dev_mode: bool = False) -> Quart:
|
||||||
|
|
||||||
assert settings is not None
|
assert settings is not None
|
||||||
save_setting("max_concurrent_jobs", settings["max_concurrent_jobs"])
|
save_setting("max_concurrent_jobs", settings["max_concurrent_jobs"])
|
||||||
|
save_setting("feed_url", settings["feed_url"])
|
||||||
trigger_refresh(app)
|
trigger_refresh(app)
|
||||||
return DatastarResponse(SSE.redirect("/settings"))
|
return DatastarResponse(SSE.redirect("/settings"))
|
||||||
|
|
||||||
@app.post("/runs")
|
@app.post("/runs")
|
||||||
async def runs_patch() -> DatastarResponse:
|
async def runs_patch() -> DatastarResponse:
|
||||||
return _page_patch_response(app, lambda: render_runs(app))
|
return await _page_patch_response(
|
||||||
|
app,
|
||||||
|
lambda tab_id: render_runs(app, tab_id=tab_id),
|
||||||
|
)
|
||||||
|
|
||||||
|
@app.post("/actions/runs/completed-page/<int:page>")
|
||||||
|
async def set_completed_runs_page_action(page: int) -> Response:
|
||||||
|
signals = await _read_optional_signals()
|
||||||
|
tab_id = _read_tab_id(signals)
|
||||||
|
if tab_id is None:
|
||||||
|
return Response(status=400)
|
||||||
|
get_tab_state_store(app).update_page_state(
|
||||||
|
tab_id,
|
||||||
|
RUNS_TAB_STATE_KEY,
|
||||||
|
lambda state: {**state, "completed_page": max(1, page)},
|
||||||
|
)
|
||||||
|
trigger_refresh(app, tab_id=tab_id)
|
||||||
|
return Response(status=204)
|
||||||
|
|
||||||
@app.post("/actions/jobs/<int:job_id>/run-now")
|
@app.post("/actions/jobs/<int:job_id>/run-now")
|
||||||
async def run_job_now_action(job_id: int) -> Response:
|
async def run_job_now_action(job_id: int) -> Response:
|
||||||
|
|
@ -351,14 +375,24 @@ def create_app(*, dev_mode: bool = False) -> Quart:
|
||||||
app, job_id=job_id, execution_id=execution_id
|
app, job_id=job_id, execution_id=execution_id
|
||||||
)
|
)
|
||||||
|
|
||||||
return _page_patch_response(app, render)
|
return await _page_patch_response(app, lambda _tab_id: render())
|
||||||
|
|
||||||
@app.before_serving
|
@app.before_serving
|
||||||
async def start_runtime() -> None:
|
async def start_runtime() -> None:
|
||||||
get_job_runtime(app).start()
|
get_job_runtime(app).start()
|
||||||
|
app.extensions[TAB_STATE_CLEANER_TASK_KEY] = asyncio.create_task(
|
||||||
|
_clean_tab_state_periodically(app)
|
||||||
|
)
|
||||||
|
|
||||||
@app.after_serving
|
@app.after_serving
|
||||||
async def stop_runtime() -> None:
|
async def stop_runtime() -> None:
|
||||||
|
cleaner = cast(
|
||||||
|
asyncio.Task[None] | None, app.extensions.get(TAB_STATE_CLEANER_TASK_KEY)
|
||||||
|
)
|
||||||
|
if cleaner is not None:
|
||||||
|
cleaner.cancel()
|
||||||
|
with suppress(asyncio.CancelledError):
|
||||||
|
await cleaner
|
||||||
get_job_runtime(app).shutdown()
|
get_job_runtime(app).shutdown()
|
||||||
|
|
||||||
return app
|
return app
|
||||||
|
|
@ -368,6 +402,10 @@ def get_refresh_broker(app: Quart) -> RefreshBroker:
|
||||||
return cast(RefreshBroker, app.extensions[REFRESH_BROKER_KEY])
|
return cast(RefreshBroker, app.extensions[REFRESH_BROKER_KEY])
|
||||||
|
|
||||||
|
|
||||||
|
def get_tab_state_store(app: Quart) -> TabStateStore:
|
||||||
|
return cast(TabStateStore, app.extensions[TAB_STATE_STORE_KEY])
|
||||||
|
|
||||||
|
|
||||||
def get_job_runtime(app: Quart) -> JobRuntime:
|
def get_job_runtime(app: Quart) -> JobRuntime:
|
||||||
runtime = cast(JobRuntime | None, app.extensions.get(JOB_RUNTIME_KEY))
|
runtime = cast(JobRuntime | None, app.extensions.get(JOB_RUNTIME_KEY))
|
||||||
if runtime is None:
|
if runtime is None:
|
||||||
|
|
@ -379,8 +417,10 @@ def get_job_runtime(app: Quart) -> JobRuntime:
|
||||||
return runtime
|
return runtime
|
||||||
|
|
||||||
|
|
||||||
def trigger_refresh(app: Quart, event: object = "refresh-event") -> None:
|
def trigger_refresh(
|
||||||
get_refresh_broker(app).publish(event)
|
app: Quart, event: object = "refresh-event", *, tab_id: str | None = None
|
||||||
|
) -> None:
|
||||||
|
get_refresh_broker(app).publish(event, tab_id=tab_id)
|
||||||
|
|
||||||
|
|
||||||
async def render_dashboard(app: Quart | None = None) -> Renderable:
|
async def render_dashboard(app: Quart | None = None) -> Renderable:
|
||||||
|
|
@ -431,18 +471,17 @@ async def render_edit_source(slug: str, app: Quart | None = None) -> Renderable:
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
async def render_runs(app: Quart | None = None) -> Renderable:
|
async def render_runs(
|
||||||
|
app: Quart | None = None, *, tab_id: str | None = None
|
||||||
|
) -> Renderable:
|
||||||
if app is None:
|
if app is None:
|
||||||
return runs_page()
|
return runs_page()
|
||||||
|
|
||||||
completed_page = (
|
tab_state = get_tab_state_store(app).get_page_state(tab_id, RUNS_TAB_STATE_KEY)
|
||||||
max(1, request.args.get("completed_page", 1, type=int) or 1)
|
resolved_completed_page = max(1, _read_int(tab_state.get("completed_page"), 1))
|
||||||
if has_request_context()
|
|
||||||
else 1
|
|
||||||
)
|
|
||||||
view = load_runs_view(
|
view = load_runs_view(
|
||||||
log_dir=app.config["REPUB_LOG_DIR"],
|
log_dir=app.config["REPUB_LOG_DIR"],
|
||||||
completed_page=completed_page,
|
completed_page=resolved_completed_page,
|
||||||
completed_page_size=COMPLETED_EXECUTION_PAGE_SIZE,
|
completed_page_size=COMPLETED_EXECUTION_PAGE_SIZE,
|
||||||
)
|
)
|
||||||
return runs_page(
|
return runs_page(
|
||||||
|
|
@ -494,24 +533,36 @@ async def render_execution_logs(
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _page_patch_response(app: Quart, render: RenderFunction) -> DatastarResponse:
|
async def _page_patch_response(
|
||||||
queue = get_refresh_broker(app).subscribe()
|
app: Quart, render: PatchRenderFunction
|
||||||
|
) -> DatastarResponse:
|
||||||
|
signals = await _read_optional_signals()
|
||||||
|
tab_id = _read_tab_id(signals)
|
||||||
|
if tab_id is not None:
|
||||||
|
get_tab_state_store(app).connect(tab_id)
|
||||||
|
queue = get_refresh_broker(app).subscribe(tab_id=tab_id)
|
||||||
stream = render_stream(
|
stream = render_stream(
|
||||||
queue,
|
queue,
|
||||||
render=render,
|
render=lambda: render(tab_id),
|
||||||
last_event_id=request.headers.get("last-event-id"),
|
last_event_id=request.headers.get("last-event-id"),
|
||||||
)
|
)
|
||||||
return DatastarResponse(_unsubscribe_on_close(queue, stream, app))
|
return DatastarResponse(_unsubscribe_on_close(queue, stream, app, tab_id=tab_id))
|
||||||
|
|
||||||
|
|
||||||
async def _unsubscribe_on_close(
|
async def _unsubscribe_on_close(
|
||||||
queue: object, stream: AsyncGenerator[DatastarEvent, None], app: Quart
|
queue: object,
|
||||||
|
stream: AsyncGenerator[DatastarEvent, None],
|
||||||
|
app: Quart,
|
||||||
|
*,
|
||||||
|
tab_id: str | None,
|
||||||
) -> AsyncGenerator[DatastarEvent, None]:
|
) -> AsyncGenerator[DatastarEvent, None]:
|
||||||
try:
|
try:
|
||||||
async for event in stream:
|
async for event in stream:
|
||||||
yield event
|
yield event
|
||||||
finally:
|
finally:
|
||||||
get_refresh_broker(app).unsubscribe(cast(asyncio.Queue[object], queue))
|
get_refresh_broker(app).unsubscribe(cast(asyncio.Queue[object], queue))
|
||||||
|
if tab_id is not None:
|
||||||
|
get_tab_state_store(app).disconnect(tab_id)
|
||||||
|
|
||||||
|
|
||||||
def _load_sidebar_counts(app: Quart) -> dict[str, int]:
|
def _load_sidebar_counts(app: Quart) -> dict[str, int]:
|
||||||
|
|
@ -523,6 +574,29 @@ def _load_sidebar_counts(app: Quart) -> dict[str, int]:
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
async def _clean_tab_state_periodically(app: Quart) -> None:
|
||||||
|
while True:
|
||||||
|
await asyncio.sleep(TAB_STATE_CLEAN_INTERVAL.total_seconds())
|
||||||
|
get_tab_state_store(app).cleanup_stale()
|
||||||
|
|
||||||
|
|
||||||
|
async def _read_optional_signals() -> dict[str, object] | None:
|
||||||
|
content_type = request.headers.get("Content-Type", "")
|
||||||
|
if request.content_length in (None, 0) and "application/json" not in content_type:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return cast(dict[str, object] | None, await read_signals())
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _read_tab_id(signals: dict[str, object] | None) -> str | None:
|
||||||
|
if signals is None:
|
||||||
|
return None
|
||||||
|
tab_id = _read_string(signals, "tabid")
|
||||||
|
return tab_id or None
|
||||||
|
|
||||||
|
|
||||||
def validate_source_form(
|
def validate_source_form(
|
||||||
signals: dict[str, object] | None,
|
signals: dict[str, object] | None,
|
||||||
*,
|
*,
|
||||||
|
|
@ -637,11 +711,17 @@ def validate_settings_form(
|
||||||
return None, "Missing form data."
|
return None, "Missing form data."
|
||||||
|
|
||||||
max_concurrent_jobs = _parse_int(_read_string(signals, "maxConcurrentJobs"))
|
max_concurrent_jobs = _parse_int(_read_string(signals, "maxConcurrentJobs"))
|
||||||
|
feed_url = _read_string(signals, "feedUrl").rstrip("/")
|
||||||
if max_concurrent_jobs is None:
|
if max_concurrent_jobs is None:
|
||||||
return None, "Max concurrent jobs must be an integer."
|
return None, "Max concurrent jobs must be an integer."
|
||||||
if max_concurrent_jobs < 1:
|
if max_concurrent_jobs < 1:
|
||||||
return None, "Max concurrent jobs must be at least 1."
|
return None, "Max concurrent jobs must be at least 1."
|
||||||
return {"max_concurrent_jobs": max_concurrent_jobs}, None
|
if feed_url != "" and not _is_valid_url(feed_url):
|
||||||
|
return None, "Feed URL must be a valid URL."
|
||||||
|
return {
|
||||||
|
"max_concurrent_jobs": max_concurrent_jobs,
|
||||||
|
"feed_url": feed_url,
|
||||||
|
}, None
|
||||||
|
|
||||||
|
|
||||||
def _read_string(signals: dict[str, object], key: str, *, strip: bool = True) -> str:
|
def _read_string(signals: dict[str, object], key: str, *, strip: bool = True) -> str:
|
||||||
|
|
@ -669,6 +749,17 @@ def _parse_int(value: str) -> int | None:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _read_int(value: object, default: int) -> int:
|
||||||
|
if isinstance(value, bool):
|
||||||
|
return int(value)
|
||||||
|
if isinstance(value, int):
|
||||||
|
return value
|
||||||
|
if isinstance(value, str):
|
||||||
|
parsed = _parse_int(value)
|
||||||
|
return default if parsed is None else parsed
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
def _is_valid_url(value: str) -> bool:
|
def _is_valid_url(value: str) -> bool:
|
||||||
parsed = urlparse(value)
|
parsed = urlparse(value)
|
||||||
return parsed.scheme in {"http", "https"} and parsed.netloc != ""
|
return parsed.scheme in {"http", "https"} and parsed.netloc != ""
|
||||||
|
|
|
||||||
11
scripts/validate-feed
Executable file
11
scripts/validate-feed
Executable file
|
|
@ -0,0 +1,11 @@
|
||||||
|
#!/bin/sh
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
feedvalidator_dir="${FEEDVALIDATOR_DIR:-$HOME/src/github.com/w3c/feedvalidator}"
|
||||||
|
pager="${PAGER:-less}"
|
||||||
|
|
||||||
|
feed_path="$(realpath "$1")"
|
||||||
|
shift
|
||||||
|
|
||||||
|
cd "$feedvalidator_dir"
|
||||||
|
uv run validate_feed.py "$feed_path" "$@" | "$pager"
|
||||||
171
tests/test_feed_validation.py
Normal file
171
tests/test_feed_validation.py
Normal file
|
|
@ -0,0 +1,171 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from email.utils import parsedate_to_datetime
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from lxml import etree
|
||||||
|
from scrapy.http import TextResponse
|
||||||
|
from scrapy.settings import Settings
|
||||||
|
|
||||||
|
from repub.exporters import RssExporter
|
||||||
|
from repub.rss import nsmap
|
||||||
|
from repub.spiders.rss_spider import RssFeedSpider
|
||||||
|
from repub.utils import local_audio_path, local_file_path, local_image_path
|
||||||
|
|
||||||
|
RSS_DATE_PATTERN = re.compile(
|
||||||
|
r"^[A-Z][a-z]{2}, \d{2} [A-Z][a-z]{2} \d{4} \d{2}:\d{2}:\d{2} [+-]\d{4}$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _serialize_feed(*, feed_text: str, feed_url: str) -> tuple[str, etree._Element]:
|
||||||
|
spider = RssFeedSpider(feed_name="demo", url="https://source.example/feed.rss")
|
||||||
|
spider.settings = Settings(
|
||||||
|
values={
|
||||||
|
"REPUBLISHER_IMAGE_DIR": "images",
|
||||||
|
"REPUBLISHER_FILE_DIR": "files",
|
||||||
|
"REPUBLISHER_AUDIO_DIR": "audio",
|
||||||
|
"REPUBLISHER_VIDEO_DIR": "video",
|
||||||
|
"REPUBLISHER_FEED_URL": feed_url,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
response = TextResponse(
|
||||||
|
url="https://source.example/feed.rss",
|
||||||
|
body=feed_text.encode("utf-8"),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
output = BytesIO()
|
||||||
|
exporter = RssExporter(output)
|
||||||
|
exporter.start_exporting()
|
||||||
|
for item in list(spider._parse(response) or []):
|
||||||
|
exporter.export_item(item)
|
||||||
|
exporter.finish_exporting()
|
||||||
|
|
||||||
|
xml = output.getvalue().decode("utf-8")
|
||||||
|
return xml, etree.fromstring(output.getvalue())
|
||||||
|
|
||||||
|
|
||||||
|
def test_feed_generation_normalizes_dates_urls_and_xml_shapes() -> None:
|
||||||
|
long_summary = "<p>" + ("Long summary text " * 260) + "<b>tail</b></p>"
|
||||||
|
source_image = "https://source.example/media/photo.jpg"
|
||||||
|
source_audio = "https://source.example/media/audio.mp3"
|
||||||
|
source_video = "https://source.example/media/video.mp4"
|
||||||
|
channel_image = "https://source.example/media/channel.png"
|
||||||
|
item_image = "https://source.example/media/cover.jpg"
|
||||||
|
xml, root = _serialize_feed(
|
||||||
|
feed_url="https://mirror.example",
|
||||||
|
feed_text=f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<rss version="2.0"
|
||||||
|
xmlns:content="http://purl.org/rss/1.0/modules/content/"
|
||||||
|
xmlns:media="http://search.yahoo.com/mrss/"
|
||||||
|
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
|
||||||
|
<channel>
|
||||||
|
<title>Demo Feed</title>
|
||||||
|
<link>https://source.example/feed</link>
|
||||||
|
<description><![CDATA[<p mode="teaser" querystring="view=full">Channel description</p>]]></description>
|
||||||
|
<language>en-us</language>
|
||||||
|
<webMaster>support@guardianproject.info</webMaster>
|
||||||
|
<category>World</category>
|
||||||
|
<pubDate>Tue, 31 Mar 2026 08:31:50 +0000</pubDate>
|
||||||
|
<lastBuildDate>Tue, 31 Mar 2026 09:31:50 +0000</lastBuildDate>
|
||||||
|
<image>
|
||||||
|
<url>{channel_image}</url>
|
||||||
|
<title>Demo Feed</title>
|
||||||
|
<link>https://source.example/feed</link>
|
||||||
|
</image>
|
||||||
|
<item>
|
||||||
|
<title>Entry One</title>
|
||||||
|
<link>https://source.example/entry-1</link>
|
||||||
|
<description><![CDATA[<p mode="summary" querystring="foo=bar"><img src="{source_image}" contenteditable="true"></p>]]></description>
|
||||||
|
<guid isPermaLink="false">entry-1</guid>
|
||||||
|
<pubDate>Tue, 31 Mar 2026 10:31:50 +0000</pubDate>
|
||||||
|
<enclosure url="{source_audio}" length="123" type="audio/mpeg" />
|
||||||
|
<content:encoded><![CDATA[<div mode="body" querystring="x=1"><img src="{source_image}" contenteditable="true"></div>]]></content:encoded>
|
||||||
|
<media:content url="{source_video}" type="video/mp4" medium="video" expression="full" duration="60" width="640" height="360" lang="en" />
|
||||||
|
<itunes:summary><![CDATA[{long_summary}]]></itunes:summary>
|
||||||
|
<itunes:image href="{item_image}" />
|
||||||
|
</item>
|
||||||
|
</channel>
|
||||||
|
</rss>
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
|
||||||
|
channel = root.find("channel")
|
||||||
|
assert channel is not None
|
||||||
|
|
||||||
|
last_build_date = channel.findtext("lastBuildDate")
|
||||||
|
item_pub_date = root.findtext("./channel/item/pubDate")
|
||||||
|
assert last_build_date is not None
|
||||||
|
assert item_pub_date is not None
|
||||||
|
assert RSS_DATE_PATTERN.fullmatch(last_build_date)
|
||||||
|
assert RSS_DATE_PATTERN.fullmatch(item_pub_date)
|
||||||
|
assert (
|
||||||
|
channel.findtext("webMaster")
|
||||||
|
== "support@guardianproject.info (Guardian Project)"
|
||||||
|
)
|
||||||
|
assert parsedate_to_datetime(last_build_date).tzinfo is not None
|
||||||
|
assert parsedate_to_datetime(item_pub_date).tzinfo is not None
|
||||||
|
assert last_build_date == item_pub_date
|
||||||
|
assert channel.findtext("itunes:explicit", namespaces=nsmap) == "false"
|
||||||
|
assert channel.findtext("./image/url") == (
|
||||||
|
f"https://mirror.example/feeds/demo/images/{local_image_path(channel_image)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
atom_self = channel.find("atom:link", namespaces=nsmap)
|
||||||
|
assert atom_self is not None
|
||||||
|
assert atom_self.attrib == {
|
||||||
|
"rel": "self",
|
||||||
|
"href": "https://mirror.example/feeds/demo/feed.rss",
|
||||||
|
"type": "application/rss+xml",
|
||||||
|
}
|
||||||
|
itunes_category = channel.find("itunes:category", namespaces=nsmap)
|
||||||
|
assert itunes_category is not None
|
||||||
|
assert itunes_category.attrib == {"text": "News"}
|
||||||
|
assert (
|
||||||
|
channel.findtext("./itunes:owner/itunes:email", namespaces=nsmap)
|
||||||
|
== "support@guardianproject.info"
|
||||||
|
)
|
||||||
|
|
||||||
|
enclosure = root.find("./channel/item/enclosure")
|
||||||
|
assert enclosure is not None
|
||||||
|
assert enclosure.attrib == {
|
||||||
|
"url": f"https://mirror.example/feeds/demo/audio/{local_audio_path(source_audio)}",
|
||||||
|
"length": "123",
|
||||||
|
"type": "audio/mpeg",
|
||||||
|
}
|
||||||
|
assert len(enclosure) == 0
|
||||||
|
|
||||||
|
media_content = root.find("./channel/item/media:content", namespaces=nsmap)
|
||||||
|
assert media_content is not None
|
||||||
|
assert media_content.attrib == {
|
||||||
|
"url": f"https://mirror.example/feeds/demo/video/{local_file_path(source_video)}",
|
||||||
|
"type": "video/mp4",
|
||||||
|
"medium": "video",
|
||||||
|
"expression": "full",
|
||||||
|
"duration": "60",
|
||||||
|
"width": "640",
|
||||||
|
"height": "360",
|
||||||
|
"lang": "en",
|
||||||
|
}
|
||||||
|
assert len(media_content) == 0
|
||||||
|
|
||||||
|
itunes_image = root.find("./channel/item/itunes:image", namespaces=nsmap)
|
||||||
|
assert itunes_image is not None
|
||||||
|
assert itunes_image.attrib == {
|
||||||
|
"href": f"https://mirror.example/feeds/demo/images/{local_image_path(item_image)}"
|
||||||
|
}
|
||||||
|
|
||||||
|
itunes_summary = root.findtext("./channel/item/itunes:summary", namespaces=nsmap)
|
||||||
|
assert itunes_summary is not None
|
||||||
|
assert len(itunes_summary) <= 4000
|
||||||
|
assert "<" not in itunes_summary
|
||||||
|
assert ">" not in itunes_summary
|
||||||
|
|
||||||
|
assert "contenteditable=" not in xml
|
||||||
|
assert "mode=" not in xml
|
||||||
|
assert "querystring=" not in xml
|
||||||
|
assert (
|
||||||
|
f"https://mirror.example/feeds/demo/images/{local_image_path(source_image)}"
|
||||||
|
in xml
|
||||||
|
)
|
||||||
37
tests/test_job_runner.py
Normal file
37
tests/test_job_runner.py
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from repub.config import FeedConfig
|
||||||
|
from repub.job_runner import _build_crawl_settings
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_crawl_settings_passes_feed_url_to_spider(tmp_path: Path) -> None:
|
||||||
|
settings = _build_crawl_settings(
|
||||||
|
out_dir=tmp_path / "out",
|
||||||
|
feed=FeedConfig(
|
||||||
|
name="Demo Feed",
|
||||||
|
slug="demo",
|
||||||
|
url="https://source.example/feed.rss",
|
||||||
|
),
|
||||||
|
stats_path=tmp_path / "stats.jsonl",
|
||||||
|
feed_url="https://mirror.example",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert settings["REPUBLISHER_FEED_URL"] == "https://mirror.example"
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_crawl_settings_requires_non_empty_feed_url(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
with pytest.raises(ValueError, match="feed_url setting is required"):
|
||||||
|
_build_crawl_settings(
|
||||||
|
out_dir=tmp_path / "out",
|
||||||
|
feed=FeedConfig(
|
||||||
|
name="Demo Feed",
|
||||||
|
slug="demo",
|
||||||
|
url="https://source.example/feed.rss",
|
||||||
|
),
|
||||||
|
stats_path=tmp_path / "stats.jsonl",
|
||||||
|
feed_url="",
|
||||||
|
)
|
||||||
|
|
@ -12,7 +12,9 @@ from repub.model import (
|
||||||
Source,
|
Source,
|
||||||
database,
|
database,
|
||||||
initialize_database,
|
initialize_database,
|
||||||
|
load_feed_url,
|
||||||
load_max_concurrent_jobs,
|
load_max_concurrent_jobs,
|
||||||
|
load_settings_form,
|
||||||
resolve_database_path,
|
resolve_database_path,
|
||||||
save_setting,
|
save_setting,
|
||||||
schema_paths,
|
schema_paths,
|
||||||
|
|
@ -250,3 +252,14 @@ def test_save_setting_persists_json_value(tmp_path: Path) -> None:
|
||||||
|
|
||||||
assert row.value == "4"
|
assert row.value == "4"
|
||||||
assert load_max_concurrent_jobs() == 4
|
assert load_max_concurrent_jobs() == 4
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_settings_form_includes_feed_url(tmp_path: Path) -> None:
|
||||||
|
initialize_database(tmp_path / "settings-form.db")
|
||||||
|
save_setting("feed_url", "https://mirror.example")
|
||||||
|
|
||||||
|
assert load_feed_url() == "https://mirror.example"
|
||||||
|
assert load_settings_form() == {
|
||||||
|
"max_concurrent_jobs": 1,
|
||||||
|
"feed_url": "https://mirror.example",
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -29,8 +29,13 @@ FIXTURE_FEED_PATH = (
|
||||||
).resolve()
|
).resolve()
|
||||||
|
|
||||||
|
|
||||||
|
def initialize_runtime_database(db_path: Path) -> None:
|
||||||
|
initialize_database(db_path)
|
||||||
|
save_setting("feed_url", "http://localhost:8080")
|
||||||
|
|
||||||
|
|
||||||
def test_job_runtime_syncs_enabled_jobs_into_apscheduler(tmp_path: Path) -> None:
|
def test_job_runtime_syncs_enabled_jobs_into_apscheduler(tmp_path: Path) -> None:
|
||||||
initialize_database(tmp_path / "scheduler.db")
|
initialize_runtime_database(tmp_path / "scheduler.db")
|
||||||
enabled_source = create_source(
|
enabled_source = create_source(
|
||||||
name="Enabled source",
|
name="Enabled source",
|
||||||
slug="enabled-source",
|
slug="enabled-source",
|
||||||
|
|
@ -85,7 +90,7 @@ def test_job_runtime_syncs_enabled_jobs_into_apscheduler(tmp_path: Path) -> None
|
||||||
def test_job_runtime_run_now_writes_log_and_stats_and_marks_success(
|
def test_job_runtime_run_now_writes_log_and_stats_and_marks_success(
|
||||||
tmp_path: Path,
|
tmp_path: Path,
|
||||||
) -> None:
|
) -> None:
|
||||||
initialize_database(tmp_path / "run-now.db")
|
initialize_runtime_database(tmp_path / "run-now.db")
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Manual source",
|
name="Manual source",
|
||||||
slug="manual-source",
|
slug="manual-source",
|
||||||
|
|
@ -141,7 +146,7 @@ def test_job_runtime_run_now_writes_log_and_stats_and_marks_success(
|
||||||
def test_job_runtime_respects_max_concurrent_jobs_setting(tmp_path: Path) -> None:
|
def test_job_runtime_respects_max_concurrent_jobs_setting(tmp_path: Path) -> None:
|
||||||
db_path = tmp_path / "max-concurrency.db"
|
db_path = tmp_path / "max-concurrency.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -216,7 +221,7 @@ def test_job_runtime_starts_queued_execution_after_capacity_opens(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "drain-queue.db"
|
db_path = tmp_path / "drain-queue.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -277,7 +282,7 @@ def test_job_runtime_starts_queued_execution_after_capacity_opens(
|
||||||
def test_job_runtime_deduplicates_manual_queue_requests(tmp_path: Path) -> None:
|
def test_job_runtime_deduplicates_manual_queue_requests(tmp_path: Path) -> None:
|
||||||
db_path = tmp_path / "queue-dedup.db"
|
db_path = tmp_path / "queue-dedup.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -344,7 +349,7 @@ def test_job_runtime_allows_one_running_and_one_pending_per_job(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "running-plus-pending.db"
|
db_path = tmp_path / "running-plus-pending.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -400,7 +405,7 @@ def test_job_runtime_start_drains_pending_rows_created_before_start(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "startup-drain.db"
|
db_path = tmp_path / "startup-drain.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Queued source",
|
name="Queued source",
|
||||||
slug="queued-source",
|
slug="queued-source",
|
||||||
|
|
@ -440,7 +445,7 @@ def test_job_runtime_scheduled_runs_use_the_persistent_queue(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "scheduled-queue.db"
|
db_path = tmp_path / "scheduled-queue.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -496,7 +501,7 @@ def test_job_runtime_cancel_pending_follow_up_keeps_running_worker_alive(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "cancel-pending.db"
|
db_path = tmp_path / "cancel-pending.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
save_setting("max_concurrent_jobs", 1)
|
save_setting("max_concurrent_jobs", 1)
|
||||||
|
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
|
|
@ -538,7 +543,7 @@ def test_job_runtime_cancel_pending_follow_up_keeps_running_worker_alive(
|
||||||
|
|
||||||
|
|
||||||
def test_job_runtime_cancel_marks_execution_canceled(tmp_path: Path) -> None:
|
def test_job_runtime_cancel_marks_execution_canceled(tmp_path: Path) -> None:
|
||||||
initialize_database(tmp_path / "cancel.db")
|
initialize_runtime_database(tmp_path / "cancel.db")
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Cancelable source",
|
name="Cancelable source",
|
||||||
|
|
@ -582,7 +587,7 @@ def test_job_runtime_cancel_marks_execution_canceled(tmp_path: Path) -> None:
|
||||||
|
|
||||||
|
|
||||||
def test_job_runtime_start_reconciles_stale_running_execution(tmp_path: Path) -> None:
|
def test_job_runtime_start_reconciles_stale_running_execution(tmp_path: Path) -> None:
|
||||||
initialize_database(tmp_path / "stale-running.db")
|
initialize_runtime_database(tmp_path / "stale-running.db")
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Stale source",
|
name="Stale source",
|
||||||
slug="stale-source",
|
slug="stale-source",
|
||||||
|
|
@ -629,7 +634,7 @@ def test_job_runtime_start_reconciles_stale_running_execution(tmp_path: Path) ->
|
||||||
|
|
||||||
|
|
||||||
def test_job_runtime_publishes_refresh_while_jobs_are_running(tmp_path: Path) -> None:
|
def test_job_runtime_publishes_refresh_while_jobs_are_running(tmp_path: Path) -> None:
|
||||||
initialize_database(tmp_path / "runtime-refresh.db")
|
initialize_runtime_database(tmp_path / "runtime-refresh.db")
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Running source",
|
name="Running source",
|
||||||
slug="running-source",
|
slug="running-source",
|
||||||
|
|
@ -667,7 +672,7 @@ def test_job_runtime_start_reattaches_live_worker_after_app_restart(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "live-worker.db"
|
db_path = tmp_path / "live-worker.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Live worker source",
|
name="Live worker source",
|
||||||
|
|
@ -743,7 +748,7 @@ def test_job_runtime_start_restores_live_worker_marked_failed_by_restart_bug(
|
||||||
) -> None:
|
) -> None:
|
||||||
db_path = tmp_path / "restore-live-worker.db"
|
db_path = tmp_path / "restore-live-worker.db"
|
||||||
log_dir = tmp_path / "out" / "logs"
|
log_dir = tmp_path / "out" / "logs"
|
||||||
initialize_database(db_path)
|
initialize_runtime_database(db_path)
|
||||||
with _slow_feed_server() as feed_url:
|
with _slow_feed_server() as feed_url:
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Recovered worker source",
|
name="Recovered worker source",
|
||||||
|
|
@ -915,6 +920,7 @@ def test_render_runs_uses_database_backed_jobs_and_executions(
|
||||||
|
|
||||||
app = create_app()
|
app = create_app()
|
||||||
app.config["REPUB_LOG_DIR"] = log_dir
|
app.config["REPUB_LOG_DIR"] = log_dir
|
||||||
|
save_setting("feed_url", "http://localhost:8080")
|
||||||
source = create_source(
|
source = create_source(
|
||||||
name="Runs page source",
|
name="Runs page source",
|
||||||
slug="runs-page-source",
|
slug="runs-page-source",
|
||||||
|
|
|
||||||
41
tests/test_tab_state.py
Normal file
41
tests/test_tab_state.py
Normal file
|
|
@ -0,0 +1,41 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from datetime import UTC, datetime, timedelta
|
||||||
|
|
||||||
|
from repub.datastar import TabStateStore
|
||||||
|
|
||||||
|
|
||||||
|
def test_tab_state_store_tracks_page_state_per_tab_and_connection_count() -> None:
|
||||||
|
store = TabStateStore()
|
||||||
|
|
||||||
|
store.connect("tab-1")
|
||||||
|
store.connect("tab-1")
|
||||||
|
store.update_page_state(
|
||||||
|
"tab-1",
|
||||||
|
"runs",
|
||||||
|
lambda state: {**state, "completed_page": 2},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert store.get_tab_state("tab-1") == {"runs": {"completed_page": 2}}
|
||||||
|
|
||||||
|
store.disconnect("tab-1")
|
||||||
|
|
||||||
|
assert store.get_tab_state("tab-1") == {"runs": {"completed_page": 2}}
|
||||||
|
|
||||||
|
store.disconnect("tab-1")
|
||||||
|
|
||||||
|
assert store.get_tab_state("tab-1") is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_tab_state_store_cleans_only_stale_disconnected_tabs() -> None:
|
||||||
|
store = TabStateStore(clean_age_threshold=timedelta(hours=24))
|
||||||
|
now = datetime(2026, 3, 31, 12, 0, tzinfo=UTC)
|
||||||
|
|
||||||
|
store.connect("stale-tab", now=now - timedelta(days=2))
|
||||||
|
store.connect("fresh-tab", now=now)
|
||||||
|
|
||||||
|
removed = store.cleanup_stale(now=now)
|
||||||
|
|
||||||
|
assert removed == {"stale-tab"}
|
||||||
|
assert store.get_tab_state("stale-tab") is None
|
||||||
|
assert store.get_tab_state("fresh-tab") == {}
|
||||||
67
tests/test_validate_feed_script.py
Normal file
67
tests/test_validate_feed_script.py
Normal file
|
|
@ -0,0 +1,67 @@
|
||||||
|
import os
|
||||||
|
import stat
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_feed_script_changes_to_feedvalidator_and_forwards_args(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
script_path = Path(__file__).resolve().parents[1] / "scripts" / "validate-feed"
|
||||||
|
feedvalidator_dir = tmp_path / "feedvalidator"
|
||||||
|
bin_dir = tmp_path / "bin"
|
||||||
|
caller_dir = tmp_path / "caller"
|
||||||
|
feed_path = caller_dir / "path" / "to" / "feed.rss"
|
||||||
|
trace_dir = tmp_path / "trace"
|
||||||
|
|
||||||
|
feedvalidator_dir.mkdir()
|
||||||
|
bin_dir.mkdir()
|
||||||
|
feed_path.parent.mkdir(parents=True)
|
||||||
|
feed_path.write_text("<rss/>", encoding="utf-8")
|
||||||
|
trace_dir.mkdir()
|
||||||
|
|
||||||
|
uv_path = bin_dir / "uv"
|
||||||
|
uv_path.write_text(
|
||||||
|
"""#!/bin/sh
|
||||||
|
printf '%s\\n' "$PWD" > "$TRACE_DIR/cwd"
|
||||||
|
printf '%s\\n' "$@" > "$TRACE_DIR/args"
|
||||||
|
printf 'validator output\\n'
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
uv_path.chmod(uv_path.stat().st_mode | stat.S_IXUSR)
|
||||||
|
|
||||||
|
pager_path = bin_dir / "pager"
|
||||||
|
pager_path.write_text(
|
||||||
|
"""#!/bin/sh
|
||||||
|
cat > "$TRACE_DIR/pager-input"
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
pager_path.chmod(pager_path.stat().st_mode | stat.S_IXUSR)
|
||||||
|
|
||||||
|
env = os.environ.copy()
|
||||||
|
env["FEEDVALIDATOR_DIR"] = str(feedvalidator_dir)
|
||||||
|
env["PAGER"] = str(pager_path)
|
||||||
|
env["PATH"] = f"{bin_dir}:{env['PATH']}"
|
||||||
|
env["TRACE_DIR"] = str(trace_dir)
|
||||||
|
|
||||||
|
subprocess.run(
|
||||||
|
[str(script_path), "path/to/feed.rss", "--flag"],
|
||||||
|
check=True,
|
||||||
|
env=env,
|
||||||
|
cwd=caller_dir,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert (trace_dir / "cwd").read_text(encoding="utf-8").strip() == str(
|
||||||
|
feedvalidator_dir
|
||||||
|
)
|
||||||
|
assert (trace_dir / "args").read_text(encoding="utf-8").splitlines() == [
|
||||||
|
"run",
|
||||||
|
"validate_feed.py",
|
||||||
|
str(feed_path.resolve()),
|
||||||
|
"--flag",
|
||||||
|
]
|
||||||
|
assert (trace_dir / "pager-input").read_text(
|
||||||
|
encoding="utf-8"
|
||||||
|
) == "validator output\n"
|
||||||
|
|
@ -1,12 +1,15 @@
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
import json
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
from datetime import UTC, datetime, timedelta
|
from datetime import UTC, datetime, timedelta
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any, cast
|
from typing import Any, cast
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
from repub.components import action_button, status_badge, toggle_field
|
from repub.components import action_button, status_badge, toggle_field
|
||||||
from repub.datastar import RefreshBroker, render_sse_event, render_stream
|
from repub.datastar import RefreshBroker, render_sse_event, render_stream
|
||||||
from repub.jobs import load_dashboard_view
|
from repub.jobs import load_dashboard_view
|
||||||
|
|
@ -19,6 +22,7 @@ from repub.model import (
|
||||||
SourcePangea,
|
SourcePangea,
|
||||||
create_source,
|
create_source,
|
||||||
load_max_concurrent_jobs,
|
load_max_concurrent_jobs,
|
||||||
|
load_settings_form,
|
||||||
save_setting,
|
save_setting,
|
||||||
)
|
)
|
||||||
from repub.pages.runs import runs_page
|
from repub.pages.runs import runs_page
|
||||||
|
|
@ -26,6 +30,7 @@ from repub.pages.sources import sources_page
|
||||||
from repub.web import (
|
from repub.web import (
|
||||||
create_app,
|
create_app,
|
||||||
get_refresh_broker,
|
get_refresh_broker,
|
||||||
|
get_tab_state_store,
|
||||||
render_create_source,
|
render_create_source,
|
||||||
render_dashboard,
|
render_dashboard,
|
||||||
render_edit_source,
|
render_edit_source,
|
||||||
|
|
@ -228,8 +233,8 @@ def test_runs_page_renders_clear_completed_button_and_pagination() -> None:
|
||||||
assert ">Clear history<" in body
|
assert ">Clear history<" in body
|
||||||
assert "Showing" in body
|
assert "Showing" in body
|
||||||
assert "21" in body
|
assert "21" in body
|
||||||
assert 'href="/runs?completed_page=1"' in body
|
assert "@post('/actions/runs/completed-page/1')" in body
|
||||||
assert 'href="/runs?completed_page=2"' in body
|
assert "@post('/actions/runs/completed-page/2')" in body
|
||||||
assert 'aria-current="page"' in body
|
assert 'aria-current="page"' in body
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -857,6 +862,7 @@ def test_render_settings_shows_current_max_concurrent_jobs(
|
||||||
monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
|
monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
|
||||||
create_app()
|
create_app()
|
||||||
save_setting("max_concurrent_jobs", 3)
|
save_setting("max_concurrent_jobs", 3)
|
||||||
|
save_setting("feed_url", "https://mirror.example")
|
||||||
|
|
||||||
async def run() -> None:
|
async def run() -> None:
|
||||||
app = create_app()
|
app = create_app()
|
||||||
|
|
@ -865,7 +871,11 @@ def test_render_settings_shows_current_max_concurrent_jobs(
|
||||||
assert ">Settings<" in body
|
assert ">Settings<" in body
|
||||||
assert "/actions/settings" in body
|
assert "/actions/settings" in body
|
||||||
assert 'value="3"' in body
|
assert 'value="3"' in body
|
||||||
|
assert 'value="https://mirror.example"' in body
|
||||||
assert "Max concurrent jobs" in body
|
assert "Max concurrent jobs" in body
|
||||||
|
assert "Feed URL" in body
|
||||||
|
assert "Example: http://localhost:8080" in body
|
||||||
|
assert "Must include http:// or https://" in body
|
||||||
assert 'type="submit"' in body
|
assert 'type="submit"' in body
|
||||||
assert "cursor-pointer" in body
|
assert "cursor-pointer" in body
|
||||||
|
|
||||||
|
|
@ -1204,13 +1214,17 @@ def test_settings_action_updates_max_concurrent_jobs(
|
||||||
response = await client.post(
|
response = await client.post(
|
||||||
"/actions/settings",
|
"/actions/settings",
|
||||||
headers={"Datastar-Request": "true"},
|
headers={"Datastar-Request": "true"},
|
||||||
json={"maxConcurrentJobs": "3"},
|
json={
|
||||||
|
"maxConcurrentJobs": "3",
|
||||||
|
"feedUrl": "https://mirror.example",
|
||||||
|
},
|
||||||
)
|
)
|
||||||
body = await response.get_data(as_text=True)
|
body = await response.get_data(as_text=True)
|
||||||
|
|
||||||
assert response.status_code == 200
|
assert response.status_code == 200
|
||||||
assert "window.location = '/settings'" in body
|
assert "window.location = '/settings'" in body
|
||||||
assert load_max_concurrent_jobs() == 3
|
assert load_max_concurrent_jobs() == 3
|
||||||
|
assert load_settings_form()["feed_url"] == "https://mirror.example"
|
||||||
assert 'value="3"' in str(await render_settings(app))
|
assert 'value="3"' in str(await render_settings(app))
|
||||||
|
|
||||||
asyncio.run(run())
|
asyncio.run(run())
|
||||||
|
|
@ -1229,7 +1243,7 @@ def test_settings_action_rejects_non_positive_max_concurrent_jobs(
|
||||||
response = await client.post(
|
response = await client.post(
|
||||||
"/actions/settings",
|
"/actions/settings",
|
||||||
headers={"Datastar-Request": "true"},
|
headers={"Datastar-Request": "true"},
|
||||||
json={"maxConcurrentJobs": "0"},
|
json={"maxConcurrentJobs": "0", "feedUrl": "https://mirror.example"},
|
||||||
)
|
)
|
||||||
body = await response.get_data(as_text=True)
|
body = await response.get_data(as_text=True)
|
||||||
|
|
||||||
|
|
@ -1240,6 +1254,28 @@ def test_settings_action_rejects_non_positive_max_concurrent_jobs(
|
||||||
asyncio.run(run())
|
asyncio.run(run())
|
||||||
|
|
||||||
|
|
||||||
|
def test_settings_action_rejects_invalid_feed_url(monkeypatch, tmp_path: Path) -> None:
|
||||||
|
db_path = tmp_path / "settings-invalid-url.db"
|
||||||
|
monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
|
||||||
|
|
||||||
|
async def run() -> None:
|
||||||
|
app = create_app()
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
response = await client.post(
|
||||||
|
"/actions/settings",
|
||||||
|
headers={"Datastar-Request": "true"},
|
||||||
|
json={"maxConcurrentJobs": "2", "feedUrl": "mirror.example"},
|
||||||
|
)
|
||||||
|
body = await response.get_data(as_text=True)
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "Feed URL must be a valid URL." in body
|
||||||
|
assert load_settings_form()["feed_url"] == ""
|
||||||
|
|
||||||
|
asyncio.run(run())
|
||||||
|
|
||||||
|
|
||||||
def test_render_runs_shows_running_scheduled_and_completed_tables(
|
def test_render_runs_shows_running_scheduled_and_completed_tables(
|
||||||
monkeypatch, tmp_path: Path
|
monkeypatch, tmp_path: Path
|
||||||
) -> None:
|
) -> None:
|
||||||
|
|
@ -1314,6 +1350,142 @@ def test_render_runs_shows_empty_state_rows(monkeypatch, tmp_path: Path) -> None
|
||||||
asyncio.run(run())
|
asyncio.run(run())
|
||||||
|
|
||||||
|
|
||||||
|
def test_runs_pagination_action_updates_only_the_current_tab(
|
||||||
|
monkeypatch, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
db_path = tmp_path / "runs-tab-pagination.db"
|
||||||
|
monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
|
||||||
|
|
||||||
|
async def run() -> None:
|
||||||
|
app = create_app()
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
source = create_source(
|
||||||
|
name="Paged runs source",
|
||||||
|
slug="paged-runs-source",
|
||||||
|
source_type="feed",
|
||||||
|
notes="",
|
||||||
|
spider_arguments="",
|
||||||
|
enabled=True,
|
||||||
|
cron_minute="*/30",
|
||||||
|
cron_hour="*",
|
||||||
|
cron_day_of_month="*",
|
||||||
|
cron_day_of_week="*",
|
||||||
|
cron_month="*",
|
||||||
|
feed_url="https://example.com/paged-runs.xml",
|
||||||
|
)
|
||||||
|
job = Job.get(Job.source == source)
|
||||||
|
|
||||||
|
for minute in range(21):
|
||||||
|
JobExecution.create(
|
||||||
|
job=job,
|
||||||
|
ended_at=datetime(2026, 3, 30, 12, minute, tzinfo=UTC),
|
||||||
|
running_status=JobExecutionStatus.SUCCEEDED,
|
||||||
|
)
|
||||||
|
|
||||||
|
async with client.request(
|
||||||
|
"/runs?u=shim",
|
||||||
|
method="POST",
|
||||||
|
headers={
|
||||||
|
"Datastar-Request": "true",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
) as first_connection:
|
||||||
|
async with client.request(
|
||||||
|
"/runs?u=shim",
|
||||||
|
method="POST",
|
||||||
|
headers={
|
||||||
|
"Datastar-Request": "true",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
) as second_connection:
|
||||||
|
await first_connection.send(json.dumps({"tabid": "tab-1"}).encode())
|
||||||
|
await second_connection.send(json.dumps({"tabid": "tab-2"}).encode())
|
||||||
|
await first_connection.send_complete()
|
||||||
|
await second_connection.send_complete()
|
||||||
|
|
||||||
|
first_body = (
|
||||||
|
await asyncio.wait_for(first_connection.receive(), timeout=1)
|
||||||
|
).decode()
|
||||||
|
second_body = (
|
||||||
|
await asyncio.wait_for(second_connection.receive(), timeout=1)
|
||||||
|
).decode()
|
||||||
|
|
||||||
|
assert (
|
||||||
|
'href="/runs?completed_page=1" aria-current="page"'
|
||||||
|
not in first_body
|
||||||
|
)
|
||||||
|
assert (
|
||||||
|
'Showing <span class="font-medium text-slate-950">1</span> to '
|
||||||
|
'<span class="font-medium text-slate-950">20</span> of '
|
||||||
|
'<span class="font-medium text-slate-950">21</span> results'
|
||||||
|
) in first_body
|
||||||
|
assert (
|
||||||
|
'Showing <span class="font-medium text-slate-950">1</span> to '
|
||||||
|
'<span class="font-medium text-slate-950">20</span> of '
|
||||||
|
'<span class="font-medium text-slate-950">21</span> results'
|
||||||
|
) in second_body
|
||||||
|
|
||||||
|
response = await client.post(
|
||||||
|
"/actions/runs/completed-page/2",
|
||||||
|
headers={"Datastar-Request": "true"},
|
||||||
|
json={"tabid": "tab-1"},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert response.status_code == 204
|
||||||
|
|
||||||
|
updated_first_body = (
|
||||||
|
await asyncio.wait_for(first_connection.receive(), timeout=1)
|
||||||
|
).decode()
|
||||||
|
|
||||||
|
assert (
|
||||||
|
'Showing <span class="font-medium text-slate-950">21</span> to '
|
||||||
|
'<span class="font-medium text-slate-950">21</span> of '
|
||||||
|
'<span class="font-medium text-slate-950">21</span> results'
|
||||||
|
) in updated_first_body
|
||||||
|
assert 'aria-current="page"' in updated_first_body
|
||||||
|
|
||||||
|
with pytest.raises(asyncio.TimeoutError):
|
||||||
|
await asyncio.wait_for(second_connection.receive(), timeout=0.2)
|
||||||
|
|
||||||
|
await second_connection.disconnect()
|
||||||
|
await first_connection.disconnect()
|
||||||
|
|
||||||
|
asyncio.run(run())
|
||||||
|
|
||||||
|
|
||||||
|
def test_runs_patch_creates_and_cleans_up_tab_state(
|
||||||
|
monkeypatch, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
db_path = tmp_path / "runs-tab-state.db"
|
||||||
|
monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
|
||||||
|
|
||||||
|
async def run() -> None:
|
||||||
|
app = create_app()
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
async with client.request(
|
||||||
|
"/runs?u=shim",
|
||||||
|
method="POST",
|
||||||
|
headers={
|
||||||
|
"Datastar-Request": "true",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
) as connection:
|
||||||
|
await connection.send(json.dumps({"tabid": "tab-1"}).encode())
|
||||||
|
await connection.send_complete()
|
||||||
|
await asyncio.wait_for(connection.receive(), timeout=1)
|
||||||
|
|
||||||
|
assert get_tab_state_store(app).get_tab_state("tab-1") == {}
|
||||||
|
|
||||||
|
await connection.disconnect()
|
||||||
|
await asyncio.sleep(0)
|
||||||
|
|
||||||
|
assert get_tab_state_store(app).get_tab_state("tab-1") is None
|
||||||
|
|
||||||
|
asyncio.run(run())
|
||||||
|
|
||||||
|
|
||||||
def test_render_runs_keeps_queued_execution_in_scheduled_jobs_table(
|
def test_render_runs_keeps_queued_execution_in_scheduled_jobs_table(
|
||||||
monkeypatch, tmp_path: Path
|
monkeypatch, tmp_path: Path
|
||||||
) -> None:
|
) -> None:
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue