add datastar SSE rendering

This commit is contained in:
Abel Luck 2026-03-30 12:34:38 +02:00
parent 2accb26546
commit 33dbb143fd
5 changed files with 329 additions and 19 deletions

View file

@ -1,5 +1,7 @@
# republisher-redux
See @README.md
## Overview
- `republisher-redux` is a Scrapy-based tool that mirrors RSS and Atom feeds for offline use.
@ -8,6 +10,71 @@
- Nix development and packaging use `flake.nix`.
- Formatting is managed through `treefmt-nix`, exposed via `nix fmt`.
- Prefer immutable style functional programming style
- functions that operate on data over classes that encapsulate state
- No backwards-compatibility guarantees; prefer breaking changes over backwards compat and complexity.
- Think carefully and implement the most concise solution that changes as little code as possible.
## HTML/Datastar Rules
Very important rules for datastar usage.
The views are pure functions data in -> html out.
- we only use full page morph mode. no diffing
Why large/fat/main morphs (aka immediate mode)?
By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.
- we only have a single render function per page
By having a single render function per page you can simplify the reasoning about your app to view = f(state). You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.
- any database change -> re render all connected users with 200ms throttle
When your events are not homogeneous, you can't miss events, so you cannot throttle your events without losing data.
But, wait! Won't that mean every change will cause all users to re-render? Yes, but at a maximum rate determined by the throttle. This, might sound scary at first but in practice:
The more shared views the users have the more likely most of the connected users will have to re-render when a change happen.
The more events that are happening the more likely most users will have to re-render.
This means you actually end up doing more work with a non homogeneous event system under heavy load than with this simple homogeneous event system that's throttled (especially it there's any sort of common/shared view between users).
- Signals are only for ephemeral client side state
Signals should only be used for ephemeral client side state. Things like: the current value of a text input, whether a popover is visible, current csrf token, input validation errors. Signals can be controlled on the client via expressions, or from the backend via patch-signals.
- Signals in elements should be declared __ifmissing
Because signals are only being used to represent ephemeral client state that means they can only be initialised by elements and they can only be changed via expressions on the client or from the server via patch-signals in an action. Signals in elements should be declared __ifmissing unless they are "view only".
- View only signals, are signals that can only be changed by the server. These should not be declared __ifmissing instead they should be made "local" by starting their key with an _ this prevents the client from sending them up to the server.
- Actions should not update the view themselves directly
Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.
- Stateless views
The only way for actions to affect the view returned by the render-fn running in a connection is via the database. The ensures CQRS. This means there is no connection state that needs to be persisted or maintained (so missed events and shutdowns/deploys will not lead to lost state). Even when you are running in a single process there is no way for an action (command) to communicate with/affect a view render (query) without going through the database.
- CQRS
Actions modify the database and return a 204 or a 200 if they patch-signals.
Render functions re-render when the database changes and send an update down the updates SSE connection.
- Work sharing (caching)
Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.
- Use data-on:pointerdown/mousedown over data-on:click
This is a small one but can make even the slowest of networks feel much snappier.
- No CORS By hosting all assets on the same origin we avoid the need for CORS. This avoids additional server round trips and helps reduce latency.
- Rendering an initial shim -Rather than returning the whole page on initial render and having two render paths, one for initial render and one for subsequent rendering a shell is rendered and then populated when the page connects to the updates endpoint for that page. This has a few advantages:
The page will only render dynamic content if the user has javascript and first party cookies enabled.
The initial shell page can generated and compressed once.
The server only does more work for actual users and less work for link preview crawlers and other bots (that don't support javascript or cookies).
## Workflow
- Use Python 3.13.
@ -44,3 +111,4 @@ uv run repub crawl -c repub.toml
- The console entrypoint is `repub`.
- Runtime ffmpeg availability is provided by the flake package and devshell.
- Tests live under `tests/`.
- `prompts/` is git ignored intentionally

85
repub/datastar.py Normal file
View file

@ -0,0 +1,85 @@
from __future__ import annotations
import asyncio
import hashlib
from collections.abc import AsyncGenerator, Awaitable, Callable
from typing import Protocol
from datastar_py import ServerSentEventGenerator as SSE
from datastar_py.sse import DatastarEvent
class HtmlRenderable(Protocol):
def __html__(self) -> str: ...
RenderResult = str | HtmlRenderable
RenderFunction = Callable[[], Awaitable[RenderResult]]
class RefreshBroker:
def __init__(self) -> None:
self._subscribers: set[asyncio.Queue[object]] = set()
def subscribe(self) -> asyncio.Queue[object]:
queue: asyncio.Queue[object] = asyncio.Queue(maxsize=1)
self._subscribers.add(queue)
return queue
def unsubscribe(self, queue: asyncio.Queue[object]) -> None:
self._subscribers.discard(queue)
def publish(self, event: object = "refresh-event") -> None:
for queue in tuple(self._subscribers):
if queue.full():
try:
queue.get_nowait()
except asyncio.QueueEmpty:
pass
try:
queue.put_nowait(event)
except asyncio.QueueFull:
continue
async def render_sse_event(
render: RenderFunction, *, last_event_id: str | None = None
) -> tuple[str | None, DatastarEvent | None]:
html = _coerce_html(await render())
event_id = _render_hash(html)
if event_id == last_event_id:
return last_event_id, None
return event_id, SSE.patch_elements(html, event_id=event_id)
async def render_stream(
queue: asyncio.Queue[object],
render: RenderFunction,
*,
last_event_id: str | None = None,
render_on_connect: bool = True,
) -> AsyncGenerator[DatastarEvent, None]:
if render_on_connect:
last_event_id, event = await render_sse_event(
render, last_event_id=last_event_id
)
if event is not None:
yield event
while True:
await queue.get()
last_event_id, event = await render_sse_event(
render, last_event_id=last_event_id
)
if event is not None:
yield event
def _coerce_html(view: RenderResult) -> str:
if isinstance(view, str):
return view
return view.__html__()
def _render_hash(html: str) -> str:
return hashlib.blake2s(html.encode("utf-8"), digest_size=16).hexdigest()

View file

@ -91,7 +91,7 @@ def page_header() -> Renderable:
]
def overview_section() -> Renderable:
def overview_section(*, active_jobs: str) -> Renderable:
return h.section[
h.div(class_="mb-4 flex items-end justify-between")[
h.div[
@ -105,7 +105,11 @@ def overview_section() -> Renderable:
h.p(class_="text-sm text-slate-500")["Updated from static fixture data"],
],
h.dl(class_="grid gap-4 md:grid-cols-2 xl:grid-cols-4")[
stat_card(label="Active jobs", value="12", detail="9 scheduled, 3 paused"),
stat_card(
label="Active jobs",
value=active_jobs,
detail="Temporary live demo counter for Datastar refresh testing",
),
stat_card(label="Running now", value="2", detail="RSS and Pangea workers"),
stat_card(
label="Completed today", value="34", detail="31 succeeded, 3 failed"
@ -450,7 +454,7 @@ def settings_panel() -> Renderable:
]
def admin_component() -> Renderable:
def admin_component(*, active_jobs: str = "12") -> Renderable:
running_rows = (
(
h.div(class_="font-semibold text-slate-950")["Pangea mobile articles"],
@ -566,7 +570,7 @@ def admin_component() -> Renderable:
h.div(class_="px-4 py-5 sm:px-6 lg:px-8 lg:py-8")[
h.div(class_="mx-auto max-w-7xl space-y-6")[
page_header(),
overview_section(),
overview_section(active_jobs=active_jobs),
h.div(
class_="grid gap-6 xl:grid-cols-[minmax(0,1.35fr)_minmax(22rem,0.95fr)]"
)[

View file

@ -1,12 +1,24 @@
from __future__ import annotations
import asyncio
import hashlib
from collections.abc import AsyncGenerator
from contextlib import suppress
from typing import cast
import htpy as h
from datastar_py.quart import DatastarResponse
from datastar_py.sse import DatastarEvent
from htpy import Renderable
from quart import Quart, Response, request, url_for
from repub.datastar import RefreshBroker, render_stream
from repub.pages import admin_component, shim_page
REFRESH_BROKER_KEY = "repub.refresh_broker"
ACTIVE_JOBS_KEY = "repub.demo_active_jobs"
REFRESH_TASK_KEY = "repub.demo_refresh_task"
def _render_shim_page(*, stylesheet_href: str, datastar_src: str) -> tuple[str, str]:
head = (
@ -18,8 +30,27 @@ def _render_shim_page(*, stylesheet_href: str, datastar_src: str) -> tuple[str,
return body, etag
def create_app() -> Quart:
def create_app(*, enable_demo_refresh: bool = True) -> Quart:
app = Quart(__name__)
app.extensions[REFRESH_BROKER_KEY] = RefreshBroker()
app.extensions[ACTIVE_JOBS_KEY] = 12
if enable_demo_refresh:
@app.before_serving
async def start_demo_refresh() -> None:
app.extensions[REFRESH_TASK_KEY] = asyncio.create_task(
_demo_refresh_loop(app)
)
@app.after_serving
async def stop_demo_refresh() -> None:
task = cast(asyncio.Task[None] | None, app.extensions.get(REFRESH_TASK_KEY))
if task is None:
return
task.cancel()
with suppress(asyncio.CancelledError):
await task
@app.get("/")
async def index() -> Response:
@ -37,7 +68,50 @@ def create_app() -> Quart:
return response
@app.post("/")
async def index_patch() -> Response:
return Response(str(admin_component()), mimetype="text/html")
async def index_patch() -> DatastarResponse:
queue = get_refresh_broker(app).subscribe()
stream = render_stream(
queue,
render=lambda: render_dashboard(app),
last_event_id=request.headers.get("last-event-id"),
)
return DatastarResponse(_unsubscribe_on_close(queue, stream, app))
return app
def get_refresh_broker(app: Quart) -> RefreshBroker:
return cast(RefreshBroker, app.extensions[REFRESH_BROKER_KEY])
def trigger_refresh(app: Quart, event: object = "refresh-event") -> None:
get_refresh_broker(app).publish(event)
async def render_dashboard(app: Quart) -> Renderable:
return admin_component(active_jobs=str(get_active_jobs(app)))
async def _unsubscribe_on_close(
queue: object, stream: AsyncGenerator[DatastarEvent, None], app: Quart
) -> AsyncGenerator[DatastarEvent, None]:
try:
async for event in stream:
yield event
finally:
get_refresh_broker(app).unsubscribe(cast(asyncio.Queue[object], queue))
def get_active_jobs(app: Quart) -> int:
return cast(int, app.extensions[ACTIVE_JOBS_KEY])
def set_active_jobs(app: Quart, value: int) -> None:
app.extensions[ACTIVE_JOBS_KEY] = value
async def _demo_refresh_loop(app: Quart) -> None:
while True:
await asyncio.sleep(1)
set_active_jobs(app, get_active_jobs(app) + 1)
trigger_refresh(app)

View file

@ -1,13 +1,21 @@
from __future__ import annotations
import asyncio
from typing import Any, cast
from repub.web import create_app
from repub.datastar import RefreshBroker, render_sse_event, render_stream
from repub.web import (
create_app,
get_active_jobs,
get_refresh_broker,
render_dashboard,
set_active_jobs,
)
def test_root_get_serves_datastar_shim() -> None:
async def run() -> None:
client = create_app().test_client()
client = create_app(enable_demo_refresh=False).test_client()
response = await client.get("/")
body = await response.get_data(as_text=True)
@ -30,7 +38,7 @@ def test_root_get_serves_datastar_shim() -> None:
def test_root_get_honors_if_none_match() -> None:
async def run() -> None:
client = create_app().test_client()
client = create_app(enable_demo_refresh=False).test_client()
initial = await client.get("/")
etag = initial.headers["ETag"]
@ -45,15 +53,86 @@ def test_root_get_honors_if_none_match() -> None:
def test_root_post_serves_morph_component() -> None:
async def run() -> None:
client = create_app().test_client()
client = create_app(enable_demo_refresh=False).test_client()
async with client.request("/?u=shim", method="POST") as connection:
await connection.send_complete()
chunk = await asyncio.wait_for(connection.receive(), timeout=1)
raw_connection = cast(Any, connection)
response = await client.post("/?u=shim")
body = await response.get_data(as_text=True)
assert response.status_code == 200
assert response.content_type == "text/html; charset=utf-8"
assert body.startswith('<main id="morph"')
assert "Admin UI" in body
assert "All on one page for the v1 spike" not in body
assert raw_connection.status_code == 200
assert raw_connection.headers["Content-Type"] == "text/event-stream"
assert b"event: datastar-patch-elements" in chunk
assert b"id: " in chunk
assert b'<main id="morph"' in chunk
await connection.disconnect()
asyncio.run(run())
def test_render_sse_event_skips_unchanged_view() -> None:
async def run() -> None:
async def render() -> str:
return '<main id="morph">same</main>'
event_id, event = await render_sse_event(render)
repeated_id, repeated_event = await render_sse_event(
render, last_event_id=event_id
)
assert repeated_id == event_id
assert event is not None
assert repeated_event is None
asyncio.run(run())
def test_app_refresh_broker_publishes_events() -> None:
async def run() -> None:
app = create_app(enable_demo_refresh=False)
broker = get_refresh_broker(app)
queue = broker.subscribe()
broker.publish()
event = await asyncio.wait_for(queue.get(), timeout=1)
assert event == "refresh-event"
broker.unsubscribe(queue)
asyncio.run(run())
def test_render_stream_yields_on_connect_and_refresh() -> None:
async def run() -> None:
queue = RefreshBroker().subscribe()
renders = 0
async def render() -> str:
nonlocal renders
renders += 1
return f'<main id="morph">{renders}</main>'
stream = render_stream(queue, render)
first = await anext(stream)
await queue.put("refresh-event")
second = await anext(stream)
await stream.aclose()
assert "1</main>" in first
assert "2</main>" in second
asyncio.run(run())
def test_render_dashboard_uses_active_jobs_from_app_state() -> None:
async def run() -> None:
app = create_app(enable_demo_refresh=False)
assert get_active_jobs(app) == 12
set_active_jobs(app, 27)
async with app.app_context():
body = str(await render_dashboard(app))
assert "27" in body
assert "Temporary live demo counter for Datastar refresh testing" in body
asyncio.run(run())