feat: initial import

This commit is contained in:
Iain Learmonth 2026-03-08 12:51:47 +00:00
commit 0f9c0d93d9
22 changed files with 3563 additions and 0 deletions

221
.gitignore vendored Normal file
View file

@ -0,0 +1,221 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
# Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
# poetry.lock
# poetry.toml
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
# pdm.lock
# pdm.toml
.pdm-python
.pdm-build/
# pixi
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
# pixi.lock
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
# in the .venv directory. It is recommended not to include this directory in version control.
.pixi
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# Redis
*.rdb
*.aof
*.pid
# RabbitMQ
mnesia/
rabbitmq/
rabbitmq-data/
# ActiveMQ
activemq-data/
# SageMath parsed files
*.sage.py
# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/
# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/
# Ruff stuff:
.ruff_cache/
# PyPI configuration file
.pypirc
# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/
# Streamlit
.streamlit/secrets.toml
/assets
/config.yaml
/google-service.json

22
LICENCE Normal file
View file

@ -0,0 +1,22 @@
Copyright 2021-2026 SR2 Communications Limited.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list
of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

22
README.md Normal file
View file

@ -0,0 +1,22 @@
snapdirect
==========
[![Translation status](https://hosted.weblate.org/widget/sr2/snapdirect/svg-badge.svg)](https://hosted.weblate.org/engage/sr2/)
[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
Combined mirrors link generation and snapshots service.
Translations
------------
Snapshot templates support localisation.
Translations of strings in the template are managed on Weblate.
<a href="https://hosted.weblate.org/engage/sr2/">
<img src="https://hosted.weblate.org/widget/sr2/snapdirect/multi-auto.svg" alt="Translation status" />
</a>
Licence & Copyright
-------------------
&copy; SR2 Communications Limited. See [LICENCE](./LICENCE) for details of the BSD 2 clause licence.

4
babel.cfg Normal file
View file

@ -0,0 +1,4 @@
[python: **.py]
[jinja2: **.j2]
encoding = utf-8

View file

@ -0,0 +1,54 @@
# English translations for PROJECT.
# Copyright (C) 2026 ORGANIZATION
# This file is distributed under the same license as the PROJECT project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2026.
#
msgid ""
msgstr ""
"Project-Id-Version: PROJECT VERSION\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2026-03-07 17:05+0000\n"
"PO-Revision-Date: 2026-03-07 17:15+0000\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: en\n"
"Language-Team: en <LL@li.org>\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: src/snapshots/templates/article-template.html.j2:66
msgid "How do I know that I can trust this page?"
msgstr "How do I know that I can trust this page?"
#: src/snapshots/templates/article-template.html.j2:73
msgid ""
"This story is a copy of an article from <a href=\"%(site_url)\" class"
"=\"snap-trust-header__sitelink\">%(site_title)</a>. It is delivered to "
"you from a trusted archive to assure its availability over time."
msgstr ""
"This story is a copy of an article from <a href=\"%(site_url)\" class"
"=\"snap-trust-header__sitelink\">%(site_title)</a>. It is delivered to "
"you from a trusted archive to assure its availability over time."
#: src/snapshots/templates/article-template.html.j2:76
msgid "View the article source"
msgstr "View the article source"
#: src/snapshots/templates/article-template.html.j2:143
msgid "You are leaving this page"
msgstr "You are leaving this page"
#: src/snapshots/templates/article-template.html.j2:163
msgid ""
"This link will redirect you to an external website. If its not available"
" in your region, you may not be able to access it."
msgstr ""
"This link will redirect you to an external website. If its not available"
" in your region, you may not be able to access it."
#: src/snapshots/templates/article-template.html.j2:165
msgid "Continue"
msgstr "Continue"

41
justfile Normal file
View file

@ -0,0 +1,41 @@
default:
just --list
run *args:
pybabel compile -d i18n
poetry run uvicorn src.main:app --reload {{args}}
mm *args:
poetry run alembic revision --autogenerate -m "{{args}}"
migrate:
poetry run alembic upgrade head
downgrade *args:
poetry run alembic downgrade {{args}}
black *args:
poetry run black {{args}} src
ruff *args:
poetry run ruff check {{args}} src
lint:
poetry run ruff format src
just ruff --fix
test:
PYTHONPATH=. pytest tests
# docker
up:
docker-compose up -d
kill *args:
docker-compose kill {{args}}
build:
docker-compose build
ps:
docker-compose ps

2190
poetry.lock generated Normal file

File diff suppressed because it is too large Load diff

47
pyproject.toml Normal file
View file

@ -0,0 +1,47 @@
[tool.poetry]
name = "snapdirect"
version = "0.0.0"
description = ""
authors = ["irl"]
readme = "README.md"
license = "BSD-2"
package-mode = false
[tool.poetry.dependencies]
python = "^3.12"
babel = "^2.17"
beautifulsoup4 = "^4.13"
fastapi = "^0.115.12"
google-cloud-storage = "^3.9.0"
jinja2 = "^3.1"
lxml = "^6.0"
requests = "^2.32"
pydantic = "^2.11"
pydantic-settings = "^2.10"
pyyaml = "^6.0"
tldextract = "^5"
uvicorn = {extras = ["standard"], version = "^0.30.6"}
[tool.poetry.group.dev.dependencies]
black = "^25.1.0"
ruff = "^0.12"
pytest = "^8.4"
[tool.poetry.group.prod.dependencies]
gunicorn = "^22.0.0"
python-json-logger = "^2.0.7"
prometheus-client = "^0.20.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[tool.black]
line-length = 92
[tool.pytest.ini_options]
asyncio_default_fixture_loop_scope = "module"
[tool.ruff]
target-version = "py312"
line-length = 92

1
src/API.md Normal file
View file

@ -0,0 +1 @@
Link Generation and Snapshots

0
src/__init__.py Normal file
View file

61
src/config.py Normal file
View file

@ -0,0 +1,61 @@
from os.path import abspath, dirname, join
from typing import Any
from pydantic_settings import (
BaseSettings,
SettingsConfigDict,
YamlConfigSettingsSource,
PydanticBaseSettingsSource,
)
from src.constants import Environment
API_README_PATH = abspath(join(dirname(__file__), "API.md"))
with open(API_README_PATH, "r", encoding="utf-8") as f:
API_README_MD = f.read()
class CustomBaseSettings(BaseSettings):
model_config = SettingsConfigDict(
yaml_file="config.yaml", yaml_file_encoding="utf-8", extra="ignore"
)
@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
return (YamlConfigSettingsSource(settings_cls),)
class Config(CustomBaseSettings):
# DATABASE_URL: PostgresDsn
# DATABASE_ASYNC_URL: PostgresDsn
# DATABASE_POOL_SIZE: int = 16
# DATABASE_POOL_TTL: int = 60 * 20 # 20 minutes
# DATABASE_POOL_PRE_PING: bool = True
ENVIRONMENT: Environment = Environment.PRODUCTION
CORS_ORIGINS: list[str] = ["*"]
CORS_ORIGINS_REGEX: str | None = None
CORS_HEADERS: list[str] = ["*"]
APP_VERSION: str = "0.0.0"
settings = Config()
app_configs: dict[str, Any] = {
"title": "snapdirect",
"version": settings.APP_VERSION,
"description": API_README_MD,
}
if not settings.ENVIRONMENT.is_debug:
app_configs["openapi_url"] = None # hide docs

24
src/constants.py Normal file
View file

@ -0,0 +1,24 @@
from enum import Enum
class Environment(str, Enum):
LOCAL = "LOCAL"
TESTING = "TESTING"
STAGING = "STAGING"
PRODUCTION = "PRODUCTION"
@property
def is_debug(self):
return self in (self.LOCAL, self.STAGING, self.TESTING)
@property
def is_local(self):
return self is Environment.LOCAL
@property
def is_testing(self):
return self == self.TESTING
@property
def is_deployed(self) -> bool:
return self in (self.STAGING, self.PRODUCTION)

10
src/google/client.py Normal file
View file

@ -0,0 +1,10 @@
from google.cloud import storage
from src.google.config import settings
def upload_blob(file_name: str, content: bytes, content_type: str) -> None:
storage_client = storage.Client()
bucket = storage_client.bucket(settings.BUCKET_NAME)
blob = bucket.blob(file_name)
blob.upload_from_string(content, content_type=content_type)

8
src/google/config.py Normal file
View file

@ -0,0 +1,8 @@
from src.config import CustomBaseSettings
class GoogleConfig(CustomBaseSettings):
BUCKET_NAME: str
settings = GoogleConfig()

24
src/main.py Normal file
View file

@ -0,0 +1,24 @@
from contextlib import asynccontextmanager
from typing import AsyncGenerator
from fastapi import FastAPI
from src.config import app_configs
from src.snapshots.router import router as snapshots_router
@asynccontextmanager
async def lifespan(_application: FastAPI) -> AsyncGenerator:
# Startup
yield
# Shutdown
app = FastAPI(**app_configs, lifespan=lifespan)
app.include_router(snapshots_router)
@app.get("/healthcheck", include_in_schema=False)
async def healthcheck() -> dict[str, str]:
return {"status": "ok"}

249
src/snapshots/client.py Normal file
View file

@ -0,0 +1,249 @@
import base64
import copy
import datetime
import logging
import mimetypes
from typing import Any
from urllib.parse import urlparse, urlunparse, urljoin
import requests
from babel.dates import format_date
from babel.support import Translations
from bs4 import BeautifulSoup
from jinja2 import Environment, PackageLoader, select_autoescape
from src.snapshots.config import SnapshotsConfig, config_for_url
from src.snapshots.schemas import SnapshotContext
class SnapshotParseError(RuntimeError):
pass
ALLOWED_ASSET_TYPES = {
"image/jpeg",
"image/webp",
"image/png",
"image/gif",
"image/svg+xml",
"image/x-icon",
}
def encode_data_uri(content: bytes, content_type: str) -> str | None:
content_type = content_type.split(";")[0].strip().lower()
if content_type not in ALLOWED_ASSET_TYPES:
return None
encoded = base64.b64encode(content).decode("utf-8")
return f"data:{content_type};base64,{encoded}"
def fetch_file(filename: str) -> str | None:
content_type = mimetypes.guess_type(filename)[0]
if content_type not in ALLOWED_ASSET_TYPES:
return None
try:
with open(filename, "rb") as f:
return encode_data_uri(f.read(), content_type)
except IOError:
return None
def fetch_url(base: str, url: str) -> str | None:
if url.startswith("data:"):
return url
url = urljoin(base, url)
try:
response = requests.get(url, stream=True, timeout=0.5)
response.raise_for_status()
content_length = response.headers.get("Content-Length")
if content_length is not None:
try:
if int(content_length) > 500_000:
return None
except ValueError:
pass # Invalid Content-Length format, proceed to stream
content = b""
for chunk in response.iter_content(chunk_size=1024):
content += chunk
if len(content) > 500_000:
return None
content_type = response.headers.get("Content-Type", "")
return encode_data_uri(content, content_type)
except requests.exceptions.RequestException:
return None
class Snapshot:
config: SnapshotsConfig | None = None
context: SnapshotContext | None = None
raw: bytes | None = None
soup: BeautifulSoup | None = None
def __init__(self, url: str) -> None:
self.url = url
self.config = config_for_url(url)
def get_content(self) -> None:
self.raw = requests.get(self.url, timeout=1).content
def _get_attribute_value(self, selector: str, attribute: str) -> str | None:
element = self.soup.select_one(selector)
if not element:
return None
try:
return element[attribute]
except KeyError:
return None
def get_attribute_value(
self, selector: str | list[str] | None, attribute: str, optional: bool = False
) -> str | None:
if not selector:
if optional:
return None
raise SnapshotParseError("No selector specified for non-optional attribute")
if isinstance(selector, str):
selector = [selector]
for s in selector:
if result := self._get_attribute_value(s, attribute):
return result
if optional:
return None
raise SnapshotParseError("No element matched for non-optional attribute")
def get_element_content(
self, selector: str | None, optional: bool = False
) -> str | None:
if not selector:
if optional:
return None
raise SnapshotParseError("No selector specified for non-optional element")
element = self.soup.select_one(selector)
if not element:
if not optional:
raise SnapshotParseError(f"Missing element for selector: {selector}")
return None
return element.text
def _get_opengraph_value(self, prop: str) -> str | None:
element = self.soup.select_one(f'meta[name="{prop}"]')
if not element:
return None
try:
return element["content"]
except KeyError:
return None
def get_opengraph_value(
self, prop: str | list[str] | None, optional: bool = False
) -> str | None:
if not prop:
if optional:
return None
raise SnapshotParseError("No property specified for non-optional property")
if isinstance(prop, str):
prop = [prop]
for p in prop:
if result := self._get_opengraph_value(p):
return result
if optional:
return None
raise SnapshotParseError("No property matched for non-optional property")
def get_body(self):
body = copy.copy(self.soup.select_one(self.config.article_body_selector))
if self.config.article_body_remove_selector:
for element in body.select(", ".join(self.config.article_body_remove_selector)):
element.decompose()
for image in body.select("img"):
image.attrs = {
"src": fetch_url(self.url, image["src"]),
"alt": image["alt"],
}
return str(body)
def preprocess(self) -> None:
compound = ", ".join(
self.config.pre_remove_selectors + ["form", "script", "style", "iframe"]
)
for element in self.soup.select(compound):
element.decompose()
for element in self.soup.select("[style]"):
element.attrs.pop("style")
def favicon(self):
icon = fetch_url(
self.url, self.get_attribute_value('link[rel="icon"]', "href", optional=True)
)
if icon:
return icon
parsed = urlparse(self.url)
icon_url = urlunparse((parsed.scheme, parsed.netloc, "/favicon.ico", "", "", ""))
return fetch_url(self.url, icon_url)
def published_time(self, locale: str = "en") -> str:
if self.config.article_published_selector:
if published := self.get_element_content(
self.config.article_published_selector, optional=True
):
return published
ts = datetime.datetime.fromisoformat(
self.get_opengraph_value("article:published_time")
)
return format_date(ts, locale=locale)
def parse(self) -> None:
self.soup = BeautifulSoup(self.raw, "lxml")
self.preprocess()
article_image_source = self.get_attribute_value(
self.config.article_image_selector, "src"
)
page_language = self.get_attribute_value(["html", "body"], "lang", optional=True)
self.context = SnapshotContext(
article_author=self.get_element_content(
self.config.article_author_selector, optional=True
),
article_body=self.get_body(),
article_description=self.get_attribute_value(
'meta[name="description"]', "content", optional=True
),
article_image=fetch_url(self.url, article_image_source),
article_image_caption=self.get_element_content(
self.config.article_image_caption_selector, optional=True
),
article_image_source=article_image_source,
article_published=self.published_time(page_language),
article_title=self.get_element_content(self.config.article_title_selector),
article_url=self.url,
page_direction=self.get_attribute_value(["html", "body"], "dir", optional=True),
page_language=page_language,
site_favicon=self.favicon(),
site_logo=fetch_file(self.config.site_logo),
site_title=self.config.site_title,
)
def get_context(self) -> dict[str, Any]:
logging.info("Get content")
self.get_content()
logging.info("Parse")
self.parse()
logging.info("Dump")
return self.context.model_dump()
def render(self) -> str:
context = self.get_context()
jinja_env = Environment(
loader=PackageLoader(
package_name="src.snapshots",
package_path="templates",
),
extensions=["jinja2.ext.i18n"],
autoescape=select_autoescape(),
trim_blocks=True,
lstrip_blocks=True,
)
translations = Translations.load("i18n", [context["page_language"], "en"])
jinja_env.install_gettext_translations(translations)
template = jinja_env.get_template("article-template.html.j2")
return template.render(**context)

37
src/snapshots/config.py Normal file
View file

@ -0,0 +1,37 @@
import fnmatch
from pydantic import BaseModel, Field
from src.config import CustomBaseSettings
class SnapshotsConfig(BaseModel):
article_author_selector: str | None = None
article_image_selector: str | None = None
article_image_caption_selector: str | None = None
article_body_selector: str
article_body_remove_selector: list[str] = []
article_published_selector: str | None = Field(
None,
description="CSS selector for an element containing a localised publication datetime. By default, the publication date will be determined from the OpenGraph metadata.",
)
article_title_selector: str = "h1"
match_urls: list[str]
pre_remove_selectors: list[str] = "aside"
site_logo: str
site_title: str
class Config(CustomBaseSettings):
PARSER_CONFIGS: list[SnapshotsConfig]
settings = Config()
def config_for_url(url: str) -> SnapshotsConfig | None:
for cfg in settings.PARSER_CONFIGS:
for pattern in cfg.match_urls:
if fnmatch.fnmatch(url, pattern):
return cfg
return None

49
src/snapshots/router.py Normal file
View file

@ -0,0 +1,49 @@
from fastapi import APIRouter, HTTPException, BackgroundTasks
from starlette import status
from starlette.responses import HTMLResponse
from src.config import settings
from src.google.config import settings as google_settings
from src.snapshots.client import Snapshot
from src.snapshots.schemas import SnapshotContext
from src.snapshots.tasks import upload_snapshot
router = APIRouter()
@router.get(
"/debug/context",
summary="Generate the context used by the snapshot template for debugging purposes. Endpoint disabled on production deployments.",
response_model=SnapshotContext,
)
def context(url: str = "https://www.bbc.com/russian/articles/ckgeey4dqgxo"):
if settings.ENVIRONMENT.is_debug:
return Snapshot(url).get_context()
raise HTTPException(status.HTTP_404_NOT_FOUND)
@router.get(
"/debug/demo",
summary="Generate a rendered snapshot template for debugging purposes. Endpoint disabled on production deployments.",
response_class=HTMLResponse,
)
def parse(url: str = "https://www.bbc.com/russian/articles/ckgeey4dqgxo"):
if settings.ENVIRONMENT.is_debug:
return Snapshot(url).render()
raise HTTPException(status.HTTP_404_NOT_FOUND)
@router.get(
"/debug/upload",
summary="Generate a rendered snapshot template for debugging purposes and upload to Google Cloud Storage. Endpoint disabled on production deployments.",
response_class=HTMLResponse,
)
def upload(
background_tasks: BackgroundTasks,
url: str = "https://www.bbc.com/russian/articles/ckgeey4dqgxo",
):
if settings.ENVIRONMENT.is_debug:
rendered = Snapshot(url).render()
background_tasks.add_task(upload_snapshot, "debug2.html", rendered)
return f'<a href="https://storage.googleapis.com/{google_settings.BUCKET_NAME}/debug.html">Google Cloud Storage</a>'
raise HTTPException(status.HTTP_404_NOT_FOUND)

18
src/snapshots/schemas.py Normal file
View file

@ -0,0 +1,18 @@
from pydantic import BaseModel
class SnapshotContext(BaseModel):
article_author: str | None = None
article_body: str
article_description: str | None = None
article_image: str | None = None
article_image_caption: str | None = None
article_image_source: str | None = None
article_published: str
article_title: str
article_url: str
page_direction: str | None = None
page_language: str | None = None
site_favicon: str | None = None
site_logo: str = None
site_title: str

5
src/snapshots/tasks.py Normal file
View file

@ -0,0 +1,5 @@
from src.google.client import upload_blob
def upload_snapshot(filename: str, content: str) -> None:
upload_blob(filename, content.encode("utf-8"), "text/html")

View file

@ -0,0 +1,165 @@
{% from "article.css.j2" import article_css %}<!DOCTYPE html>
<html {% if page_direction %} dir="{{ page_direction }}"{% endif %}{% if page_language %} lang={{ page_language }}{% endif %} prefix="og: https://ogp.me/ns#">
<head>
<meta charset="utf-8">
<title>{{ article_title }}</title>
<meta name="viewport"
content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no"/>
<meta name="format-detection" content="telephone=no"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta name="MobileOptimized" content="176"/>
<meta name="HandheldFriendly" content="True"/>
<meta property="og:type" content="article"/>
<meta property="og:title" content="{{ article_title }}"/>
<meta property="og:site_name" content="{{ site_title }}"/>
<meta property="og:url" content="{{ article_url }}"/>
{% if article_image_source %}
<meta property="og:image" content="{{ article_image_source }}"/>
{% endif %}
<meta name="twitter:card" content="summary_large_image"/>
{% if article_author %}<meta property="article:author" content="{{ article_author }}"/>{% endif %}
{% if noindex %}<meta name="robots" content="noindex" />{% endif %}
{% if site_favicon %}
<link rel="icon" href="{{ site_favicon }}" />
{% endif %}
<style>
{{ article_css() }}
</style>
<script type="text/javascript">
const cancelCurrentLink = function () {
delete document.body.dataset.currentLink;
};
const goToCurrentLink = function () {
const currentLinkHref = document.body.dataset.currentLink;
if (currentLinkHref) {
delete document.body.dataset.currentLink;
document.location.href = currentLinkHref;
}
};
document.addEventListener("click", (e) => {
let target = e.target.closest("a");
if (target) {
// if the click was on or within an <a>
if (!target.href.includes("cloudfront.net") &&
!target.href.includes("azureedge.net") &&
!target.href.includes("global.ssl.fastly.net")) {
e.preventDefault();
document.body.dataset.currentLink = target.href;
}
}
});
</script>
</head>
<body>
<div class="snap-wrapper">
<a href="#snap-main" class="snap-skip-link">Skip to main content</a>
<details class="snap-trust-header">
<summary class="snap-trust-header__header">
<div class="snap-trust-header__header-text">
{{ gettext("How do I know that I can trust this page?") }}
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="snap-trust-header__expand-icon">
<path d="M12.0132 15.5154C12.282 15.5133 12.5392 15.4067 12.7307 15.2181L17.6953 10.2535C17.8899 10.0598 17.9996 9.79673 18.0001 9.52216C18.0008 9.24758 17.8922 8.98401 17.6986 8.7894C17.5048 8.59479 17.2417 8.4851 16.9671 8.48455C16.6925 8.48391 16.429 8.59243 16.2343 8.78608L12.0001 13.0201L7.76585 8.78608H7.76596C7.50378 8.52519 7.12238 8.42384 6.76536 8.52038C6.40823 8.61693 6.12979 8.89664 6.03484 9.2541C5.93989 9.61155 6.04286 9.99261 6.30504 10.2535L11.2696 15.2181C11.4674 15.413 11.7354 15.5201 12.0129 15.5154H12.0132Z" fill="#333333"></path>
</svg>
</div>
</summary>
<div class="snap-trust-header__content">
{{ gettext('This story is a copy of an article from <a href="{site_url}" class="snap-trust-header__sitelink">{site_title}</a>. It is delivered to you from a trusted archive to assure its availability over time.').format(site_url=site_url, site_title=site_title) }} <p>
<a href="{{ article_url }}" class="snap-footer-link">
{{ gettext("View the article source") }}
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path
d="M19.7212 13.0822C19.3072 13.0822 18.9712 13.4189 18.9712 13.8322V18.9712H5.02881V5.02881H10.167C10.5818 5.02881 10.917 4.69279 10.917 4.27881C10.917 3.86483 10.5818 3.52881 10.167 3.52881H4.27881C3.86405 3.52881 3.52881 3.86483 3.52881 4.27881V19.7212C3.52881 20.136 3.86405 20.4712 4.27881 20.4712H19.7212C20.136 20.4712 20.4712 20.136 20.4712 19.7212V13.8322C20.4712 13.4197 20.136 13.0822 19.7212 13.0822Z"
fill="#222F3A"></path>
<path
d="M19.7212 3.52881H14.1622C13.7474 3.52881 13.4122 3.86483 13.4122 4.27881C13.4122 4.69279 13.7474 5.02881 14.1622 5.02881H17.9108L11.9978 10.9418C11.7045 11.2351 11.7045 11.7091 11.9978 12.0023C12.144 12.1485 12.3361 12.222 12.528 12.222C12.7201 12.222 12.912 12.1485 13.0583 12.0023L18.9713 6.08927V9.83927C18.9713 10.2532 19.3073 10.5893 19.7213 10.5893C20.136 10.5893 20.4713 10.2532 20.4713 9.83927V4.27887C20.4713 3.86411 20.136 3.52887 19.7213 3.52887L19.7212 3.52881Z"
fill="#222F3A"></path>
</svg>
</a>
</p>
</div>
</details>
<header class="snap-page-header">
<nav class="snap-page-header-nav">
{% if article_mirror_url %}<a href="{{ article_mirror_url }}">{% endif %}
<img src="{{ site_logo }}" alt="{{ site_title }}" class="snap-page-header-logo">
{% if article_mirror_url %}</a>{% endif %}
</nav>
</header>
<main id="snap-main">
<header class="snap-article-header">
<h1>{{ article_title }}</h1>
<div class="snap-byline">
{{ article_published }} - {{ site_title }}
</div>
</header>
{% if article_image %}
<figure>
<img src="{{ article_image }}">
{% if article_image_caption %}
<figcaption>{{ article_image_caption }}</figcaption>
{% endif %}
</figure>
{% endif %}
<div class="snap-content">
{{ article_body }}
{% if article_mirror_url %}
<p>
<a href="{{ article_mirror_url }}" class="snap-footer-link">
View the original article
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M19.7212 13.0822C19.3072 13.0822 18.9712 13.4189 18.9712 13.8322V18.9712H5.02881V5.02881H10.167C10.5818 5.02881 10.917 4.69279 10.917 4.27881C10.917 3.86483 10.5818 3.52881 10.167 3.52881H4.27881C3.86405 3.52881 3.52881 3.86483 3.52881 4.27881V19.7212C3.52881 20.136 3.86405 20.4712 4.27881 20.4712H19.7212C20.136 20.4712 20.4712 20.136 20.4712 19.7212V13.8322C20.4712 13.4197 20.136 13.0822 19.7212 13.0822Z"
fill="#222F3A"></path>
<path d="M19.7212 3.52881H14.1622C13.7474 3.52881 13.4122 3.86483 13.4122 4.27881C13.4122 4.69279 13.7474 5.02881 14.1622 5.02881H17.9108L11.9978 10.9418C11.7045 11.2351 11.7045 11.7091 11.9978 12.0023C12.144 12.1485 12.3361 12.222 12.528 12.222C12.7201 12.222 12.912 12.1485 13.0583 12.0023L18.9713 6.08927V9.83927C18.9713 10.2532 19.3073 10.5893 19.7213 10.5893C20.136 10.5893 20.4713 10.2532 20.4713 9.83927V4.27887C20.4713 3.86411 20.136 3.52887 19.7213 3.52887L19.7212 3.52881Z"
fill="#222F3A"></path>
</svg>
</a>
</p>
{% endif %}
</div>
</main>
<footer class="snap-footer">
<div>
{% if site_mirror_url %}<a href="https://d7qg4uz16a7xs.cloudfront.net/">{% endif %}
<img src="{{ site_logo }}" alt="{{ site_title }} logo">
{% if site_mirror_url %}</a>{% endif %}
</div>
<p>© {{ site_title }}</p>
</footer>
</div>
<div class="snap-link-warning-popup">
<div class="snap-link-warning-popup__wrapper">
<div class="snap-link-warning-popup__header">
<div class="snap-link-warning-popup__title">{{ gettext("You are leaving this page") }}</div>
<div class="snap-link-warning-popup__icon" onclick="cancelCurrentLink()">
<svg width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
<g clip-path="url(#clip0_235_140)">
<path
fill-rule="evenodd"
clip-rule="evenodd"
d="M2.94975 3.05025C3.34027 2.65973 3.97344 2.65973 4.36396 3.05025L7.89949 6.58579L11.435 3.05025C11.8256 2.65973 12.4587 2.65973 12.8492 3.05025C13.2398 3.44078 13.2398 4.07394 12.8492 4.46447L9.31371 8L12.8492 11.5355C13.2398 11.9261 13.2398 12.5592 12.8492 12.9497C12.4587 13.3403 11.8256 13.3403 11.435 12.9497L7.89949 9.41421L4.36396 12.9497C3.97344 13.3403 3.34027 13.3403 2.94975 12.9497C2.55922 12.5592 2.55922 11.9261 2.94975 11.5355L6.48528 8L2.94975 4.46447C2.55922 4.07394 2.55922 3.44078 2.94975 3.05025Z"
fill="#303E4F"></path>
</g>
<defs>
<clipPath id="clip0_235_140">
<rect width="16" height="16" fill="white"></rect>
</clipPath>
</defs>
</svg>
</div>
</div>
<div class="snap-link-warning-popup__content">
{{ gettext("This link will redirect you to an external website. If its not available in your region, you may not be able to access it.") }}
</div>
<div class="snap-link-warning-popup__button" onclick="goToCurrentLink()">{{ gettext("Continue") }}</div>
</div>
</div>
</body>
</html>

View file

@ -0,0 +1,311 @@
{% macro article_css() %}
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&display=swap');
html, body {
font-size: 18px;
font-family: "Noto Sans", Arial, sans-serif;
font-weight: 400;
color: #333;
margin: 0;
}
a:focus {
background-color: #ffc800;
}
figure {
width: 100%;
margin: 0;
}
figure img {
width: 100%;
}
figcaption {
font-size: 12px;
color: #4f4f4f;
padding-top: 8px;
padding-bottom: 24px;
}
#snap-main {
max-width: 600px;
margin: 0 auto;
padding: 0;
}
.snap-page-header {
background: none !important;
margin-top: 24px;
margin-bottom: 48px;
text-align: center;
}
.snap-page-header img {
width: 396px;
}
.snap-skip-link {
display: block;
width: 1px;
height: 1px;
overflow: hidden;
}
.snap-skip-link:focus {
width: 100%;
height: auto;
color: #000;
overflow: auto;
}
.snap-trust-header {
background: rgba(249, 248, 246, 1);
overflow: hidden;
padding: 0 16px 0 16px;
}
.snap-trust-header__header {
box-sizing: border-box;
height: 48px;
padding: 0px;
cursor: pointer;
display: flex;
align-items: center;
}
.snap-trust-header__header-text {
margin: 0 auto;
width: 100%;
display: flex;
align-items: center;
justify-content: space-between;
max-width: 600px;
font-weight: 400;
font-size: 12px;
line-height: 16px;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
}
.snap-trust-header__header::-webkit-details-marker {
display: none;
}
.snap-trust-header__header:hover {
opacity: 0.7;
}
.snap-trust-header__content {
font-weight: 400;
font-size: 18px;
line-height: 27px;
max-width: 600px;
margin: 0 auto;
}
.snap-trust-header[open] .snap-trust-header__expand-icon {
transform: rotate(180deg);
}
.snap-trust-header__sitelink {
color: #333;
}
.snap-article-header h1 {
font-weight: 500;
font-size: 40px;
line-height: 48px;
padding-bottom: 24px;
margin: 0;
}
.snap-byline {
font-size: 16px;
line-height: 24px;
padding-bottom: 20px;
color: #4f4f4f;
}
#snap-main p, #snap-main ul, #snap-main ol {
padding-top: 16px;
}
.snap-content a {
font-weight: 600;
color: #333;
text-decoration: underline;
}
.snap-footer-link {
box-sizing: border-box;
color: #333;
display: block;
max-width: 335px;
width: 100%;
border: 1px solid #e0dfdd;
border-radius: 4px;
padding: 16px 24px;
text-decoration: none;
background-color: #fff;
}
.snap-footer-link:hover {
background-color: #F9F8F6;
}
.snap-footer-link:focus {
background-color: #fff;
border: 2px solid #222F3A;
}
.snap-footer-link:active {
background-color: rgba(34, 47, 58, 0.24);
border-color: #e0dfdd;
}
.snap-footer-link svg {
float: right;
}
.snap-footer-link--disabled {
color: rgba(51, 51, 51, 0.5);
border-color: rgba(51, 51, 51, 0.5);
background-color: transparent; /* Ensure no background on hover/focus */
cursor: not-allowed;
pointer-events: none; /* Disable link interaction */
}
.snap-footer-link--disabled svg path {
fill: rgba(51, 51, 51, 0.5);
}
.snap-footer {
text-align: center;
display: block;
color: #4f4f4f;
background-color: #f8f9f6;
border-bottom: 8px solid #222f3a;
margin-top: 36px;
padding: 83px 10px 24px 10px;
font-size: 12px;
line-height: 16px;
}
.snap-footer img {
max-width: 237px;
margin-bottom: 36px;
}
@media (max-width: 640px) {
#snap-main {
padding-left: 20px;
padding-right: 20px;
}
.snap-page-header img {
width: 198px;
}
.snap-article-header h1 {
font-size: 32px;
line-height: 40px;
}
}
.snap-link-warning-popup {
pointer-events: none;
user-select: none;
background-color: rgba(0, 0, 0, 0.24);
position: fixed;
top: 0;
left: 0;
bottom: 0;
right: 0;
opacity: 0;
transition: opacity 0.3s linear;
display: flex;
align-items: center;
justify-content: center;
}
body[data-current-link] .snap-link-warning-popup {
pointer-events: initial;
opacity: 1;
}
.snap-link-warning-popup__wrapper {
width: 375px;
background-color: white;
padding: 24px;
border-radius: 12px 12px 12px 12px;
}
.snap-link-warning-popup__header {
display: flex;
justify-content: space-between;
margin-bottom: 16px;
}
.snap-link-warning-popup__title {
font-size: 16px;
font-weight: 700;
line-height: 20px;
text-align: left;
}
.snap-link-warning-popup__icon:hover,
.snap-link-warning-popup__button:hover {
cursor: pointer;
opacity: 0.8;
}
.snap-link-warning-popup__content {
font-size: 16px;
font-weight: 400;
line-height: 24px;
text-align: left;
margin-bottom: 32px;
}
.snap-link-warning-popup__button {
display: flex;
align-items: center;
justify-content: center;
padding: 15px 16px 15px 16px;
border-radius: 4px;
border: 1px solid rgba(0, 0, 0, 0.1);
font-size: 16px;
font-weight: 400;
line-height: 24px;
text-align: center;
color: rgba(51, 51, 51, 1);
}
@media (max-width: 500px) {
.snap-link-warning-popup {
display: block;
}
.snap-link-warning-popup__wrapper {
width: unset;
position: absolute;
bottom: 0;
left: 0;
right: 0;
border-radius: 12px 12px 0 0;
}
.snap-link-warning-popup .snap-link-warning-popup__wrapper {
transform: translateY(50%);
transition: transform 0.3s linear;
}
body[data-current-link] .snap-link-warning-popup .snap-link-warning-popup__wrapper {
transform: translateY(0);
}
}
{% endmacro %}