feat: rebuild rss-news backend, admin ui, and legal extraction pipeline
This commit is contained in:
parent
d65c55d315
commit
2c331d683b
43 changed files with 3463 additions and 73 deletions
6
.github/workflows/deploy.yml
vendored
6
.github/workflows/deploy.yml
vendored
|
|
@ -19,9 +19,15 @@ jobs:
|
|||
username: oliver
|
||||
key: ${{ secrets.HETZNER_SSH_KEY }}
|
||||
port: 22
|
||||
envs: APP_ADMIN_USERNAME,APP_ADMIN_PASSWORD
|
||||
script: |
|
||||
cd rss-news
|
||||
git pull origin main
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
pip install -r backend/requirements.txt || true
|
||||
sudo systemctl restart rss-app
|
||||
BASE_URL="https://news.vanityontour.de" APP_ADMIN_USERNAME="${APP_ADMIN_USERNAME}" APP_ADMIN_PASSWORD="${APP_ADMIN_PASSWORD}" bash scripts/smoke_backend.sh
|
||||
env:
|
||||
APP_ADMIN_USERNAME: ${{ secrets.NEWS_APP_ADMIN_USERNAME }}
|
||||
APP_ADMIN_PASSWORD: ${{ secrets.NEWS_APP_ADMIN_PASSWORD }}
|
||||
|
|
|
|||
39
.github/workflows/test.yml
vendored
Normal file
39
.github/workflows/test.yml
vendored
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
name: Backend Tests
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
backend-tests:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 15
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install -r backend/requirements.txt
|
||||
pip install -r backend/requirements-test.txt
|
||||
|
||||
- name: Run tests with coverage
|
||||
env:
|
||||
APP_DB_PATH: /tmp/rss_news_test.db
|
||||
run: |
|
||||
pytest backend/tests --cov=backend/app --cov-report=term-missing --cov-report=xml
|
||||
|
||||
- name: Upload coverage artifact
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: coverage-xml
|
||||
path: coverage.xml
|
||||
38
CHANGELOG.md
38
CHANGELOG.md
|
|
@ -1,10 +1,42 @@
|
|||
## [1.7.1] - 2025-08-28
|
||||
## [1.7.1] - 2025-08-24
|
||||
|
||||
- Beschreibung...
|
||||
### ✨ Security angepasst
|
||||
- alle Credentials in die .env Datei verschoben
|
||||
- beim Start der App werden die Credentials geprüft und beim fehlen entsprechende Meldungen ausgegeben
|
||||
|
||||
---
|
||||
|
||||
## [1.7.0] - 2025-08-24
|
||||
|
||||
- Beschreibung...
|
||||
### Multi-Select & Massenoperationen:
|
||||
- ✅ Checkboxes für Artikel-Auswahl im "Artikel verwalten" Bereich
|
||||
- ✅ "Alle auswählen" / "Auswahl aufheben" Buttons
|
||||
- ✅ Massenoperationen für ausgewählte Artikel:
|
||||
- Bulk Status-Änderung für mehrere Artikel gleichzeitig
|
||||
- Bulk Artikel-Umschreibung mit automatischer Status-Verwaltung
|
||||
- Bulk WordPress-Upload nur für "Process"-Artikel
|
||||
- Bulk Papierkorb-Funktion
|
||||
|
||||
### Schnellaktionen Integration:
|
||||
- ✅ Feed-Aktualisierung direkt im Artikel-Tab verfügbar
|
||||
- ✅ Alle Dashboard-Schnellaktionen in Artikel-Verwaltung integriert
|
||||
- ✅ Intelligente Anzeige nur relevanter Operationen (z.B. WordPress-Upload nur bei Process-Artikeln)
|
||||
|
||||
### 🔧 Verbesserungen
|
||||
|
||||
- UI/UX: Verbesserte Artikel-Card-Layouts mit Checkbox-Integration
|
||||
- Workflow: Streamlined Artikel-Management ohne Tab-Wechsel nötig
|
||||
- Feedback: Detaillierte Statusmeldungen bei Massenoperationen
|
||||
- Performance: Optimierte Session-State-Verwaltung für Artikel-Auswahl
|
||||
|
||||
### 🏗️ Technische Änderungen
|
||||
|
||||
- Session State Erweiterung um selected_articles Set
|
||||
- Neue Bulk-Operation-Funktionen in app.py:326-467
|
||||
- Überarbeitetes Artikel-Card-Layout mit 3-Spalten-Design
|
||||
- Integration bestehender WordPress-Upload und Rewrite-Funktionen
|
||||
|
||||
---
|
||||
|
||||
## [1.6.3] - 2025-08-18
|
||||
|
||||
|
|
|
|||
114
README.md
114
README.md
|
|
@ -1,89 +1,63 @@
|
|||
# 📰 RSS News Bot
|
||||
# rss-news (Rebuild)
|
||||
|
||||
Ein intelligentes Tool zum Einlesen, Umschreiben und Veröffentlichen von Artikeln aus RSS-Feeds – mit automatischer Tag-Erkennung, KI-unterstütztem Rewrite via GPT-4, Bildextraktion aus Originalartikeln und optionaler DALL·E-Bildgenerierung.
|
||||
`rss-news` wird als bestehendes Repository weitergefuehrt und schrittweise zu einer robusten, rechtssicheren News-Pipeline neu aufgebaut.
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||
Aktueller Stand:
|
||||
- Alte Streamlit-App wird nicht produktiv genutzt.
|
||||
- `news.vanityontour.de` wird bis zum Go-Live der neuen App auf `https://vanityontour.de` umgeleitet.
|
||||
- Planung, Doku und Wiki werden als Grundlage fuer den Neuaufbau gepflegt.
|
||||
|
||||
---
|
||||
## Ziele
|
||||
- RSS-gestuetzte Artikelverarbeitung mit klaren Quellregeln
|
||||
- Rechtssichere Nutzung (Quellen, Attribution, Lizenzinformationen)
|
||||
- Zuverlaessige Automatisierung auf Hetzner
|
||||
- Publikation nach WordPress (IONOS aktuell, spaeter offen)
|
||||
- Zugriff nur nach Login (zunaechst User/Password)
|
||||
|
||||
## 🚀 Features
|
||||
## Architektur-Richtung (MVP)
|
||||
- Backend: `Python + FastAPI`
|
||||
- Jobs: Queue-Worker (z. B. Redis + RQ/Celery)
|
||||
- Daten: SQLite fuer MVP, spaeter optional PostgreSQL
|
||||
- Auth: Session-Login mit einem Admin-User
|
||||
- Publishing: WordPress REST API (Status zunaechst `pending`)
|
||||
|
||||
- 📡 **RSS-Feeds verwalten** (hinzufügen, aktualisieren)
|
||||
- ✍️ **Artikel automatisch umschreiben** mit GPT-4
|
||||
- 🏷️ **Tags automatisch generieren**
|
||||
- 🖼️ **Bilder aus Originalartikeln extrahieren**
|
||||
- 🪄 **Optionales DALL·E-Bild generieren**
|
||||
- 🔧 **Bearbeiten von Bildmetadaten**
|
||||
- 🗂️ **Statusverwaltung der Artikel (New, Rewrite, Process, etc.)**
|
||||
- 📜 **Log-Viewer-Seite integriert**
|
||||
- 📥 **Export zur Veröffentlichung auf WordPress vorbereitet**
|
||||
- 📋 Artikeltabelle mit Status-Filter
|
||||
- 🔍 Artikel-Expander mit Rewrite, Tags & Bildern
|
||||
- 🪄 Button für KI-Bildgenerierung
|
||||
Details: `docs/PROJECT_PLAN.md`
|
||||
|
||||
## Projektsteuerung
|
||||
- GitHub Project: `https://github.com/users/OliverGiertz/projects/3/views/1`
|
||||
- Dieses Board ist die zentrale Steuerung fuer ToDos, Bugs, Verbesserungen.
|
||||
- Wiki-Struktur liegt unter `docs/wiki/`.
|
||||
|
||||
---
|
||||
## Dokumentation
|
||||
- Projektplan: `docs/PROJECT_PLAN.md`
|
||||
- ToDo-Liste: `docs/TODO.md`
|
||||
- Quell- und Lizenzpolicy: `docs/SOURCE_POLICY.md`
|
||||
- Wiki Home: `docs/wiki/Home.md`
|
||||
|
||||
## 🧱 Projektstruktur
|
||||
|
||||
ss-news/
|
||||
├── app.py # Haupt-UI mit Streamlit
|
||||
├── main.py # Logik für Feed-Import und Verarbeitung
|
||||
├── utils/
|
||||
│ └── image_extractor.py # Bilder aus Originalartikeln extrahieren
|
||||
│ └── dalle_generator.py # DALL·E-Integration (KI-Bild)
|
||||
├── pages/
|
||||
│ └── log_viewer.py # UI zur Anzeige der Logs
|
||||
├── data/
|
||||
│ └── articles.json # Gespeicherte Artikel
|
||||
│ └── feeds.json # Gespeicherte Feed-URLs
|
||||
├── logs/
|
||||
│ └── rss_tool.log # Logging der Verarbeitung
|
||||
├── versioning.py # CLI-Tool zur Versionierung & Release
|
||||
├── TEST-CHECKLIST.md # Manuelle Prüfliste für Releases
|
||||
├── version.py # Aktuelle Version
|
||||
└── CHANGELOG.md # Änderungsprotokoll
|
||||
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Installation
|
||||
## Lokale Entwicklung (Legacy-Code)
|
||||
Der vorhandene Legacy-Stand kann weiterhin lokal gestartet werden:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/OliverGiertz/rss-news.git
|
||||
cd rss-news
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Update
|
||||
Ein Update Script findest du hier: https://gist.github.com/OliverGiertz/ad33ae3de9aa1c1163dad5fe8affb6ca
|
||||
|
||||
```bash
|
||||
bash update.sh
|
||||
```
|
||||
|
||||
|
||||
## ▶️ Starten der App
|
||||
|
||||
streamlit run app.py
|
||||
```
|
||||
|
||||
---
|
||||
Hinweis: Diese App ist funktional historisch und wird durch die neue Architektur ersetzt.
|
||||
|
||||
## 🔐 Konfiguration (.env)
|
||||
## Deployment-Zielbild
|
||||
- Betrieb auf Hetzner
|
||||
- Reverse Proxy via CloudPanel/Nginx
|
||||
- Produktive Domain: `news.vanityontour.de`
|
||||
- Bis zur Fertigstellung: Redirect auf `https://vanityontour.de`
|
||||
|
||||
Lege eine `.env` im Projekt an (siehe `.env.example`). Erforderliche Variablen:
|
||||
## Sicherheit
|
||||
- Keine Secrets im Repository
|
||||
- `.env` lokal/auf Server, nie committen
|
||||
- Auth-Pflicht fuer die neue WebApp
|
||||
- spaeter optional: Passkeys/WebAuthn
|
||||
|
||||
- `WP_BASE_URL`: Basis-URL deiner WordPress-Seite (z. B. https://example.com)
|
||||
- Authentifizierung (eine Option wählen):
|
||||
- `WP_AUTH_BASE64`: Bevorzugt. Base64 von `username:application_password`
|
||||
- oder `WP_USERNAME` und `WP_PASSWORD`: Benutzer + Anwendungspasswort
|
||||
- Optional: `OPENAI_API_KEY` für das Umschreiben von Artikeln
|
||||
## Rechtlicher Hinweis
|
||||
Dieses Projekt verarbeitet nur Quellen mit dokumentierter Nutzungsgrundlage. Vor produktiver Nutzung ist eine finale rechtliche Pruefung der ausgewaehlten Feeds notwendig.
|
||||
|
||||
Hinweis: Der Code liest ausschließlich aus `.env`. Es gibt keine hartkodierten Standard-Credentials.
|
||||
|
|
|
|||
10
backend/.env.example
Normal file
10
backend/.env.example
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
APP_ENV=development
|
||||
APP_NAME=rss-news-backend
|
||||
APP_SECRET_KEY=replace-with-a-long-random-secret
|
||||
APP_DB_PATH=backend/data/rss_news.db
|
||||
|
||||
APP_ADMIN_USERNAME=admin
|
||||
APP_ADMIN_PASSWORD=change-me
|
||||
|
||||
SESSION_COOKIE_NAME=rss_news_session
|
||||
SESSION_MAX_AGE_SECONDS=28800
|
||||
82
backend/README.md
Normal file
82
backend/README.md
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
# Backend Skeleton (FastAPI)
|
||||
|
||||
Dieses Verzeichnis enthaelt das technische Grundgeruest fuer den Rebuild von `rss-news`.
|
||||
|
||||
## Start (lokal)
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r backend/requirements.txt
|
||||
uvicorn backend.app.main:app --reload --port 8501
|
||||
```
|
||||
|
||||
## Admin UI
|
||||
- Login: `http://127.0.0.1:8501/admin/login`
|
||||
- Dashboard: `http://127.0.0.1:8501/admin/dashboard`
|
||||
|
||||
## Environment
|
||||
- Datei: `backend/.env`
|
||||
- Vorlage: `backend/.env.example`
|
||||
|
||||
## Endpoints
|
||||
- `GET /health` - Healthcheck
|
||||
- `POST /auth/login` - Login mit Admin-User
|
||||
- `POST /auth/logout` - Logout
|
||||
- `GET /auth/me` - Aktiver User
|
||||
- `GET /api/protected` - Geschuetzter Test-Endpoint
|
||||
- `GET /api/pipeline/status` - Basisstatus inkl. Datensatzzaehler
|
||||
- `GET /api/sources` - Quellenliste
|
||||
- `POST /api/sources` - Quelle anlegen
|
||||
- `GET /api/sources/{source_id}/policy-check` - Policy-Pruefung fuer Quelle
|
||||
- `GET /api/feeds` - Feedliste
|
||||
- `POST /api/feeds` - Feed anlegen
|
||||
- `GET /api/feeds/{feed_id}/policy-check` - Policy-Pruefung fuer Feed
|
||||
- `GET /api/runs` - Import-/Job-Runs anzeigen
|
||||
- `GET /api/runs/{run_id}` - Detailansicht eines Runs
|
||||
- `POST /api/runs` - Run starten
|
||||
- `POST /api/runs/{run_id}/finish` - Run abschliessen
|
||||
- `GET /api/articles` - Artikel anzeigen
|
||||
- `GET /api/articles/{article_id}` - Artikeldetail
|
||||
- `POST /api/articles/upsert` - Artikel idempotent anlegen/aktualisieren
|
||||
- `POST /api/articles/{article_id}/transition` - Statuswechsel nach Workflow-Regeln
|
||||
- `POST /api/articles/{article_id}/review` - Review-Entscheidung (approve/reject)
|
||||
- `POST /api/ingestion/run` - Feed-Ingestion starten (optional pro Feed)
|
||||
|
||||
## Datenbank
|
||||
- SQLite-Datei unter `backend/data/rss_news.db`
|
||||
- Tabellen werden beim App-Start initialisiert.
|
||||
- Tabellen: `sources`, `feeds`, `runs`, `articles`
|
||||
- Dedupe-Strategie Artikel: `source_url` -> `(feed_id, source_article_id)` -> `source_hash`
|
||||
|
||||
## Policy-Enforcement
|
||||
- Ingestion blockiert Feeds automatisch, wenn die zugeordnete Quelle nicht policy-konform ist.
|
||||
- Mindestanforderungen: `risk_level=green`, `terms_url`, `license_name`, `last_reviewed_at`, `is_enabled=1`.
|
||||
- Pro importiertem Artikel wird ein `attribution`-Block in `meta_json` gespeichert.
|
||||
|
||||
## Review-Workflow
|
||||
- Statuskette: `new -> review -> approved -> published`
|
||||
- Ablehnung im Review setzt auf `rewrite`
|
||||
- Ungueltige Statuswechsel werden per API blockiert
|
||||
|
||||
## Verifikation
|
||||
```bash
|
||||
python -m unittest backend.tests.test_db_repositories
|
||||
python -m unittest backend.tests.test_ingestion
|
||||
python -m unittest backend.tests.test_api_auth
|
||||
```
|
||||
|
||||
## CI / Online-Auswertung
|
||||
- GitHub Actions Workflow: `.github/workflows/test.yml`
|
||||
- Fuehrt Tests inkl. Coverage auf Push/PR gegen `main` aus.
|
||||
|
||||
## Hetzner Smoketest
|
||||
```bash
|
||||
BASE_URL="https://news.vanityontour.de" \
|
||||
APP_ADMIN_USERNAME="admin" \
|
||||
APP_ADMIN_PASSWORD="..." \
|
||||
bash scripts/smoke_backend.sh
|
||||
```
|
||||
|
||||
## Hinweis
|
||||
Passwort-Hashing und CSRF/Rate-Limit sind als naechste Ausbaustufe vorgesehen.
|
||||
1
backend/__init__.py
Normal file
1
backend/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
"""Backend package for rss-news rebuild."""
|
||||
1
backend/app/__init__.py
Normal file
1
backend/app/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
"""Application package."""
|
||||
265
backend/app/admin_ui.py
Normal file
265
backend/app/admin_ui.py
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from urllib.parse import urlencode
|
||||
|
||||
from fastapi import APIRouter, Form, Request
|
||||
from fastapi.responses import HTMLResponse, RedirectResponse
|
||||
from fastapi.templating import Jinja2Templates
|
||||
|
||||
from .auth import create_session_token, verify_credentials, verify_session_token
|
||||
from .config import get_settings
|
||||
from .ingestion import run_ingestion
|
||||
from .policy import evaluate_source_policy
|
||||
from .repositories import (
|
||||
FeedCreate,
|
||||
SourceCreate,
|
||||
create_feed,
|
||||
create_source,
|
||||
get_article_by_id,
|
||||
list_articles,
|
||||
list_feeds,
|
||||
list_runs,
|
||||
list_sources,
|
||||
update_article_status,
|
||||
)
|
||||
|
||||
settings = get_settings()
|
||||
router = APIRouter(tags=["admin-ui"])
|
||||
templates = Jinja2Templates(directory=str(Path(__file__).resolve().parent.parent / "templates"))
|
||||
ALLOWED_TRANSITIONS: dict[str, tuple[str, ...]] = {
|
||||
"new": ("review", "rewrite", "error"),
|
||||
"rewrite": ("review", "error"),
|
||||
"review": ("approved", "rewrite", "error"),
|
||||
"approved": ("published", "error"),
|
||||
"published": ("error",),
|
||||
"error": ("review", "rewrite"),
|
||||
}
|
||||
|
||||
|
||||
def _admin_user(request: Request) -> str | None:
|
||||
token = request.cookies.get(settings.session_cookie_name)
|
||||
if not token:
|
||||
return None
|
||||
return verify_session_token(token)
|
||||
|
||||
|
||||
def _to_optional_int(raw: str | None) -> int | None:
|
||||
if raw is None:
|
||||
return None
|
||||
value = raw.strip()
|
||||
if value == "":
|
||||
return None
|
||||
return int(value)
|
||||
|
||||
|
||||
def _dashboard_redirect(
|
||||
*,
|
||||
msg: str | None = None,
|
||||
msg_type: str = "success",
|
||||
status_filter: str | None = None,
|
||||
) -> RedirectResponse:
|
||||
query: dict[str, str] = {}
|
||||
if msg:
|
||||
query["msg"] = msg
|
||||
query["type"] = msg_type
|
||||
if status_filter:
|
||||
query["status_filter"] = status_filter
|
||||
suffix = f"?{urlencode(query)}" if query else ""
|
||||
return RedirectResponse(url=f"/admin/dashboard{suffix}", status_code=303)
|
||||
|
||||
|
||||
def _parse_meta_json(raw: str | None) -> dict:
|
||||
if not raw:
|
||||
return {}
|
||||
try:
|
||||
parsed = json.loads(raw)
|
||||
return parsed if isinstance(parsed, dict) else {}
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
|
||||
@router.get("/admin", response_class=HTMLResponse)
|
||||
def admin_index(request: Request):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
return RedirectResponse(url="/admin/dashboard", status_code=303)
|
||||
|
||||
|
||||
@router.get("/admin/login", response_class=HTMLResponse)
|
||||
def admin_login_page(request: Request):
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"admin_login.html",
|
||||
{"request": request, "title": "Admin Login", "error": request.query_params.get("error")},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/admin/login")
|
||||
def admin_login(request: Request, username: str = Form(...), password: str = Form(...)):
|
||||
if not verify_credentials(username, password):
|
||||
return RedirectResponse(url="/admin/login?error=1", status_code=303)
|
||||
|
||||
token = create_session_token(username)
|
||||
response = RedirectResponse(url="/admin/dashboard", status_code=303)
|
||||
response.set_cookie(
|
||||
key=settings.session_cookie_name,
|
||||
value=token,
|
||||
max_age=settings.session_max_age_seconds,
|
||||
httponly=True,
|
||||
secure=False,
|
||||
samesite="lax",
|
||||
)
|
||||
return response
|
||||
|
||||
|
||||
@router.post("/admin/logout")
|
||||
def admin_logout():
|
||||
response = RedirectResponse(url="/admin/login", status_code=303)
|
||||
response.delete_cookie(settings.session_cookie_name)
|
||||
return response
|
||||
|
||||
|
||||
@router.get("/admin/dashboard", response_class=HTMLResponse)
|
||||
def admin_dashboard(request: Request):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
|
||||
sources = list_sources()
|
||||
source_policy = {s["id"]: evaluate_source_policy(s) for s in sources}
|
||||
feeds = list_feeds()
|
||||
runs = list_runs(limit=30)
|
||||
status_filter = request.query_params.get("status_filter")
|
||||
if status_filter in {"new", "rewrite", "review", "approved", "published", "error"}:
|
||||
articles = list_articles(limit=100, status_filter=status_filter)
|
||||
else:
|
||||
status_filter = ""
|
||||
articles = list_articles(limit=100)
|
||||
for article in articles:
|
||||
meta = _parse_meta_json(article.get("meta_json"))
|
||||
extraction = meta.get("extraction") if isinstance(meta.get("extraction"), dict) else {}
|
||||
article["meta"] = meta
|
||||
article["extracted_images"] = extraction.get("images") if isinstance(extraction.get("images"), list) else []
|
||||
article["press_contact"] = extraction.get("press_contact") if isinstance(extraction.get("press_contact"), str) else None
|
||||
article["extraction_error"] = extraction.get("extraction_error") if isinstance(extraction.get("extraction_error"), str) else None
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"admin_dashboard.html",
|
||||
{
|
||||
"request": request,
|
||||
"title": "Admin Dashboard",
|
||||
"user": user,
|
||||
"sources": sources,
|
||||
"source_policy": source_policy,
|
||||
"feeds": feeds,
|
||||
"runs": runs,
|
||||
"articles": articles,
|
||||
"status_options": ["new", "rewrite", "review", "approved", "published", "error"],
|
||||
"allowed_transitions": ALLOWED_TRANSITIONS,
|
||||
"status_filter": status_filter,
|
||||
"flash_msg": request.query_params.get("msg", ""),
|
||||
"flash_type": request.query_params.get("type", "success"),
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/admin/sources/create")
|
||||
def admin_create_source(
|
||||
request: Request,
|
||||
name: str = Form(...),
|
||||
base_url: str = Form(""),
|
||||
terms_url: str = Form(""),
|
||||
license_name: str = Form(""),
|
||||
risk_level: str = Form("yellow"),
|
||||
last_reviewed_at: str = Form(""),
|
||||
):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
|
||||
try:
|
||||
create_source(
|
||||
SourceCreate(
|
||||
name=name,
|
||||
base_url=base_url or None,
|
||||
terms_url=terms_url or None,
|
||||
license_name=license_name or None,
|
||||
risk_level=risk_level,
|
||||
is_enabled=True,
|
||||
notes=None,
|
||||
last_reviewed_at=last_reviewed_at or None,
|
||||
)
|
||||
)
|
||||
except Exception as exc:
|
||||
return _dashboard_redirect(msg=f"Quelle konnte nicht gespeichert werden: {exc}", msg_type="error")
|
||||
return _dashboard_redirect(msg="Quelle gespeichert")
|
||||
|
||||
|
||||
@router.post("/admin/feeds/create")
|
||||
def admin_create_feed(
|
||||
request: Request,
|
||||
name: str = Form(...),
|
||||
url: str = Form(...),
|
||||
source_id: str = Form(""),
|
||||
):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
|
||||
try:
|
||||
create_feed(
|
||||
FeedCreate(
|
||||
name=name,
|
||||
url=url,
|
||||
source_id=_to_optional_int(source_id),
|
||||
is_enabled=True,
|
||||
)
|
||||
)
|
||||
except Exception as exc:
|
||||
return _dashboard_redirect(msg=f"Feed konnte nicht gespeichert werden: {exc}", msg_type="error")
|
||||
return _dashboard_redirect(msg="Feed gespeichert")
|
||||
|
||||
|
||||
@router.post("/admin/ingestion/run")
|
||||
def admin_run_ingestion(request: Request, feed_id: str = Form("")):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
try:
|
||||
stats = run_ingestion(feed_id=_to_optional_int(feed_id))
|
||||
except Exception as exc:
|
||||
return _dashboard_redirect(msg=f"Ingestion fehlgeschlagen: {exc}", msg_type="error")
|
||||
return _dashboard_redirect(msg=f"Ingestion: {stats.status}, upserts={stats.articles_upserted}")
|
||||
|
||||
|
||||
@router.post("/admin/articles/{article_id}/review")
|
||||
def admin_review_article(request: Request, article_id: int, decision: str = Form(...), note: str = Form("")):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
|
||||
article = get_article_by_id(article_id)
|
||||
if article and article.get("status") == "review" and decision in {"approve", "reject"}:
|
||||
target = "approved" if decision == "approve" else "rewrite"
|
||||
update_article_status(article_id, target, actor=user, note=note or None, decision=decision)
|
||||
return _dashboard_redirect(msg=f"Artikel #{article_id}: {decision}")
|
||||
return _dashboard_redirect(msg=f"Review-Aktion ungueltig fuer Artikel #{article_id}", msg_type="error")
|
||||
|
||||
|
||||
@router.post("/admin/articles/{article_id}/transition")
|
||||
def admin_transition_article(request: Request, article_id: int, target_status: str = Form(...), note: str = Form("")):
|
||||
user = _admin_user(request)
|
||||
if not user:
|
||||
return RedirectResponse(url="/admin/login", status_code=303)
|
||||
|
||||
article = get_article_by_id(article_id)
|
||||
if article:
|
||||
current = article.get("status")
|
||||
if target_status in ALLOWED_TRANSITIONS.get(current, ()):
|
||||
update_article_status(article_id, target_status, actor=user, note=note or None)
|
||||
return _dashboard_redirect(msg=f"Artikel #{article_id}: {current} -> {target_status}")
|
||||
return _dashboard_redirect(msg=f"Ungueltiger Statuswechsel fuer Artikel #{article_id}", msg_type="error")
|
||||
31
backend/app/auth.py
Normal file
31
backend/app/auth.py
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
import hmac
|
||||
from typing import Optional
|
||||
|
||||
from itsdangerous import URLSafeTimedSerializer, BadSignature, SignatureExpired
|
||||
|
||||
from .config import get_settings
|
||||
|
||||
|
||||
def _serializer() -> URLSafeTimedSerializer:
|
||||
settings = get_settings()
|
||||
return URLSafeTimedSerializer(settings.app_secret_key, salt="rss-news-session")
|
||||
|
||||
|
||||
def verify_credentials(username: str, password: str) -> bool:
|
||||
settings = get_settings()
|
||||
user_ok = hmac.compare_digest(username, settings.app_admin_username)
|
||||
pw_ok = hmac.compare_digest(password, settings.app_admin_password)
|
||||
return user_ok and pw_ok
|
||||
|
||||
|
||||
def create_session_token(username: str) -> str:
|
||||
return _serializer().dumps({"username": username})
|
||||
|
||||
|
||||
def verify_session_token(token: str) -> Optional[str]:
|
||||
settings = get_settings()
|
||||
try:
|
||||
payload = _serializer().loads(token, max_age=settings.session_max_age_seconds)
|
||||
except (BadSignature, SignatureExpired):
|
||||
return None
|
||||
return payload.get("username")
|
||||
29
backend/app/config.py
Normal file
29
backend/app/config.py
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
from functools import lru_cache
|
||||
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# Prefer backend-specific env file to avoid collisions with legacy root .env
|
||||
model_config = SettingsConfigDict(
|
||||
env_file=("backend/.env", ".env"),
|
||||
env_file_encoding="utf-8",
|
||||
extra="ignore",
|
||||
)
|
||||
|
||||
app_env: str = "development"
|
||||
app_name: str = "rss-news-backend"
|
||||
app_secret_key: str = "replace-with-a-long-random-secret"
|
||||
|
||||
app_admin_username: str = "admin"
|
||||
app_admin_password: str = "change-me"
|
||||
|
||||
session_cookie_name: str = "rss_news_session"
|
||||
session_max_age_seconds: int = 28800
|
||||
|
||||
app_db_path: str = "backend/data/rss_news.db"
|
||||
|
||||
|
||||
@lru_cache(maxsize=1)
|
||||
def get_settings() -> Settings:
|
||||
return Settings()
|
||||
138
backend/app/db.py
Normal file
138
backend/app/db.py
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
import sqlite3
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator
|
||||
|
||||
from .config import get_settings
|
||||
|
||||
|
||||
def _db_path() -> Path:
|
||||
settings = get_settings()
|
||||
path = Path(settings.app_db_path)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_conn() -> Iterator[sqlite3.Connection]:
|
||||
conn = sqlite3.connect(_db_path())
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys=ON;")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def init_db() -> None:
|
||||
with get_conn() as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
PRAGMA journal_mode=WAL;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sources (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT NOT NULL,
|
||||
base_url TEXT,
|
||||
terms_url TEXT,
|
||||
license_name TEXT,
|
||||
risk_level TEXT NOT NULL DEFAULT 'yellow' CHECK (risk_level IN ('green', 'yellow', 'red')),
|
||||
is_enabled INTEGER NOT NULL DEFAULT 0,
|
||||
notes TEXT,
|
||||
last_reviewed_at TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS feeds (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
source_id INTEGER,
|
||||
name TEXT NOT NULL,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
is_enabled INTEGER NOT NULL DEFAULT 1,
|
||||
etag TEXT,
|
||||
last_modified TEXT,
|
||||
last_checked_at TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
FOREIGN KEY(source_id) REFERENCES sources(id) ON DELETE SET NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS runs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_type TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK (status IN ('queued', 'running', 'success', 'failed')),
|
||||
started_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
finished_at TEXT,
|
||||
details TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS articles (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
feed_id INTEGER,
|
||||
source_article_id TEXT,
|
||||
source_hash TEXT,
|
||||
title TEXT NOT NULL,
|
||||
source_url TEXT NOT NULL,
|
||||
canonical_url TEXT,
|
||||
published_at TEXT,
|
||||
author TEXT,
|
||||
summary TEXT,
|
||||
content_raw TEXT,
|
||||
content_rewritten TEXT,
|
||||
word_count INTEGER DEFAULT 0,
|
||||
status TEXT NOT NULL DEFAULT 'new' CHECK (status IN ('new', 'rewrite', 'review', 'approved', 'published', 'error')),
|
||||
meta_json TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
FOREIGN KEY(feed_id) REFERENCES feeds(id) ON DELETE SET NULL,
|
||||
UNIQUE(source_url)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_source_article_id ON articles(source_article_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_source_hash ON articles(source_hash);
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS uq_articles_feed_source_article_id
|
||||
ON articles(feed_id, source_article_id)
|
||||
WHERE source_article_id IS NOT NULL;
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS uq_articles_source_hash
|
||||
ON articles(source_hash)
|
||||
WHERE source_hash IS NOT NULL;
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_status ON articles(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_feeds_source_id ON feeds(source_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_runs_started_at ON runs(started_at);
|
||||
CREATE INDEX IF NOT EXISTS idx_articles_published_at ON articles(published_at);
|
||||
|
||||
CREATE TRIGGER IF NOT EXISTS trg_sources_updated_at
|
||||
AFTER UPDATE ON sources
|
||||
FOR EACH ROW
|
||||
BEGIN
|
||||
UPDATE sources SET updated_at = datetime('now') WHERE id = OLD.id;
|
||||
END;
|
||||
|
||||
CREATE TRIGGER IF NOT EXISTS trg_feeds_updated_at
|
||||
AFTER UPDATE ON feeds
|
||||
FOR EACH ROW
|
||||
BEGIN
|
||||
UPDATE feeds SET updated_at = datetime('now') WHERE id = OLD.id;
|
||||
END;
|
||||
|
||||
CREATE TRIGGER IF NOT EXISTS trg_articles_updated_at
|
||||
AFTER UPDATE ON articles
|
||||
FOR EACH ROW
|
||||
BEGIN
|
||||
UPDATE articles SET updated_at = datetime('now') WHERE id = OLD.id;
|
||||
END;
|
||||
"""
|
||||
)
|
||||
|
||||
# Lightweight migration for existing DBs created before source_hash was introduced.
|
||||
existing_columns = {
|
||||
row["name"] for row in conn.execute("PRAGMA table_info(articles)").fetchall()
|
||||
}
|
||||
if "source_hash" not in existing_columns:
|
||||
conn.execute("ALTER TABLE articles ADD COLUMN source_hash TEXT")
|
||||
|
||||
|
||||
def rows_to_dicts(rows: list[sqlite3.Row]) -> list[dict[str, Any]]:
|
||||
return [dict(r) for r in rows]
|
||||
253
backend/app/ingestion.py
Normal file
253
backend/app/ingestion.py
Normal file
|
|
@ -0,0 +1,253 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
import hashlib
|
||||
import json
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
import feedparser
|
||||
|
||||
from .policy import evaluate_source_policy
|
||||
from .repositories import (
|
||||
ArticleUpsert,
|
||||
RunCreate,
|
||||
create_run,
|
||||
finish_run,
|
||||
get_feed_by_id,
|
||||
list_enabled_feeds,
|
||||
update_feed_fetch_state,
|
||||
upsert_article,
|
||||
)
|
||||
from .source_extraction import extract_article, extracted_article_to_meta
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class IngestionStats:
|
||||
run_id: int
|
||||
feeds_processed: int
|
||||
entries_seen: int
|
||||
articles_upserted: int
|
||||
status: str
|
||||
message: str
|
||||
|
||||
|
||||
MAX_FEED_FETCH_RETRIES = 3
|
||||
|
||||
|
||||
def _entry_published_iso(entry: dict) -> str | None:
|
||||
published = entry.get("published_parsed") or entry.get("updated_parsed")
|
||||
if not published:
|
||||
return None
|
||||
return datetime(*published[:6], tzinfo=timezone.utc).isoformat()
|
||||
|
||||
|
||||
def _entry_text(entry: dict) -> tuple[str, str]:
|
||||
summary = entry.get("summary", "") or ""
|
||||
content = ""
|
||||
if entry.get("content") and isinstance(entry.get("content"), list):
|
||||
first = entry["content"][0]
|
||||
content = first.get("value", "") if isinstance(first, dict) else ""
|
||||
if not content:
|
||||
content = summary
|
||||
return summary, content
|
||||
|
||||
|
||||
def _entry_hash(entry: dict, feed_id: int, link: str, title: str, summary: str) -> str:
|
||||
source_id = entry.get("id") or entry.get("guid") or ""
|
||||
published = _entry_published_iso(entry) or ""
|
||||
fingerprint = f"{feed_id}|{source_id}|{link}|{title.strip()}|{summary.strip()}|{published}"
|
||||
return hashlib.sha256(fingerprint.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def _parsed_get(parsed: object, key: str, default: object = None) -> object:
|
||||
if isinstance(parsed, dict):
|
||||
return parsed.get(key, default)
|
||||
return getattr(parsed, key, default)
|
||||
|
||||
|
||||
def run_ingestion(feed_id: int | None = None) -> IngestionStats:
|
||||
run_id = create_run(RunCreate(run_type="ingestion", status="running", details="started"))
|
||||
feeds_processed = 0
|
||||
entries_seen = 0
|
||||
articles_upserted = 0
|
||||
feed_results: list[dict[str, object]] = []
|
||||
|
||||
try:
|
||||
if feed_id is not None:
|
||||
feed = get_feed_by_id(feed_id)
|
||||
feeds = [feed] if feed and int(feed.get("is_enabled", 0)) == 1 else []
|
||||
else:
|
||||
feeds = list_enabled_feeds()
|
||||
|
||||
for feed in feeds:
|
||||
if not feed:
|
||||
continue
|
||||
feeds_processed += 1
|
||||
|
||||
source_snapshot = {
|
||||
"id": feed.get("source_id"),
|
||||
"name": feed.get("source_name"),
|
||||
"base_url": feed.get("source_base_url"),
|
||||
"terms_url": feed.get("source_terms_url"),
|
||||
"license_name": feed.get("source_license_name"),
|
||||
"risk_level": feed.get("source_risk_level"),
|
||||
"last_reviewed_at": feed.get("source_last_reviewed_at"),
|
||||
"is_enabled": feed.get("source_is_enabled"),
|
||||
}
|
||||
policy_issues = evaluate_source_policy(source_snapshot)
|
||||
if policy_issues:
|
||||
feed_results.append(
|
||||
{
|
||||
"feed_id": int(feed["id"]),
|
||||
"feed_url": feed["url"],
|
||||
"status": "blocked",
|
||||
"policy_issues": policy_issues,
|
||||
"entries_seen": 0,
|
||||
"upserts": 0,
|
||||
}
|
||||
)
|
||||
continue
|
||||
|
||||
parsed = None
|
||||
feed_error = None
|
||||
for attempt in range(1, MAX_FEED_FETCH_RETRIES + 1):
|
||||
try:
|
||||
parsed = feedparser.parse(
|
||||
feed["url"],
|
||||
etag=feed.get("etag"),
|
||||
modified=feed.get("last_modified"),
|
||||
)
|
||||
break
|
||||
except Exception as exc:
|
||||
feed_error = str(exc)
|
||||
if attempt < MAX_FEED_FETCH_RETRIES:
|
||||
time.sleep(0.5 * attempt)
|
||||
|
||||
if parsed is None:
|
||||
feed_results.append(
|
||||
{
|
||||
"feed_id": int(feed["id"]),
|
||||
"feed_url": feed["url"],
|
||||
"status": "failed",
|
||||
"error": feed_error or "unknown",
|
||||
"entries_seen": 0,
|
||||
"upserts": 0,
|
||||
}
|
||||
)
|
||||
continue
|
||||
|
||||
# Persist ETag/Last-Modified for conditional requests.
|
||||
parsed_etag = _parsed_get(parsed, "etag")
|
||||
parsed_modified = _parsed_get(parsed, "modified")
|
||||
if parsed_modified and not isinstance(parsed_modified, str):
|
||||
parsed_modified = str(parsed_modified)
|
||||
update_feed_fetch_state(
|
||||
feed_id=int(feed["id"]),
|
||||
etag=parsed_etag if isinstance(parsed_etag, str) else None,
|
||||
last_modified=parsed_modified if isinstance(parsed_modified, str) else None,
|
||||
)
|
||||
|
||||
feed_entries_seen = 0
|
||||
feed_upserts = 0
|
||||
for entry in _parsed_get(parsed, "entries", []):
|
||||
entries_seen += 1
|
||||
feed_entries_seen += 1
|
||||
link = entry.get("link")
|
||||
if not link:
|
||||
continue
|
||||
|
||||
summary, content_raw = _entry_text(entry)
|
||||
title = entry.get("title") or "Ohne Titel"
|
||||
extracted = extract_article(link)
|
||||
|
||||
final_title = extracted.title or title
|
||||
final_author = extracted.author or entry.get("author")
|
||||
final_summary = extracted.summary or (summary[:1000] if summary else None)
|
||||
final_content_raw = extracted.content_text or content_raw
|
||||
final_canonical = extracted.canonical_url or entry.get("link")
|
||||
|
||||
source_hash = _entry_hash(
|
||||
entry,
|
||||
int(feed["id"]),
|
||||
link,
|
||||
final_title,
|
||||
final_summary or "",
|
||||
)
|
||||
attribution = {
|
||||
"source_name": feed.get("source_name"),
|
||||
"source_base_url": feed.get("source_base_url"),
|
||||
"source_terms_url": feed.get("source_terms_url"),
|
||||
"source_license_name": feed.get("source_license_name"),
|
||||
"source_risk_level": feed.get("source_risk_level"),
|
||||
"original_link": link,
|
||||
"feed_name": feed.get("name"),
|
||||
"feed_id": int(feed["id"]),
|
||||
"imported_at": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
extraction_meta: dict[str, Any] = extracted_article_to_meta(extracted)
|
||||
extraction_meta["fetched_from"] = link
|
||||
article_id = upsert_article(
|
||||
ArticleUpsert(
|
||||
feed_id=int(feed["id"]),
|
||||
source_article_id=entry.get("id") or entry.get("guid"),
|
||||
source_hash=source_hash,
|
||||
title=final_title,
|
||||
source_url=link,
|
||||
canonical_url=final_canonical,
|
||||
published_at=_entry_published_iso(entry),
|
||||
author=final_author,
|
||||
summary=final_summary,
|
||||
content_raw=final_content_raw,
|
||||
content_rewritten=None,
|
||||
word_count=len((final_content_raw or "").split()),
|
||||
status="new",
|
||||
meta_json=json.dumps({"attribution": attribution, "extraction": extraction_meta}, ensure_ascii=False),
|
||||
)
|
||||
)
|
||||
if article_id:
|
||||
articles_upserted += 1
|
||||
feed_upserts += 1
|
||||
|
||||
feed_results.append(
|
||||
{
|
||||
"feed_id": int(feed["id"]),
|
||||
"feed_url": feed["url"],
|
||||
"status": "success",
|
||||
"entries_seen": feed_entries_seen,
|
||||
"upserts": feed_upserts,
|
||||
}
|
||||
)
|
||||
|
||||
finish_run(
|
||||
run_id=run_id,
|
||||
status="success",
|
||||
details=json.dumps(
|
||||
{
|
||||
"feeds_processed": feeds_processed,
|
||||
"entries_seen": entries_seen,
|
||||
"upserts": articles_upserted,
|
||||
"feeds": feed_results,
|
||||
},
|
||||
ensure_ascii=False,
|
||||
),
|
||||
)
|
||||
return IngestionStats(
|
||||
run_id=run_id,
|
||||
feeds_processed=feeds_processed,
|
||||
entries_seen=entries_seen,
|
||||
articles_upserted=articles_upserted,
|
||||
status="success",
|
||||
message="Ingestion abgeschlossen",
|
||||
)
|
||||
except Exception as exc:
|
||||
finish_run(run_id=run_id, status="failed", details=str(exc))
|
||||
return IngestionStats(
|
||||
run_id=run_id,
|
||||
feeds_processed=feeds_processed,
|
||||
entries_seen=entries_seen,
|
||||
articles_upserted=articles_upserted,
|
||||
status="failed",
|
||||
message=str(exc),
|
||||
)
|
||||
404
backend/app/main.py
Normal file
404
backend/app/main.py
Normal file
|
|
@ -0,0 +1,404 @@
|
|||
from contextlib import asynccontextmanager
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import Depends, FastAPI, HTTPException, Request, Response, status
|
||||
from pydantic import BaseModel, Field
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
|
||||
from .admin_ui import router as admin_router
|
||||
from .auth import create_session_token, verify_credentials, verify_session_token
|
||||
from .config import get_settings
|
||||
from .db import init_db
|
||||
from .ingestion import run_ingestion
|
||||
from .policy import evaluate_source_policy, is_source_allowed
|
||||
from .repositories import (
|
||||
ArticleUpsert,
|
||||
FeedCreate,
|
||||
RunCreate,
|
||||
SourceCreate,
|
||||
create_feed as repo_create_feed,
|
||||
create_run,
|
||||
create_source as repo_create_source,
|
||||
finish_run,
|
||||
get_article_by_id,
|
||||
get_feed_by_id,
|
||||
get_run_by_id,
|
||||
get_source_by_id,
|
||||
list_articles as repo_list_articles,
|
||||
list_feeds as repo_list_feeds,
|
||||
list_runs,
|
||||
list_sources as repo_list_sources,
|
||||
update_article_status,
|
||||
upsert_article as repo_upsert_article,
|
||||
)
|
||||
|
||||
settings = get_settings()
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def app_lifespan(_: FastAPI):
|
||||
init_db()
|
||||
yield
|
||||
|
||||
|
||||
app = FastAPI(title=settings.app_name, lifespan=app_lifespan)
|
||||
app.include_router(admin_router)
|
||||
app.mount(
|
||||
"/admin/static",
|
||||
StaticFiles(directory=str(Path(__file__).resolve().parent.parent / "static")),
|
||||
name="admin-static",
|
||||
)
|
||||
|
||||
|
||||
class LoginRequest(BaseModel):
|
||||
username: str
|
||||
password: str
|
||||
|
||||
|
||||
class SourceCreateRequest(BaseModel):
|
||||
name: str = Field(min_length=1, max_length=200)
|
||||
base_url: str | None = None
|
||||
terms_url: str | None = None
|
||||
license_name: str | None = None
|
||||
risk_level: str = Field(default="yellow", pattern="^(green|yellow|red)$")
|
||||
is_enabled: bool = False
|
||||
notes: str | None = None
|
||||
last_reviewed_at: str | None = None
|
||||
|
||||
|
||||
class FeedCreateRequest(BaseModel):
|
||||
name: str = Field(min_length=1, max_length=200)
|
||||
url: str = Field(min_length=5, max_length=1000)
|
||||
source_id: int | None = None
|
||||
is_enabled: bool = True
|
||||
|
||||
|
||||
class RunCreateRequest(BaseModel):
|
||||
run_type: str = Field(min_length=2, max_length=100)
|
||||
status: str = Field(default="queued", pattern="^(queued|running|success|failed)$")
|
||||
details: str | None = None
|
||||
|
||||
|
||||
class RunFinishRequest(BaseModel):
|
||||
status: str = Field(pattern="^(success|failed)$")
|
||||
details: str | None = None
|
||||
|
||||
|
||||
class ArticleUpsertRequest(BaseModel):
|
||||
feed_id: int | None = None
|
||||
source_article_id: str | None = None
|
||||
source_hash: str | None = None
|
||||
title: str = Field(min_length=1, max_length=500)
|
||||
source_url: str = Field(min_length=5, max_length=2000)
|
||||
canonical_url: str | None = None
|
||||
published_at: str | None = None
|
||||
author: str | None = None
|
||||
summary: str | None = None
|
||||
content_raw: str | None = None
|
||||
content_rewritten: str | None = None
|
||||
word_count: int = 0
|
||||
status: str = Field(default="new", pattern="^(new|rewrite|review|approved|published|error)$")
|
||||
meta_json: str | None = None
|
||||
|
||||
|
||||
class IngestionRunRequest(BaseModel):
|
||||
feed_id: int | None = None
|
||||
|
||||
|
||||
class ArticleTransitionRequest(BaseModel):
|
||||
target_status: str = Field(pattern="^(new|rewrite|review|approved|published|error)$")
|
||||
note: str | None = None
|
||||
|
||||
|
||||
class ArticleReviewRequest(BaseModel):
|
||||
decision: str = Field(pattern="^(approve|reject)$")
|
||||
note: str | None = None
|
||||
|
||||
|
||||
ALLOWED_ARTICLE_TRANSITIONS: dict[str, set[str]] = {
|
||||
"new": {"review", "rewrite", "error"},
|
||||
"rewrite": {"review", "error"},
|
||||
"review": {"approved", "rewrite", "error"},
|
||||
"approved": {"published", "error"},
|
||||
"published": {"error"},
|
||||
"error": {"review", "rewrite"},
|
||||
}
|
||||
|
||||
|
||||
def require_auth(request: Request) -> str:
|
||||
token = request.cookies.get(settings.session_cookie_name)
|
||||
if not token:
|
||||
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Nicht angemeldet")
|
||||
|
||||
username = verify_session_token(token)
|
||||
if not username:
|
||||
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Session ungueltig oder abgelaufen")
|
||||
|
||||
return username
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
def health() -> dict:
|
||||
return {"status": "ok", "service": settings.app_name, "db_path": settings.app_db_path}
|
||||
|
||||
|
||||
@app.post("/auth/login")
|
||||
def login(payload: LoginRequest, response: Response) -> dict:
|
||||
if not verify_credentials(payload.username, payload.password):
|
||||
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Ungueltige Zugangsdaten")
|
||||
|
||||
token = create_session_token(payload.username)
|
||||
response.set_cookie(
|
||||
key=settings.session_cookie_name,
|
||||
value=token,
|
||||
max_age=settings.session_max_age_seconds,
|
||||
httponly=True,
|
||||
secure=False,
|
||||
samesite="lax",
|
||||
)
|
||||
return {"ok": True, "username": payload.username}
|
||||
|
||||
|
||||
@app.post("/auth/logout")
|
||||
def logout(response: Response) -> dict:
|
||||
response.delete_cookie(settings.session_cookie_name)
|
||||
return {"ok": True}
|
||||
|
||||
|
||||
@app.get("/auth/me")
|
||||
def me(username: str = Depends(require_auth)) -> dict:
|
||||
return {"authenticated": True, "username": username}
|
||||
|
||||
|
||||
@app.get("/api/protected")
|
||||
def protected(username: str = Depends(require_auth)) -> dict:
|
||||
return {"ok": True, "message": "Protected endpoint", "username": username}
|
||||
|
||||
|
||||
@app.get("/api/pipeline/status")
|
||||
def pipeline_status(username: str = Depends(require_auth)) -> dict:
|
||||
feeds_total = len(repo_list_feeds())
|
||||
sources_total = len(repo_list_sources())
|
||||
articles_total = len(repo_list_articles(limit=500))
|
||||
return {
|
||||
"ok": True,
|
||||
"stage": "skeleton+db",
|
||||
"requested_by": username,
|
||||
"counts": {
|
||||
"sources": sources_total,
|
||||
"feeds": feeds_total,
|
||||
"articles": articles_total,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@app.get("/api/sources")
|
||||
def list_sources(username: str = Depends(require_auth)) -> dict:
|
||||
return {"ok": True, "items": repo_list_sources(), "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/sources/{source_id}/policy-check")
|
||||
def source_policy_check(source_id: int, username: str = Depends(require_auth)) -> dict:
|
||||
source = get_source_by_id(source_id)
|
||||
if not source:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Quelle nicht gefunden")
|
||||
issues = evaluate_source_policy(source)
|
||||
return {
|
||||
"ok": True,
|
||||
"source_id": source_id,
|
||||
"allowed": is_source_allowed(source),
|
||||
"issues": issues,
|
||||
"requested_by": username,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/api/sources")
|
||||
def create_source(payload: SourceCreateRequest, username: str = Depends(require_auth)) -> dict:
|
||||
source_id = repo_create_source(
|
||||
SourceCreate(
|
||||
name=payload.name,
|
||||
base_url=payload.base_url,
|
||||
terms_url=payload.terms_url,
|
||||
license_name=payload.license_name,
|
||||
risk_level=payload.risk_level,
|
||||
is_enabled=payload.is_enabled,
|
||||
notes=payload.notes,
|
||||
last_reviewed_at=payload.last_reviewed_at,
|
||||
)
|
||||
)
|
||||
return {"ok": True, "id": source_id, "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/feeds")
|
||||
def list_feeds(username: str = Depends(require_auth)) -> dict:
|
||||
return {"ok": True, "items": repo_list_feeds(), "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/feeds/{feed_id}/policy-check")
|
||||
def feed_policy_check(feed_id: int, username: str = Depends(require_auth)) -> dict:
|
||||
feed = get_feed_by_id(feed_id)
|
||||
if not feed:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Feed nicht gefunden")
|
||||
|
||||
source_snapshot = {
|
||||
"id": feed.get("source_id"),
|
||||
"name": feed.get("source_name"),
|
||||
"base_url": feed.get("source_base_url"),
|
||||
"terms_url": feed.get("source_terms_url"),
|
||||
"license_name": feed.get("source_license_name"),
|
||||
"risk_level": feed.get("source_risk_level"),
|
||||
"last_reviewed_at": feed.get("source_last_reviewed_at"),
|
||||
"is_enabled": feed.get("source_is_enabled"),
|
||||
}
|
||||
issues = evaluate_source_policy(source_snapshot)
|
||||
return {
|
||||
"ok": True,
|
||||
"feed_id": feed_id,
|
||||
"allowed": len(issues) == 0,
|
||||
"issues": issues,
|
||||
"requested_by": username,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/api/feeds")
|
||||
def create_feed(payload: FeedCreateRequest, username: str = Depends(require_auth)) -> dict:
|
||||
try:
|
||||
feed_id = repo_create_feed(
|
||||
FeedCreate(
|
||||
name=payload.name,
|
||||
url=payload.url,
|
||||
source_id=payload.source_id,
|
||||
is_enabled=payload.is_enabled,
|
||||
)
|
||||
)
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=f"Feed konnte nicht angelegt werden: {exc}") from exc
|
||||
|
||||
return {"ok": True, "id": feed_id, "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/runs")
|
||||
def api_list_runs(limit: int = 50, username: str = Depends(require_auth)) -> dict:
|
||||
return {"ok": True, "items": list_runs(limit=limit), "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/runs/{run_id}")
|
||||
def api_get_run(run_id: int, username: str = Depends(require_auth)) -> dict:
|
||||
run = get_run_by_id(run_id)
|
||||
if not run:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Run nicht gefunden")
|
||||
return {"ok": True, "item": run, "requested_by": username}
|
||||
|
||||
|
||||
@app.post("/api/runs")
|
||||
def api_create_run(payload: RunCreateRequest, username: str = Depends(require_auth)) -> dict:
|
||||
run_id = create_run(RunCreate(run_type=payload.run_type, status=payload.status, details=payload.details))
|
||||
return {"ok": True, "id": run_id, "requested_by": username}
|
||||
|
||||
|
||||
@app.post("/api/runs/{run_id}/finish")
|
||||
def api_finish_run(run_id: int, payload: RunFinishRequest, username: str = Depends(require_auth)) -> dict:
|
||||
finish_run(run_id=run_id, status=payload.status, details=payload.details)
|
||||
return {"ok": True, "id": run_id, "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/articles")
|
||||
def api_list_articles(limit: int = 100, status_filter: str | None = None, username: str = Depends(require_auth)) -> dict:
|
||||
return {"ok": True, "items": repo_list_articles(limit=limit, status_filter=status_filter), "requested_by": username}
|
||||
|
||||
|
||||
@app.get("/api/articles/{article_id}")
|
||||
def api_get_article(article_id: int, username: str = Depends(require_auth)) -> dict:
|
||||
article = get_article_by_id(article_id)
|
||||
if not article:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Artikel nicht gefunden")
|
||||
return {"ok": True, "item": article, "requested_by": username}
|
||||
|
||||
|
||||
@app.post("/api/articles/upsert")
|
||||
def api_upsert_article(payload: ArticleUpsertRequest, username: str = Depends(require_auth)) -> dict:
|
||||
article_id = repo_upsert_article(
|
||||
ArticleUpsert(
|
||||
feed_id=payload.feed_id,
|
||||
source_article_id=payload.source_article_id,
|
||||
source_hash=payload.source_hash,
|
||||
title=payload.title,
|
||||
source_url=payload.source_url,
|
||||
canonical_url=payload.canonical_url,
|
||||
published_at=payload.published_at,
|
||||
author=payload.author,
|
||||
summary=payload.summary,
|
||||
content_raw=payload.content_raw,
|
||||
content_rewritten=payload.content_rewritten,
|
||||
word_count=payload.word_count,
|
||||
status=payload.status,
|
||||
meta_json=payload.meta_json,
|
||||
)
|
||||
)
|
||||
return {"ok": True, "id": article_id, "requested_by": username}
|
||||
|
||||
|
||||
@app.post("/api/articles/{article_id}/transition")
|
||||
def api_article_transition(article_id: int, payload: ArticleTransitionRequest, username: str = Depends(require_auth)) -> dict:
|
||||
article = get_article_by_id(article_id)
|
||||
if not article:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Artikel nicht gefunden")
|
||||
|
||||
current_status = article.get("status")
|
||||
allowed_targets = ALLOWED_ARTICLE_TRANSITIONS.get(current_status, set())
|
||||
if payload.target_status not in allowed_targets:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail=f"Ungueltiger Statuswechsel: {current_status} -> {payload.target_status}",
|
||||
)
|
||||
|
||||
updated = update_article_status(article_id, payload.target_status, actor=username, note=payload.note)
|
||||
if not updated:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Artikel nicht gefunden")
|
||||
return {"ok": True, "id": article_id, "from_status": current_status, "to_status": payload.target_status}
|
||||
|
||||
|
||||
@app.post("/api/articles/{article_id}/review")
|
||||
def api_article_review(article_id: int, payload: ArticleReviewRequest, username: str = Depends(require_auth)) -> dict:
|
||||
article = get_article_by_id(article_id)
|
||||
if not article:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Artikel nicht gefunden")
|
||||
if article.get("status") != "review":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail=f"Review nur fuer Status 'review' erlaubt (aktuell: {article.get('status')})",
|
||||
)
|
||||
|
||||
target_status = "approved" if payload.decision == "approve" else "rewrite"
|
||||
updated = update_article_status(
|
||||
article_id,
|
||||
target_status,
|
||||
actor=username,
|
||||
note=payload.note,
|
||||
decision=payload.decision,
|
||||
)
|
||||
if not updated:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Artikel nicht gefunden")
|
||||
return {
|
||||
"ok": True,
|
||||
"id": article_id,
|
||||
"decision": payload.decision,
|
||||
"to_status": target_status,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/api/ingestion/run")
|
||||
def api_run_ingestion(payload: IngestionRunRequest, username: str = Depends(require_auth)) -> dict:
|
||||
stats = run_ingestion(feed_id=payload.feed_id)
|
||||
return {
|
||||
"ok": stats.status == "success",
|
||||
"run_id": stats.run_id,
|
||||
"status": stats.status,
|
||||
"message": stats.message,
|
||||
"stats": {
|
||||
"feeds_processed": stats.feeds_processed,
|
||||
"entries_seen": stats.entries_seen,
|
||||
"articles_upserted": stats.articles_upserted,
|
||||
},
|
||||
"requested_by": username,
|
||||
}
|
||||
35
backend/app/policy.py
Normal file
35
backend/app/policy.py
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def evaluate_source_policy(source: dict[str, Any] | None) -> list[str]:
|
||||
issues: list[str] = []
|
||||
if not source:
|
||||
issues.append("Keine Quelle zugeordnet")
|
||||
return issues
|
||||
|
||||
risk_level = (source.get("risk_level") or "").strip().lower()
|
||||
if risk_level != "green":
|
||||
issues.append(f"Quelle nicht freigegeben (risk_level={risk_level or 'unset'})")
|
||||
|
||||
terms_url = (source.get("terms_url") or "").strip()
|
||||
if not terms_url:
|
||||
issues.append("terms_url fehlt")
|
||||
|
||||
license_name = (source.get("license_name") or "").strip()
|
||||
if not license_name:
|
||||
issues.append("license_name fehlt")
|
||||
|
||||
last_reviewed_at = (source.get("last_reviewed_at") or "").strip()
|
||||
if not last_reviewed_at:
|
||||
issues.append("last_reviewed_at fehlt")
|
||||
|
||||
if int(source.get("is_enabled", 0) or 0) != 1:
|
||||
issues.append("Quelle ist deaktiviert")
|
||||
|
||||
return issues
|
||||
|
||||
|
||||
def is_source_allowed(source: dict[str, Any] | None) -> bool:
|
||||
return len(evaluate_source_policy(source)) == 0
|
||||
416
backend/app/repositories.py
Normal file
416
backend/app/repositories.py
Normal file
|
|
@ -0,0 +1,416 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any
|
||||
|
||||
from .db import get_conn, rows_to_dicts
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SourceCreate:
|
||||
name: str
|
||||
base_url: str | None
|
||||
terms_url: str | None
|
||||
license_name: str | None
|
||||
risk_level: str
|
||||
is_enabled: bool
|
||||
notes: str | None
|
||||
last_reviewed_at: str | None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FeedCreate:
|
||||
name: str
|
||||
url: str
|
||||
source_id: int | None
|
||||
is_enabled: bool
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunCreate:
|
||||
run_type: str
|
||||
status: str
|
||||
details: str | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ArticleUpsert:
|
||||
feed_id: int | None
|
||||
source_article_id: str | None
|
||||
source_hash: str | None
|
||||
title: str
|
||||
source_url: str
|
||||
canonical_url: str | None
|
||||
published_at: str | None
|
||||
author: str | None
|
||||
summary: str | None
|
||||
content_raw: str | None
|
||||
content_rewritten: str | None
|
||||
word_count: int
|
||||
status: str
|
||||
meta_json: str | None
|
||||
|
||||
|
||||
def create_source(payload: SourceCreate) -> int:
|
||||
with get_conn() as conn:
|
||||
cur = conn.execute(
|
||||
"""
|
||||
INSERT INTO sources (name, base_url, terms_url, license_name, risk_level, is_enabled, notes, last_reviewed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
payload.name.strip(),
|
||||
payload.base_url,
|
||||
payload.terms_url,
|
||||
payload.license_name,
|
||||
payload.risk_level,
|
||||
1 if payload.is_enabled else 0,
|
||||
payload.notes,
|
||||
payload.last_reviewed_at,
|
||||
),
|
||||
)
|
||||
return int(cur.lastrowid)
|
||||
|
||||
|
||||
def list_sources() -> list[dict[str, Any]]:
|
||||
with get_conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT id, name, base_url, terms_url, license_name, risk_level, is_enabled, notes, last_reviewed_at, created_at, updated_at
|
||||
FROM sources
|
||||
ORDER BY id DESC
|
||||
"""
|
||||
).fetchall()
|
||||
return rows_to_dicts(rows)
|
||||
|
||||
|
||||
def get_source_by_id(source_id: int) -> dict[str, Any] | None:
|
||||
with get_conn() as conn:
|
||||
row = conn.execute(
|
||||
"""
|
||||
SELECT id, name, base_url, terms_url, license_name, risk_level, is_enabled, notes, last_reviewed_at, created_at, updated_at
|
||||
FROM sources
|
||||
WHERE id = ?
|
||||
""",
|
||||
(source_id,),
|
||||
).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
|
||||
def create_feed(payload: FeedCreate) -> int:
|
||||
with get_conn() as conn:
|
||||
cur = conn.execute(
|
||||
"INSERT INTO feeds (name, url, source_id, is_enabled) VALUES (?, ?, ?, ?)",
|
||||
(payload.name.strip(), payload.url.strip(), payload.source_id, 1 if payload.is_enabled else 0),
|
||||
)
|
||||
return int(cur.lastrowid)
|
||||
|
||||
|
||||
def list_feeds() -> list[dict[str, Any]]:
|
||||
with get_conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.url, f.source_id, f.is_enabled, f.etag, f.last_modified, f.last_checked_at,
|
||||
f.created_at, f.updated_at, s.name AS source_name, s.license_name AS source_license_name,
|
||||
s.terms_url AS source_terms_url, s.risk_level AS source_risk_level, s.base_url AS source_base_url,
|
||||
s.last_reviewed_at AS source_last_reviewed_at, s.is_enabled AS source_is_enabled
|
||||
FROM feeds f
|
||||
LEFT JOIN sources s ON s.id = f.source_id
|
||||
ORDER BY f.id DESC
|
||||
"""
|
||||
).fetchall()
|
||||
return rows_to_dicts(rows)
|
||||
|
||||
|
||||
def list_enabled_feeds() -> list[dict[str, Any]]:
|
||||
with get_conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.url, f.source_id, f.is_enabled, f.etag, f.last_modified, f.last_checked_at,
|
||||
s.name AS source_name, s.license_name AS source_license_name, s.terms_url AS source_terms_url,
|
||||
s.risk_level AS source_risk_level, s.base_url AS source_base_url,
|
||||
s.last_reviewed_at AS source_last_reviewed_at, s.is_enabled AS source_is_enabled
|
||||
FROM feeds f
|
||||
LEFT JOIN sources s ON s.id = f.source_id
|
||||
WHERE f.is_enabled = 1
|
||||
ORDER BY f.id ASC
|
||||
"""
|
||||
).fetchall()
|
||||
return rows_to_dicts(rows)
|
||||
|
||||
|
||||
def get_feed_by_id(feed_id: int) -> dict[str, Any] | None:
|
||||
with get_conn() as conn:
|
||||
row = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.url, f.source_id, f.is_enabled, f.etag, f.last_modified, f.last_checked_at,
|
||||
s.name AS source_name, s.license_name AS source_license_name, s.terms_url AS source_terms_url,
|
||||
s.risk_level AS source_risk_level, s.base_url AS source_base_url,
|
||||
s.last_reviewed_at AS source_last_reviewed_at, s.is_enabled AS source_is_enabled
|
||||
FROM feeds f
|
||||
LEFT JOIN sources s ON s.id = f.source_id
|
||||
WHERE f.id = ?
|
||||
""",
|
||||
(feed_id,),
|
||||
).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
|
||||
def update_feed_fetch_state(feed_id: int, etag: str | None, last_modified: str | None) -> None:
|
||||
with get_conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
UPDATE feeds
|
||||
SET etag = ?, last_modified = ?, last_checked_at = datetime('now')
|
||||
WHERE id = ?
|
||||
""",
|
||||
(etag, last_modified, feed_id),
|
||||
)
|
||||
|
||||
|
||||
def create_run(payload: RunCreate) -> int:
|
||||
with get_conn() as conn:
|
||||
cur = conn.execute(
|
||||
"INSERT INTO runs (run_type, status, details) VALUES (?, ?, ?)",
|
||||
(payload.run_type, payload.status, payload.details),
|
||||
)
|
||||
return int(cur.lastrowid)
|
||||
|
||||
|
||||
def finish_run(run_id: int, status: str, details: str | None = None) -> None:
|
||||
with get_conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
UPDATE runs
|
||||
SET status = ?, details = ?, finished_at = datetime('now')
|
||||
WHERE id = ?
|
||||
""",
|
||||
(status, details, run_id),
|
||||
)
|
||||
|
||||
|
||||
def list_runs(limit: int = 50) -> list[dict[str, Any]]:
|
||||
safe_limit = max(1, min(limit, 500))
|
||||
with get_conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT id, run_type, status, started_at, finished_at, details
|
||||
FROM runs
|
||||
ORDER BY id DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(safe_limit,),
|
||||
).fetchall()
|
||||
return rows_to_dicts(rows)
|
||||
|
||||
|
||||
def get_run_by_id(run_id: int) -> dict[str, Any] | None:
|
||||
with get_conn() as conn:
|
||||
row = conn.execute(
|
||||
"""
|
||||
SELECT id, run_type, status, started_at, finished_at, details
|
||||
FROM runs
|
||||
WHERE id = ?
|
||||
""",
|
||||
(run_id,),
|
||||
).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
|
||||
def get_article_by_id(article_id: int) -> dict[str, Any] | None:
|
||||
with get_conn() as conn:
|
||||
row = conn.execute(
|
||||
"""
|
||||
SELECT a.id, a.feed_id, a.source_article_id, a.source_hash, a.title, a.source_url, a.canonical_url, a.published_at, a.author,
|
||||
a.summary, a.content_raw, a.content_rewritten, a.word_count, a.status, a.meta_json, a.created_at, a.updated_at
|
||||
FROM articles a
|
||||
WHERE a.id = ?
|
||||
""",
|
||||
(article_id,),
|
||||
).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
|
||||
def _merge_review_event(meta_json: str | None, event: dict[str, Any]) -> str:
|
||||
meta: dict[str, Any] = {}
|
||||
if meta_json:
|
||||
try:
|
||||
meta = json.loads(meta_json)
|
||||
if not isinstance(meta, dict):
|
||||
meta = {}
|
||||
except Exception:
|
||||
meta = {}
|
||||
|
||||
events = meta.get("review_events")
|
||||
if not isinstance(events, list):
|
||||
events = []
|
||||
events.append(event)
|
||||
meta["review_events"] = events
|
||||
return json.dumps(meta, ensure_ascii=False)
|
||||
|
||||
|
||||
def update_article_status(
|
||||
article_id: int,
|
||||
new_status: str,
|
||||
*,
|
||||
actor: str | None = None,
|
||||
note: str | None = None,
|
||||
decision: str | None = None,
|
||||
) -> bool:
|
||||
article = get_article_by_id(article_id)
|
||||
if not article:
|
||||
return False
|
||||
|
||||
event = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"from_status": article.get("status"),
|
||||
"to_status": new_status,
|
||||
"actor": actor or "system",
|
||||
"note": note,
|
||||
"decision": decision,
|
||||
}
|
||||
merged_meta = _merge_review_event(article.get("meta_json"), event)
|
||||
|
||||
with get_conn() as conn:
|
||||
conn.execute(
|
||||
"UPDATE articles SET status = ?, meta_json = ? WHERE id = ?",
|
||||
(new_status, merged_meta, article_id),
|
||||
)
|
||||
return True
|
||||
|
||||
|
||||
def _resolve_existing_article_id(payload: ArticleUpsert) -> int | None:
|
||||
with get_conn() as conn:
|
||||
# 1) strongest key: source_url
|
||||
row = conn.execute(
|
||||
"SELECT id FROM articles WHERE source_url = ?",
|
||||
(payload.source_url.strip(),),
|
||||
).fetchone()
|
||||
if row:
|
||||
return int(row["id"])
|
||||
|
||||
# 2) stable feed+guid combo
|
||||
if payload.feed_id is not None and payload.source_article_id:
|
||||
row = conn.execute(
|
||||
"SELECT id FROM articles WHERE feed_id = ? AND source_article_id = ?",
|
||||
(payload.feed_id, payload.source_article_id),
|
||||
).fetchone()
|
||||
if row:
|
||||
return int(row["id"])
|
||||
|
||||
# 3) content hash fallback
|
||||
if payload.source_hash:
|
||||
row = conn.execute(
|
||||
"SELECT id FROM articles WHERE source_hash = ?",
|
||||
(payload.source_hash,),
|
||||
).fetchone()
|
||||
if row:
|
||||
return int(row["id"])
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def upsert_article(payload: ArticleUpsert) -> int:
|
||||
existing_id = _resolve_existing_article_id(payload)
|
||||
with get_conn() as conn:
|
||||
if existing_id is None:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO articles (
|
||||
feed_id, source_article_id, source_hash, title, source_url, canonical_url, published_at, author,
|
||||
summary, content_raw, content_rewritten, word_count, status, meta_json
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
payload.feed_id,
|
||||
payload.source_article_id,
|
||||
payload.source_hash,
|
||||
payload.title.strip(),
|
||||
payload.source_url.strip(),
|
||||
payload.canonical_url,
|
||||
payload.published_at,
|
||||
payload.author,
|
||||
payload.summary,
|
||||
payload.content_raw,
|
||||
payload.content_rewritten,
|
||||
payload.word_count,
|
||||
payload.status,
|
||||
payload.meta_json,
|
||||
),
|
||||
)
|
||||
else:
|
||||
conn.execute(
|
||||
"""
|
||||
UPDATE articles
|
||||
SET
|
||||
feed_id = ?,
|
||||
source_article_id = ?,
|
||||
source_hash = ?,
|
||||
title = ?,
|
||||
source_url = ?,
|
||||
canonical_url = ?,
|
||||
published_at = ?,
|
||||
author = ?,
|
||||
summary = ?,
|
||||
content_raw = ?,
|
||||
content_rewritten = ?,
|
||||
word_count = ?,
|
||||
status = ?,
|
||||
meta_json = ?
|
||||
WHERE id = ?
|
||||
""",
|
||||
(
|
||||
payload.feed_id,
|
||||
payload.source_article_id,
|
||||
payload.source_hash,
|
||||
payload.title.strip(),
|
||||
payload.source_url.strip(),
|
||||
payload.canonical_url,
|
||||
payload.published_at,
|
||||
payload.author,
|
||||
payload.summary,
|
||||
payload.content_raw,
|
||||
payload.content_rewritten,
|
||||
payload.word_count,
|
||||
payload.status,
|
||||
payload.meta_json,
|
||||
existing_id,
|
||||
),
|
||||
)
|
||||
row = conn.execute("SELECT id FROM articles WHERE source_url = ?", (payload.source_url.strip(),)).fetchone()
|
||||
if row:
|
||||
return int(row["id"])
|
||||
return int(existing_id) if existing_id else 0
|
||||
|
||||
|
||||
def list_articles(limit: int = 100, status_filter: str | None = None) -> list[dict[str, Any]]:
|
||||
safe_limit = max(1, min(limit, 500))
|
||||
with get_conn() as conn:
|
||||
if status_filter:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT a.id, a.feed_id, a.source_article_id, a.source_hash, a.title, a.source_url, a.canonical_url, a.published_at, a.author,
|
||||
a.summary, a.content_raw, a.word_count, a.status, a.meta_json, a.created_at, a.updated_at, f.name AS feed_name
|
||||
FROM articles a
|
||||
LEFT JOIN feeds f ON f.id = a.feed_id
|
||||
WHERE a.status = ?
|
||||
ORDER BY a.id DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(status_filter, safe_limit),
|
||||
).fetchall()
|
||||
else:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT a.id, a.feed_id, a.source_article_id, a.source_hash, a.title, a.source_url, a.canonical_url, a.published_at, a.author,
|
||||
a.summary, a.content_raw, a.word_count, a.status, a.meta_json, a.created_at, a.updated_at, f.name AS feed_name
|
||||
FROM articles a
|
||||
LEFT JOIN feeds f ON f.id = a.feed_id
|
||||
ORDER BY a.id DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(safe_limit,),
|
||||
).fetchall()
|
||||
return rows_to_dicts(rows)
|
||||
257
backend/app/source_extraction.py
Normal file
257
backend/app/source_extraction.py
Normal file
|
|
@ -0,0 +1,257 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from html import unescape
|
||||
import re
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin
|
||||
from urllib.request import Request, urlopen
|
||||
|
||||
DEFAULT_TIMEOUT_SECONDS = 10
|
||||
DEFAULT_USER_AGENT = "rss-news-bot/1.0 (+https://news.vanityontour.de)"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExtractedArticle:
|
||||
title: str | None
|
||||
author: str | None
|
||||
canonical_url: str | None
|
||||
summary: str | None
|
||||
content_text: str | None
|
||||
images: list[str]
|
||||
press_contact: str | None
|
||||
extraction_error: str | None = None
|
||||
|
||||
|
||||
def _clean_text(raw: str | None) -> str | None:
|
||||
if not raw:
|
||||
return None
|
||||
text = unescape(raw)
|
||||
text = re.sub(r"<[^>]+>", " ", text)
|
||||
text = re.sub(r"\s+", " ", text).strip()
|
||||
return text or None
|
||||
|
||||
|
||||
def _strip_noise(html: str) -> str:
|
||||
html = re.sub(r"<script[\s\S]*?</script>", " ", html, flags=re.IGNORECASE)
|
||||
html = re.sub(r"<style[\s\S]*?</style>", " ", html, flags=re.IGNORECASE)
|
||||
html = re.sub(r"<noscript[\s\S]*?</noscript>", " ", html, flags=re.IGNORECASE)
|
||||
return html
|
||||
|
||||
|
||||
def _meta_content(html: str, attr: str, value: str) -> str | None:
|
||||
pattern = re.compile(
|
||||
rf"<meta[^>]+{attr}\s*=\s*[\"']{re.escape(value)}[\"'][^>]*content\s*=\s*[\"']([^\"']+)[\"'][^>]*>",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
match = pattern.search(html)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
|
||||
# handle reversed attribute order
|
||||
pattern_rev = re.compile(
|
||||
rf"<meta[^>]+content\s*=\s*[\"']([^\"']+)[\"'][^>]*{attr}\s*=\s*[\"']{re.escape(value)}[\"'][^>]*>",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
match = pattern_rev.search(html)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
return None
|
||||
|
||||
|
||||
def _extract_title(html: str) -> str | None:
|
||||
title = _meta_content(html, "property", "og:title")
|
||||
if title:
|
||||
return title
|
||||
|
||||
match = re.search(r"<title[^>]*>([\s\S]*?)</title>", html, re.IGNORECASE)
|
||||
if match:
|
||||
cleaned = _clean_text(match.group(1))
|
||||
if cleaned:
|
||||
return cleaned
|
||||
|
||||
match = re.search(r"<h1[^>]*>([\s\S]*?)</h1>", html, re.IGNORECASE)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
return None
|
||||
|
||||
|
||||
def _extract_canonical(html: str) -> str | None:
|
||||
match = re.search(
|
||||
r"<link[^>]+rel\s*=\s*[\"']canonical[\"'][^>]*href\s*=\s*[\"']([^\"']+)[\"'][^>]*>",
|
||||
html,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
|
||||
match = re.search(
|
||||
r"<link[^>]+href\s*=\s*[\"']([^\"']+)[\"'][^>]*rel\s*=\s*[\"']canonical[\"'][^>]*>",
|
||||
html,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
return None
|
||||
|
||||
|
||||
def _extract_author(html: str) -> str | None:
|
||||
for attr, value in (("name", "author"), ("property", "article:author"), ("property", "og:article:author")):
|
||||
author = _meta_content(html, attr, value)
|
||||
if author:
|
||||
return author
|
||||
|
||||
for pattern in (
|
||||
r"(?:Von|Autor(?:in)?)\s*[:\-]\s*([^<\n\r]{3,120})",
|
||||
r"class=[\"'][^\"']*(?:author|byline)[^\"']*[\"'][^>]*>([\s\S]{1,180})<",
|
||||
):
|
||||
match = re.search(pattern, html, re.IGNORECASE)
|
||||
if match:
|
||||
author = _clean_text(match.group(1))
|
||||
if author:
|
||||
return author
|
||||
return None
|
||||
|
||||
|
||||
def _extract_images(html: str, page_url: str) -> list[str]:
|
||||
images: list[str] = []
|
||||
seen: set[str] = set()
|
||||
|
||||
for prop in ("og:image", "twitter:image"):
|
||||
pattern = re.compile(
|
||||
rf"<meta[^>]+property\s*=\s*[\"']{re.escape(prop)}[\"'][^>]*content\s*=\s*[\"']([^\"']+)[\"'][^>]*>",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
for match in pattern.finditer(html):
|
||||
src = match.group(1).strip()
|
||||
abs_src = urljoin(page_url, src)
|
||||
if abs_src not in seen:
|
||||
seen.add(abs_src)
|
||||
images.append(abs_src)
|
||||
|
||||
for match in re.finditer(r"<img[^>]+src\s*=\s*[\"']([^\"']+)[\"'][^>]*>", html, re.IGNORECASE):
|
||||
src = match.group(1).strip()
|
||||
abs_src = urljoin(page_url, src)
|
||||
if abs_src not in seen:
|
||||
seen.add(abs_src)
|
||||
images.append(abs_src)
|
||||
|
||||
return images
|
||||
|
||||
|
||||
def _extract_content_text(html: str) -> str | None:
|
||||
section = None
|
||||
for pattern in (
|
||||
r"<article[^>]*>([\s\S]*?)</article>",
|
||||
r"<main[^>]*>([\s\S]*?)</main>",
|
||||
r"<body[^>]*>([\s\S]*?)</body>",
|
||||
):
|
||||
match = re.search(pattern, html, re.IGNORECASE)
|
||||
if match:
|
||||
section = match.group(1)
|
||||
break
|
||||
|
||||
if not section:
|
||||
section = html
|
||||
|
||||
paragraphs = []
|
||||
for match in re.finditer(r"<h[2-4][^>]*>([\s\S]*?)</h[2-4]>", section, re.IGNORECASE):
|
||||
text = _clean_text(match.group(1))
|
||||
if text and re.search(r"\b(pressekontakt|press contact|kontakt)\b", text, re.IGNORECASE):
|
||||
paragraphs.append(text)
|
||||
|
||||
for match in re.finditer(r"<p[^>]*>([\s\S]*?)</p>", section, re.IGNORECASE):
|
||||
text = _clean_text(match.group(1))
|
||||
if text and len(text) > 2:
|
||||
paragraphs.append(text)
|
||||
|
||||
if paragraphs:
|
||||
return "\n".join(paragraphs)
|
||||
|
||||
stripped = _clean_text(section)
|
||||
return stripped
|
||||
|
||||
|
||||
def _extract_press_contact(content_text: str | None) -> str | None:
|
||||
if not content_text:
|
||||
return None
|
||||
|
||||
lines = [line.strip() for line in content_text.split("\n") if line.strip()]
|
||||
marker_re = re.compile(r"\b(pressekontakt|press contact|presse\-kontakt)\b", re.IGNORECASE)
|
||||
for idx, line in enumerate(lines):
|
||||
if marker_re.search(line):
|
||||
chunk = [line]
|
||||
for nxt in lines[idx + 1 : idx + 6]:
|
||||
if re.search(r"\b(original\-content von|ots:|newsroom:)\b", nxt, re.IGNORECASE):
|
||||
break
|
||||
chunk.append(nxt)
|
||||
return _clean_text("\n".join(chunk))
|
||||
|
||||
match = re.search(
|
||||
r"(Pressekontakt[\s\S]{0,1200}?)(?:Original-Content von|OTS:|newsroom:|$)",
|
||||
content_text,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
if match:
|
||||
return _clean_text(match.group(1))
|
||||
return None
|
||||
|
||||
|
||||
def extract_article(url: str, timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS) -> ExtractedArticle:
|
||||
try:
|
||||
req = Request(
|
||||
url=url,
|
||||
headers={
|
||||
"User-Agent": DEFAULT_USER_AGENT,
|
||||
"Accept-Language": "de-DE,de;q=0.9,en;q=0.8",
|
||||
},
|
||||
)
|
||||
with urlopen(req, timeout=timeout_seconds) as resp:
|
||||
raw = resp.read()
|
||||
charset = resp.headers.get_content_charset() or "utf-8"
|
||||
html = raw.decode(charset, errors="replace")
|
||||
except Exception as exc:
|
||||
return ExtractedArticle(
|
||||
title=None,
|
||||
author=None,
|
||||
canonical_url=None,
|
||||
summary=None,
|
||||
content_text=None,
|
||||
images=[],
|
||||
press_contact=None,
|
||||
extraction_error=str(exc),
|
||||
)
|
||||
|
||||
html = _strip_noise(html)
|
||||
title = _extract_title(html)
|
||||
author = _extract_author(html)
|
||||
canonical_url = _extract_canonical(html)
|
||||
summary = _meta_content(html, "name", "description")
|
||||
content_text = _extract_content_text(html)
|
||||
if not summary and content_text:
|
||||
summary = _clean_text(content_text[:320])
|
||||
images = _extract_images(html, url)
|
||||
press_contact = _extract_press_contact(content_text)
|
||||
|
||||
return ExtractedArticle(
|
||||
title=title,
|
||||
author=author,
|
||||
canonical_url=canonical_url,
|
||||
summary=summary,
|
||||
content_text=content_text,
|
||||
images=images,
|
||||
press_contact=press_contact,
|
||||
extraction_error=None,
|
||||
)
|
||||
|
||||
|
||||
def extracted_article_to_meta(article: ExtractedArticle) -> dict[str, Any]:
|
||||
return {
|
||||
"title": article.title,
|
||||
"author": article.author,
|
||||
"canonical_url": article.canonical_url,
|
||||
"summary": article.summary,
|
||||
"images": article.images,
|
||||
"press_contact": article.press_contact,
|
||||
"extraction_error": article.extraction_error,
|
||||
}
|
||||
BIN
backend/data/rss_news.db
Normal file
BIN
backend/data/rss_news.db
Normal file
Binary file not shown.
3
backend/requirements-test.txt
Normal file
3
backend/requirements-test.txt
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
pytest==8.3.5
|
||||
pytest-cov==6.0.0
|
||||
httpx==0.28.1
|
||||
8
backend/requirements.txt
Normal file
8
backend/requirements.txt
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
fastapi==0.116.1
|
||||
uvicorn[standard]==0.35.0
|
||||
itsdangerous==2.2.0
|
||||
pydantic-settings==2.10.1
|
||||
python-dotenv==1.1.1
|
||||
feedparser==6.0.11
|
||||
jinja2==3.1.4
|
||||
python-multipart==0.0.20
|
||||
189
backend/static/admin.css
Normal file
189
backend/static/admin.css
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
body {
|
||||
margin: 0;
|
||||
font-family: "Segoe UI", Tahoma, Geneva, Verdana, sans-serif;
|
||||
background: #f4f6f8;
|
||||
color: #1f2937;
|
||||
}
|
||||
|
||||
.topbar {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 20px 28px;
|
||||
background: #0f172a;
|
||||
color: #f8fafc;
|
||||
}
|
||||
|
||||
.container {
|
||||
padding: 20px 28px 28px 28px;
|
||||
}
|
||||
|
||||
.login {
|
||||
max-width: 520px;
|
||||
margin: 60px auto;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: #ffffff;
|
||||
border-radius: 10px;
|
||||
box-shadow: 0 1px 3px rgba(0, 0, 0, 0.12);
|
||||
padding: 16px;
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
.stats {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(4, minmax(0, 1fr));
|
||||
gap: 12px;
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
.stat {
|
||||
background: #ffffff;
|
||||
border-radius: 10px;
|
||||
padding: 12px;
|
||||
box-shadow: 0 1px 3px rgba(0, 0, 0, 0.12);
|
||||
}
|
||||
|
||||
.stat .label {
|
||||
font-size: 12px;
|
||||
color: #64748b;
|
||||
}
|
||||
|
||||
.stat .value {
|
||||
font-size: 24px;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.grid.two {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 1fr;
|
||||
gap: 16px;
|
||||
}
|
||||
|
||||
.stack {
|
||||
display: grid;
|
||||
gap: 10px;
|
||||
}
|
||||
|
||||
.row {
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.filter-row {
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
.inline {
|
||||
display: flex;
|
||||
gap: 6px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
}
|
||||
|
||||
th, td {
|
||||
text-align: left;
|
||||
padding: 8px;
|
||||
border-bottom: 1px solid #e5e7eb;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
input, select, button {
|
||||
padding: 8px;
|
||||
border-radius: 6px;
|
||||
border: 1px solid #cbd5e1;
|
||||
font: inherit;
|
||||
}
|
||||
|
||||
button {
|
||||
background: #0ea5e9;
|
||||
border-color: #0ea5e9;
|
||||
color: white;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
button.secondary {
|
||||
background: #64748b;
|
||||
border-color: #64748b;
|
||||
}
|
||||
|
||||
.badge {
|
||||
display: inline-block;
|
||||
padding: 2px 8px;
|
||||
border-radius: 999px;
|
||||
background: #e2e8f0;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.badge.ok {
|
||||
background: #dcfce7;
|
||||
color: #166534;
|
||||
}
|
||||
|
||||
.badge.bad {
|
||||
background: #fee2e2;
|
||||
color: #991b1b;
|
||||
}
|
||||
|
||||
.alert {
|
||||
margin-bottom: 12px;
|
||||
padding: 10px;
|
||||
border-radius: 8px;
|
||||
background: #fee2e2;
|
||||
color: #991b1b;
|
||||
}
|
||||
|
||||
.flash {
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.flash-success {
|
||||
border-left: 4px solid #10b981;
|
||||
}
|
||||
|
||||
.flash-error {
|
||||
border-left: 4px solid #ef4444;
|
||||
}
|
||||
|
||||
.subtle {
|
||||
color: #64748b;
|
||||
font-size: 12px;
|
||||
margin-top: 4px;
|
||||
}
|
||||
|
||||
.pre {
|
||||
white-space: pre-wrap;
|
||||
line-height: 1.35;
|
||||
max-height: 220px;
|
||||
overflow: auto;
|
||||
background: #f8fafc;
|
||||
border: 1px solid #e2e8f0;
|
||||
border-radius: 8px;
|
||||
padding: 8px;
|
||||
margin-top: 6px;
|
||||
}
|
||||
|
||||
.linkbtn {
|
||||
display: inline-block;
|
||||
padding: 8px 10px;
|
||||
border-radius: 6px;
|
||||
text-decoration: none;
|
||||
border: 1px solid #cbd5e1;
|
||||
color: #334155;
|
||||
background: #f8fafc;
|
||||
}
|
||||
|
||||
@media (max-width: 920px) {
|
||||
.stats {
|
||||
grid-template-columns: repeat(2, minmax(0, 1fr));
|
||||
}
|
||||
.grid.two {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
235
backend/templates/admin_dashboard.html
Normal file
235
backend/templates/admin_dashboard.html
Normal file
|
|
@ -0,0 +1,235 @@
|
|||
<!doctype html>
|
||||
<html lang="de">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>{{ title }}</title>
|
||||
<link rel="stylesheet" href="/admin/static/admin.css" />
|
||||
</head>
|
||||
<body>
|
||||
<header class="topbar">
|
||||
<div>
|
||||
<h1>rss-news Admin Dashboard</h1>
|
||||
<p>Angemeldet als <strong>{{ user }}</strong></p>
|
||||
</div>
|
||||
<form method="post" action="/admin/logout">
|
||||
<button type="submit" class="secondary">Logout</button>
|
||||
</form>
|
||||
</header>
|
||||
|
||||
<main class="container">
|
||||
{% if flash_msg %}
|
||||
<section class="card flash {{ 'flash-error' if flash_type == 'error' else 'flash-success' }}">
|
||||
{{ flash_msg }}
|
||||
</section>
|
||||
{% endif %}
|
||||
|
||||
<section class="stats">
|
||||
<article class="stat">
|
||||
<div class="label">Quellen</div>
|
||||
<div class="value">{{ sources|length }}</div>
|
||||
</article>
|
||||
<article class="stat">
|
||||
<div class="label">Feeds</div>
|
||||
<div class="value">{{ feeds|length }}</div>
|
||||
</article>
|
||||
<article class="stat">
|
||||
<div class="label">Artikel</div>
|
||||
<div class="value">{{ articles|length }}</div>
|
||||
</article>
|
||||
<article class="stat">
|
||||
<div class="label">Runs</div>
|
||||
<div class="value">{{ runs|length }}</div>
|
||||
</article>
|
||||
</section>
|
||||
|
||||
<section class="grid two">
|
||||
<article class="card">
|
||||
<h2>Quelle anlegen</h2>
|
||||
<form method="post" action="/admin/sources/create" class="stack">
|
||||
<input name="name" placeholder="Name" required />
|
||||
<input name="base_url" placeholder="Base URL" />
|
||||
<input name="terms_url" placeholder="Terms URL" />
|
||||
<input name="license_name" placeholder="Lizenzname" />
|
||||
<select name="risk_level">
|
||||
<option value="green">green</option>
|
||||
<option value="yellow" selected>yellow</option>
|
||||
<option value="red">red</option>
|
||||
</select>
|
||||
<input name="last_reviewed_at" placeholder="last_reviewed_at (ISO)" />
|
||||
<button type="submit">Quelle speichern</button>
|
||||
</form>
|
||||
</article>
|
||||
|
||||
<article class="card">
|
||||
<h2>Feed anlegen</h2>
|
||||
<form method="post" action="/admin/feeds/create" class="stack">
|
||||
<input name="name" placeholder="Feed Name" required />
|
||||
<input name="url" placeholder="https://..." required />
|
||||
<label>Quelle</label>
|
||||
<select name="source_id">
|
||||
<option value="">-- keine --</option>
|
||||
{% for s in sources %}
|
||||
<option value="{{ s.id }}">{{ s.name }} (#{{ s.id }})</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
<button type="submit">Feed speichern</button>
|
||||
</form>
|
||||
</article>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Ingestion starten</h2>
|
||||
<form method="post" action="/admin/ingestion/run" class="row">
|
||||
<select name="feed_id">
|
||||
<option value="">Alle aktivierten Feeds</option>
|
||||
{% for f in feeds %}
|
||||
<option value="{{ f.id }}">{{ f.name }} (#{{ f.id }})</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
<button type="submit">Ingestion starten</button>
|
||||
</form>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Quellen + Policy</h2>
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>ID</th><th>Name</th><th>Risk</th><th>Lizenz</th><th>Terms</th><th>Policy</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for s in sources %}
|
||||
<tr>
|
||||
<td>{{ s.id }}</td>
|
||||
<td>{{ s.name }}</td>
|
||||
<td>{{ s.risk_level }}</td>
|
||||
<td>{{ s.license_name or "-" }}</td>
|
||||
<td>{{ s.terms_url or "-" }}</td>
|
||||
<td>
|
||||
{% if source_policy[s.id] %}
|
||||
<span class="badge bad">BLOCKED ({{ source_policy[s.id]|length }})</span>
|
||||
<div class="subtle">{{ source_policy[s.id]|join(", ") }}</div>
|
||||
{% else %}
|
||||
<span class="badge ok">OK</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Artikel (Review)</h2>
|
||||
<form method="get" action="/admin/dashboard" class="row filter-row">
|
||||
<label>Status-Filter</label>
|
||||
<select name="status_filter">
|
||||
<option value="" {% if not status_filter %}selected{% endif %}>alle</option>
|
||||
{% for s in status_options %}
|
||||
<option value="{{ s }}" {% if status_filter == s %}selected{% endif %}>{{ s }}</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
<button type="submit" class="secondary">Filtern</button>
|
||||
<a href="/admin/dashboard" class="linkbtn">Reset</a>
|
||||
</form>
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>ID</th><th>Artikel</th><th>Status</th><th>Details</th><th>Review</th><th>Transition</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for a in articles %}
|
||||
<tr>
|
||||
<td>{{ a.id }}</td>
|
||||
<td>
|
||||
<strong>{{ a.title }}</strong><br />
|
||||
<span class="subtle">Autor: {{ a.author or "-" }}</span><br />
|
||||
<a href="{{ a.source_url }}" target="_blank" rel="noopener">Original öffnen</a>
|
||||
{% if a.canonical_url and a.canonical_url != a.source_url %}
|
||||
<br /><a href="{{ a.canonical_url }}" target="_blank" rel="noopener">Canonical öffnen</a>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td><span class="badge">{{ a.status }}</span></td>
|
||||
<td>
|
||||
{% if a.summary %}
|
||||
<div><strong>Summary:</strong> {{ a.summary }}</div>
|
||||
{% endif %}
|
||||
{% if a.content_raw %}
|
||||
<details>
|
||||
<summary>Volltext anzeigen</summary>
|
||||
<div class="pre">{{ a.content_raw }}</div>
|
||||
</details>
|
||||
{% endif %}
|
||||
<div class="subtle">Bilder: {{ a.extracted_images|length }}</div>
|
||||
{% if a.extracted_images %}
|
||||
<details>
|
||||
<summary>Bild-URLs</summary>
|
||||
<ul>
|
||||
{% for img in a.extracted_images %}
|
||||
<li><a href="{{ img }}" target="_blank" rel="noopener">{{ img }}</a></li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
</details>
|
||||
{% endif %}
|
||||
{% if a.press_contact %}
|
||||
<details>
|
||||
<summary>Pressekontakt</summary>
|
||||
<div class="pre">{{ a.press_contact }}</div>
|
||||
</details>
|
||||
{% endif %}
|
||||
{% if a.extraction_error %}
|
||||
<div class="subtle">Extraktionsfehler: {{ a.extraction_error }}</div>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td>
|
||||
{% if a.status == "review" %}
|
||||
<form method="post" action="/admin/articles/{{ a.id }}/review" class="inline">
|
||||
<input name="note" placeholder="Notiz" />
|
||||
<button name="decision" value="approve" type="submit">Approve</button>
|
||||
<button name="decision" value="reject" type="submit" class="secondary">Reject</button>
|
||||
</form>
|
||||
{% else %}
|
||||
-
|
||||
{% endif %}
|
||||
</td>
|
||||
<td>
|
||||
<form method="post" action="/admin/articles/{{ a.id }}/transition" class="inline">
|
||||
<select name="target_status">
|
||||
{% for s in allowed_transitions.get(a.status, []) %}
|
||||
<option value="{{ s }}">{{ s }}</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
{% if allowed_transitions.get(a.status, []) %}
|
||||
<button type="submit" class="secondary">Setzen</button>
|
||||
{% else %}
|
||||
<span class="subtle">keine Aktion</span>
|
||||
{% endif %}
|
||||
</form>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Runs</h2>
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>ID</th><th>Typ</th><th>Status</th><th>Start</th><th>Ende</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for r in runs %}
|
||||
<tr>
|
||||
<td>{{ r.id }}</td>
|
||||
<td>{{ r.run_type }}</td>
|
||||
<td>{{ r.status }}</td>
|
||||
<td>{{ r.started_at }}</td>
|
||||
<td>{{ r.finished_at or "-" }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
</main>
|
||||
</body>
|
||||
</html>
|
||||
27
backend/templates/admin_login.html
Normal file
27
backend/templates/admin_login.html
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
<!doctype html>
|
||||
<html lang="de">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>{{ title }}</title>
|
||||
<link rel="stylesheet" href="/admin/static/admin.css" />
|
||||
</head>
|
||||
<body>
|
||||
<main class="container login">
|
||||
<h1>rss-news Admin</h1>
|
||||
<p>Bitte anmelden, um das Tool zu verwalten.</p>
|
||||
{% if error %}
|
||||
<div class="alert">Login fehlgeschlagen. Bitte pruefen.</div>
|
||||
{% endif %}
|
||||
<form method="post" action="/admin/login" class="card">
|
||||
<label>Benutzername
|
||||
<input type="text" name="username" required />
|
||||
</label>
|
||||
<label>Passwort
|
||||
<input type="password" name="password" required />
|
||||
</label>
|
||||
<button type="submit">Anmelden</button>
|
||||
</form>
|
||||
</main>
|
||||
</body>
|
||||
</html>
|
||||
1
backend/tests/__init__.py
Normal file
1
backend/tests/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
"""Tests package."""
|
||||
65
backend/tests/test_admin_ui.py
Normal file
65
backend/tests/test_admin_ui.py
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from backend.app import config as config_module
|
||||
from backend.app.db import init_db
|
||||
from backend.app.main import app
|
||||
|
||||
|
||||
class TestAdminUi(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.tmp_dir = tempfile.TemporaryDirectory()
|
||||
os.environ["APP_DB_PATH"] = str(Path(self.tmp_dir.name) / "admin_ui.db")
|
||||
os.environ["APP_ADMIN_USERNAME"] = "admin"
|
||||
os.environ["APP_ADMIN_PASSWORD"] = "secret"
|
||||
config_module.get_settings.cache_clear()
|
||||
init_db()
|
||||
self.client = TestClient(app)
|
||||
|
||||
def tearDown(self) -> None:
|
||||
config_module.get_settings.cache_clear()
|
||||
os.environ.pop("APP_DB_PATH", None)
|
||||
os.environ.pop("APP_ADMIN_USERNAME", None)
|
||||
os.environ.pop("APP_ADMIN_PASSWORD", None)
|
||||
self.tmp_dir.cleanup()
|
||||
|
||||
def test_admin_login_and_dashboard(self) -> None:
|
||||
login_page = self.client.get("/admin/login")
|
||||
self.assertEqual(login_page.status_code, 200)
|
||||
self.assertIn("rss-news Admin", login_page.text)
|
||||
|
||||
login = self.client.post(
|
||||
"/admin/login",
|
||||
data={"username": "admin", "password": "secret"},
|
||||
follow_redirects=True,
|
||||
)
|
||||
self.assertEqual(login.status_code, 200)
|
||||
self.assertIn("Admin Dashboard", login.text)
|
||||
|
||||
def test_dashboard_redirects_if_not_logged_in(self) -> None:
|
||||
res = self.client.get("/admin/dashboard", follow_redirects=False)
|
||||
self.assertEqual(res.status_code, 303)
|
||||
self.assertEqual(res.headers.get("location"), "/admin/login")
|
||||
|
||||
def test_create_feed_with_empty_source_id_does_not_error(self) -> None:
|
||||
self.client.post(
|
||||
"/admin/login",
|
||||
data={"username": "admin", "password": "secret"},
|
||||
follow_redirects=True,
|
||||
)
|
||||
# empty source_id used to cause validation issues in form parsing
|
||||
res = self.client.post(
|
||||
"/admin/feeds/create",
|
||||
data={"name": "Feed X", "url": "https://example.org/feed.xml", "source_id": ""},
|
||||
follow_redirects=False,
|
||||
)
|
||||
self.assertEqual(res.status_code, 303)
|
||||
self.assertTrue(res.headers.get("location", "").startswith("/admin/dashboard"))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
77
backend/tests/test_api_auth.py
Normal file
77
backend/tests/test_api_auth.py
Normal file
|
|
@ -0,0 +1,77 @@
|
|||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from backend.app import config as config_module
|
||||
from backend.app.db import init_db
|
||||
from backend.app.main import app
|
||||
|
||||
|
||||
class TestApiAuth(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.tmp_dir = tempfile.TemporaryDirectory()
|
||||
os.environ["APP_DB_PATH"] = str(Path(self.tmp_dir.name) / "api.db")
|
||||
os.environ["APP_ADMIN_USERNAME"] = "admin"
|
||||
os.environ["APP_ADMIN_PASSWORD"] = "secret"
|
||||
config_module.get_settings.cache_clear()
|
||||
init_db()
|
||||
self.client = TestClient(app)
|
||||
|
||||
def tearDown(self) -> None:
|
||||
config_module.get_settings.cache_clear()
|
||||
os.environ.pop("APP_DB_PATH", None)
|
||||
os.environ.pop("APP_ADMIN_USERNAME", None)
|
||||
os.environ.pop("APP_ADMIN_PASSWORD", None)
|
||||
self.tmp_dir.cleanup()
|
||||
|
||||
def test_login_and_protected_endpoint(self) -> None:
|
||||
r = self.client.post("/auth/login", json={"username": "admin", "password": "secret"})
|
||||
self.assertEqual(r.status_code, 200)
|
||||
|
||||
p = self.client.get("/api/protected")
|
||||
self.assertEqual(p.status_code, 200)
|
||||
self.assertTrue(p.json().get("ok"))
|
||||
|
||||
def test_protected_requires_auth(self) -> None:
|
||||
r = self.client.get("/api/protected")
|
||||
self.assertEqual(r.status_code, 401)
|
||||
|
||||
def test_run_detail_endpoint(self) -> None:
|
||||
login = self.client.post("/auth/login", json={"username": "admin", "password": "secret"})
|
||||
self.assertEqual(login.status_code, 200)
|
||||
|
||||
created = self.client.post("/api/runs", json={"run_type": "ingestion", "status": "running"})
|
||||
self.assertEqual(created.status_code, 200)
|
||||
run_id = created.json()["id"]
|
||||
|
||||
detail = self.client.get(f"/api/runs/{run_id}")
|
||||
self.assertEqual(detail.status_code, 200)
|
||||
self.assertEqual(detail.json()["item"]["id"], run_id)
|
||||
|
||||
def test_source_policy_check_endpoint(self) -> None:
|
||||
login = self.client.post("/auth/login", json={"username": "admin", "password": "secret"})
|
||||
self.assertEqual(login.status_code, 200)
|
||||
|
||||
created = self.client.post(
|
||||
"/api/sources",
|
||||
json={
|
||||
"name": "Policy Source",
|
||||
"risk_level": "yellow",
|
||||
"is_enabled": True,
|
||||
},
|
||||
)
|
||||
self.assertEqual(created.status_code, 200)
|
||||
source_id = created.json()["id"]
|
||||
|
||||
check = self.client.get(f"/api/sources/{source_id}/policy-check")
|
||||
self.assertEqual(check.status_code, 200)
|
||||
body = check.json()
|
||||
self.assertFalse(body["allowed"])
|
||||
self.assertGreaterEqual(len(body["issues"]), 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
95
backend/tests/test_article_workflow.py
Normal file
95
backend/tests/test_article_workflow.py
Normal file
|
|
@ -0,0 +1,95 @@
|
|||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from backend.app import config as config_module
|
||||
from backend.app.db import init_db
|
||||
from backend.app.main import app
|
||||
|
||||
|
||||
class TestArticleWorkflow(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.tmp_dir = tempfile.TemporaryDirectory()
|
||||
os.environ["APP_DB_PATH"] = str(Path(self.tmp_dir.name) / "workflow.db")
|
||||
os.environ["APP_ADMIN_USERNAME"] = "admin"
|
||||
os.environ["APP_ADMIN_PASSWORD"] = "secret"
|
||||
config_module.get_settings.cache_clear()
|
||||
init_db()
|
||||
self.client = TestClient(app)
|
||||
self.client.post("/auth/login", json={"username": "admin", "password": "secret"})
|
||||
|
||||
def tearDown(self) -> None:
|
||||
config_module.get_settings.cache_clear()
|
||||
os.environ.pop("APP_DB_PATH", None)
|
||||
os.environ.pop("APP_ADMIN_USERNAME", None)
|
||||
os.environ.pop("APP_ADMIN_PASSWORD", None)
|
||||
self.tmp_dir.cleanup()
|
||||
|
||||
def _create_article(self) -> int:
|
||||
source = self.client.post(
|
||||
"/api/sources",
|
||||
json={
|
||||
"name": "Workflow Source",
|
||||
"base_url": "https://example.org",
|
||||
"terms_url": "https://example.org/terms",
|
||||
"license_name": "cc-by",
|
||||
"risk_level": "green",
|
||||
"is_enabled": True,
|
||||
"last_reviewed_at": "2026-02-18T00:00:00Z",
|
||||
},
|
||||
)
|
||||
source_id = source.json()["id"]
|
||||
|
||||
feed = self.client.post(
|
||||
"/api/feeds",
|
||||
json={"name": "Workflow Feed", "url": "https://example.org/feed.xml", "source_id": source_id, "is_enabled": True},
|
||||
)
|
||||
feed_id = feed.json()["id"]
|
||||
|
||||
article = self.client.post(
|
||||
"/api/articles/upsert",
|
||||
json={
|
||||
"feed_id": feed_id,
|
||||
"source_article_id": "wf-1",
|
||||
"source_url": "https://example.org/a1",
|
||||
"title": "Workflow Artikel",
|
||||
"summary": "s",
|
||||
"content_raw": "c",
|
||||
"status": "new",
|
||||
},
|
||||
)
|
||||
return article.json()["id"]
|
||||
|
||||
def test_valid_transition_chain(self) -> None:
|
||||
article_id = self._create_article()
|
||||
|
||||
t1 = self.client.post(f"/api/articles/{article_id}/transition", json={"target_status": "review"})
|
||||
self.assertEqual(t1.status_code, 200)
|
||||
|
||||
r1 = self.client.post(f"/api/articles/{article_id}/review", json={"decision": "approve", "note": "ok"})
|
||||
self.assertEqual(r1.status_code, 200)
|
||||
self.assertEqual(r1.json()["to_status"], "approved")
|
||||
|
||||
t2 = self.client.post(f"/api/articles/{article_id}/transition", json={"target_status": "published"})
|
||||
self.assertEqual(t2.status_code, 200)
|
||||
|
||||
final = self.client.get(f"/api/articles/{article_id}")
|
||||
self.assertEqual(final.status_code, 200)
|
||||
self.assertEqual(final.json()["item"]["status"], "published")
|
||||
|
||||
def test_invalid_transition_rejected(self) -> None:
|
||||
article_id = self._create_article()
|
||||
bad = self.client.post(f"/api/articles/{article_id}/transition", json={"target_status": "published"})
|
||||
self.assertEqual(bad.status_code, 400)
|
||||
|
||||
def test_review_only_allowed_in_review_status(self) -> None:
|
||||
article_id = self._create_article()
|
||||
bad = self.client.post(f"/api/articles/{article_id}/review", json={"decision": "approve"})
|
||||
self.assertEqual(bad.status_code, 400)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
119
backend/tests/test_db_repositories.py
Normal file
119
backend/tests/test_db_repositories.py
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from backend.app import config as config_module
|
||||
from backend.app.db import init_db
|
||||
from backend.app.repositories import (
|
||||
ArticleUpsert,
|
||||
FeedCreate,
|
||||
RunCreate,
|
||||
SourceCreate,
|
||||
create_feed,
|
||||
create_run,
|
||||
create_source,
|
||||
finish_run,
|
||||
list_articles,
|
||||
list_feeds,
|
||||
list_runs,
|
||||
list_sources,
|
||||
upsert_article,
|
||||
)
|
||||
|
||||
|
||||
class TestSQLiteRepositories(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.tmp_dir = tempfile.TemporaryDirectory()
|
||||
self.db_path = str(Path(self.tmp_dir.name) / "test.db")
|
||||
os.environ["APP_DB_PATH"] = self.db_path
|
||||
config_module.get_settings.cache_clear()
|
||||
init_db()
|
||||
|
||||
def tearDown(self) -> None:
|
||||
config_module.get_settings.cache_clear()
|
||||
os.environ.pop("APP_DB_PATH", None)
|
||||
self.tmp_dir.cleanup()
|
||||
|
||||
def test_end_to_end_basic_crud(self) -> None:
|
||||
source_id = create_source(
|
||||
SourceCreate(
|
||||
name="GovData",
|
||||
base_url="https://data.gov.de",
|
||||
terms_url="https://www.govdata.de/dl-de/by-2-0",
|
||||
license_name="dl-de/by-2-0",
|
||||
risk_level="green",
|
||||
is_enabled=True,
|
||||
notes="test source",
|
||||
last_reviewed_at="2026-02-18T00:00:00Z",
|
||||
)
|
||||
)
|
||||
self.assertGreater(source_id, 0)
|
||||
|
||||
feed_id = create_feed(
|
||||
FeedCreate(
|
||||
name="GovData RSS",
|
||||
url="https://example.org/feed.xml",
|
||||
source_id=source_id,
|
||||
is_enabled=True,
|
||||
)
|
||||
)
|
||||
self.assertGreater(feed_id, 0)
|
||||
|
||||
run_id = create_run(RunCreate(run_type="ingest", status="running", details="start"))
|
||||
self.assertGreater(run_id, 0)
|
||||
finish_run(run_id=run_id, status="success", details="ok")
|
||||
|
||||
article_id = upsert_article(
|
||||
ArticleUpsert(
|
||||
feed_id=feed_id,
|
||||
source_article_id="abc-1",
|
||||
source_hash="hash-abc-1",
|
||||
title="Beispielartikel",
|
||||
source_url="https://example.org/articles/1",
|
||||
canonical_url="https://example.org/articles/1",
|
||||
published_at="2026-02-18T00:00:00Z",
|
||||
author="Max Mustermann",
|
||||
summary="Kurzfassung",
|
||||
content_raw="Originaltext",
|
||||
content_rewritten="Umschreibung",
|
||||
word_count=120,
|
||||
status="review",
|
||||
meta_json='{"lang":"de"}',
|
||||
)
|
||||
)
|
||||
self.assertGreater(article_id, 0)
|
||||
|
||||
# Upsert with same source_url updates same row
|
||||
article_id_2 = upsert_article(
|
||||
ArticleUpsert(
|
||||
feed_id=feed_id,
|
||||
source_article_id="abc-1",
|
||||
source_hash="hash-abc-1",
|
||||
title="Beispielartikel aktualisiert",
|
||||
source_url="https://example.org/articles/1",
|
||||
canonical_url="https://example.org/articles/1",
|
||||
published_at="2026-02-18T00:00:00Z",
|
||||
author="Max Mustermann",
|
||||
summary="Kurzfassung 2",
|
||||
content_raw="Originaltext 2",
|
||||
content_rewritten="Umschreibung 2",
|
||||
word_count=140,
|
||||
status="approved",
|
||||
meta_json='{"lang":"de","v":2}',
|
||||
)
|
||||
)
|
||||
self.assertEqual(article_id, article_id_2)
|
||||
|
||||
self.assertEqual(len(list_sources()), 1)
|
||||
self.assertEqual(len(list_feeds()), 1)
|
||||
self.assertEqual(len(list_runs()), 1)
|
||||
|
||||
articles = list_articles()
|
||||
self.assertEqual(len(articles), 1)
|
||||
self.assertEqual(articles[0]["title"], "Beispielartikel aktualisiert")
|
||||
self.assertEqual(articles[0]["status"], "approved")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
122
backend/tests/test_ingestion.py
Normal file
122
backend/tests/test_ingestion.py
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
from backend.app import config as config_module
|
||||
from backend.app.db import init_db
|
||||
from backend.app.ingestion import run_ingestion
|
||||
from backend.app.repositories import FeedCreate, SourceCreate, create_feed, create_source, list_articles
|
||||
from backend.app.source_extraction import ExtractedArticle
|
||||
|
||||
|
||||
class TestIngestion(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.tmp_dir = tempfile.TemporaryDirectory()
|
||||
os.environ["APP_DB_PATH"] = str(Path(self.tmp_dir.name) / "ingestion.db")
|
||||
config_module.get_settings.cache_clear()
|
||||
init_db()
|
||||
|
||||
source_id = create_source(
|
||||
SourceCreate(
|
||||
name="Test Source",
|
||||
base_url="https://example.org",
|
||||
terms_url="https://example.org/terms",
|
||||
license_name="cc-by",
|
||||
risk_level="green",
|
||||
is_enabled=True,
|
||||
notes=None,
|
||||
last_reviewed_at="2026-02-18T00:00:00Z",
|
||||
)
|
||||
)
|
||||
self.feed_id = create_feed(
|
||||
FeedCreate(
|
||||
name="Test Feed",
|
||||
url="https://example.org/feed.xml",
|
||||
source_id=source_id,
|
||||
is_enabled=True,
|
||||
)
|
||||
)
|
||||
|
||||
def tearDown(self) -> None:
|
||||
config_module.get_settings.cache_clear()
|
||||
os.environ.pop("APP_DB_PATH", None)
|
||||
self.tmp_dir.cleanup()
|
||||
|
||||
@patch("backend.app.ingestion.extract_article")
|
||||
@patch("backend.app.ingestion.feedparser.parse")
|
||||
def test_ingestion_deduplicates_by_feed_and_guid(self, mock_parse, mock_extract_article) -> None:
|
||||
mock_extract_article.return_value = ExtractedArticle(
|
||||
title="Artikel 1 original",
|
||||
author="Autorin A",
|
||||
canonical_url="https://example.org/article/1",
|
||||
summary="Original Summary",
|
||||
content_text="Original Volltext",
|
||||
images=["https://example.org/a.jpg"],
|
||||
press_contact="Pressekontakt: Team A",
|
||||
extraction_error=None,
|
||||
)
|
||||
mock_parse.return_value = {
|
||||
"etag": "etag-1",
|
||||
"modified": "Tue, 18 Feb 2026 10:00:00 GMT",
|
||||
"entries": [
|
||||
{
|
||||
"id": "item-1",
|
||||
"title": "Artikel 1",
|
||||
"link": "https://example.org/article/1",
|
||||
"summary": "A",
|
||||
},
|
||||
{
|
||||
"id": "item-1",
|
||||
"title": "Artikel 1 aktualisiert",
|
||||
"link": "https://example.org/article/1-neu",
|
||||
"summary": "B",
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
stats = run_ingestion(feed_id=self.feed_id)
|
||||
self.assertEqual(stats.status, "success")
|
||||
self.assertEqual(stats.entries_seen, 2)
|
||||
self.assertEqual(len(list_articles()), 1)
|
||||
article = list_articles()[0]
|
||||
self.assertEqual(article["title"], "Artikel 1 original")
|
||||
self.assertEqual(article["author"], "Autorin A")
|
||||
self.assertIn("Original Volltext", article["content_raw"] or "")
|
||||
self.assertIn("Pressekontakt", article["meta_json"] or "")
|
||||
|
||||
@patch("backend.app.ingestion.extract_article")
|
||||
@patch("backend.app.ingestion.feedparser.parse")
|
||||
def test_ingestion_blocks_non_green_source(self, mock_parse, mock_extract_article) -> None:
|
||||
# Re-create source/feed with yellow risk to verify enforcement
|
||||
source_id = create_source(
|
||||
SourceCreate(
|
||||
name="Blocked Source",
|
||||
base_url="https://example.net",
|
||||
terms_url="https://example.net/terms",
|
||||
license_name="custom",
|
||||
risk_level="yellow",
|
||||
is_enabled=True,
|
||||
notes=None,
|
||||
last_reviewed_at="2026-02-18T00:00:00Z",
|
||||
)
|
||||
)
|
||||
blocked_feed_id = create_feed(
|
||||
FeedCreate(
|
||||
name="Blocked Feed",
|
||||
url="https://example.net/feed.xml",
|
||||
source_id=source_id,
|
||||
is_enabled=True,
|
||||
)
|
||||
)
|
||||
|
||||
stats = run_ingestion(feed_id=blocked_feed_id)
|
||||
self.assertEqual(stats.status, "success")
|
||||
self.assertEqual(stats.articles_upserted, 0)
|
||||
mock_parse.assert_not_called()
|
||||
mock_extract_article.assert_not_called()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
69
backend/tests/test_source_extraction.py
Normal file
69
backend/tests/test_source_extraction.py
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
import unittest
|
||||
from unittest.mock import patch
|
||||
|
||||
from backend.app.source_extraction import extract_article
|
||||
|
||||
|
||||
SAMPLE_HTML = """
|
||||
<!doctype html>
|
||||
<html lang="de">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta property="og:title" content="Demo Meldung von Presseportal" />
|
||||
<meta name="author" content="Max Mustermann" />
|
||||
<meta name="description" content="Kurzbeschreibung aus der Originalseite" />
|
||||
<meta property="og:image" content="/images/demo.jpg" />
|
||||
<link rel="canonical" href="https://www.presseportal.de/pm/118273/6158137" />
|
||||
</head>
|
||||
<body>
|
||||
<article>
|
||||
<p>Dies ist der vollstaendige Inhalt des Artikels.</p>
|
||||
<p>Weitere relevante Informationen fuer die Meldung.</p>
|
||||
<h3>Pressekontakt</h3>
|
||||
<p>Musterfirma GmbH, Kontakt: presse@example.org</p>
|
||||
</article>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
|
||||
class _FakeHeaders:
|
||||
@staticmethod
|
||||
def get_content_charset():
|
||||
return "utf-8"
|
||||
|
||||
|
||||
class _FakeResponse:
|
||||
headers = _FakeHeaders()
|
||||
|
||||
def __init__(self, body: str):
|
||||
self._body = body.encode("utf-8")
|
||||
|
||||
def read(self):
|
||||
return self._body
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc_val, exc_tb):
|
||||
return False
|
||||
|
||||
|
||||
class TestSourceExtraction(unittest.TestCase):
|
||||
@patch("backend.app.source_extraction.urlopen")
|
||||
def test_extract_article_parses_author_images_and_press_contact(self, mock_urlopen) -> None:
|
||||
mock_urlopen.return_value = _FakeResponse(SAMPLE_HTML)
|
||||
|
||||
extracted = extract_article("https://www.presseportal.de/pm/118273/6158137")
|
||||
self.assertEqual(extracted.title, "Demo Meldung von Presseportal")
|
||||
self.assertEqual(extracted.author, "Max Mustermann")
|
||||
self.assertEqual(extracted.canonical_url, "https://www.presseportal.de/pm/118273/6158137")
|
||||
self.assertIn("vollstaendige Inhalt", extracted.content_text or "")
|
||||
self.assertIn("Kurzbeschreibung", extracted.summary or "")
|
||||
self.assertIn("https://www.presseportal.de/images/demo.jpg", extracted.images)
|
||||
self.assertIn("Pressekontakt", extracted.press_contact or "")
|
||||
self.assertIsNone(extracted.extraction_error)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
67
docs/PROJECT_PLAN.md
Normal file
67
docs/PROJECT_PLAN.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
# Projektplan (Neustart)
|
||||
|
||||
## Leitentscheidungen
|
||||
- Bestehendes Repository wird weiterverwendet.
|
||||
- Kein harter Endtermin: lauffaehig werden, dann iterativ verbessern.
|
||||
- Hetzner bleibt Laufzeitplattform.
|
||||
- WordPress (IONOS) bleibt vorerst Ziel fuer Publikation.
|
||||
- Auth initial nur mit einem User/Password.
|
||||
|
||||
## Zielbild
|
||||
Eine modulare News-Pipeline mit klaren Stufen:
|
||||
1. Feed-Ingestion
|
||||
2. Inhaltsanalyse und Normalisierung
|
||||
3. Rewrite/Anreicherung
|
||||
4. Legal- und Qualitaetschecks
|
||||
5. WordPress-Publikation (`pending`)
|
||||
6. Monitoring/Logging
|
||||
|
||||
## Grobe Zeitplanung (ohne Fixtermine)
|
||||
- Phase 0: ca. 1 Woche
|
||||
- Phase 1: ca. 2-4 Wochen
|
||||
- Phase 2: ca. 2-3 Wochen
|
||||
- Phase 3: fortlaufend
|
||||
|
||||
## Phasen
|
||||
|
||||
### Phase 0 - Grundlagen (jetzt)
|
||||
- Doku und Wiki strukturieren
|
||||
- Source-Policy definieren
|
||||
- Redirect fuer `news.vanityontour.de` setzen
|
||||
- GitHub Project als zentrale Planung scharfstellen
|
||||
|
||||
### Phase 1 - MVP Core
|
||||
- Neues FastAPI-Projektgeruest
|
||||
- SQLite-Datenmodell (feeds, articles, runs, source_policy)
|
||||
- Feed-Import mit Duplikaterkennung
|
||||
- Admin-Login (ein User)
|
||||
- Manuelle Review vor Publish
|
||||
|
||||
### Phase 2 - Automation
|
||||
- Job-Queue (asynchron)
|
||||
- Regelbasierte Scheduler
|
||||
- Retry/Dead-Letter-Handling
|
||||
- Robustes Error-Reporting
|
||||
|
||||
### Phase 3 - Compliance und Skalierung
|
||||
- Source-Whitelisting mit Pflichtfeldern
|
||||
- Pflicht-Attribution pro Artikel
|
||||
- Qualitaetsmetriken und Audit-Logs
|
||||
- Optional: Passkey/WebAuthn
|
||||
|
||||
## Architekturprinzipien
|
||||
- Idempotente Jobs
|
||||
- Trennung von UI, API, Worker
|
||||
- Strikte Validierung bei Quell-/Lizenzdaten
|
||||
- Expliziter Publish-Schritt, kein blindes Autoposting
|
||||
|
||||
## Risiken
|
||||
- Lizenz-/Nutzungsbedingungen je Quelle variieren stark
|
||||
- Feeds aendern Struktur/Verfuegbarkeit
|
||||
- WordPress-API und Auth koennen regressionsanfaellig sein
|
||||
|
||||
## Erfolgsmetriken
|
||||
- Zeit von Feed-Eingang bis Review-Ready
|
||||
- Quote sauber attribuierter Artikel
|
||||
- Fehlerrate pro Pipeline-Stufe
|
||||
- Anzahl manueller Eingriffe pro Woche
|
||||
81
docs/SOURCE_POLICY.md
Normal file
81
docs/SOURCE_POLICY.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# Source Policy und Feed-Vorschlaege
|
||||
|
||||
## Grundsatz
|
||||
Es werden nur Quellen genutzt, deren Nutzungsbedingungen die geplante Nutzung erlauben oder fuer die eine explizite Genehmigung vorliegt.
|
||||
|
||||
## Pflichtdaten pro Quelle
|
||||
- Quellname
|
||||
- Feed-URL
|
||||
- Originalartikel-URL
|
||||
- Autor/Herausgeber (wenn vorhanden)
|
||||
- Lizenz/Nutzungsgrundlage
|
||||
- Einschraenkungen (kommerziell, Bearbeitung, Bildrechte, Archivierung)
|
||||
- Datum der letzten Pruefung
|
||||
- Link auf Nutzungsbedingungen
|
||||
|
||||
## Einstufung (Ampel)
|
||||
- Gruen: Nutzung fuer geplantes Modell klar erlaubt
|
||||
- Gelb: teilklar/mit Einschraenkungen, manuelle Pruefung erforderlich
|
||||
- Rot: fuer das Modell nicht geeignet ohne Zusatzvertrag
|
||||
|
||||
## Verbindliche Regeln
|
||||
- Keine neue Quelle ohne Eintrag im Source-Register
|
||||
- Kein automatischer Publish bei Gelb/Rot
|
||||
- Bilder separat pruefen (Textrecht != Bildrecht)
|
||||
- Quartalsweiser Re-Check der Terms
|
||||
|
||||
## Ersteinschaetzung (Stand: 16.02.2026)
|
||||
|
||||
### Rot
|
||||
1. Reuters / Thomson Reuters
|
||||
- Grund: Inhalte sind urheberrechtlich geschuetzt; Reproduktion/Verteilung laut Terms nur mit vorheriger Zustimmung.
|
||||
- Folge: Nur mit explizitem Vertrag/Lizenz.
|
||||
- Referenz:
|
||||
- https://www.thomsonreuters.com/en/terms-of-use
|
||||
|
||||
2. tagesschau.de RSS
|
||||
- Grund: Inhalte nur privat/nicht-kommerziell; Veroeffentlichung grundsaetzlich nicht erlaubt (ausser explizit CC-lizenziert).
|
||||
- Folge: Nicht fuer das geplante Modell geeignet.
|
||||
- Referenz:
|
||||
- https://www.tagesschau.de/infoservices/rssfeeds
|
||||
|
||||
### Gelb
|
||||
1. Presseportal / ots
|
||||
- Grund: Redaktionelle Nutzung grundsaetzlich moeglich, aber Verantwortung liegt beim Verwender; darueber hinausgehende Geschaeftsnutzung nur mit Genehmigung.
|
||||
- Folge: Nur mit strikter Einzelpruefung pro Meldung (insb. Bild-/Drittrechte).
|
||||
- Referenz:
|
||||
- https://www.presseportal.de/nutzungsbedingungen
|
||||
- https://www.presseportal.de/feeds/
|
||||
|
||||
2. Bundesbehoerden-RSS ohne explizite freie Weiterverwendungs-Lizenz
|
||||
- Grund: RSS wird bereitgestellt, aber nicht immer als offene Lizenz zur kommerziellen Nachnutzung formuliert.
|
||||
- Folge: Je Behoerde einzeln pruefen und dokumentieren.
|
||||
- Beispiele:
|
||||
- https://www.bundesfinanzministerium.de/Content/DE/Standardartikel/Service/rss_base.html
|
||||
- https://bmas.bund.de/EN/Services/RSS/rss.html
|
||||
|
||||
### Gruen (mit korrekter Attribution)
|
||||
1. GovData / Open-Data-Portale mit `dl-de/by-2-0`, `dl-de/zero-2-0`, `CC BY 4.0` oder `CC0`
|
||||
- Grund: Diese Lizenzen erlauben grundsaetzlich auch kommerzielle Weiterverwendung (je nach Lizenzbedingungen).
|
||||
- Folge: Sehr gut fuer stabile Automatisierung geeignet.
|
||||
- Referenz:
|
||||
- https://www.govdata.de/dl-de/by-2-0
|
||||
- https://data.gov.de/informationen/lizenzen
|
||||
- https://www.dcat-ap.de/def/licenses/dl-zero-de/2.0
|
||||
|
||||
2. EU-Quellen mit expliziter `CC BY 4.0` Wiederverwendungsregel
|
||||
- Grund: EU-Inhalte sind haeufig unter CC BY 4.0 wiederverwendbar, sofern nicht anders gekennzeichnet.
|
||||
- Folge: Geeignet, wenn Drittinhalte ausgenommen werden.
|
||||
- Referenz:
|
||||
- https://commission.europa.eu/legal-notice_en
|
||||
- https://eur-lex.europa.eu/content/help/content/legal-notice/legal-notice.html
|
||||
|
||||
## Quelle im Register freischalten (Definition of Done)
|
||||
- Terms-Link hinterlegt
|
||||
- Lizenzklasse (Gruen/Gelb/Rot) gesetzt
|
||||
- Pflicht-Attribution dokumentiert
|
||||
- Bildrechtsregel dokumentiert
|
||||
- Letzte Pruefung und Verantwortlicher gepflegt
|
||||
|
||||
## Hinweis
|
||||
Keine Rechtsberatung. Bei unklaren oder wirtschaftlich kritischen Quellen ist eine juristische Prüfung sinnvoll.
|
||||
33
docs/TODO.md
Normal file
33
docs/TODO.md
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
# ToDo (Ein-Entwickler Setup)
|
||||
|
||||
## Jetzt
|
||||
- [ ] GitHub Project #3 Felder/Views fuer Neustart vereinheitlichen
|
||||
- [ ] Alte/obsolet gewordene Issues kennzeichnen (z. B. User-Verwaltung)
|
||||
- [ ] Redirect `news.vanityontour.de -> vanityontour.de` aktiv halten
|
||||
- [ ] Wiki-Basis fertigstellen und verlinken
|
||||
|
||||
## MVP
|
||||
- [x] Neues Backend-Skelett (`backend/`) aufsetzen (FastAPI)
|
||||
- [x] Datenmodell in SQLite anlegen
|
||||
- [x] Feed-Ingestion Service bauen (ETag/Last-Modified)
|
||||
- [x] Duplikaterkennung ueber `source_url`, `guid`, Hash
|
||||
- [x] Login mit 1 Admin-Account implementieren
|
||||
- [ ] Artikel-Review-Maske mit Statusworkflow
|
||||
- [ ] WordPress-Publisher als separaten Service implementieren
|
||||
|
||||
## Recht/Qualitaet
|
||||
- [ ] Source-Policy in DB + Admin-UI abbilden
|
||||
- [ ] Pflichtfelder je Quelle erzwingen (Autor, URL, Lizenz, Hinweise)
|
||||
- [ ] Auto-Block bei fehlender Lizenzinfo
|
||||
- [ ] Pro Artikel Attribution-Block generieren
|
||||
|
||||
## Betrieb
|
||||
- [ ] Systemd-Service(s) fuer API/Worker erstellen
|
||||
- [ ] Nginx-Routing fuer neue App einrichten
|
||||
- [ ] Healthcheck-Endpunkte + Monitoring einrichten
|
||||
- [ ] Backup/Restore fuer DB dokumentieren
|
||||
|
||||
## Spaeter
|
||||
- [ ] Passkey/WebAuthn evaluieren und optional einfuehren
|
||||
- [ ] Migration auf PostgreSQL bewerten
|
||||
- [ ] Teilautomatische Freigabe-Regeln definieren
|
||||
29
docs/wiki/Architektur.md
Normal file
29
docs/wiki/Architektur.md
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
# Architektur
|
||||
|
||||
## Zielarchitektur
|
||||
- API: FastAPI
|
||||
- Worker: Queue-basierte Hintergrundjobs
|
||||
- DB: SQLite (MVP), spaeter optional PostgreSQL
|
||||
- Publisher: WordPress REST API
|
||||
- Frontend/Admin: schlanke Web-UI mit Login
|
||||
|
||||
## Pipeline
|
||||
1. Feed Fetch
|
||||
2. Parse + Normalize
|
||||
3. Deduplicate
|
||||
4. Enrichment (Rewrite/Tags)
|
||||
5. Legal/Policy Check
|
||||
6. Publish (pending)
|
||||
|
||||
## Datenobjekte (MVP)
|
||||
- `sources`
|
||||
- `feeds`
|
||||
- `articles`
|
||||
- `article_versions`
|
||||
- `runs`
|
||||
- `policy_checks`
|
||||
|
||||
## Nichtziele (MVP)
|
||||
- Multi-User und Rollen
|
||||
- Vollautomatische Freigabe ohne Review
|
||||
- Komplexe externe SSO-Integration
|
||||
20
docs/wiki/Deployment.md
Normal file
20
docs/wiki/Deployment.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# Deployment (Hetzner + CloudPanel)
|
||||
|
||||
## Umgebung
|
||||
- Host: Hetzner
|
||||
- Reverse Proxy: Nginx via CloudPanel
|
||||
- Ziel-Domain: `news.vanityontour.de`
|
||||
|
||||
## Aktueller Zustand
|
||||
- Domain ist bis zum Go-Live auf `https://vanityontour.de` umgeleitet.
|
||||
|
||||
## Zielzustand
|
||||
- `news.vanityontour.de` zeigt auf neue App (interner Port, z. B. `127.0.0.1:8501`)
|
||||
- API/Worker laufen als systemd-Services
|
||||
- TLS bleibt ueber CloudPanel/Nginx
|
||||
|
||||
## Mindest-Checks nach Deployment
|
||||
- `curl -I https://news.vanityontour.de`
|
||||
- Login erreichbar
|
||||
- Feed-Import laeuft
|
||||
- WordPress-Testpublikation (pending) erfolgreich
|
||||
19
docs/wiki/Home.md
Normal file
19
docs/wiki/Home.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# Wiki Home
|
||||
|
||||
## Zweck
|
||||
Dieses Wiki dokumentiert Architektur, Betrieb, Sicherheit, Recht und Roadmap des Neuaufbaus von `rss-news`.
|
||||
|
||||
## Inhalte
|
||||
- `Architektur.md`
|
||||
- `Deployment.md`
|
||||
- `Security-Auth.md`
|
||||
- `Recht-Quellen.md`
|
||||
- `Operations-Runbook.md`
|
||||
- `Roadmap.md`
|
||||
- `Project-Board.md`
|
||||
|
||||
## Projektsteuerung
|
||||
- GitHub Project #3: https://github.com/users/OliverGiertz/projects/3/views/1
|
||||
|
||||
## Prinzip
|
||||
Dokumentation wird bei jeder relevanten Aenderung im selben Pull Request aktualisiert.
|
||||
23
docs/wiki/Operations-Runbook.md
Normal file
23
docs/wiki/Operations-Runbook.md
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
# Operations Runbook
|
||||
|
||||
## Daily Checks
|
||||
- App erreichbar
|
||||
- Queue/Worker aktiv
|
||||
- Letzte Feed-Laeufe erfolgreich
|
||||
- Keine auffaelligen Fehler im Log
|
||||
|
||||
## Incident: Feed-Import faellt aus
|
||||
1. RSS-Quelle erreichbar?
|
||||
2. Parser-Fehler im Log?
|
||||
3. Rate Limits oder Blockaden?
|
||||
4. Retry-Queue pruefen
|
||||
|
||||
## Incident: WordPress Publish faellt aus
|
||||
1. WP API erreichbar?
|
||||
2. Credentials gueltig?
|
||||
3. Payload-Validation/Tag-Fehler?
|
||||
4. Artikel in `pending` statt `failed` markieren, wenn unklar
|
||||
|
||||
## Backups
|
||||
- SQLite-Dump taeglich
|
||||
- Konfiguration und `.env` sicher sichern
|
||||
28
docs/wiki/Project-Board.md
Normal file
28
docs/wiki/Project-Board.md
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
# Project Board Workflow
|
||||
|
||||
## Zentrale Steuerung
|
||||
- Board: https://github.com/users/OliverGiertz/projects/3/views/1
|
||||
- Board ist die einzige Quelle fuer Planungsstatus.
|
||||
|
||||
## Arbeitsmodus (1 Entwickler)
|
||||
- Neue Arbeit immer als Issue anlegen
|
||||
- Issue direkt ins Project aufnehmen
|
||||
- Status nur im Project pflegen
|
||||
- PR/Commit auf Issue referenzieren
|
||||
|
||||
## Empfohlene Status-Disziplin
|
||||
- `Todo`: noch nicht begonnen
|
||||
- `In Progress`: aktiv in Arbeit
|
||||
- `Done`: umgesetzt und dokumentiert
|
||||
|
||||
## Konventionen fuer Issues
|
||||
- Prefix fuer Klarheit:
|
||||
- `[MVP]`
|
||||
- `[Infra]`
|
||||
- `[Legal]`
|
||||
- `[Bug]`
|
||||
- Definition of Done in jedem Issue notieren
|
||||
|
||||
## Aktueller Backlog-Hinweis
|
||||
- Thema Userverwaltung ist fuer MVP obsolet (ein Admin-User).
|
||||
- Entsprechende Issues als `deferred` oder `closed` kennzeichnen.
|
||||
35
docs/wiki/Recht-Quellen.md
Normal file
35
docs/wiki/Recht-Quellen.md
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
# Recht und Quellen
|
||||
|
||||
## Grundregeln
|
||||
- Nur freigegebene Quellen aus Source-Register
|
||||
- Pflicht-Attribution pro Artikel
|
||||
- Rechte fuer Bilder separat pruefen
|
||||
- Kein Autopublish bei unklarer Lizenz
|
||||
|
||||
## Bewertungsmodell
|
||||
- Gruen: Freie Nachnutzung klar erlaubt
|
||||
- Gelb: Nutzung mit Einschraenkungen/Einzelfallpruefung
|
||||
- Rot: Ohne Zusatzlizenz nicht geeignet
|
||||
|
||||
## Aktuelle Referenzen
|
||||
- Reuters/Thomson Reuters Terms: https://www.thomsonreuters.com/en/terms-of-use
|
||||
- Presseportal Nutzungsbedingungen: https://www.presseportal.de/nutzungsbedingungen
|
||||
- tagesschau RSS-Hinweise: https://www.tagesschau.de/infoservices/rssfeeds
|
||||
- Datenlizenz Deutschland BY 2.0: https://www.govdata.de/dl-de/by-2-0
|
||||
- GovData Lizenzen: https://data.gov.de/informationen/lizenzen
|
||||
- EU Legal Notice (CC BY 4.0): https://commission.europa.eu/legal-notice_en
|
||||
|
||||
## Review-Checkliste je Quelle
|
||||
1. Sind Bearbeitung und Veroeffentlichung erlaubt?
|
||||
2. Ist kommerzielle Nutzung erlaubt?
|
||||
3. Gibt es gesonderte Bildrechte?
|
||||
4. Ist die Quellenangabe vorgeschrieben?
|
||||
5. Gibt es Archivierungs- oder Weitergabebeschraenkungen?
|
||||
|
||||
## Operativer Schutz
|
||||
- Source-Register als Pflicht vor Feed-Aktivierung
|
||||
- Auto-Block bei fehlenden Lizenzdaten
|
||||
- Quartalsweiser Terms-Recheck
|
||||
|
||||
## Hinweis
|
||||
Keine Rechtsberatung. Finale Freigabe kritischer Quellen bei Bedarf juristisch validieren.
|
||||
19
docs/wiki/Roadmap.md
Normal file
19
docs/wiki/Roadmap.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# Roadmap
|
||||
|
||||
## Jetzt
|
||||
- Doku und Projektstruktur bereinigen
|
||||
- Redirect aktiv
|
||||
- Backlog auf Neustart ausrichten
|
||||
|
||||
## Naechster Schritt
|
||||
- FastAPI-MVP implementieren
|
||||
- Login + Feed-Ingestion + Review + WordPress pending
|
||||
|
||||
## Danach
|
||||
- Worker/Queue
|
||||
- Source-Policy Enforcement
|
||||
- Monitoring/Reporting
|
||||
- Optional Passkey
|
||||
|
||||
## Steuerung
|
||||
Alle Arbeitsitems liegen im GitHub Project #3.
|
||||
16
docs/wiki/Security-Auth.md
Normal file
16
docs/wiki/Security-Auth.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
# Security und Auth
|
||||
|
||||
## Mindestanforderungen
|
||||
- Zugriff auf die WebApp nur mit Login
|
||||
- Ein aktiver Admin-User (kein Rollenmodell im MVP)
|
||||
- Passwort nicht im Repo, nur als Secret auf Server
|
||||
|
||||
## Empfohlene Umsetzung
|
||||
- Session-basierte Auth (HTTP-only Cookies)
|
||||
- Passwort gehasht (Argon2 oder bcrypt)
|
||||
- Rate Limiting auf Login-Endpunkt
|
||||
- CSRF-Schutz fuer Form-Aktionen
|
||||
|
||||
## Spaeter (optional)
|
||||
- Passkey/WebAuthn als zusaetzlicher Login-Faktor
|
||||
- IP-Allowlist fuer Admin-Zugang
|
||||
4
pytest.ini
Normal file
4
pytest.ini
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
[pytest]
|
||||
testpaths = backend/tests
|
||||
python_files = test_*.py
|
||||
addopts = -q --maxfail=1
|
||||
33
scripts/smoke_backend.sh
Executable file
33
scripts/smoke_backend.sh
Executable file
|
|
@ -0,0 +1,33 @@
|
|||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -z "${BASE_URL:-}" ]]; then
|
||||
echo "BASE_URL fehlt (z. B. https://news.vanityontour.de)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "${APP_ADMIN_USERNAME:-}" || -z "${APP_ADMIN_PASSWORD:-}" ]]; then
|
||||
echo "APP_ADMIN_USERNAME/APP_ADMIN_PASSWORD fehlen"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cookie_file="$(mktemp)"
|
||||
trap 'rm -f "$cookie_file"' EXIT
|
||||
|
||||
echo "[1/4] Healthcheck"
|
||||
curl -fsS "${BASE_URL}/health" | grep -q '"status":"ok"'
|
||||
|
||||
echo "[2/4] Login"
|
||||
curl -fsS -c "$cookie_file" \
|
||||
-H "Content-Type: application/json" \
|
||||
-X POST "${BASE_URL}/auth/login" \
|
||||
-d "{\"username\":\"${APP_ADMIN_USERNAME}\",\"password\":\"${APP_ADMIN_PASSWORD}\"}" \
|
||||
| grep -q '"ok":true'
|
||||
|
||||
echo "[3/4] Protected Endpoint"
|
||||
curl -fsS -b "$cookie_file" "${BASE_URL}/api/protected" | grep -q '"ok":true'
|
||||
|
||||
echo "[4/4] Pipeline Status"
|
||||
curl -fsS -b "$cookie_file" "${BASE_URL}/api/pipeline/status" | grep -q '"stage":"skeleton+db"'
|
||||
|
||||
echo "Smoke test erfolgreich."
|
||||
Loading…
Add table
Add a link
Reference in a new issue