Commit graph

92 commits

Author SHA1 Message Date
OliverGiertz
d1cb809852 fix(wordpress): fix attribution block source name and image credit lookup
- Derive real source hostname from canonical URL when feed name is generic
  (e.g. "Google Alerts"), so the link shows "moin.de" instead of "Google Alerts"
- Use _get_image_meta_for_url() (fuzzy URL matching) for image credit lookup
- Use caption field for Bildnachweis since it already contains embedded credits

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 08:28:44 +00:00
OliverGiertz
82f2df610d fix(wordpress): fuzzy URL match for image metadata and simplify caption builder
Image metadata keys may have query params (e.g. ?w=1200) that differ from
the selected_url stored in image_review. Fall back to comparing URLs without
query string so the figcaption text is correctly found.

Also simplified _build_image_caption: figcaption text already contains the
credit info, so just use caption directly instead of appending the redundant
credit prefix marker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 08:24:40 +00:00
OliverGiertz
8e65485f0c fix(ingestion): strip HTML tags from feed entry titles
Google Alerts wraps matched keywords in <b>...</b> tags.
Strip all HTML tags from the title before storing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 08:08:07 +00:00
OliverGiertz
0d07a9804d fix(ingestion): resolve Google Alerts redirect URLs before article fetch
Google Alerts feed entries use google.com/url?...&url=<encoded_real_url>&...
tracking links. The extractor was fetching the Google redirect page instead
of the actual article, resulting in empty content and no images.

_resolve_google_redirect() extracts the real URL from the 'url' query
parameter before passing it to extract_article(). Non-Google URLs are
returned unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 07:10:30 +00:00
OliverGiertz
aaac5def27 feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution
source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
  per image URL using 3 strategies (figure+figcaption, data-* attributes,
  adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON

pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter

db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs

workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions

wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
  (original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block

telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 07:08:48 +00:00
OliverGiertz
1963e32ab4 fix(rewrite): make image upload non-fatal and add rewrite tracing logs
- wordpress.py: catch image download/upload failures and skip image
  instead of aborting the entire WP draft update
- pipeline.py: add INFO logs at each step of _do_rewrite_and_draft
  to trace OpenAI call, tag generation, and WP API call
- telegram_bot.py: add INFO logs around rewrite execution + exc_info
  on error for full traceback in logs
- repositories.py: include scheduled_publish_at in get_article_by_id

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:45:55 +00:00
OliverGiertz
12932bca90 fix(rewrite): attribute claims to source instead of using first-person 'wir'
Rewrites must not use 'wir haben erforscht/berechnet' since the content
comes from a third-party source. The prompt now passes the source name
and instructs GPT to attribute all claims to the original publisher
(e.g. 'laut PiNCAMP', 'die Auswertung zeigt').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:36:09 +00:00
OliverGiertz
013af2ab62 fix(pipeline): set warning-zone articles to review status to prevent re-warnings
Articles scoring between warn and auto threshold stayed in "new" status,
causing repeated warning notifications on every /run call. Now they are
set to "review" status after the first warning is sent.

The override callback already resets status to "new" before processing,
so the existing flow works correctly. Also include "review" articles in
/rejected command output so they can be acted on.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:22:47 +00:00
OliverGiertz
a64bf31ff6 fix(telegram): restore webhook to RSS-News backend and forward app-release commands
The N8N App Release Telegram Trigger had overwritten the webhook
registration, pointing it to N8N instead of the RSS-News backend.
This caused all callback_query events (inline buttons) to be lost,
breaking the override/rewrite/discard buttons.

Changes:
- Re-register webhook to https://news.vanityontour.de/telegram/webhook
  with both message and callback_query in allowed_updates
- Add _forward_to_n8n_app_release() to proxy unknown bot commands
  (e.g. /release) to the N8N App Release webhook, keeping that
  workflow functional without needing its own Telegram Trigger

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 06:34:49 +00:00
OliverGiertz
970f509ad4 feat(wordpress): store suggested publish date directly in WP draft
Reserve the publish slot before creating the WP draft so the
scheduled_publish_at timestamp is available when building the post
payload. WordPress receives the `date` field (e.g. 2026-03-24T09:00:00)
which sets the scheduled publish time on the draft.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 11:15:39 +00:00
OliverGiertz
e9c472b722 fix(telegram): async webhook handler + deduplicate callback responses
- Webhook returns 200 immediately, processing runs in background task
  → Telegram no longer retries, eliminates duplicate callbacks and 400 errors
- Consolidate answer_callback_query call to top of handler (before heavy work)
- Add logger.info/error for callback actions to aid debugging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 11:08:32 +00:00
OliverGiertz
1020526e76 fix(pipeline): run N8N pipeline endpoint async to avoid HTTP timeout
Pipeline runs in background via asyncio. Endpoint returns immediately,
results arrive via Telegram notifications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 10:03:13 +00:00
OliverGiertz
d9ab599466 fix(deploy): correct service name and app path for Hetzner
Service is rss-news-api (not rss-app), app lives at /opt/rss-news.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:43:55 +00:00
OliverGiertz
0a9c0b10d6 test(ingestion): update test for removed Ampel risk-level check
Ampel system removed – all enabled feeds are now processed regardless
of risk_level. Updated test to verify feeds with any risk_level are
processed instead of blocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:41:34 +00:00
OliverGiertz
6192f8e527 feat(automation): autonomous pipeline with Telegram bot and N8N integration
- Add full auto pipeline: RSS ingest → GPT relevance score → AI rewrite → WP draft
- Add Telegram bot with inline buttons (rewrite/discard/override) and commands (/run, /rejected, /status)
- Add smart publish scheduler: max 2 drafts/day, spread over week (09:00 & 14:00 CET)
- Add N8N API endpoints (/api/n8n/pipeline, /api/n8n/ingest) with X-API-Key auth
- Add GPT-based relevance scoring (0-100) for VanLife/Camping/Outdoor topics
- Remove Ampel risk-level policy check from ingestion (all enabled feeds are used)
- Add Telegram webhook endpoint and setup endpoint
- Add delete_wp_post() for Telegram discard action
- Add DB migrations for relevance_score and scheduled_publish_at columns
- Update .env.example with all new configuration variables
- Add docs/AUTOMATION.md with full setup and usage documentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:40:15 +00:00
6332a9a399
feat(wordpress): publish true Gutenberg blocks and remove auto summary/details sections 2026-02-21 14:55:20 +01:00
93f52f72b9
fix(ingestion): preserve article workflow data and skip closed items on re-import 2026-02-21 14:51:36 +01:00
b0f995d5c9
feat(rewrite): add batch rewrite run, AI tags for WP, and agentur contact detection 2026-02-21 14:39:47 +01:00
da269d08f1
chore(admin): remove legal approval step from UI workflow 2026-02-21 14:11:03 +01:00
88b2ee1d01
feat(admin): add feed/source management, rewrite editor, reopen flow, and WP block output 2026-02-21 14:03:49 +01:00
50f737f434
feat(admin): add connectivity diagnostics page for domains and endpoints 2026-02-21 13:58:40 +01:00
35ccceb260
feat(workflow): simplify article flow and add automated rewrite step 2026-02-21 13:43:22 +01:00
8d7375c99f feat(ui): classify publisher errors with actionable hints 2026-02-21 13:11:43 +01:00
24d8e5ad0f feat(wordpress): improve post html structure and excerpt generation 2026-02-21 13:09:00 +01:00
e68b6a41fd feat(wordpress): upload selected image and set featured_media on draft publish 2026-02-21 13:07:08 +01:00
ba83b24510 chore: finalize current state and prepare next wordpress-focused roadmap 2026-02-18 11:11:49 +01:00
fee5e76842 feat(ui): add publish readiness indicators and WP env key aliases 2026-02-18 11:03:53 +01:00
592d699166 chore(config): load shared rss-news .env for wordpress and keys 2026-02-18 11:00:57 +01:00
1cee56205e feat(publisher): add wordpress draft queue with retry and admin controls 2026-02-18 10:49:43 +01:00
dcdf4d954a feat(ui): show auto image ranking reasons in article detail 2026-02-18 10:43:17 +01:00
26e3d26b93 feat(images): auto-select relevant article images and tidy detail header 2026-02-18 10:40:39 +01:00
fb3465fb10 fix(images): add proxy fallback to direct source url rendering 2026-02-18 10:20:47 +01:00
910ca72c81 fix(ui): render article images via authenticated proxy thumbnails 2026-02-18 10:16:30 +01:00
efaf132936 feat(images): add thumbnail gallery with select/exclude workflow 2026-02-18 10:11:22 +01:00
6691db8051 feat(export): add csv/json article export with date relevance scoring 2026-02-18 10:04:38 +01:00
5159a6e3b4 feat(legal): add structured attribution fields and publish legal gate 2026-02-18 10:02:19 +01:00
c52363f1a7
feat(admin): add article detail page with legal checklist 2026-02-18 09:52:36 +01:00
2c331d683b
feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
d65c55d315
Bump version to v1.7.1 2025-08-28 11:18:30 +02:00
a46d919118
Bump version to v1.7.0 2025-08-24 14:59:32 +02:00
46e0b98928
Update CHANGELOG.md 2025-08-18 10:37:27 +02:00
0bb7d246c1
Bump version to v1.6.3 2025-08-18 10:33:27 +02:00
a02f825274
Image Dublettenprüfung 2025-08-18 07:37:32 +02:00
0cfbb6c37f
Image Dublettenprüfung 2025-08-18 07:36:48 +02:00
777c770142
Update .gitignore 2025-08-17 18:01:33 +02:00
beac96095e
Update requirements.txt
Erweiterung der Abhängikeiten
2025-08-17 17:58:18 +02:00
ed91864eda
Create image_deduper.py
Funktionen:
- Scan: Verzeichnisse rekursiv scannen, sha256 + pHash berechnen
- Report: CSV + menschenlesbare Zusammenfassung mit Gruppen
- Apply: Duplikate auf kanonische Datei umbiegen (Hardlink/Löschen)
- Optional: DB-Referenzen aktualisieren (SQLite/SQLModel kompatibel)
2025-08-17 17:56:04 +02:00
759a313f31
Create roadmap-image-dedup.md 2025-08-17 17:54:09 +02:00
d6ab09226a
Bump version to v1.6.2 2025-08-16 13:39:10 +02:00
808a39dfc9
Bump version to v1.6.2 2025-08-16 13:33:02 +02:00