- Derive real source hostname from canonical URL when feed name is generic
(e.g. "Google Alerts"), so the link shows "moin.de" instead of "Google Alerts"
- Use _get_image_meta_for_url() (fuzzy URL matching) for image credit lookup
- Use caption field for Bildnachweis since it already contains embedded credits
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Image metadata keys may have query params (e.g. ?w=1200) that differ from
the selected_url stored in image_review. Fall back to comparing URLs without
query string so the figcaption text is correctly found.
Also simplified _build_image_caption: figcaption text already contains the
credit info, so just use caption directly instead of appending the redundant
credit prefix marker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Google Alerts wraps matched keywords in <b>...</b> tags.
Strip all HTML tags from the title before storing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Google Alerts feed entries use google.com/url?...&url=<encoded_real_url>&...
tracking links. The extractor was fetching the Google redirect page instead
of the actual article, resulting in empty content and no images.
_resolve_google_redirect() extracts the real URL from the 'url' query
parameter before passing it to extract_article(). Non-Google URLs are
returned unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
per image URL using 3 strategies (figure+figcaption, data-* attributes,
adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON
pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter
db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs
workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions
wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
(original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block
telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wordpress.py: catch image download/upload failures and skip image
instead of aborting the entire WP draft update
- pipeline.py: add INFO logs at each step of _do_rewrite_and_draft
to trace OpenAI call, tag generation, and WP API call
- telegram_bot.py: add INFO logs around rewrite execution + exc_info
on error for full traceback in logs
- repositories.py: include scheduled_publish_at in get_article_by_id
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites must not use 'wir haben erforscht/berechnet' since the content
comes from a third-party source. The prompt now passes the source name
and instructs GPT to attribute all claims to the original publisher
(e.g. 'laut PiNCAMP', 'die Auswertung zeigt').
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Articles scoring between warn and auto threshold stayed in "new" status,
causing repeated warning notifications on every /run call. Now they are
set to "review" status after the first warning is sent.
The override callback already resets status to "new" before processing,
so the existing flow works correctly. Also include "review" articles in
/rejected command output so they can be acted on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The N8N App Release Telegram Trigger had overwritten the webhook
registration, pointing it to N8N instead of the RSS-News backend.
This caused all callback_query events (inline buttons) to be lost,
breaking the override/rewrite/discard buttons.
Changes:
- Re-register webhook to https://news.vanityontour.de/telegram/webhook
with both message and callback_query in allowed_updates
- Add _forward_to_n8n_app_release() to proxy unknown bot commands
(e.g. /release) to the N8N App Release webhook, keeping that
workflow functional without needing its own Telegram Trigger
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reserve the publish slot before creating the WP draft so the
scheduled_publish_at timestamp is available when building the post
payload. WordPress receives the `date` field (e.g. 2026-03-24T09:00:00)
which sets the scheduled publish time on the draft.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Webhook returns 200 immediately, processing runs in background task
→ Telegram no longer retries, eliminates duplicate callbacks and 400 errors
- Consolidate answer_callback_query call to top of handler (before heavy work)
- Add logger.info/error for callback actions to aid debugging
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pipeline runs in background via asyncio. Endpoint returns immediately,
results arrive via Telegram notifications.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ampel system removed – all enabled feeds are now processed regardless
of risk_level. Updated test to verify feeds with any risk_level are
processed instead of blocked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>