feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution

source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
  per image URL using 3 strategies (figure+figcaption, data-* attributes,
  adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON

pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter

db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs

workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions

wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
  (original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block

telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
OliverGiertz 2026-03-27 07:08:48 +00:00
parent 1963e32ab4
commit aaac5def27
7 changed files with 381 additions and 10 deletions

View file

@ -289,6 +289,7 @@ def notify_pipeline_done(stats: dict[str, Any]) -> None:
processed = stats.get("processed", 0)
drafts = stats.get("drafts_created", 0)
rejected = stats.get("rejected", 0)
no_image = stats.get("no_image", 0)
warnings = stats.get("warnings", 0)
errors = stats.get("errors", 0)
@ -300,6 +301,8 @@ def notify_pipeline_done(stats: dict[str, Any]) -> None:
]
if rejected:
lines.append(f"🚫 Abgelehnt: {rejected}")
if no_image:
lines.append(f"🖼️ Kein Bild: {no_image}")
if warnings:
lines.append(f"⚠️ Warnungen: {warnings}")
if errors: