rss-news/backend/app
OliverGiertz aaac5def27 feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution
source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
  per image URL using 3 strategies (figure+figcaption, data-* attributes,
  adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON

pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter

db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs

workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions

wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
  (original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block

telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 07:08:48 +00:00
..
__init__.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
admin_ui.py feat(rewrite): add batch rewrite run, AI tags for WP, and agentur contact detection 2026-02-21 14:39:47 +01:00
auth.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
config.py feat(automation): autonomous pipeline with Telegram bot and N8N integration 2026-03-21 09:40:15 +00:00
db.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
ingestion.py feat(automation): autonomous pipeline with Telegram bot and N8N integration 2026-03-21 09:40:15 +00:00
main.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
pipeline.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
policy.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
publisher.py feat(workflow): simplify article flow and add automated rewrite step 2026-02-21 13:43:22 +01:00
relevance.py feat(export): add csv/json article export with date relevance scoring 2026-02-18 10:04:38 +01:00
repositories.py fix(rewrite): make image upload non-fatal and add rewrite tracing logs 2026-03-26 07:45:55 +00:00
rewrite.py fix(rewrite): attribute claims to source instead of using first-person 'wir' 2026-03-26 07:36:09 +00:00
scheduler.py feat(automation): autonomous pipeline with Telegram bot and N8N integration 2026-03-21 09:40:15 +00:00
source_extraction.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
telegram_bot.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
wordpress.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
workflow.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00