Commit graph

12 commits

Author SHA1 Message Date
OliverGiertz
f710141828 fix(scheduler): prevent duplicate slot assignment from concurrent pipeline runs
Two bugs caused multiple articles to land on the same publish slot:

1. main.py: asyncio.create_task() returned immediately, allowing a second
   pipeline trigger (N8N + Telegram /run or two N8N calls) to start a
   second concurrent run. Added asyncio.Lock (_pipeline_lock) so any
   second trigger while the pipeline is running is rejected immediately.

2. scheduler.py: reserve_publish_slot() read the list of occupied slots
   and wrote the new slot in two separate DB connections. Concurrent threads
   could both see the same "free" slot before either committed its write.
   Fixed by wrapping the entire read-find-write cycle in a threading.Lock
   (_slot_lock) and a single DB connection, so the slot check and the
   slot assignment are atomic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 05:03:13 +00:00
OliverGiertz
aaac5def27 feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution
source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
  per image URL using 3 strategies (figure+figcaption, data-* attributes,
  adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON

pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter

db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs

workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions

wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
  (original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block

telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 07:08:48 +00:00
OliverGiertz
e9c472b722 fix(telegram): async webhook handler + deduplicate callback responses
- Webhook returns 200 immediately, processing runs in background task
  → Telegram no longer retries, eliminates duplicate callbacks and 400 errors
- Consolidate answer_callback_query call to top of handler (before heavy work)
- Add logger.info/error for callback actions to aid debugging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 11:08:32 +00:00
OliverGiertz
1020526e76 fix(pipeline): run N8N pipeline endpoint async to avoid HTTP timeout
Pipeline runs in background via asyncio. Endpoint returns immediately,
results arrive via Telegram notifications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 10:03:13 +00:00
OliverGiertz
6192f8e527 feat(automation): autonomous pipeline with Telegram bot and N8N integration
- Add full auto pipeline: RSS ingest → GPT relevance score → AI rewrite → WP draft
- Add Telegram bot with inline buttons (rewrite/discard/override) and commands (/run, /rejected, /status)
- Add smart publish scheduler: max 2 drafts/day, spread over week (09:00 & 14:00 CET)
- Add N8N API endpoints (/api/n8n/pipeline, /api/n8n/ingest) with X-API-Key auth
- Add GPT-based relevance scoring (0-100) for VanLife/Camping/Outdoor topics
- Remove Ampel risk-level policy check from ingestion (all enabled feeds are used)
- Add Telegram webhook endpoint and setup endpoint
- Add delete_wp_post() for Telegram discard action
- Add DB migrations for relevance_score and scheduled_publish_at columns
- Update .env.example with all new configuration variables
- Add docs/AUTOMATION.md with full setup and usage documentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 09:40:15 +00:00
b0f995d5c9
feat(rewrite): add batch rewrite run, AI tags for WP, and agentur contact detection 2026-02-21 14:39:47 +01:00
35ccceb260
feat(workflow): simplify article flow and add automated rewrite step 2026-02-21 13:43:22 +01:00
1cee56205e feat(publisher): add wordpress draft queue with retry and admin controls 2026-02-18 10:49:43 +01:00
efaf132936 feat(images): add thumbnail gallery with select/exclude workflow 2026-02-18 10:11:22 +01:00
6691db8051 feat(export): add csv/json article export with date relevance scoring 2026-02-18 10:04:38 +01:00
5159a6e3b4 feat(legal): add structured attribution fields and publish legal gate 2026-02-18 10:02:19 +01:00
2c331d683b
feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00