rss-news

Author	SHA1	Message	Date
OliverGiertz	f710141828	fix(scheduler): prevent duplicate slot assignment from concurrent pipeline runs Two bugs caused multiple articles to land on the same publish slot: 1. main.py: asyncio.create_task() returned immediately, allowing a second pipeline trigger (N8N + Telegram /run or two N8N calls) to start a second concurrent run. Added asyncio.Lock (_pipeline_lock) so any second trigger while the pipeline is running is rejected immediately. 2. scheduler.py: reserve_publish_slot() read the list of occupied slots and wrote the new slot in two separate DB connections. Concurrent threads could both see the same "free" slot before either committed its write. Fixed by wrapping the entire read-find-write cycle in a threading.Lock (_slot_lock) and a single DB connection, so the slot check and the slot assignment are atomic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 05:03:13 +00:00
OliverGiertz	cdcf441daf	feat(admin): bulk-editable article list with WP ID inline editing - New /admin/article-list: paginated (50/page) table with thumbnail, title, excerpt (120 chars), status, scheduled date, and WP ID input - Sticky save bar with live change counter (JS tracks modified inputs, highlights changed cells in amber, disables save when nothing changed) - POST /admin/article-list/update: saves only changed WP IDs in one request; clears stale wp_post_url so WP-Sync repopulates it cleanly - Filter by status + free-text search (title or article ID) - Pagination with page/filter state preserved through save redirects - repositories: add list_articles_page() (offset + search) and bulk_update_wp_post_ids() - Dashboard nav: add Artikelliste link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 09:00:25 +00:00
OliverGiertz	2d02b56b65	feat(admin): WordPress→DB sync for scheduled slots Adds sync_db_from_wordpress() that treats WordPress as source of truth: - future posts: update scheduled_publish_at to WP's actual date - draft posts: clear scheduled_publish_at (not yet scheduled) - published posts: mark article as 'published' in DB - trashed/deleted posts: clear wp_post_id + wp_post_url + slot so article can be re-processed Exposed via POST /admin/wp-sync with a sync button on the schedule page. Run after any manual rescheduling in WordPress to bring DB back in sync. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 08:53:44 +00:00
OliverGiertz	8676ace102	feat(pipeline): article age filter, image URL validation, schedule UI, retry button 1. Article age filter (ingestion.py + config.py): - New setting pipeline_max_article_age_days=7 (0 = no limit) - Skip RSS entries older than N days before expensive extract_article() - Prevents old articles from Google Alerts re-entering pipeline 2. Image URL pre-validation (ingestion.py): - HEAD request probe for each primary image candidate during ingestion - Falls back to next-best candidate if primary returns 4xx - Network errors treated as OK to avoid false negatives on flaky servers 3. Stale WP draft cleanup (pipeline.py): - Quality gate rejections now delete any pre-existing WP draft (wp_post_id) - Prevents orphaned drafts when re-running articles that previously had drafts 4. Schedule overview UI (scheduler.py + admin_ui.py + admin_schedule.html): - New /admin/schedule page showing calendar grid of all booked slots - Distinguishes Pipeline-DB slots from WordPress-only slots - Link added to dashboard navigation 5. Retry for failed articles (admin_ui.py + admin_dashboard.html): - New POST /admin/articles/{id}/retry endpoint: resets to 'new', releases slot - '🔄 Wiederholen' button shown in dashboard for all 'close' (error) articles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 08:44:28 +00:00
OliverGiertz	cf2d826c8a	fix(scheduler,pipeline): fix WP auth attribute name and release slot on hard errors - scheduler: use wordpress_app_password (not wordpress_password) so _fetch_wp_occupied_slots() can actually authenticate against the WP REST API — previously always returned empty set silently - pipeline: release reserved publish slot when draft creation fails with a non-ValueError exception (e.g. WP API error), preventing permanently blocked slots on failed articles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 08:22:26 +00:00
OliverGiertz	2d1dd14e45	fix(pipeline): send individual Telegram notifications for quality gate rejections - Add individual Telegram message when an article is rejected by quality gate (too short raw content or rewritten text), so users see each rejection in real time instead of only in the bulk summary - Add quality_gate_rejected counter to PipelineStats and result dict - Show quality gate rejections separately in pipeline-done summary (✂️ Qualitätsprüfung: N) distinct from score-based rejections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 07:02:03 +00:00
OliverGiertz	09dcf6ce36	feat(pipeline): add two-stage article quality gate (min word count) Stage 1 (before OpenAI rewrite): reject if raw content < pipeline_min_words_raw (default 120) Stage 2 (after rewrite): reject if rewritten text < pipeline_min_words_rewritten (default 150) Both stages set status='error' with a descriptive note and skip WP draft creation. The reserved publish slot is released so it stays available for the next article. Quality rejections don't abort the pipeline — processing continues with the next article. New config settings (overridable via .env): PIPELINE_MIN_WORDS_RAW=120 PIPELINE_MIN_WORDS_REWRITTEN=150 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 09:42:02 +00:00
OliverGiertz	94bd93a18a	fix(scheduler): fill schedule gaps instead of always appending to end Previously the scheduler started searching from the last scheduled post date, skipping all free slots in between (e.g. a free slot on Apr 20 would be ignored if the last post was on May 18). Now starts scanning from tomorrow, finding the first available slot regardless of whether earlier dates have gaps — fills the calendar naturally. Also extended lookahead from 30 to 60 days. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 09:34:27 +00:00
OliverGiertz	8fa46312e8	fix(scheduler): query WordPress future posts to avoid double-booking slots The scheduler previously only checked the local SQLite DB for occupied slots. Posts created outside the pipeline (e.g. recovery scripts) were invisible, causing newly scheduled articles to land on already-taken WP dates. _fetch_wp_occupied_slots() now queries WP /wp/v2/posts?status=future before each slot assignment. All scheduling functions accept a wp_occupied set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 09:29:24 +00:00
OliverGiertz	764e7bff6a	fix(ingestion): skip data: URIs and known placeholder images - ingestion.py: filter out data:image/... inline URIs before ranking - ingestion.py: penalise (-300) known placeholder paths (some-default.jpg etc.) - wordpress.py: _is_usable_image_url rejects data: URIs and placeholder paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 09:09:44 +00:00
OliverGiertz	426a799371	fix(wordpress): use status=future for posts with a future scheduled_publish_at WordPress ignores the date field for draft posts and shows "Sofort veröffentlichen" instead. Setting status=future causes WP to display and honour the scheduled date, auto-publishing the post at the given time as intended. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-29 14:29:25 +00:00
OliverGiertz	8c6022fead	fix(pipeline): always reserve publish slot before WP draft creation If scheduled_publish_at is not set when _do_rewrite_and_draft runs (e.g. rewrite_and_update_draft called on a review article), reserve a slot now so the WP draft always receives a future date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-29 14:14:03 +00:00
OliverGiertz	1a8d0775c7	fix(wordpress): correctly detect bare credit marker prefix before caption fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 08:47:09 +00:00
OliverGiertz	45c533c674	fix(wordpress): extract credit portion from caption for attribution block When the credit field only captured a marker prefix (e.g. "Foto:") due to CSS-class-based extraction picking up only the label element, fall back to regex-extracting the credit line from the full figcaption caption text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 08:41:28 +00:00
OliverGiertz	d1cb809852	fix(wordpress): fix attribution block source name and image credit lookup - Derive real source hostname from canonical URL when feed name is generic (e.g. "Google Alerts"), so the link shows "moin.de" instead of "Google Alerts" - Use _get_image_meta_for_url() (fuzzy URL matching) for image credit lookup - Use caption field for Bildnachweis since it already contains embedded credits Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 08:28:44 +00:00
OliverGiertz	82f2df610d	fix(wordpress): fuzzy URL match for image metadata and simplify caption builder Image metadata keys may have query params (e.g. ?w=1200) that differ from the selected_url stored in image_review. Fall back to comparing URLs without query string so the figcaption text is correctly found. Also simplified _build_image_caption: figcaption text already contains the credit info, so just use caption directly instead of appending the redundant credit prefix marker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 08:24:40 +00:00
OliverGiertz	8e65485f0c	fix(ingestion): strip HTML tags from feed entry titles Google Alerts wraps matched keywords in <b>...</b> tags. Strip all HTML tags from the title before storing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 08:08:07 +00:00
OliverGiertz	0d07a9804d	fix(ingestion): resolve Google Alerts redirect URLs before article fetch Google Alerts feed entries use google.com/url?...&url=<encoded_real_url>&... tracking links. The extractor was fetching the Google redirect page instead of the actual article, resulting in empty content and no images. _resolve_google_redirect() extracts the real URL from the 'url' query parameter before passing it to extract_article(). Non-Google URLs are returned unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:10:30 +00:00
OliverGiertz	aaac5def27	feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution source_extraction.py: - New _extract_image_metadata(): extracts figcaption text + copyright/credit per image URL using 3 strategies (figure+figcaption, data-* attributes, adjacent credit spans) - ExtractedArticle gets new image_metadata field - extracted_article_to_meta() includes image_metadata in stored JSON pipeline.py: - After auto image selection, check if selected_url is set - Articles without usable image → status "no_image" (excluded with Telegram notice) - PipelineStats and summary report include no_image counter db.py: - Add "no_image" to articles status CHECK constraint - Migration: recreates articles table with updated constraint on existing DBs workflow.py / main.py: - Map no_image as own UI status with rewrite/close transitions wordpress.py: - _upload_featured_media() accepts image_caption param, sends to WP media - _get_image_meta_for_url() / _build_image_caption() helpers - _build_attribution_block(): separator + attribution paragraph at article end (original link, author, Bildnachweis/credit) - _build_post_content() appends attribution block telegram_bot.py: - notify_pipeline_done() shows 🖼️ no-image count Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:08:48 +00:00
OliverGiertz	1963e32ab4	fix(rewrite): make image upload non-fatal and add rewrite tracing logs - wordpress.py: catch image download/upload failures and skip image instead of aborting the entire WP draft update - pipeline.py: add INFO logs at each step of _do_rewrite_and_draft to trace OpenAI call, tag generation, and WP API call - telegram_bot.py: add INFO logs around rewrite execution + exc_info on error for full traceback in logs - repositories.py: include scheduled_publish_at in get_article_by_id Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 07:45:55 +00:00
OliverGiertz	12932bca90	fix(rewrite): attribute claims to source instead of using first-person 'wir' Rewrites must not use 'wir haben erforscht/berechnet' since the content comes from a third-party source. The prompt now passes the source name and instructs GPT to attribute all claims to the original publisher (e.g. 'laut PiNCAMP', 'die Auswertung zeigt'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 07:36:09 +00:00
OliverGiertz	013af2ab62	fix(pipeline): set warning-zone articles to review status to prevent re-warnings Articles scoring between warn and auto threshold stayed in "new" status, causing repeated warning notifications on every /run call. Now they are set to "review" status after the first warning is sent. The override callback already resets status to "new" before processing, so the existing flow works correctly. Also include "review" articles in /rejected command output so they can be acted on. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 07:22:47 +00:00
OliverGiertz	a64bf31ff6	fix(telegram): restore webhook to RSS-News backend and forward app-release commands The N8N App Release Telegram Trigger had overwritten the webhook registration, pointing it to N8N instead of the RSS-News backend. This caused all callback_query events (inline buttons) to be lost, breaking the override/rewrite/discard buttons. Changes: - Re-register webhook to https://news.vanityontour.de/telegram/webhook with both message and callback_query in allowed_updates - Add _forward_to_n8n_app_release() to proxy unknown bot commands (e.g. /release) to the N8N App Release webhook, keeping that workflow functional without needing its own Telegram Trigger Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 06:34:49 +00:00
OliverGiertz	970f509ad4	feat(wordpress): store suggested publish date directly in WP draft Reserve the publish slot before creating the WP draft so the scheduled_publish_at timestamp is available when building the post payload. WordPress receives the `date` field (e.g. 2026-03-24T09:00:00) which sets the scheduled publish time on the draft. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-21 11:15:39 +00:00
OliverGiertz	e9c472b722	fix(telegram): async webhook handler + deduplicate callback responses - Webhook returns 200 immediately, processing runs in background task → Telegram no longer retries, eliminates duplicate callbacks and 400 errors - Consolidate answer_callback_query call to top of handler (before heavy work) - Add logger.info/error for callback actions to aid debugging Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-21 11:08:32 +00:00
OliverGiertz	1020526e76	fix(pipeline): run N8N pipeline endpoint async to avoid HTTP timeout Pipeline runs in background via asyncio. Endpoint returns immediately, results arrive via Telegram notifications. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-21 10:03:13 +00:00
OliverGiertz	0a9c0b10d6	test(ingestion): update test for removed Ampel risk-level check Ampel system removed – all enabled feeds are now processed regardless of risk_level. Updated test to verify feeds with any risk_level are processed instead of blocked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-21 09:41:34 +00:00
OliverGiertz	6192f8e527	feat(automation): autonomous pipeline with Telegram bot and N8N integration - Add full auto pipeline: RSS ingest → GPT relevance score → AI rewrite → WP draft - Add Telegram bot with inline buttons (rewrite/discard/override) and commands (/run, /rejected, /status) - Add smart publish scheduler: max 2 drafts/day, spread over week (09:00 & 14:00 CET) - Add N8N API endpoints (/api/n8n/pipeline, /api/n8n/ingest) with X-API-Key auth - Add GPT-based relevance scoring (0-100) for VanLife/Camping/Outdoor topics - Remove Ampel risk-level policy check from ingestion (all enabled feeds are used) - Add Telegram webhook endpoint and setup endpoint - Add delete_wp_post() for Telegram discard action - Add DB migrations for relevance_score and scheduled_publish_at columns - Update .env.example with all new configuration variables - Add docs/AUTOMATION.md with full setup and usage documentation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-21 09:40:15 +00:00
Oliver G	6332a9a399	feat(wordpress): publish true Gutenberg blocks and remove auto summary/details sections	2026-02-21 14:55:20 +01:00
Oliver G	93f52f72b9	fix(ingestion): preserve article workflow data and skip closed items on re-import	2026-02-21 14:51:36 +01:00
Oliver G	b0f995d5c9	feat(rewrite): add batch rewrite run, AI tags for WP, and agentur contact detection	2026-02-21 14:39:47 +01:00
Oliver G	da269d08f1	chore(admin): remove legal approval step from UI workflow	2026-02-21 14:11:03 +01:00
Oliver G	88b2ee1d01	feat(admin): add feed/source management, rewrite editor, reopen flow, and WP block output	2026-02-21 14:03:49 +01:00
Oliver G	50f737f434	feat(admin): add connectivity diagnostics page for domains and endpoints	2026-02-21 13:58:40 +01:00
Oliver G	35ccceb260	feat(workflow): simplify article flow and add automated rewrite step	2026-02-21 13:43:22 +01:00
Oliver G	8d7375c99f	feat(ui): classify publisher errors with actionable hints	2026-02-21 13:11:43 +01:00
Oliver G	24d8e5ad0f	feat(wordpress): improve post html structure and excerpt generation	2026-02-21 13:09:00 +01:00
Oliver G	e68b6a41fd	feat(wordpress): upload selected image and set featured_media on draft publish	2026-02-21 13:07:08 +01:00
Oliver G	ba83b24510	chore: finalize current state and prepare next wordpress-focused roadmap	2026-02-18 11:11:49 +01:00
Oliver G	fee5e76842	feat(ui): add publish readiness indicators and WP env key aliases	2026-02-18 11:03:53 +01:00
Oliver G	592d699166	chore(config): load shared rss-news .env for wordpress and keys	2026-02-18 11:00:57 +01:00
Oliver G	1cee56205e	feat(publisher): add wordpress draft queue with retry and admin controls	2026-02-18 10:49:43 +01:00
Oliver G	dcdf4d954a	feat(ui): show auto image ranking reasons in article detail	2026-02-18 10:43:17 +01:00
Oliver G	26e3d26b93	feat(images): auto-select relevant article images and tidy detail header	2026-02-18 10:40:39 +01:00
Oliver G	fb3465fb10	fix(images): add proxy fallback to direct source url rendering	2026-02-18 10:20:47 +01:00
Oliver G	910ca72c81	fix(ui): render article images via authenticated proxy thumbnails	2026-02-18 10:16:30 +01:00
Oliver G	efaf132936	feat(images): add thumbnail gallery with select/exclude workflow	2026-02-18 10:11:22 +01:00
Oliver G	6691db8051	feat(export): add csv/json article export with date relevance scoring	2026-02-18 10:04:38 +01:00
Oliver G	5159a6e3b4	feat(legal): add structured attribution fields and publish legal gate	2026-02-18 10:02:19 +01:00
Oliver G	c52363f1a7	feat(admin): add article detail page with legal checklist	2026-02-18 09:52:36 +01:00

1 2

51 commits