1. Article age filter (ingestion.py + config.py):
- New setting pipeline_max_article_age_days=7 (0 = no limit)
- Skip RSS entries older than N days before expensive extract_article()
- Prevents old articles from Google Alerts re-entering pipeline
2. Image URL pre-validation (ingestion.py):
- HEAD request probe for each primary image candidate during ingestion
- Falls back to next-best candidate if primary returns 4xx
- Network errors treated as OK to avoid false negatives on flaky servers
3. Stale WP draft cleanup (pipeline.py):
- Quality gate rejections now delete any pre-existing WP draft (wp_post_id)
- Prevents orphaned drafts when re-running articles that previously had drafts
4. Schedule overview UI (scheduler.py + admin_ui.py + admin_schedule.html):
- New /admin/schedule page showing calendar grid of all booked slots
- Distinguishes Pipeline-DB slots from WordPress-only slots
- Link added to dashboard navigation
5. Retry for failed articles (admin_ui.py + admin_dashboard.html):
- New POST /admin/articles/{id}/retry endpoint: resets to 'new', releases slot
- '🔄 Wiederholen' button shown in dashboard for all 'close' (error) articles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scheduler: use wordpress_app_password (not wordpress_password) so
_fetch_wp_occupied_slots() can actually authenticate against the WP
REST API — previously always returned empty set silently
- pipeline: release reserved publish slot when draft creation fails with
a non-ValueError exception (e.g. WP API error), preventing permanently
blocked slots on failed articles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stage 1 (before OpenAI rewrite): reject if raw content < pipeline_min_words_raw (default 120)
Stage 2 (after rewrite): reject if rewritten text < pipeline_min_words_rewritten (default 150)
Both stages set status='error' with a descriptive note and skip WP draft creation.
The reserved publish slot is released so it stays available for the next article.
Quality rejections don't abort the pipeline — processing continues with the next article.
New config settings (overridable via .env):
PIPELINE_MIN_WORDS_RAW=120
PIPELINE_MIN_WORDS_REWRITTEN=150
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the scheduler started searching from the last scheduled post date,
skipping all free slots in between (e.g. a free slot on Apr 20 would be ignored
if the last post was on May 18).
Now starts scanning from tomorrow, finding the first available slot regardless
of whether earlier dates have gaps — fills the calendar naturally.
Also extended lookahead from 30 to 60 days.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The scheduler previously only checked the local SQLite DB for occupied slots.
Posts created outside the pipeline (e.g. recovery scripts) were invisible,
causing newly scheduled articles to land on already-taken WP dates.
_fetch_wp_occupied_slots() now queries WP /wp/v2/posts?status=future before
each slot assignment. All scheduling functions accept a wp_occupied set.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>