Adds sync_db_from_wordpress() that treats WordPress as source of truth:
- future posts: update scheduled_publish_at to WP's actual date
- draft posts: clear scheduled_publish_at (not yet scheduled)
- published posts: mark article as 'published' in DB
- trashed/deleted posts: clear wp_post_id + wp_post_url + slot so article
can be re-processed
Exposed via POST /admin/wp-sync with a sync button on the schedule page.
Run after any manual rescheduling in WordPress to bring DB back in sync.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WordPress ignores the date field for draft posts and shows "Sofort veröffentlichen"
instead. Setting status=future causes WP to display and honour the scheduled date,
auto-publishing the post at the given time as intended.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the credit field only captured a marker prefix (e.g. "Foto:") due to
CSS-class-based extraction picking up only the label element, fall back to
regex-extracting the credit line from the full figcaption caption text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Derive real source hostname from canonical URL when feed name is generic
(e.g. "Google Alerts"), so the link shows "moin.de" instead of "Google Alerts"
- Use _get_image_meta_for_url() (fuzzy URL matching) for image credit lookup
- Use caption field for Bildnachweis since it already contains embedded credits
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Image metadata keys may have query params (e.g. ?w=1200) that differ from
the selected_url stored in image_review. Fall back to comparing URLs without
query string so the figcaption text is correctly found.
Also simplified _build_image_caption: figcaption text already contains the
credit info, so just use caption directly instead of appending the redundant
credit prefix marker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
source_extraction.py:
- New _extract_image_metadata(): extracts figcaption text + copyright/credit
per image URL using 3 strategies (figure+figcaption, data-* attributes,
adjacent credit spans)
- ExtractedArticle gets new image_metadata field
- extracted_article_to_meta() includes image_metadata in stored JSON
pipeline.py:
- After auto image selection, check if selected_url is set
- Articles without usable image → status "no_image" (excluded with Telegram notice)
- PipelineStats and summary report include no_image counter
db.py:
- Add "no_image" to articles status CHECK constraint
- Migration: recreates articles table with updated constraint on existing DBs
workflow.py / main.py:
- Map no_image as own UI status with rewrite/close transitions
wordpress.py:
- _upload_featured_media() accepts image_caption param, sends to WP media
- _get_image_meta_for_url() / _build_image_caption() helpers
- _build_attribution_block(): separator + attribution paragraph at article end
(original link, author, Bildnachweis/credit)
- _build_post_content() appends attribution block
telegram_bot.py:
- notify_pipeline_done() shows 🖼️ no-image count
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wordpress.py: catch image download/upload failures and skip image
instead of aborting the entire WP draft update
- pipeline.py: add INFO logs at each step of _do_rewrite_and_draft
to trace OpenAI call, tag generation, and WP API call
- telegram_bot.py: add INFO logs around rewrite execution + exc_info
on error for full traceback in logs
- repositories.py: include scheduled_publish_at in get_article_by_id
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reserve the publish slot before creating the WP draft so the
scheduled_publish_at timestamp is available when building the post
payload. WordPress receives the `date` field (e.g. 2026-03-24T09:00:00)
which sets the scheduled publish time on the draft.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>