rss-news/backend/app
OliverGiertz 09dcf6ce36 feat(pipeline): add two-stage article quality gate (min word count)
Stage 1 (before OpenAI rewrite): reject if raw content < pipeline_min_words_raw (default 120)
Stage 2 (after rewrite): reject if rewritten text < pipeline_min_words_rewritten (default 150)

Both stages set status='error' with a descriptive note and skip WP draft creation.
The reserved publish slot is released so it stays available for the next article.
Quality rejections don't abort the pipeline — processing continues with the next article.

New config settings (overridable via .env):
  PIPELINE_MIN_WORDS_RAW=120
  PIPELINE_MIN_WORDS_REWRITTEN=150

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 09:42:02 +00:00
..
__init__.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
admin_ui.py feat(rewrite): add batch rewrite run, AI tags for WP, and agentur contact detection 2026-02-21 14:39:47 +01:00
auth.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
config.py feat(pipeline): add two-stage article quality gate (min word count) 2026-04-08 09:42:02 +00:00
db.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
ingestion.py fix(ingestion): skip data: URIs and known placeholder images 2026-04-07 09:09:44 +00:00
main.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
pipeline.py feat(pipeline): add two-stage article quality gate (min word count) 2026-04-08 09:42:02 +00:00
policy.py feat: rebuild rss-news backend, admin ui, and legal extraction pipeline 2026-02-18 09:52:36 +01:00
publisher.py feat(workflow): simplify article flow and add automated rewrite step 2026-02-21 13:43:22 +01:00
relevance.py feat(export): add csv/json article export with date relevance scoring 2026-02-18 10:04:38 +01:00
repositories.py fix(rewrite): make image upload non-fatal and add rewrite tracing logs 2026-03-26 07:45:55 +00:00
rewrite.py fix(rewrite): attribute claims to source instead of using first-person 'wir' 2026-03-26 07:36:09 +00:00
scheduler.py feat(pipeline): add two-stage article quality gate (min word count) 2026-04-08 09:42:02 +00:00
source_extraction.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
telegram_bot.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00
wordpress.py fix(ingestion): skip data: URIs and known placeholder images 2026-04-07 09:09:44 +00:00
workflow.py feat(pipeline): image caption/credit extraction, no-image exclusion, WP attribution 2026-03-27 07:08:48 +00:00