�� Tabellenansicht & Statusverwaltung integriert, Rewrite-Batchfunktion hinzugefügt (v1.3.1)
This commit is contained in:
parent
2f7f2a1eb7
commit
fe2191e6c8
9 changed files with 281 additions and 361 deletions
19
CHANGELOG.md
19
CHANGELOG.md
|
|
@ -1,5 +1,22 @@
|
||||||
# CHANGELOG.md
|
# CHANGELOG.md
|
||||||
|
|
||||||
|
## [1.2.0] - 2025-07-04
|
||||||
|
### Hinzugefügt
|
||||||
|
- Automatische Bilderkennung beim Einlesen von Artikeln
|
||||||
|
- Extrahieren von Bildern aus dem Originalartikel (bis zu 3 Bilder)
|
||||||
|
- Speicherung von Bild-URLs, Alt-Texten (Bildbeschreibung) und Copyright-Hinweisen
|
||||||
|
- Fehlerbehandlung für nicht erreichbare Seiten
|
||||||
|
- Darstellung der Bilder (inkl. Beschreibung & Copyright) in der Artikelansicht
|
||||||
|
|
||||||
|
### Geändert
|
||||||
|
- Bilder werden direkt beim Einlesen eines RSS-Artikels verarbeitet und gespeichert
|
||||||
|
- `app.py` zeigt nun auch Bildinformationen innerhalb der Artikeldetailansicht an
|
||||||
|
|
||||||
|
### Behoben
|
||||||
|
- Keine
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## [1.1.0] - 2025-07-04
|
## [1.1.0] - 2025-07-04
|
||||||
### Hinzugefügt
|
### Hinzugefügt
|
||||||
- Visuell aufgewertete Box zur Darstellung eines Artikels mit:
|
- Visuell aufgewertete Box zur Darstellung eines Artikels mit:
|
||||||
|
|
@ -8,7 +25,7 @@
|
||||||
- Kopierbutton für Tags
|
- Kopierbutton für Tags
|
||||||
- Button zum Öffnen des Originalartikels im neuen Tab
|
- Button zum Öffnen des Originalartikels im neuen Tab
|
||||||
- Artikelansicht ist nun in einer grauen, abgerundeten Box gekapselt
|
- Artikelansicht ist nun in einer grauen, abgerundeten Box gekapselt
|
||||||
- Icons unterstützen visuelle Orientierung (📝, 📋, 📎 etc.)
|
- Icons unterstützen visuelle Orientierung (📝, 🗌, 📌 etc.)
|
||||||
|
|
||||||
### Geändert
|
### Geändert
|
||||||
- Artikelkopierfunktion für WordPress ist nun interaktiv über Buttons möglich
|
- Artikelkopierfunktion für WordPress ist nun interaktiv über Buttons möglich
|
||||||
|
|
|
||||||
36
README.md
36
README.md
|
|
@ -1,25 +1,27 @@
|
||||||
# 📰 RSS Artikel Manager
|
# 📰 RSS Article Manager
|
||||||
|
|
||||||
Ein Python-Webtool, das RSS-Artikel automatisch einliest, per ChatGPT umschreibt und mit Tags versieht. Die Artikel lassen sich nach Status filtern und über eine tabellarische Streamlit-Oberfläche verwalten.
|
Ein einfaches, modulares Webtool auf Basis von Streamlit, das RSS-Artikel automatisch einliest, umschreibt, zusammenfasst und mit Tags versieht – bereit zur Veröffentlichung auf WordPress.
|
||||||
|
|
||||||
---
|
## ✨ Funktionen
|
||||||
|
|
||||||
## 🚀 Features
|
- 📥 RSS-Feeds direkt über die Oberfläche hinzufügen und verwalten
|
||||||
|
- 📝 Artikel automatisch umschreiben mit Hilfe von ChatGPT
|
||||||
|
- 🏷️ Tags und Zusammenfassungen automatisch generieren
|
||||||
|
- 🗂️ Übersicht in tabellarischer Form mit Filter nach Status
|
||||||
|
- 📋 Kopierbare Inhalte für manuelles Einfügen in WordPress
|
||||||
|
- 📎 Link zum Originalartikel zur einfachen Bildübernahme
|
||||||
|
- 💾 Speicherung in einer lokalen JSON-Datei (später SQLite möglich)
|
||||||
|
- 📦 Versionierung inkl. CHANGELOG und GitHub Releases
|
||||||
|
|
||||||
- Verwaltung von RSS-Feeds direkt in der Weboberfläche
|
## 🔐 Voraussetzungen
|
||||||
- Artikel laden, duplikatfrei speichern und anzeigen
|
|
||||||
- Artikelstatus: `New`, `Rewrite`, `Process`, `Online`, `On Hold`, `Trash`
|
|
||||||
- Artikel per ChatGPT umformulieren und automatisch taggen
|
|
||||||
- Filterbare und editierbare Artikelübersicht in Tabellenform
|
|
||||||
- Speicherung in `articles.json` (lokale JSON-Datei)
|
|
||||||
|
|
||||||
---
|
- Python 3.8+
|
||||||
|
- OpenAI API Key (per `.env` eingebunden)
|
||||||
|
|
||||||
## 🛠️ Installation
|
## 🚀 Loslegen
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/dein-user/rss-artikel-manager.git
|
# Setup
|
||||||
cd rss-artikel-manager
|
git clone https://github.com/dein-benutzername/rss-article-manager.git
|
||||||
python -m venv .venv
|
cd rss-article-manager
|
||||||
source .venv/bin/activate # oder .venv\\Scripts\\activate auf Windows
|
bash start.sh
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
|
||||||
1
app.log
1
app.log
|
|
@ -2,3 +2,4 @@
|
||||||
2025-07-04 09:30:17,000 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
|
2025-07-04 09:30:17,000 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
|
||||||
2025-07-04 09:30:17,010 - INFO - ✅ Artikel umgeschrieben: Das weltweit größte Caravaning-Erlebnis
|
2025-07-04 09:30:17,010 - INFO - ✅ Artikel umgeschrieben: Das weltweit größte Caravaning-Erlebnis
|
||||||
2025-07-04 09:46:03,001 - INFO - Status von 1 Artikel(n) auf 'Online' gesetzt.
|
2025-07-04 09:46:03,001 - INFO - Status von 1 Artikel(n) auf 'Online' gesetzt.
|
||||||
|
2025-07-04 19:36:22,449 - INFO - 1 neue Artikel geladen.
|
||||||
|
|
|
||||||
297
app.py
297
app.py
|
|
@ -1,241 +1,92 @@
|
||||||
|
# app.py
|
||||||
|
|
||||||
import streamlit as st
|
import streamlit as st
|
||||||
import feedparser
|
|
||||||
import json
|
import json
|
||||||
import uuid
|
|
||||||
import os
|
import os
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
import pandas as pd
|
|
||||||
import openai
|
|
||||||
from openai import OpenAI
|
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
import logging
|
from email.utils import parsedate_to_datetime
|
||||||
|
from main import load_articles, save_articles, process_articles, fetch_and_process_feed, rewrite_articles
|
||||||
|
|
||||||
# ==== Version ====
|
|
||||||
APP_VERSION = "1.1.0"
|
|
||||||
|
|
||||||
# ==== Logging konfigurieren ====
|
|
||||||
LOG_FILE = "app.log"
|
|
||||||
logging.basicConfig(
|
|
||||||
filename=LOG_FILE,
|
|
||||||
level=logging.INFO,
|
|
||||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
|
||||||
)
|
|
||||||
|
|
||||||
# ==== Konfiguration ====
|
|
||||||
ARTICLES_FILE = "articles.json"
|
|
||||||
FEEDS_FILE = "feeds.json"
|
|
||||||
DEFAULT_STATUS = "New"
|
|
||||||
ALL_STATUSES = ["New", "Rewrite", "Process", "Online", "On Hold", "Trash"]
|
|
||||||
|
|
||||||
# ==== API Schlüssel laden ====
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
api_key = os.getenv("OPENAI_API_KEY")
|
|
||||||
client = OpenAI(api_key=api_key)
|
|
||||||
|
|
||||||
# ==== Hilfsfunktionen ====
|
st.set_page_config(layout="wide", page_title="RSS Article Manager")
|
||||||
def load_articles():
|
|
||||||
if not os.path.exists(ARTICLES_FILE):
|
|
||||||
return []
|
|
||||||
try:
|
|
||||||
with open(ARTICLES_FILE, "r") as f:
|
|
||||||
return json.load(f)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
logging.error("Fehler beim Laden von articles.json")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def save_articles(articles):
|
# Artikelstatusfilter
|
||||||
with open(ARTICLES_FILE, "w") as f:
|
status_filter = st.sidebar.selectbox("🔍 Artikelstatus filtern", ["Alle", "New", "Rewrite", "Process", "Online", "On Hold", "Trash"])
|
||||||
json.dump(articles, f, indent=2)
|
|
||||||
|
|
||||||
def load_feeds():
|
# Neuen Feed hinzufügen
|
||||||
if not os.path.exists(FEEDS_FILE):
|
st.sidebar.markdown("---")
|
||||||
return []
|
st.sidebar.header("➕ RSS Feed hinzufügen")
|
||||||
try:
|
new_feed_url = st.sidebar.text_input("Feed URL")
|
||||||
with open(FEEDS_FILE, "r") as f:
|
if st.sidebar.button("Feed hinzufügen") and new_feed_url:
|
||||||
return json.load(f)
|
fetch_and_process_feed(new_feed_url)
|
||||||
except json.JSONDecodeError:
|
|
||||||
logging.error("Fehler beim Laden von feeds.json")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def save_feeds(feeds):
|
|
||||||
with open(FEEDS_FILE, "w") as f:
|
|
||||||
json.dump(feeds, f, indent=2)
|
|
||||||
|
|
||||||
def fetch_articles_from_feeds(feeds):
|
|
||||||
new_articles = []
|
|
||||||
existing_links = {a['link'] for a in load_articles()}
|
|
||||||
for feed_url in feeds:
|
|
||||||
parsed = feedparser.parse(feed_url)
|
|
||||||
for entry in parsed.entries:
|
|
||||||
if entry.link in existing_links:
|
|
||||||
continue
|
|
||||||
content = ""
|
|
||||||
if 'content' in entry:
|
|
||||||
content = entry.content[0].value
|
|
||||||
elif 'summary' in entry:
|
|
||||||
content = entry.summary
|
|
||||||
article = {
|
|
||||||
"id": str(uuid.uuid4()),
|
|
||||||
"date": entry.get("published", datetime.now().isoformat()),
|
|
||||||
"title": entry.get("title", "(kein Titel)"),
|
|
||||||
"summary": content[:150],
|
|
||||||
"content": content,
|
|
||||||
"word_count": len(content.split()),
|
|
||||||
"tags": [],
|
|
||||||
"status": DEFAULT_STATUS,
|
|
||||||
"link": entry.link
|
|
||||||
}
|
|
||||||
new_articles.append(article)
|
|
||||||
return new_articles
|
|
||||||
|
|
||||||
def format_date(date_str):
|
|
||||||
try:
|
|
||||||
return datetime.fromisoformat(date_str).strftime("%d.%m.%y")
|
|
||||||
except Exception:
|
|
||||||
try:
|
|
||||||
return datetime.strptime(date_str[:25], "%a, %d %b %Y %H:%M:%S").strftime("%d.%m.%y")
|
|
||||||
except Exception:
|
|
||||||
return date_str
|
|
||||||
|
|
||||||
def rewrite_article_with_gpt(original_text, title):
|
|
||||||
prompt = (
|
|
||||||
"Schreibe folgenden Artikel um und formuliere ihn in journalistischem Stil neu. "
|
|
||||||
"Füge am Ende eine Liste von 2–3 passenden Tags hinzu (nur Schlagwörter, keine Hashtags):\n"
|
|
||||||
f"{original_text}"
|
|
||||||
)
|
|
||||||
try:
|
|
||||||
response = client.chat.completions.create(
|
|
||||||
model="gpt-4",
|
|
||||||
messages=[{"role": "user", "content": prompt}],
|
|
||||||
temperature=0.7
|
|
||||||
)
|
|
||||||
result = response.choices[0].message.content
|
|
||||||
logging.info(f"✅ Artikel umgeschrieben: {title}")
|
|
||||||
return result
|
|
||||||
except Exception as e:
|
|
||||||
logging.error(f"❌ Fehler beim Umschreiben von '{title}': {e}")
|
|
||||||
return f"FEHLER: {e}"
|
|
||||||
|
|
||||||
# ==== UI ====
|
|
||||||
st.set_page_config(page_title="RSS Artikel Manager", layout="wide")
|
|
||||||
st.title("📰 RSS Artikel Manager")
|
|
||||||
st.sidebar.markdown(f"🧩 Version: `{APP_VERSION}`")
|
|
||||||
|
|
||||||
# Bereich: Feed-Verwaltung
|
|
||||||
feeds = load_feeds()
|
|
||||||
new_feed = st.sidebar.text_input("Neuen Feed hinzufügen")
|
|
||||||
if st.sidebar.button("➕ Feed hinzufügen") and new_feed:
|
|
||||||
feeds.append(new_feed)
|
|
||||||
save_feeds(feeds)
|
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if feeds:
|
# Alle Feeds neu laden
|
||||||
remove_feed = st.sidebar.selectbox("Feed entfernen", [""] + feeds)
|
if st.sidebar.button("Alle Feeds neu laden"):
|
||||||
if st.sidebar.button("🗑️ Entfernen") and remove_feed:
|
process_articles()
|
||||||
feeds.remove(remove_feed)
|
st.rerun()
|
||||||
save_feeds(feeds)
|
|
||||||
st.rerun()
|
# Artikel laden
|
||||||
|
try:
|
||||||
|
articles = load_articles()
|
||||||
|
except json.decoder.JSONDecodeError:
|
||||||
|
articles = []
|
||||||
|
|
||||||
|
# Artikel nach Status filtern
|
||||||
|
if status_filter != "Alle":
|
||||||
|
articles = [a for a in articles if a.get("status") == status_filter]
|
||||||
|
|
||||||
|
# Artikelübersicht
|
||||||
|
st.title("📰 RSS Artikel Übersicht")
|
||||||
|
st.markdown("---")
|
||||||
|
|
||||||
|
if not articles:
|
||||||
|
st.info("Keine Artikel gefunden.")
|
||||||
else:
|
else:
|
||||||
st.sidebar.info("Noch keine Feeds hinzugefügt")
|
st.markdown("### 📄 Artikelliste")
|
||||||
|
|
||||||
# Bereich: Artikel laden
|
|
||||||
if st.button("🔄 Artikel aus Feeds laden"):
|
|
||||||
new = fetch_articles_from_feeds(feeds)
|
|
||||||
if new:
|
|
||||||
all_articles = load_articles() + new
|
|
||||||
save_articles(all_articles)
|
|
||||||
st.success(f"{len(new)} neue Artikel geladen.")
|
|
||||||
logging.info(f"{len(new)} neue Artikel geladen.")
|
|
||||||
else:
|
|
||||||
st.info("Keine neuen Artikel gefunden.")
|
|
||||||
|
|
||||||
# Button zum Umschreiben aller Artikel mit Status "Rewrite"
|
|
||||||
rewrite_articles = [a for a in load_articles() if a["status"] == "Rewrite"]
|
|
||||||
if rewrite_articles:
|
|
||||||
if st.button("✍️ Alle Artikel mit Status 'Rewrite' umschreiben"):
|
|
||||||
all_articles = load_articles()
|
|
||||||
progress_text = st.empty()
|
|
||||||
with st.spinner("Artikel werden umgeschrieben..."):
|
|
||||||
total = len([a for a in all_articles if a["status"] == "Rewrite"])
|
|
||||||
count = 0
|
|
||||||
for a in all_articles:
|
|
||||||
if a["status"] == "Rewrite":
|
|
||||||
count += 1
|
|
||||||
progress_text.markdown(f"➡️ Umschreibe Artikel {count} von {total}: **{a['title']}**")
|
|
||||||
result = rewrite_article_with_gpt(a["content"], a["title"])
|
|
||||||
if "FEHLER:" not in result:
|
|
||||||
if "Tags:" in result:
|
|
||||||
rewritten, tags = result.rsplit("Tags:", 1)
|
|
||||||
a["content"] = rewritten.strip()
|
|
||||||
a["tags"] = [t.strip() for t in tags.split(",")][:3]
|
|
||||||
else:
|
|
||||||
a["content"] = result.strip()
|
|
||||||
a["summary"] = a["content"][:150]
|
|
||||||
a["word_count"] = len(a["content"].split())
|
|
||||||
a["status"] = "Process"
|
|
||||||
save_articles(all_articles)
|
|
||||||
st.success("Artikel erfolgreich umgeschrieben und aktualisiert.")
|
|
||||||
st.rerun()
|
|
||||||
|
|
||||||
# Bereich: Artikeltabelle
|
|
||||||
status_filter = st.selectbox("Status filtern", ALL_STATUSES, index=ALL_STATUSES.index(DEFAULT_STATUS))
|
|
||||||
articles = [a for a in load_articles() if a["status"] == status_filter]
|
|
||||||
|
|
||||||
if articles:
|
|
||||||
st.markdown("---")
|
|
||||||
st.subheader(f"Artikel mit Status '{status_filter}'")
|
|
||||||
|
|
||||||
selected_ids = []
|
selected_ids = []
|
||||||
for i, article in enumerate(articles):
|
all_statuses = ["New", "Rewrite", "Process", "Online", "On Hold", "Trash"]
|
||||||
cols = st.columns([0.5, 1.5, 3, 4, 1, 2, 1])
|
|
||||||
with cols[0]:
|
for article in articles:
|
||||||
if st.checkbox("Auswählen", key=article["id"], label_visibility="collapsed"):
|
col1, col2, col3, col4, col5, col6, col7 = st.columns([0.5, 1.2, 2.5, 2, 1, 2, 1.2])
|
||||||
selected_ids.append(article["id"])
|
with col1:
|
||||||
cols[1].markdown(format_date(article["date"]))
|
if st.checkbox("", key=f"select_{article['id']}"):
|
||||||
cols[2].markdown(f"**{article['title']}**")
|
selected_ids.append(article['id'])
|
||||||
cols[3].markdown(article["summary"])
|
with col2:
|
||||||
cols[4].markdown(str(article["word_count"]))
|
date = parsedate_to_datetime(article['date']).strftime("%d.%m.%y")
|
||||||
cols[5].markdown(", ".join(article["tags"]) if article["tags"] else "")
|
st.markdown(date)
|
||||||
cols[6].markdown(article["status"])
|
with col3:
|
||||||
|
st.markdown(article['title'])
|
||||||
|
with col4:
|
||||||
|
st.markdown(article['summary'][:150] + ("..." if len(article['summary']) > 150 else ""))
|
||||||
|
with col5:
|
||||||
|
word_count = len(article['text'].split())
|
||||||
|
st.markdown(str(word_count))
|
||||||
|
with col6:
|
||||||
|
st.markdown(", ".join(article.get("tags", [])))
|
||||||
|
with col7:
|
||||||
|
status = st.selectbox("", all_statuses, index=all_statuses.index(article.get("status", "New")), key=f"status_{article['id']}")
|
||||||
|
if status != article.get("status"):
|
||||||
|
article["status"] = status
|
||||||
|
save_articles(articles)
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
if selected_ids:
|
||||||
|
new_status = st.selectbox("Status für ausgewählte Artikel setzen", all_statuses)
|
||||||
|
if st.button("✅ Status aktualisieren"):
|
||||||
|
for article in articles:
|
||||||
|
if article['id'] in selected_ids:
|
||||||
|
article['status'] = new_status
|
||||||
|
save_articles(articles)
|
||||||
|
st.success("Status aktualisiert.")
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
st.markdown("---")
|
st.markdown("---")
|
||||||
if selected_ids:
|
|
||||||
all_articles = load_articles()
|
|
||||||
selected_articles = [a for a in all_articles if a["id"] in selected_ids]
|
|
||||||
|
|
||||||
|
if st.button("✍️ Artikel mit Status 'Rewrite' umschreiben"):
|
||||||
with st.expander("📋 Inhalte für WordPress kopieren"):
|
rewrite_articles()
|
||||||
for a in selected_articles:
|
st.rerun()
|
||||||
with st.container():
|
|
||||||
st.markdown("""
|
|
||||||
<div style="border: 1px solid #CCC; padding: 1rem; border-radius: 10px; background-color: #F9F9F9;">
|
|
||||||
""", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
st.markdown(f"### ✏️ {a['title']}")
|
st.markdown("---")
|
||||||
|
|
||||||
st.markdown(f"<button style='margin-bottom:0.5rem;' onclick=\"navigator.clipboard.writeText('{a['title']}')\">🔗 Titel kopieren</button>", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
st.text_area("📝 Artikeltext", value=a["content"], height=300, key=f"content_{a['id']}", help="CMD+C zum Kopieren")
|
|
||||||
|
|
||||||
st.markdown(f"<button style='margin-bottom:0.5rem;' onclick=\"navigator.clipboard.writeText(`{a['content']}`)\">📋 Artikeltext kopieren</button>", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
st.text_input("🏷️ Tags", value=", ".join(a["tags"]), key=f"tags_{a['id']}", help="CMD+C zum Kopieren")
|
|
||||||
|
|
||||||
st.markdown(f"<button style='margin-bottom:0.5rem;' onclick=\"navigator.clipboard.writeText('{', '.join(a['tags'])}')\">📎 Tags kopieren</button>", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
st.markdown(f"<a href='{a['link']}' target='_blank' style='text-decoration: none;'><button style='background-color:#e8f0fe; border:none; padding:0.5rem 1rem; border-radius:5px;'>🔗 Zum Originalartikel</button></a>", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
st.markdown("</div>", unsafe_allow_html=True)
|
|
||||||
|
|
||||||
|
|
||||||
new_status = st.selectbox("Neuen Status setzen für ausgewählte Artikel", ALL_STATUSES)
|
|
||||||
if st.button("✅ Status ändern"):
|
|
||||||
for a in all_articles:
|
|
||||||
if a["id"] in selected_ids:
|
|
||||||
a["status"] = new_status
|
|
||||||
save_articles(all_articles)
|
|
||||||
st.success("Status aktualisiert.")
|
|
||||||
logging.info(f"Status von {len(selected_ids)} Artikel(n) auf '{new_status}' gesetzt.")
|
|
||||||
st.rerun()
|
|
||||||
else:
|
|
||||||
st.warning(f"Keine Artikel mit Status '{status_filter}' vorhanden.")
|
|
||||||
121
articles.json
121
articles.json
File diff suppressed because one or more lines are too long
7
logs/rss_tool.log
Normal file
7
logs/rss_tool.log
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
INFO:root:Abrufen von Feed: https://www.camping-news.de/rss/
|
||||||
|
INFO:root:Abrufen von Feed: https://www.camping-news.de/rss/
|
||||||
|
INFO:root:Starte Umschreiben von Artikeln mit Status 'Rewrite' ...
|
||||||
|
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
|
||||||
|
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
|
||||||
|
INFO:root:✅ Artikel 'Abenteuer für die Kleinen, Entspannung für
|
||||||
|
die Großen' umgeschrieben.
|
||||||
108
main.py
Normal file
108
main.py
Normal file
|
|
@ -0,0 +1,108 @@
|
||||||
|
# main.py
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from datetime import datetime
|
||||||
|
import feedparser
|
||||||
|
from utils.image_extractor import extract_images_with_metadata
|
||||||
|
from openai import OpenAI
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
import logging
|
||||||
|
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Log-Verzeichnis sicherstellen
|
||||||
|
os.makedirs("logs", exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
client = OpenAI()
|
||||||
|
# 📝 Logging konfigurieren
|
||||||
|
logging.basicConfig(filename='logs/rss_tool.log', level=logging.INFO)
|
||||||
|
|
||||||
|
ARTICLES_FILE = "processed_articles.json"
|
||||||
|
FEEDS_FILE = "feeds.json"
|
||||||
|
|
||||||
|
def load_articles():
|
||||||
|
if os.path.exists(ARTICLES_FILE):
|
||||||
|
with open(ARTICLES_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
return []
|
||||||
|
|
||||||
|
def save_articles(articles):
|
||||||
|
with open(ARTICLES_FILE, "w") as f:
|
||||||
|
json.dump(articles, f, indent=2)
|
||||||
|
|
||||||
|
def load_feeds():
|
||||||
|
if os.path.exists(FEEDS_FILE):
|
||||||
|
with open(FEEDS_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
return []
|
||||||
|
|
||||||
|
def save_feeds(feeds):
|
||||||
|
with open(FEEDS_FILE, "w") as f:
|
||||||
|
json.dump(feeds, f, indent=2)
|
||||||
|
|
||||||
|
def fetch_and_process_feed(url):
|
||||||
|
logging.info(f"Abrufen von Feed: {url}")
|
||||||
|
feed = feedparser.parse(url)
|
||||||
|
articles = load_articles()
|
||||||
|
for entry in feed.entries:
|
||||||
|
if any(a["link"] == entry.link for a in articles):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
images = extract_images_with_metadata(entry.link)
|
||||||
|
except Exception as e:
|
||||||
|
logging.warning(f"Fehler beim Bildextrakt: {e}")
|
||||||
|
images = []
|
||||||
|
article = {
|
||||||
|
"id": f"{entry.link}",
|
||||||
|
"title": entry.title,
|
||||||
|
"summary": entry.summary,
|
||||||
|
"link": entry.link,
|
||||||
|
"date": entry.get("published", datetime.now().isoformat()),
|
||||||
|
"text": entry.summary,
|
||||||
|
"status": "New",
|
||||||
|
"images": images,
|
||||||
|
"tags": []
|
||||||
|
}
|
||||||
|
articles.append(article)
|
||||||
|
save_articles(articles)
|
||||||
|
|
||||||
|
def process_articles():
|
||||||
|
feeds = load_feeds()
|
||||||
|
for url in feeds:
|
||||||
|
fetch_and_process_feed(url)
|
||||||
|
|
||||||
|
def rewrite_articles():
|
||||||
|
logging.info("Starte Umschreiben von Artikeln mit Status 'Rewrite' ...")
|
||||||
|
articles = load_articles()
|
||||||
|
updated = False
|
||||||
|
for article in articles:
|
||||||
|
if article["status"] != "Rewrite":
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
prompt = f"Fasse diesen Text neu und interessant zusammen:\n\n{article['summary']}"
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model="gpt-4",
|
||||||
|
messages=[{"role": "user", "content": prompt}]
|
||||||
|
)
|
||||||
|
rewritten = response.choices[0].message.content.strip()
|
||||||
|
article["text"] = rewritten
|
||||||
|
article["status"] = "Done"
|
||||||
|
|
||||||
|
# Tags generieren
|
||||||
|
tag_prompt = f"Erstelle passende 3-5 Tags für diesen Text:\n\n{rewritten}"
|
||||||
|
tag_response = client.chat.completions.create(
|
||||||
|
model="gpt-4",
|
||||||
|
messages=[{"role": "user", "content": tag_prompt}]
|
||||||
|
)
|
||||||
|
tags = [tag.strip() for tag in tag_response.choices[0].message.content.split(",")]
|
||||||
|
article["tags"] = tags
|
||||||
|
|
||||||
|
updated = True
|
||||||
|
logging.info(f"✅ Artikel '{article['title']}' umgeschrieben.")
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"❌ Fehler beim Umschreiben von '{article['title']}':\n{e}")
|
||||||
|
if updated:
|
||||||
|
save_articles(articles)
|
||||||
|
|
@ -2,6 +2,7 @@ altair==5.5.0
|
||||||
annotated-types==0.7.0
|
annotated-types==0.7.0
|
||||||
anyio==4.9.0
|
anyio==4.9.0
|
||||||
attrs==25.3.0
|
attrs==25.3.0
|
||||||
|
beautifulsoup4==4.13.4
|
||||||
blinker==1.9.0
|
blinker==1.9.0
|
||||||
cachetools==6.1.0
|
cachetools==6.1.0
|
||||||
certifi==2025.6.15
|
certifi==2025.6.15
|
||||||
|
|
@ -41,6 +42,7 @@ sgmllib3k==1.0.0
|
||||||
six==1.17.0
|
six==1.17.0
|
||||||
smmap==5.0.2
|
smmap==5.0.2
|
||||||
sniffio==1.3.1
|
sniffio==1.3.1
|
||||||
|
soupsieve==2.7
|
||||||
streamlit==1.46.1
|
streamlit==1.46.1
|
||||||
tenacity==9.1.2
|
tenacity==9.1.2
|
||||||
toml==0.10.2
|
toml==0.10.2
|
||||||
|
|
|
||||||
51
utils/image_extractor.py
Normal file
51
utils/image_extractor.py
Normal file
|
|
@ -0,0 +1,51 @@
|
||||||
|
import requests
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from urllib.parse import urljoin
|
||||||
|
|
||||||
|
def extract_images_with_metadata(article_url):
|
||||||
|
"""
|
||||||
|
Versucht, Bilder mit Bildunterschrift und Copyright aus dem Originalartikel zu extrahieren.
|
||||||
|
Gibt eine Liste mit Dictionaries zurück: {url, alt, copyright_text, copyright_link}
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
response = requests.get(article_url, timeout=10)
|
||||||
|
if response.status_code != 200:
|
||||||
|
return []
|
||||||
|
|
||||||
|
soup = BeautifulSoup(response.content, "html.parser")
|
||||||
|
images = []
|
||||||
|
|
||||||
|
for img_tag in soup.find_all("img"):
|
||||||
|
src = img_tag.get("src")
|
||||||
|
if not src:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Vollständige URL bauen
|
||||||
|
img_url = urljoin(article_url, src)
|
||||||
|
alt_text = img_tag.get("alt", "").strip()
|
||||||
|
|
||||||
|
# Copyright-Hinweis suchen: z. B. umgebender <figure> oder <div>
|
||||||
|
copyright_text = ""
|
||||||
|
copyright_link = ""
|
||||||
|
|
||||||
|
parent = img_tag.find_parent(["figure", "div"])
|
||||||
|
if parent:
|
||||||
|
caption = parent.find("figcaption")
|
||||||
|
if caption:
|
||||||
|
copyright_text = caption.get_text(strip=True)
|
||||||
|
link_tag = caption.find("a")
|
||||||
|
if link_tag and link_tag.has_attr("href"):
|
||||||
|
copyright_link = link_tag["href"]
|
||||||
|
|
||||||
|
images.append({
|
||||||
|
"url": img_url,
|
||||||
|
"alt": alt_text or "Bild aus Originalartikel",
|
||||||
|
"copyright_text": copyright_text or "Unbekannt",
|
||||||
|
"copyright_link": copyright_link or article_url
|
||||||
|
})
|
||||||
|
|
||||||
|
return images
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[extract_images_with_metadata] Fehler bei {article_url}: {e}")
|
||||||
|
return []
|
||||||
Loading…
Add table
Add a link
Reference in a new issue