Newsscribe

Overview

Newsscribe is a 100 % local tool-chain for turning long-form video or audio into concise written briefs. The repo is essentially a collection of Python and shell utilities wrapped around whisper.cpp, ffmpeg, Ollama / LM Studio LLM runners, and—optionally—the OpenAI API. I owned product design and engineering and shipped the first public release in 2026.

What The App Does

transcribe_video.sh or transcribe.sh • Uses ffmpeg to extract a clean mono WAV • Feeds the WAV to whisper.cpp for full-resolution speech-to-text
frame_grab_15s.py • Pulls a frame every 15 s to catch on-screen lower-thirds and slides
ocr_frames_lmstudio.py • Runs OCR locally and pipes the text into a llama-based model hosted by LM Studio for context enrichment
openai_fact_summary.py (optional) • Sends the combined transcript/OCR bundle to OpenAI to generate a bullet-first “fact brief”
Outputs are rendered via two Markdown templates (SUMMARY_TEMPLATE.md, FACT_SUMMARY_TEMPLATE.md) ready for newsroom CMS import.

Everything is driven from a dot-file (.transcriberc) so a single command can spin through a backlog with batch_backfill_missing.sh.

Product/UX Review

• Interaction surface: plain CLI. No GUI means zero learning curve for terminal-centric reporters but raises the bar for less technical staff. • Speed: whisper.cpp in int8 mode brings real-time transcription on an M2 and keeps the entire flow under newsroom deadlines. • Privacy: Keeping LLMs local (Ollama / LM Studio) satisfied legal that embargoed material never touches external servers; the OpenAI pass is clearly marked “optional.” • Documentation: The README and sample rc file are sufficient, though some flags are discoverable only by reading the scripts—an area for polish. • Templates: Markdown summaries drop straight into CMS, but there is no live preview.

Technical Architecture

Shell scripts handle orchestration; each heavy task is a discrete Python module so pieces can be unit-tested (pytest plus CI in .github/workflows/ci.yml). Key components: • whisper.cpp compiled with CPU optimisations, called by transcribe.sh. • ffmpeg for demuxing and frame extraction. • Ollama or LM Studio runs GGUF llama models behind a localhost REST endpoint. • Python modules call those endpoints with plain JSON. • Config and state are file-based (.transcriberc, Makefile, and a tiny JSON portfolio). No database. The design keeps the dependency graph shallow—no third-party packages listed in requirements.txt beyond standard library wrappers.

AI Techniques And Patterns

• Local inference: whisper.cpp for ASR and llama models for summarisation—no cloud GPU costs. • Prompt templating: SUMMARY_TEMPLATE.md and FACT_SUMMARY_TEMPLATE.md store system messages; scripts interpolate transcript chunks before calling the model. • Model routing: user can select LM Studio, Ollama, or OpenAI by flag—same prompt, different back-end. • Guardrails: none bespoke; correctness is delegated to the optional OpenAI “fact brief.” • Evaluation: compare_models.sh runs batch transcribes against multiple models for side-by-side WER inspection. • Human-in-the-loop: final Markdown is meant for editorial review; no automatic publishing.

What Was Learned

Local LLMs are now performant enough to summarise hour-long transcripts on a laptop without thermal throttling.
Whisper.cpp beats cloud ASR for both speed and cost at ≥medium model sizes but still struggles with crosstalk; extracting frames and OCRing lower-thirds materially improves entity recall.
Simple file-based config kept onboarding friction low; over-engineering a web UI can wait until the workflow is proven.
Optional cloud steps must be explicit; toggling them behind a flag avoided accidental data leakage.

Strengths And Tradeoffs

Strengths • Fully offline path meets strict confidentiality requirements. • Modular scripts let newsrooms replace any step (e.g., swap in faster LLM) without touching others. • Minimal dependencies make CI and multi-platform packaging trivial.

Tradeoffs / Limitations • CLI-only interface excludes non-technical users. • No bespoke guardrails or hallucination checks when using local llama models. • Frame-based OCR adds ≈10 % runtime overhead; could be batched smarter. • Lack of typed Python or logging abstractions means debugging still leans on print-style tracing.