Relevance 10/10Importance 9/10
Mistral has shipped OCR 4, a document-extraction model that returns text alongside bounding boxes, block classifications, and confidence scores. It claims roughly 72% average win rates over rival systems, supports 170 languages, and can run self-hosted in a single container for privacy-sensitive workloads. At $4 per 1,000 pages via API and wired into Mistral's Search Toolkit for RAG, it's a direct shot at the enterprise document-AI market.
Relevance 9/10Importance 8/10
Baidu released Unlimited OCR, a document-parsing system that builds on DeepSeek-OCR to do "one-shot long-horizon parsing" — extracting text across multi-page PDFs in a single pass. It supports both single-image and multi-page inference with Transformers and SGLang deployment paths. It's a notable step toward large-scale document digitization that doesn't choke on length.
Relevance 9/10Importance 8/10
David Rosenthal argues the big AI platforms have been heavily subsidizing access to manufacture demand, and that token-based pricing ahead of IPOs is about to expose the real costs. He cites figures suggesting platforms spend $8–14 to generate $1 of revenue, with data-center buildouts demanding implausible future returns. The uncomfortable conclusion: using AI is currently more expensive than hiring people for limited productivity gains, and the accumulated debt would require displacing millions of jobs just to service interest.
Relevance 7/10Importance 8/10
Madison Square Garden compiled a file literally named "Facial Recognition Activists.docx" on three privacy advocates who criticized the venue's face-scanning tech, including contact details, social handles, and quotes. The document sat on SharePoint accessible to multiple employees and surfaced in a 45GB cache that hackers stole and published this month. It's a stark example of surveillance tooling being turned on a company's own critics.
Relevance 6/10Importance 8/10
Cory Doctorow argues that age-verification laws meant to protect children are counterproductive, amounting to mass surveillance more invasive than commercial ad-tech tracking. He warns these mandates push toward VPN bans and build exactly the control infrastructure authoritarian regimes crave. The net effect, he contends, harms the very kids the laws claim to protect.
Relevance 8/10Importance 5/10
Lift4D is a test-time optimization framework that reconstructs full 4D dynamic objects — including unobserved regions — from a single monocular video. It pairs causal latent conditioning from a single-view 3D model with deformable 3D Gaussian Splatting, then refines with an occlusion-aware, view-conditioned diffusion prior. It handles messy, in-the-wild footage with heavy occlusion and non-rigid motion better than prior 4D methods.
Relevance 6/10Importance 4/10
bun-sqlgen is a TypeScript code generator that gives you type-safe raw SQL on the Bun runtime without an ORM. It validates queries against your real database schema at build time and generates fully-typed rows right at the call site, with null-safety enforced at compile time. It supports both PostgreSQL and SQLite while keeping the runtime fully native to Bun.
Relevance 5/10Importance 5/10
Plotnine is a Python data-visualization package built on the grammar of graphics, bringing R's ggplot2-style layered, declarative syntax to Python. It lets you compose publication-quality charts from quick exploratory plots up to polished, themed figures. It's a staple for data scientists who want fine-grained control without dropping to low-level plotting.
Relevance 4/10Importance 4/10
This Show HN is a WYSIWYG editor for building TikZ figures in LaTeX, letting you design graphics visually rather than hand-coding them. It targets one of TikZ's biggest pain points — no live preview and slow recompile cycles — by generating the code from what you draw. It's a quality-of-life win for anyone who's wrestled with LaTeX diagrams.
Relevance 3/10Importance 4/10
IBM's May 2026 report documents validation work for open-source software running on the s390x architecture behind IBM Z and LinuxONE. The team verified 27 projects spanning databases, web servers, and monitoring tools, while 10-plus community projects added s390x support that month. Developers can test their own apps via IBM's LinuxONE Community Cloud.