The latest developments in AI from around the world
not much happened today
not much happened today
not much happened today
not much happened today
Anthropic @ $30B ARR, Project GlassWing and Claude Mythos Preview — first model too dangerous to release since GPT-2
not much happened today
not much happened today
not much happened today
Gemma 4
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
The Claude Code Source Leak
not much happened today
not much happened today
not much happened today
MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
Yann LeCun’s AMI Labs launches with a $1.03B seed to build world models around JEPA
Autoresearch: Sparks of Recursive Self Improvement
not much happened today
GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back
not much happened today
not much happened today
not much happened today
OpenAI closes $110B raise from Amazon, NVIDIA, SoftBank in largest startup fundraise in history @ $840B post-money
Nano Banana 2 aka Gemini 3.1 Flash Image Preview: the new SOTA Imagegen model
Agentic Engineering: WTF Happened in December 2025?
Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2
Anthropic accuses DeepSeek, Moonshot, and MiniMax of "industrial-scale distillation attacks".
not much happened today
Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2
not much happened today
Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats
Qwen3.5-397B-A17B: the smallest Open-Opus class, very efficient model
MiniMax-M2.5: SOTA coding, search, toolcalls, $1/hour
new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5
Z.ai GLM-5: New SOTA Open Weights LLM
Qwen-Image 2.0 and Seedance 2.0
not much happened today
not much happened today
OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex
ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -> Agentic Engineering
Context Graphs: Hype or actually Trillion-dollar opportunity?
OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations
MoltBook takes over the timeline
xAI Grok Imagine API - the #1 Video Model, Best Pricing and Latency - and merging with SpaceX
not much happened today
Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager
Anthropic launches the MCP Apps open spec, in Claude.ai
not much happened today
OpenEvidence, the ‘ChatGPT for doctors,’ raises $250m at $12B valuation, 12x from $1b last Feb
not much happened today
not much happened today
ChatGPT starts testing ads on free tier + new $8/mo Go plan in the US
Open Responses: explicit spec for OpenAI's Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al
not much happened today.
Anthropic Labs: Cowork, Claude Code, MCP, Skills incubator led by Mike Krieger and Ben Mann
Apple picks Google's Gemini to power Siri's next generation
not much happened today
not much happened today
not much happened today
xAI raises $20B Series E at ~$230B valuation
not much happened today
not much happened today
not much happened today
not much happened today
Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch
not much happened today
Nvidia buys (most of) Groq for $20B cash; largest execuhire ever
not much happened today
not much happened today
not much happened today
Claude Skills grows: Open Standard, Directory, Org Admin
Gemini 3.0 Flash Preview: 1/4 cost of Pro, but ~as smart, retakes Pareto Frontier
OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks
NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B
not much happened today
GPT-5.2 (Instant/Thinking/Pro): 74% on GDPVal, 1.4x cost of GPT 5.1, on 10 Year OpenAI Anniversary
not much happened today
MCP -> Agentic AI Foundation, Mistral Devstral 2
not much happened today
not much happened today
OpenRouter's State of AI - An Empirical 100 Trillion Token Study
not much happened today
DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling
Mistral 3: Mistral Large 3 + Ministral 3B/8B/14B open weights models
not much happened today
Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights
Claude Opus 4.5: 3rd new SOTA coding model in past week, 1/3 the price of Opus
AI Engineer Code Summit
Nano Banana Pro (Gemini Image Pro) solves text-in-images, infographic generation, 2-4k resolution, and Google Search grounding
OpenAI fires back: GPT-5.1-Codex-Max (API) and GPT 5.1 Pro (ChatGPT)
Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE
xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing
not much happened today
minor updates to GPT 5.1 and SIMA 2
GPT 5.1 in ChatGPT: No evals, but adaptive thinking and instruction following
not much happened today
not much happened today
Terminal-Bench 2.0 and Harbor
Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
Cursor 2.0 & Composer-1: Fast Models and New Agents UI
OpenAI completes Microsoft + For-profit restructuring + announces 2028 AI Researcher timeline + Platform / AI cloud product direction + next $1T of compute
MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model
not much happened today
not much happened today
not much happened today
ChatGPT Atlas: OpenAI's AI Browser
DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100
The Karpathy-Dwarkesh Interview delays AGI timelines
Claude Agent Skills - glorified AGENTS.md? or MCP killer?
Claude Haiku 4.5
not much happened today
OpenAI Titan XPU: 10GW of self-designed chips with Broadcom
not much happened today
Air Street's State of AI 2025 Report
not much happened today
Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA
OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs
not much happened today
not much happened today
Thinking Machines' Tinker: LoRA based LLM fine-tuning API
Sora 2: new video+audio model and OpenAI's first Social Network
Anthropic Claude Sonnet 4.5, Claude Code 2.0, new VS Code Extensions
not much happened today
GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)
not much happened today
Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap
NVIDIA to invest $100B in OpenAI for 10GW of Vera Rubin rollout
Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model
Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters
not much happened today
not much happened today
GPT-5 Codex launch and OpenAI's quiet rise in Agentic Coding
not much happened today
Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency
Oracle jumps +36% in a day after winning $300B OpenAI contract
not much happened today
Cognition's $10b Series C; Smol AI updates
Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched
not much happened today
not much happened today
Anthropic raises $13B at $183B Series F
not much happened today
not much happened today
OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o
OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud
nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion
not much happened today
not much happened today
Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528
DeepSeek V3.1: 840B token continued pretrain, beating Claude 4 Sonnet at 11% of its cost
Databricks' $100B Series K
not much happened today
not much happened today
Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants
not much happened today
not much happened today
OpenAI's IMO Gold model also wins IOI Gold
not much happened today
OpenAI rolls out GPT-5 and GPT-5 Thinking to >1B users worldwide; -mini and -nano help claim Pareto Frontier
not much happened today
OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3
Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT
Gemini 2.5 Deep Think finally ships
Figma's $50+b IPO
not much happened today
not much happened today
GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)
not much happened today
3x in 3 months: Cursor @ $28b, Cognition + Windsurf @ $10b
not much happened today
not much happened today
OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits
ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal
not much happened today
Voxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3
not much happened today
Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params
Grok 4: xAI succeeds in going from 0 to new SOTA LLM in 2 years
not much happened today
SmolLM3: the SOTA 3B reasoning open source LLM
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
not much happened today
OpenAI releases Deep Research API (o3/o4-mini)
Context Engineering: Much More than Prompts
Bartz v. Anthropic PBC — "Training use is Fair Use"
Not much happened today
The Quiet Rise of Claude Code vs Codex
minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer
Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?
Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview
Chinese Models Launch - MiniMax-M1, Hailuo 2 "Kangaroo", Moonshot Kimi-Dev-72B
Cognition vs Anthropic: Don't Build Multi-Agents/How to Build Multi-Agents
not much happened today
Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI
Reasoning Price War 2: Mistral Magistral + o3's 80% price cut + o3-pro
Apple exposes Foundation Models API and... no new Siri
not much happened today
Gemini 2.5 Pro (06-05) launched at AI Engineer World's Fair
AI Engineer World's Fair Talks Day 1
not much happened today
not much happened today
Mary Meeker is so back: BOND Capital AI Trends report
DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release
not much happened today
Mistral's Agents API and the 2025 LLM OS
not much happened today
not much happened today
Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama
OpenAI buys Jony Ive's io for $6.5b, LMArena lands $100m seed from a16z
Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)
not much happened today
ChatGPT Codex, OpenAI's first cloud SWE agent
Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL
Granola launches team notes, while Notion launches meeting transcription
not much happened today
Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning
not much happened today
not much happened today
AI Engineer World's Fair: Second Run, Twice The Fun
Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model
Cursor @ $9b, OpenAI Buys Windsurf @ $3b
not much happened today
not much happened today
ChatGPT responds to GlazeGate + LMArena responds to Cohere
LlamaCon: Meta AI gets into the Llama API platform business
Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1
Cognition's DeepWiki, a free encyclopedia of all GitHub repos
not much happened today
gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API
not much happened today
not much happened today; New email provider for AINews
Grok 3 & 3-mini now API Available
Gemini 2.5 Flash completes the total domination of the Pareto Frontier
OpenAI o3, o4-mini, and Codex CLI
QwQ-32B claims to match DeepSeek R1-671B
SOTA Video Gen: Veo 2 and Kling 2 are GA for developers
GPT 4.1: The New OpenAI Workhorse
not much happened today
not much happened today
Google's Agent2Agent Protocol (A2A)
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
Llama 4's Controversial Weekend Release
not much happened today
not much happened today
not much happened today
>$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)
not much happened today
not much happened today
OpenAI adopts MCP
Gemini 2.5 Pro + 4o Native Image Gen
Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio
lots of little things happened this week
Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI
Every 7 Months: The Moore's Law for Agent Autonomy
not much happened today
Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)
not much happened today
not much happened today
Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen
The new OpenAI Agents Platform
not much happened today
DeepSeek's Open Source Stack
not much happened today
not much happened today
Anthropic's $61.5B Series E
not much happened today
GPT 4.5 — Chonky Orion ships!
lots of small launches
not much happened today
Claude 3.7 Sonnet
AI Engineer Summit Day 1
not much happened today
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
X.ai Grok 3 and Mira Murati's Thinking Machines
LLaDA: Large Language Diffusion Models
not much happened today
Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)
small news items
not much happened today
not much happened today
not much happened today
s1: Simple test-time scaling (and Kyutai Hibiki)
Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
How To Scale Your Model, by DeepMind
OpenAI takes on Gemini's Deep Research
o3-mini launches, OpenAI on "wrong side of history"
Mistral Small 3 24B and Tulu 3 405B
not much happened today
not much happened today
DeepSeek #1 on US App Store, Nvidia stock tanks -17%
TinyZero: Reproduce DeepSeek R1-Zero for $30
OpenAI launches Operator, its first Agent
Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning
Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2
DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
not much happened today
not much happened today
Titans: Learning to Memorize at Test Time
small little news items
not much happened today
Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model
not much happened today
not much happened today
not much happened today
PRIME: Process Reinforcement through Implicit Rewards
not much happened today
not much happened to end the year
not much happened today
not much happened today
DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
not much happened today
not much happened this weekend
o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath
ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,
Genesis: Generative Physics Engine for Robotics (o1-mini version)
Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)
OpenAI Voice Mode Can See Now - After Gemini Does
o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
Meta BLT: Tokenizer-free, Byte-level LLM
Google wakes up: Gemini 2.0 et al
ChatGPT Canvas GA
OpenAI Sora Turbo and Sora.com
Meta Llama 3.3: 405B/Nova Pro performance at 70B price
$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews
not much happened today
Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)
not much happened today
not much happened to end the week
Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
OLMo 2 - new SOTA Fully Open LLM
Anthropic launches the Model Context Protocol
Vision Everywhere: Apple AIMv2 and Jina CLIP v2
LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)
DeepSeek-R1 claims to beat o1-preview AND will be open sourced
Perplexity starts Shopping for you
Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11
Stripe lets Agents spend money with StripeAgentToolkit
Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo
Common Corpus: 2T Open Tokens with Provenance
BitNet was a lie?
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
not much happened today
not much happened today
Not much happened today
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
OpenAI beats Anthropic to releasing Speculative Decoding
not much happened today
The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more
Creating a LLM-as-a-Judge
GitHub Copilot Strikes Back
not much happened this weekend
not much happened today
s{imple|table|calable} Consistency Models
not much happened today
Claude 3.5 Sonnet (New) gets Computer Use
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
not much happened today
Did Nvidia's Nemotron 70B train on test?
not much happened today
Not much (in AI) happened this weekend
not much happened today
State of AI 2024
not much happened today
The AI Nobel Prize
not much happened this weekend
Contextual Document Embeddings: `cde-small-v1`
Canvas: OpenAI's answer to Claude Artifacts
Not much technical happened today
OpenAI Realtime API and other Dev Day Goodies
Liquid Foundation Models: A New Transformers alternative + AINews Pod 2
not much happened today
not much happened today
Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
ChatGPT Advanced Voice Mode
a calm before the storm
not much happened today
not much happened today
o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release
nothing much happened today
a quiet weekend
Learnings from o1 AMA
o1: OpenAI's new general reasoning models
Pixtral 12B: Mistral beats Llama to Multimodality
not much happened today + AINews Podcast?
AIPhone 16: the Visual Intelligence Phone
Reflection 70B, by Matt from IT Department
Replit Agent - How did everybody beat Devin to market?
$1150m for SSI, Sakana, You.com + Claude 500m context
Everybody shipped small things this holiday weekend
not much happened today
Summer of Code AI: $1.6b raised, 1 usable product
Cerebras Inference: Faster, Better, AND Cheaper
CogVideoX: Zhipu's Open Source Sora
not much happened this weekend
Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
super quiet day
Ideogram 2 + Berkeley Function Calling Leaderboard V2
not much happened today
The DSPy Roadmap
not much happened today
not much happened today
Grok 2! and ChatGPT-4o-latest confuses everybody
Gemini Live
a quiet weekend
not much happened today
Too Cheap To Meter: AI prices cut 50-70% in last 30 days
not much happened today
GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)
GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
How Carlini Uses AI
Execuhires: Tempting The Wrath of Khan
Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs
Gemma 2 2B + Scope + Shield
not much happened today
Apple Intelligence Beta + Segment Anything Model 2
AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold
Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
Llama 3.1: The Synthetic Data Model
Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model
DataComp-LM: the best open-data 7B model/benchmark/dataset
Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)
Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)
Gemma 2 tops /r/LocalLlama vibe check
SciCode: HumanEval gets a STEM PhD upgrade
Microsoft AgentInstruct + Orca 3
We Solved Hallucinations
FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence
Nothing much happened today
Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)
Problems with MMLU-Pro
Qdrant's BM42: "Please don't trust us"
Not much happened today.
GraphRAG: The Marriage of Knowledge Graphs and RAG
RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)
That GPT-4o Demo
Gemma 2: The Open Model for Everyone
Mozilla's AI Second Act
Shall I compare thee to a Sonnet's day?
Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary
Shazeer et al (2024): you are overpaying for inference >13x
Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
There's Ilya!
Gemini launches context caching... or does it?
Is this... OpenQ*?
Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
The Last Hurrah of Stable Diffusion?
Francois Chollet launches $1m ARC Prize
Talaria: Apple's new MLOps Superweapon
HippoRAG: First, do know(ledge) Graph
Qwen 2 beats Llama 3 (and we don't know how)
5 small news items
Not much happened today
Mamba-2: State Space Duality
Ways to use Anthropic's Tool Use GA
Contextual Position Encoding (CoPE)
1 TRILLION token context, real time, on device?
Somebody give Andrej some H100s already
Life after DPO (RewardBench)
Ten Commandments for Deploying Fine-Tuned Models
Clémentine Fourrier on LLM evals
ALL of AI Engineering in One Place
Anthropic's "LLM Genome Project": learning & clamping 34m features on Claude Sonnet
Skyfall
Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
Not much happened today
Google I/O in 60 seconds
GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)
GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)
Quis promptum ipso promptiet?
LMSys advances Llama 3 eval analysis
OpenAI's PR Campaign?
Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?
DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
$100k to predict LMSYS human preferences in a Kaggle contest
Evals: The Next Generation
Not much happened today
LLMs-as-Juries
A quiet weekend
Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT
Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM
OpenAI's Instruction Hierarchy for the LLM OS
Perplexity, the newest AI unicorn
FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)
Llama-3-70b is GPT-4-level Open Model
Meta Llama 3 (8B, 70B)
Mixtral 8x22B Instruct sparks efficiency memes
Lilian Weng on Video Diffusion
Multi-modal, Multi-Aspect, Multi-Form-Factor AI
Zero to GPT in 1 Year
Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
Music's Dall-E moment
Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence
Anime pfp anon eclipses $10k A::B prompting challenge
Mixture of Depths: Dynamically allocating compute in transformer-based language models
Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning
ReALM: Reference Resolution As Language Modeling
Not much happened today
AdamW -> AaronD?
Evals-based AI Engineering
Jamba: Mixture of Architectures dethrones Mixtral
DBRX: Best open model (just not most efficient)
Claude 3 is officially America's Next Top Model
Andrew likes Agents
Astro Nano
not much happened today
Welcome /r/LocalLlama!
Shipping and Dipping: Inflection + Stability edition
World_sim.exe
Grok-1 in Bio
Astro Sphere
MM1: Apple's first Large Multimodal Model
Not much happened piday
DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY
The world's first fully autonomous AI Engineer
Fixing Gemma
FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs
Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU
Not much happened today
Stable Diffusion 3 — Rombach & Esser did it again!
Claude 3 just destroyed GPT 4 (see for yourself)
The Era of 1-bit LLMs
Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)
... and welcome AI Twitter!
Welcome Interconnects and OpenRouter
Mistral Large disappoints
One Year of Latent Space
Ring Attention for >1M Context
Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)
Karpathy emerges from stealth?
Companies liable for AI hallucination is Good Actually for AI Engineers
Sora pushes SOTA
AI gets Memory
The Dissection of Smaug (72B)
Gemini Ultra is out, to mixed reviews
MetaVoice & RIP Bard
Qwen 1.5 Released
Less Lazy AI
The Core Skills of AI Engineering
AI2 releases OLMo - the 4th open-everything LLM
Trust in GPTs at all time low
Miqu confirmed to be an early Mistral-medium checkpoint
CodeLLama 70B beats GPT4 on HumanEval
RWKV "Eagle" v5: Your move, Mamba
GPT4Turbo A/B Test: gpt-4-0125-preview
GPT4Turbo A/B Test: gpt-4-1106-preview
Adept Fuyu-Heavy: Multimodal model for Agents
Google Solves Text to Video
RIP Latent Diffusion, Hello Hourglass Diffusion
Nightshade poisons AI art... kinda?
Sama says: GPT-5 soon
1/17/2024: Help crowdsource function calling datasets
1/16/2024: ArtificialAnalysis - a new model/host benchmark site
1/16/2024: TIES-Merging
1/13-14/2024: Don't sleep on #prompt-engineering
1/12/2024: Anthropic coins Sleeper Agents
1/11/2024: Mixing Experts vs Merging Models
1/10/2024: All the best papers for AI Engineers
1/9/2024: Nous Research lands $5m for Open Source AI
1/8/2024: The Four Wars of the AI Stack
1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??
1/4/2024: Jeff Bezos backs Perplexity's $520m Series B.
1/3/2024: RIP Coqui
1/2/2024: Smol tweaks to Smol Talk
1/1/2024: How to start with Open Source AI
12/31/2023: Happy New Year
12/30/2023: Mega List of all LLMs
12/29/2023: TinyLlama on the way
12/28/2023: Smol Talk updates
12/27/2023: NYT vs OpenAI
12/26/2023: not much happened today
12/25/2023: Nous Hermes 2 Yi 34B for Christmas
12/24/2023: Dolphin Mixtral 8x7b is wild
12/23/2023: NeurIPS Best Papers of 2023
12/22/2023: Anyscale's Benchmark Criticisms
12/21/2023: The State of AI (according to LangChain)
12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous
12/19/2023: Everybody Loves OpenRouter
12/18/2023: Gaslighting Mistral for fun and profit
12/16/2023: ByteDance suspended by OpenAI
12/15/2023: Mixtral-Instruct beats Gemini Pro (and matches GPT3.5)
12/14/2023: $1e7 for Superalignment
12/13/2023 SOLAR10.7B upstages Mistral7B?
12/12/2023: Towards LangChain 0.1
12/11/2023: Mixtral beats GPT3.5 and Llama2-70B
12/10/2023: not much happened today
12/9/2023: The Mixtral Rush
12/8/2023 - Mamba v Mistral v Hyena
12/7/2023: Anthropic says "skill issue"
Is Google's Gemini... legit?