š ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE HOUR before GPT 5.3 Codex, Voxtral transcription, Codex app, Qwen Coder Next & the Agentic Internet
Fri Feb 06 2026
Hey, Alex from W&B here š Let me catch you up!
The most important news about AI this week today are, Anthropic updates Opus to 4.6 with 1M context window, and they held the crown for literally 1 hour before OpenAI released their GPT 5.3 Codex also today, with 25% faster speed and lower token utilization.
āGPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results.ā
We had VB from OpenAI jump on to tell us about the cool features on Codex, so donāt miss that part. And this is just an icing on otherwise very insane AI news week cake, as weāve also had a SOTA transcription release from Mistral, both Grok and Kling are releasing incredible, audio native video models with near perfect lip-sync and Ace 1.5 drops a fully open source music generator you can run on your mac!
Also, the internet all but lost it after Clawdbot was rebranded to Molt and then to OpenClaw, and.. an entire internet popped up.. built forn agents!
Yeah... a huge week, so letās break it down. (P.S this weeks episode is edited by Voxtral, Claude and Codex, nearly automatically so forgive the rough cuts please)
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Anthropic & OpenAI are neck in neck
Claude Opus 4.6: 1M context, native compaction, adaptive thinking and agent teams
Opus is by far the most preferred model in terms of personality to many folks (many ThursdAI panelists included), and this breaking news live on the show was met with so much enthusiasm! A new Opus upgrade, now with a LOT more context, is as welcome as it can ever get! Not only is it a 4-time increase in context window (though,the pricing nearly doubles after the 200K tokens mark from $5/$25 to $10/37.5 input/output, so use caching!), itās also scores very high on MRCR long context benchmark, at 76% vs Sonnet 4.5 at just 18%. This means significantly better memory for longer.
Adaptive thinking for auto calibrating how much tokens the model needs to spend per query is interesting, but remains to be seen how well it will work.
Looking at the benchmarks, a SOTA 64.4% on Terminalbench 2, 81% on SWE bench, this is a coding model with a great personality, and the ability to compact context to better serve you as a user natively! This model is now available (and is default) on Claude, Claude Code and in the API! Go play!
One funny (concerning?) tidbig, on the vendingbench Opus 4.6 earned $8000 vs Gemini 3 pro $5500, but Andon Labs who run the vending machines noticed that Opus achieved SOTA via ācollusion, exploitation, and deception tacticsā including lying to suppliers š
Agent Teams - Anthropicās built in Ralph?
Together with new Opus release, Anthropic drops a Claude code update that can mean big things, for folks running swarms of coding agents. Agent teams is a new way to spin up multiple agents with their own context window and ability to execute tasks, and you can talk to each agent directly vs a manager agent like now.
OpenAI drops GPT 5.3 Codex update: 25% faster, more token efficient, 77% on Terminal Bench and mid task steering
OpenAI didnāt wait long after Opus, in fact, they didnāt wait at all! Announcing a huge release (for a .1 upgrade), GPT 5.3 Codex is claimed to be the best coding model in the world, taking the lead on Terminal Bench with 77% (12 point lead on the newly released Opus!) while running 25% AND using less than half the tokens to achieve the same results as before.
But the most interesting to me is the new mid-task steer-ability feature, where you donāt have to hit the āstopā button, you can tell the most to adjust on the fly!
The biggest notable jump in this model on benchmarks is the OSWorld verified computer use bench, though thereās not a straightforward way to use it attached to a browser, the jump from 38% in 5.2 to 64.7% on the new one is a big one!
One thing to note, this model is not YET available via the API, so if you want to try it out, Codex apps (including the native one) is the way!
Codex app - native way to run the best coding intelligence on your mac (download)
Earlier this week, OpenAI folks launched the Codex native mac app, which has a few interesting features (and now with 5.3 Codex its that much more powerful)
Given the excitement many people had about OpenClaw bots, and the recent CoWork release from Anthropic, OpenAI decided to answer with Codex UI and people loved it, with over 1M users in the first week, and 500K downloads in just two days!
It has built in voice dictation, slash commands, a new skill marketplace (last month we told you about why skills are important, and now they are everywhere!) and built in git and worktrees support. And while it cannot run a browser yet, Iām sure thatās coming as well, but it can do automations!
This is a huge unlock for developers, imagine setting Codex to do a repeat task, like summarization or extraction of anything on your mac every hour or every day. In our interview, VB showed us that commenting on an individual code line is also built in, as well as switching to āsteerā vs queue for new messges while codex runs is immensely helpful.
One more reason I saw people switch, is that the Codex app can natively preview files like images whereās the CLI cannot, and itās right now the best way to use the new GPT 5.3 Codex model that was just released! Itās now also available to Free users and regular folks get 2x the limits for the next two months.
In other big company news:
OpenAI also launched Frontier, a platform for enterprises to build and deploy and manage āAI coworkersā, while Anthropic is going after OpenAI with superbowl ads that make fun of OpenAIās ads strategy. Sam Altman really didnāt like this depiction that show that ads will be part of the replies of LLMs.
Open Source AI
Alibaba drops Qwen-coder-next, 80B with only 3B active that scores 70% on SWE (X, Blog, HF)
Shoutout to Qwen folks, this is a massive release and when surveyed the āone thing about this week must not missā 2 out of 6 cohosts pointed a finger at this model.
Built on their ānextā hybrid architecture, Qwen coder is specifically designed for agentic coding workflows. And yes, I know, weāre coding heavy this week! It was trained on over 800K verifiable agentic tasks in executable environments for long horizon reasoning and supports 256K context with a potential 1M yarn extension. If you donāt want to rely on the the big guys and send them your tokens, this one model seems to be a good contender for local coding!
Mistral launches Voxtral Transcribe 2: SOTA speech-to-text with sub 200ms latency
This one surprised and delighted me maybe the most, ASR (automatic speech recognition) has been a personal favorite of mine from Whisper days, and seeing Mistral release an incredible near real time transcription model, which we demoed live on the show was awesome!
With apache 2.0 license, and significantly faster than Whisper performance (though 2x larger at 4B parameters), Voxtral shows a 4% word error rate on FLEURS dataset + the real time model was released with Apache 2 so you can BUILD your agents with it!
The highest praise? Speaker diarization, being able to tell who is speaking when, which is a great addition. This model also outperforms Gemini Flash and GPT transcribe and is 3x than ElevenLabs scribe at one fifth the cost!
ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)
This open source release surprised me the most as I didnāt expect weāll be having Suno at home any time soon. Iāve generated multiple rock tracks with custom lyrics on my mac (though slower than 10 seconds as I donāt have a beefy home GPU) and they sound great!
This weeks buzz - Weights & Biases update
Folks who follow the newsletter know that we hosted a hackathon, so hereās a small recap from the last weekend! Over 180 folks attended out hackathon (a very decent 40% show up rate for SF). The winning team was composed of a 15-yo Savir and his friends, his third time at the hackathon! They built a self improving agent that navigates the UIs fo Cloud providers and helps you do that!
With a huge thanks to sponsors, particularly Cursor who gave every hacker $50 of credits on Cursor platform, one guy used over 400M tokens and shipped fractal.surf from the hackathon! If youād like a short video recap, Ryan posted one here, and a huge shoutout to many fans of ThursdAI who showed up to support!
Vision, Video and AI Art
Grok Imagine 1.0 takes over video charts with native audio, lip-sync and 10 seconds generations.
We told you about Grok Imagine in the API last week, but this week it was officially launched as a product and the results are quite beautiful. Itās also climbing to top of the charts on Artificial Analysis and Design Arena websites.
Kling 3.0 is here with native multimodal, multi-shot sequences (X, Announcement)
This is definitely a hot moment for video models as Kling shows some crazy 15 second multi-shot realistic footages that have near perfect character consistency!
The rise of the agentic (clawgentic?) internet a.k.a ClankerNet
Last week we told you that ClawdBot changed its name to Moltbot (I then had to update the blogpost as that same day, Peter rebranded again to OpenClaw, which is a MUCH better name)
But the āmoltā thing took hold, and the creator of an āAI native redditā called MoltBook exploded in virality. It is supposedly a completely agentic reddit like forum, with sub-reddits, and agents verifying themselves through their humans on X.
Even Andrej Karpathy sent his bot in there (though admittedly it posted just 1 time) and called this the closest to
More
Hey, Alex from W&B here š Let me catch you up! The most important news about AI this week today are, Anthropic updates Opus to 4.6 with 1M context window, and they held the crown for literally 1 hour before OpenAI released their GPT 5.3 Codex also today, with 25% faster speed and lower token utilization. āGPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results.ā We had VB from OpenAI jump on to tell us about the cool features on Codex, so donāt miss that part. And this is just an icing on otherwise very insane AI news week cake, as weāve also had a SOTA transcription release from Mistral, both Grok and Kling are releasing incredible, audio native video models with near perfect lip-sync and Ace 1.5 drops a fully open source music generator you can run on your mac! Also, the internet all but lost it after Clawdbot was rebranded to Molt and then to OpenClaw, and.. an entire internet popped up.. built forn agents! Yeah... a huge week, so letās break it down. (P.S this weeks episode is edited by Voxtral, Claude and Codex, nearly automatically so forgive the rough cuts please) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Anthropic & OpenAI are neck in neck Claude Opus 4.6: 1M context, native compaction, adaptive thinking and agent teams Opus is by far the most preferred model in terms of personality to many folks (many ThursdAI panelists included), and this breaking news live on the show was met with so much enthusiasm! A new Opus upgrade, now with a LOT more context, is as welcome as it can ever get! Not only is it a 4-time increase in context window (though,the pricing nearly doubles after the 200K tokens mark from $5/$25 to $10/37.5 input/output, so use caching!), itās also scores very high on MRCR long context benchmark, at 76% vs Sonnet 4.5 at just 18%. This means significantly better memory for longer. Adaptive thinking for auto calibrating how much tokens the model needs to spend per query is interesting, but remains to be seen how well it will work. Looking at the benchmarks, a SOTA 64.4% on Terminalbench 2, 81% on SWE bench, this is a coding model with a great personality, and the ability to compact context to better serve you as a user natively! This model is now available (and is default) on Claude, Claude Code and in the API! Go play! One funny (concerning?) tidbig, on the vendingbench Opus 4.6 earned $8000 vs Gemini 3 pro $5500, but Andon Labs who run the vending machines noticed that Opus achieved SOTA via ācollusion, exploitation, and deception tacticsā including lying to suppliers š Agent Teams - Anthropicās built in Ralph? Together with new Opus release, Anthropic drops a Claude code update that can mean big things, for folks running swarms of coding agents. Agent teams is a new way to spin up multiple agents with their own context window and ability to execute tasks, and you can talk to each agent directly vs a manager agent like now. OpenAI drops GPT 5.3 Codex update: 25% faster, more token efficient, 77% on Terminal Bench and mid task steering OpenAI didnāt wait long after Opus, in fact, they didnāt wait at all! Announcing a huge release (for a .1 upgrade), GPT 5.3 Codex is claimed to be the best coding model in the world, taking the lead on Terminal Bench with 77% (12 point lead on the newly released Opus!) while running 25% AND using less than half the tokens to achieve the same results as before. But the most interesting to me is the new mid-task steer-ability feature, where you donāt have to hit the āstopā button, you can tell the most to adjust on the fly! The biggest notable jump in this model on benchmarks is the OSWorld verified computer use bench, though thereās not a straightforward way to use it attached to a browser, the jump from 38% in 5.2 to 64.7% on the new one is a big one! One thing to note, this model is not YET available via the API, so if you want to try it out, Codex apps (including the native one) is the way! Codex app - native way to run the best coding intelligence on your mac (download) Earlier this week, OpenAI folks launched the Codex native mac app, which has a few interesting features (and now with 5.3 Codex its that much more powerful) Given the excitement many people had about OpenClaw bots, and the recent CoWork release from Anthropic, OpenAI decided to answer with Codex UI and people loved it, with over 1M users in the first week, and 500K downloads in just two days! It has built in voice dictation, slash commands, a new skill marketplace (last month we told you about why skills are important, and now they are everywhere!) and built in git and worktrees support. And while it cannot run a browser yet, Iām sure thatās coming as well, but it can do automations! This is a huge unlock for developers, imagine setting Codex to do a repeat task, like summarization or extraction of anything on your mac every hour or every day. In our interview, VB showed us that commenting on an individual code line is also built in, as well as switching to āsteerā vs queue for new messges while codex runs is immensely helpful. One more reason I saw people switch, is that the Codex app can natively preview files like images whereās the CLI cannot, and itās right now the best way to use the new GPT 5.3 Codex model that was just released! Itās now also available to Free users and regular folks get 2x the limits for the next two months. In other big company news: OpenAI also launched Frontier, a platform for enterprises to build and deploy and manage āAI coworkersā, while Anthropic is going after OpenAI with superbowl ads that make fun of OpenAIās ads strategy. Sam Altman really didnāt like this depiction that show that ads will be part of the replies of LLMs. Open Source AI Alibaba drops Qwen-coder-next, 80B with only 3B active that scores 70% on SWE (X, Blog, HF) Shoutout to Qwen folks, this is a massive release and when surveyed the āone thing about this week must not missā 2 out of 6 cohosts pointed a finger at this model. Built on their ānextā hybrid architecture, Qwen coder is specifically designed for agentic coding workflows. And yes, I know, weāre coding heavy this week! It was trained on over 800K verifiable agentic tasks in executable environments for long horizon reasoning and supports 256K context with a potential 1M yarn extension. If you donāt want to rely on the the big guys and send them your tokens, this one model seems to be a good contender for local coding! Mistral launches Voxtral Transcribe 2: SOTA speech-to-text with sub 200ms latency This one surprised and delighted me maybe the most, ASR (automatic speech recognition) has been a personal favorite of mine from Whisper days, and seeing Mistral release an incredible near real time transcription model, which we demoed live on the show was awesome! With apache 2.0 license, and significantly faster than Whisper performance (though 2x larger at 4B parameters), Voxtral shows a 4% word error rate on FLEURS dataset + the real time model was released with Apache 2 so you can BUILD your agents with it! The highest praise? Speaker diarization, being able to tell who is speaking when, which is a great addition. This model also outperforms Gemini Flash and GPT transcribe and is 3x than ElevenLabs scribe at one fifth the cost! ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub) This open source release surprised me the most as I didnāt expect weāll be having Suno at home any time soon. Iāve generated multiple rock tracks with custom lyrics on my mac (though slower than 10 seconds as I donāt have a beefy home GPU) and they sound great! This weeks buzz - Weights & Biases update Folks who follow the newsletter know that we hosted a hackathon, so hereās a small recap from the last weekend! Over 180 folks attended out hackathon (a very decent 40% show up rate for SF). The winning team was composed of a 15-yo Savir and his friends, his third time at the hackathon! They built a self improving agent that navigates the UIs fo Cloud providers and helps you do that! With a huge thanks to sponsors, particularly Cursor who gave every hacker $50 of credits on Cursor platform, one guy used over 400M tokens and shipped fractal.surf from the hackathon! If youād like a short video recap, Ryan posted one here, and a huge shoutout to many fans of ThursdAI who showed up to support! Vision, Video and AI Art Grok Imagine 1.0 takes over video charts with native audio, lip-sync and 10 seconds generations. We told you about Grok Imagine in the API last week, but this week it was officially launched as a product and the results are quite beautiful. Itās also climbing to top of the charts on Artificial Analysis and Design Arena websites. Kling 3.0 is here with native multimodal, multi-shot sequences (X, Announcement) This is definitely a hot moment for video models as Kling shows some crazy 15 second multi-shot realistic footages that have near perfect character consistency! The rise of the agentic (clawgentic?) internet a.k.a ClankerNet Last week we told you that ClawdBot changed its name to Moltbot (I then had to update the blogpost as that same day, Peter rebranded again to OpenClaw, which is a MUCH better name) But the āmoltā thing took hold, and the creator of an āAI native redditā called MoltBook exploded in virality. It is supposedly a completely agentic reddit like forum, with sub-reddits, and agents verifying themselves through their humans on X. Even Andrej Karpathy sent his bot in there (though admittedly it posted just 1 time) and called this the closest to