Close Menu
Spicy Creator Tips —Spicy Creator Tips —

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    What does ‘AI native’ even mean?

    October 25, 2025

    The 1% Deductible Rule to Avoid Overpaying for Home Insurance

    October 25, 2025

    BBC World Service – Global News Podcast, US deploys top aircraft carrier to Caribbean

    October 25, 2025
    Facebook X (Twitter) Instagram
    Spicy Creator Tips —Spicy Creator Tips —
    Trending
    • What does ‘AI native’ even mean?
    • The 1% Deductible Rule to Avoid Overpaying for Home Insurance
    • BBC World Service – Global News Podcast, US deploys top aircraft carrier to Caribbean
    • Late Artist’s ‘Pippins’ Being Used to Prompt Talks About Mental Health
    • Stop Motion Secrets From the Creator of ‘The Tiny Chef Show’
    • New Inflation Report Delivers Good News For Next Week’s I Bond Rate
    • Thailand’s Queen Mother Sirikit dies at 93 after more than a decade out of the public eye | Thailand
    • 2025 Talent Trailblazer Award winner revealed
    Facebook X (Twitter) Instagram
    • Home
    • Ideas
    • Editing
    • Equipment
    • Growth
    • Retention
    • Stories
    • Strategy
    • Engagement
    • Modeling
    • Captions
    Spicy Creator Tips —Spicy Creator Tips —
    Home»Ideas»A new AI coding challenge just published its first results – and they aren’t pretty
    Ideas

    A new AI coding challenge just published its first results – and they aren’t pretty

    spicycreatortips_18q76aBy spicycreatortips_18q76aJuly 24, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
    Blue code on a dark background presented at an angle.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A brand new AI coding problem has revealed its first winner — and set a brand new bar for AI-powered software program engineers. 

    On Wednesday at 5pm PST, the nonprofit Laude Institute introduced the primary winner of the Okay Prize, a multi-round AI coding problem launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian immediate engineer named Eduardo Rocha de Andrade, who will obtain $50,000 for the prize. However extra stunning than the win was his ultimate rating: he gained with appropriate solutions to only 7.5% of the questions on the take a look at.

    “We’re glad we constructed a benchmark that’s truly onerous,” mentioned Konwinski. “Benchmarks ought to be onerous in the event that they’re going to matter,” he continued, including: “Scores could be totally different if the massive labs had entered with their greatest fashions. However that’s form of the purpose. Okay Prize runs offline with restricted compute, so it favors smaller and open fashions. I really like that. It ranges the taking part in area.”

    Konwinski has pledged $1 million to the primary open-source mannequin that may rating increased than 90% on the take a look at.

    Just like the well-known SWE-Bench system, the Okay Prize exams fashions towards flagged points from GitHub as a take a look at of how properly fashions can take care of real-world programming issues. However whereas SWE-Bench is predicated on a hard and fast set of issues that fashions can practice towards, the Okay Prize is designed as a “contamination-free model of SWE-Bench,” utilizing a timed entry system to protect towards any benchmark-specific coaching. For spherical one, fashions had been due by March twelfth. The Okay Prize organizers then constructed the take a look at utilizing solely GitHub points flagged after that date.

    The 7.5% high rating stands in marked distinction to SWE-Bench itself, which presently reveals a 75% high rating on its simpler ‘Verified’ take a look at and 34% on its tougher ‘Full’ take a look at. Konwinski nonetheless isn’t certain whether or not the disparity is because of contamination on SWE-Bench or simply the problem of gathering new points from GitHub, however he expects the Okay Prize mission to reply the query quickly.

    “As we get extra runs of the factor, we’ll have a greater sense,” he advised TechCrunch, “as a result of we count on individuals to adapt to the dynamics of competing on this each few months.”

    Techcrunch occasion

    San Francisco
    |
    October 27-29, 2025

    It would look like an odd place to fall quick, given the big selection of AI coding instruments already publicly obtainable – however with benchmarks changing into too straightforward, many critics see initiatives just like the Okay Prize as a vital step towards fixing AI’s rising analysis downside.

    “I’m fairly bullish about constructing new exams for current benchmarks,” says Princeton researcher Sayash Kapoor, who put ahead an identical concept in a latest paper. “With out such experiments, we are able to’t truly inform if the difficulty is contamination, and even simply focusing on the SWE-Bench leaderboard with a human within the loop.”

    For Konwinski, it’s not only a higher benchmark, however an open problem to the remainder of the business. “In case you hearken to the hype, it’s like we ought to be seeing AI docs and AI attorneys and AI software program engineers, and that’s simply not true,” he says. “If we are able to’t even get greater than 10% on a contamination free SWE-Bench, that’s the fact test for me.”

    arent challenge coding pretty published Results
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    spicycreatortips_18q76a
    • Website

    Related Posts

    Caerphilly byelection result live: Plaid Cyrmu beats challenge from Reform UK to win pivotal Welsh parliament vote | Byelections

    October 24, 2025

    Stocks Gain as Investors Monitor China Trade Developments, Earnings Results; Oil Futures Jump

    October 23, 2025

    L’Oréal Stock Sinks After Q3 Results Release

    October 22, 2025

    Vibe Coding Tips for SMBs and Startups

    October 16, 2025

    New WordPress Vibe Coding Simplifies Building Websites

    October 16, 2025

    Etsy sellers aren’t sure about the new ChatGPT checkout integration

    October 14, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Don't Miss
    Growth

    What does ‘AI native’ even mean?

    October 25, 2025

    This week, information stories revealed that Meta could be slicing a whole lot of jobs…

    The 1% Deductible Rule to Avoid Overpaying for Home Insurance

    October 25, 2025

    BBC World Service – Global News Podcast, US deploys top aircraft carrier to Caribbean

    October 25, 2025

    Late Artist’s ‘Pippins’ Being Used to Prompt Talks About Mental Health

    October 25, 2025
    Our Picks

    Four ways to be more selfish at work

    June 18, 2025

    How to Create a Seamless Instagram Carousel Post

    June 18, 2025

    Up First from NPR : NPR

    June 18, 2025

    Meta Plans to Release New Oakley, Prada AI Smart Glasses

    June 18, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    About Us

    Welcome to SpicyCreatorTips.com — your go-to hub for leveling up your content game!

    At Spicy Creator Tips, we believe that every creator has the potential to grow, engage, and thrive with the right strategies and tools.
    We're accepting new partnerships right now.

    Our Picks

    What does ‘AI native’ even mean?

    October 25, 2025

    The 1% Deductible Rule to Avoid Overpaying for Home Insurance

    October 25, 2025
    Recent Posts
    • What does ‘AI native’ even mean?
    • The 1% Deductible Rule to Avoid Overpaying for Home Insurance
    • BBC World Service – Global News Podcast, US deploys top aircraft carrier to Caribbean
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 spicycreatortips. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.