Reflections from the First Agentic Bio Hackathon

Last Thursday, we hosted the first (to my knowledge) agentic bio hackathon. Scientists, engineers and ML folks were in the same room working on a real problem: design a GLP-1 receptor agonist that meaningfully improves on the current landscape, pick a specific design, and justify your answer.

With this first hackathon, there was no experimental validation, no wet lab follow-up, or no claim that anything designed that night would become a drug. The point wasn't to ship a new peptide or small molecule, but rather to give scientists direct access to state-of-the-art tools and agents.

The problem I chose, and why

GLP-1 agonists are a good stress test. The space is clinically relevant, commercially saturated, and full of real constraints. There are known tradeoffs around efficacy, tolerability, oral bioavailability, and developability, so instead of an open-ended "design whatever" problem, teams had to reason within constraints.

These constraints were intentional and we were also interested in a fundamental way of working: how were scientists, when working with agentic tools, reasoning about a hard problem? How were they surveying the landscape, iterating with tools to refine hypotheses, choosing a direction, and defending it?

Initial hesitance on hosting a hackathon

If you're a scientist, there are a lot of valid reasons to be skeptical of hackathons:

Science takes longer than 90 minutes
These tools come with real learning curves
Hackathons are often optimized for engineers
It can feel like "theory", not real science

Early reactions to the idea reflected that skepticism - polite but hesitant, with a sense that this might be nudging scientists into tech territory rather than meeting them where they already are. That skepticism turned out to be important context for what happened during the event.

What actually happened

People broke into small teams: some were familiar with the tools, some were bench scientists, some were engineers with little biological intuition, and most teams were mixed.

Everyone worked for about 90 minutes, and what stood out was where teams struggled. Engineers often hit walls without biological context, while bench scientists - once they realized they could interact with agents in natural language - moved quickly through the problem.

In one case, a team began optimizing a particular direction, only to realize halfway through that they should have chosen another path. The tools made it easy to probe and go deeper, yet it was intuition built from years at the bench that helped them recognize when a path no longer made biological sense.

And the novel thing wasn't that agents helped people work faster, but that scientists didn't need to jump through tickets, specs, or package installs before engaging with the tools with context they already had. In most settings today, scientists hand off ideas to computational counterparts and re-engage later. Here, scientific judgment showed up before anything was handed off.

Why this matters now

We're in an era of constant releases: scientific models, agents, and tools. But more tools doesn't necessarily mean more effective science.

Most scientists don't want to learn ten new frameworks or become engineers. They want to explore ideas, iterate through hypotheses, and understand where tools help and where they don't. And this is not about replacing computational scientists; in fact, it was made clear that technical expertise in those areas are increasingly important. It did show that scientists could engage earlier in the process and not just react to outputs later on.

What this suggests about the future

Hackathons are not the answer to the next blockbuster drug, but challenges like these will be an important entry point, or a way of getting scientists early, low-friction exposure to the newest models.

These settings make it possible to engage with agentic systems without committing to a stack, workflow, or weeks of setup - they can try things, push on assumptions, and see what actually helps them before deciding what's worth adopting. During this process, tools stop feeling like scary, abstract, mono-font packages to install, and start showing how they fit (or don't) in a natural reasoning flow.

They also give model builders a chance to see how their tools are used in practice: where they help and where addressable gaps or bugs show up.

What this changes

Formats like these challenge-based hackathons will shift how progress is measured. Instead of benchmaxxing or scoring well on predetermined exams, they surface how tools are actually used when scientists are reasoning through real problems. This will be particularly helpful in earlier and more ambiguous stages of where hypotheses are still forming, and exploration and direction matters more than precision.