“We need better data!” – a distressed call from a data scientist, presented with the latest csv from the plate reader. “There’s too much noise in these replicates! What are these controls? Why do they look different from last week?”
These rumblings are growing louder around machine learning in bio. We have AlphaFold, sure, but protein folding is as close to physics as biology gets. What about the mess of interacting pathways that give us actual living systems? To model them, we need data that accurately reflects that biology, and getting that data is hard.
So much of the data we generate to explain biology are proxies, even proxies of proxies. Extracting meaning from this data means controlling for the noise in the biological system, but also having a deep understanding of the method that we use to generate that data. Without it, that data is just as likely telling us something about our method than the biology itself.
Most wet lab biologists understand this intuitively. It gets to the root of why running these experiments is so flipping difficult. One week your experiment is working a treat, the next you’re bashing your head against the gel imager. Something must have changed in the method? But what?! There’s so many details to keep track of. Did I make a mistake? Why didn’t I think of that control? Maybe my enzyme went off? When I was in the lab, I always blamed the enzyme... it was never the enzyme.
Figuring out, planning, and tracking the how of an experiment – the steps to follow in the lab, the calculations, the controls… all this stuff really matters. It matters both to the grand future of biotech that we’re all excited about, and also to helping free scientists from the cycle of failed experiments and inconclusive results.
And yet, it feels as though very little is being done to help. Automation is often held up as the answer, but there is still so much science that is out of the reach of robots for now. Automation requires a known working protocol, but a lot of science is in flux – too changeable, too complex, too ill-defined.
And what about software? We have tailor-made tools to help designers prototype and collaborate, developer tools that superpower engineers while they code… not to mention all the software built to help wrangle the data coming out of these experiments. Why not something to help design and capture those experiments? A word processor and a spreadsheet hardly feels like the answer.
We need to recognise that getting ‘better data’ means working together with scientists. Let’s take a fresh look at the process of designing and running experiments. How can we be more efficient, more effective, more reproducible? What tools could help us? What can we learn from other fields? Maybe there’s a world where waiting for the gel imager to load doesn’t have to be an emotional rollercoaster.
At Briefly, we’re building software for scientists to help tackle this problem; but we won’t get there with software alone. There are questions around lab tooling, process optimisation, statistics, collaboration… If you’re an experimental biologist who obsesses over the details of doing your science, we’re building a community of people like you so we can figure this out together. If you’re interested in joining us, head here and let’s chat.