Unfair Advantage by Parul Singh

Unfair Advantage by Parul Singh

Home
Notes
Archive
About

Share this post

Unfair Advantage by Parul Singh
Unfair Advantage by Parul Singh
The next $10B AI company will own the weirdest data

The next $10B AI company will own the weirdest data

Why the most defensible moats aren’t in LLMs—they’re in forgotten workflows and friction-filled datasets

Parul Singh
Apr 22, 2025
8

Share this post

Unfair Advantage by Parul Singh
Unfair Advantage by Parul Singh
The next $10B AI company will own the weirdest data
3
2
Share
Cross-post from Unfair Advantage by Parul Singh
We totally agree with this thesis, and we're personally watching weird data in: - industrial: steel, concrete, niche part manufacturing, supply chains no one's heard of, etc - nature data: ocean ecological state, coral photogrammetry, high quality soil ground truthing - energy: electrical transmission data, grid utilization, interconnection queue and usage, historical production and consumption for DERs, etc there are so many opportunities, and we're excited to find the best founders building in these areas. We'll post more about this soon, but this piece from Parul Singh hits! -
Cerulean Ventures
variations of weird data

Everyone’s chasing the next foundational model.
But the AI company that builds a true monopoly?
It won’t win with scale.
It’ll win with strange, high-friction, nobody-else-wants-it data.

Weird data is hard to clean. Hard to label. Hard to get access to.
That’s what makes it a moat.

Enjoying Unfair Advantage? Subscribe to get sharp, strategic insights on startups, venture, and tech shifts—straight to your inbox.


Why weird data wins

In AI, defensibility doesn’t come from technical novelty anymore. It comes from what nobody else can see.

  • Tesla’s lead in autonomous driving isn’t from algorithms—it’s from billions of real-world miles.

  • Medivis builds AR-guided surgical tools trained on proprietary 3D medical imaging datasets, painstakingly curated from hospitals and surgeons over years.

  • Channel19 optimizes trucking logistics with a digital marketplace fueled by real-time freight and carrier data, scraped from small operators others overlook.

  • Klaviyo powers e-commerce marketing with a proprietary dataset of customer behavior—clickstreams, purchase histories, and engagement signals—collected from 143,000 merchants, giving it an edge no generic AI can replicate.

These are messy. Expensive. Domain-specific.
And that’s the point.


The secret: friction is the moat

The more annoying the data is to collect, the more likely it is to be valuable and defensible.

  • Healthcare transcripts full of acronyms, hesitations, and human nuance

  • Factory sensor logs with no consistent format

  • Field notes from oil rigs, classrooms, or farming tools

  • Customer support voice calls that require redaction, labeling, and industry context

You can’t buy this off the shelf. You have to earn it—through workflow, access, and patience.

“The deepest AI moats are built on data no one else wanted badly enough to collect.”


How to build a weird data moat

🛠️ Founder playbook:

  • Build for data capture: Your product should generate proprietary signal by solving a real pain point.

  • Go vertical first: Focus on one narrow niche where the data is high value and underutilized.

  • Own the loop: Make the data useful in product, improving retention and performance.

  • Treat ops like IP: Labeling, cleaning, structuring—it’s not overhead. It’s defensibility.

“10 golden samples are more valuable than 10,000 public ones.”


What I’m watching

The most exciting AI startups today aren’t flexing 100B parameters. They’re silently dominating:

  • Payer-side claims workflows in healthcare

  • Campaign orchestration and attribution in martech

  • Real-time workflow automation in enterprise software

  • Sensor + voice fusion data in elder care

These aren’t just models—they’re monopolies on niche, critical, impossible-to-replicate data, solving real customer pain.


The investor takeaway

The sharp investors I know are asking a different set of questions:

  • What feedback loop are you capturing?

  • What signal improves with every user interaction?

  • What would a competitor have to do to replicate your dataset?

Because the next $10B outcome?
It probably won’t come from another chatbot.

It’ll come from a startup quietly logging overlooked signals from workflows you’ve never seen— until suddenly, they’re the only ones with the data that matters.

What's the weirdest dataset you wish you could own?

(I'd love to hear your answer—just hit reply.)

Enjoying Unfair Advantage? Subscribe to get sharp, strategic insights on startups, venture, and tech shifts—straight to your inbox...

8

Share this post

Unfair Advantage by Parul Singh
Unfair Advantage by Parul Singh
The next $10B AI company will own the weirdest data
3
2
Share

No posts

© 2025 Parul Singh
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share