The Default Data Rail for World-Models
DePIN Network Subsidizing Cloud Gaming Through World-Model Data Sales
The Trillion Dollar Opportunity
"Most Mind-Blowing Tech Ever"
Investor Quick Reference
Key numbers for your model (click to expand)
Market Landscape at a Glance
Why Decentralized Networks Win This Market
Why Centralized Efforts Fail
- Cost prohibitive: Paying human laborers $2-3/hour for gameplay data
- Low signal: Google's SIMA lacks player diversity, human irrationality & emotions.
- Scale limitations: Cannot coordinate thousands of simultaneous players
- Rights complexity: Individual licensing across multiple game publishers
Shaga's DePIN Solution
- Free gameplay: Players get free cloud gaming—data sales cover costs
- Authentic behavior: Real players making genuine decisions, not performing tasks
- Massive scale: ~1M Waitlisted Gamers, 7k+ Gamers (Invite-Only), 1,140 Nodes
- Network-level licensing: Bulk deals with publishers for entire ecosystem
The Scaling-Law Urgency: Genie3 now = GPT3 in 2020
Video and world models now show power-law scaling (bigger models = better models) like LLMs did in 2020. Labs that acquire the most interactive data the fastest will lead the next wave of AI.
The Unreplicable Moat (Timing)
Labs need this data now. Frontier training runs are active; they can't wait 12–24 months for a new network to bootstrap. We're ready now—the chicken‑and‑egg is already solved on Shaga.
- 1,140 Active Nodes — Nearly 1M gamers waitlisted, hungering for supply to unlock
- Streamers + gaming communities are in place; buyers can turn on with minimal lead time
Executive Summary: Why Shaga Wins
The Market Signal
General Intuition raised $134M seed (Khosla, General Catalyst) using Medal's gaming clips. OpenAI offered $500M for Medal. DeepMind, Microsoft, xAI are building world models.
The bottleneck is causality data: how inputs change frames.
Why Shaga's Data Wins
- Cloud gaming captures better data: Long sessions (10-60+ min), pure native interactions, 120Hz synchronized controls
- Uncapped scale: 1 PC = many gamers (vs. Medal's 1 PC = 1 gamer). Massive throughput per node
- Diverse distributions: Casual gamers, not just pro streamers. Richer behavioral patterns
The Strategy:
[DATA] Build relationships with labs selling data & bootstrap network growth today.
[COMPUTE] Become the rails for world model edge-computing tomorrow.
Economic Advantage: The Resale Multiplier
Key Insight: Data production costs are paid once; Resale opportunity is infinite.
Traction & Production Run‑Rate
Network Metrics
+60% MoM growth.
Note: Metrics reconciled monthly.
$SHAG Token: The Data Acceleration Engine
We Have Demand
- Token rewards accelerate nodes onboarding
- Higher rewards for rare mechanics & quality data
- Network effects: more creators → better coverage
Revenue Acceleration
- Faster data collection → faster revenue scaling
- Supply/demand matching through token incentives
- Virtuous cycle: more data sales → bigger token rewards
Phase-1 Target: Data Persistence Layer
Goal: Convert existing network activity into AI-training datasets. Target ≥1M hrs/mo with $SHAG token incentives driving creator participation.
Product & Pricing Primitives
The "Premium Hour" Standard
Shaga's billable unit isn't raw footage—it's a rigorously defined data product. Each "premium hour" represents:
- Training-quality (AFK/menu filtered, QA validated)
- Video + native player controls (time-synced)
- Clean IP rights for commercial AI training
Current Production SKU
Core
$22M/3yr Deal • 2$/hr avgProduction-grade interactive data
- • Video: 720p60 MP4 (H.264, 6–12 Mbps CBR)
- • Controls: 120Hz native capture (not inferred)
- • Sync: ≤4ms p95 video↔controls alignment
What AI Labs Need in Training Data
Broad coverage of first-person titles across genres—shooters, RPGs, simulations—to capture varied physics, lighting, and interaction mechanics.
Full spectrum of player abilities, including valuable mistakes—missed shots, car crashes, failed jumps. Low-skill play enriches behavioral distributions.
Extended gameplay sessions (10-30+ minutes) to capture temporal dependencies, multi-step decision chains, and evolving strategy—not just isolated moments.
Future Data Enrichment Roadmap
As AI labs evolve from pre-training to fine-tuning and RL phases, Shaga can systematically enrich data without rebuilding supply:
Engagement Signals
- • Key-hold duration & mouse velocity
- • Player skill brackets & outcomes
- • Decision-making moments
Higher Fidelity
- • 1080p60+ resolution
- • 240Hz control sync
- • Ray-tracing & ultra settings
3D Geometric Data
- • Camera pose extraction
- • Depth maps (RGB-D)
- • Object segmentation masks
Buyer Economics & Shaga's Cost Advantage
The "DIY" Approach
For labs considering building their own datasets:
Shaga's Model
"Produce Once, Resell Many" advantage:
The Cloud Gaming Multiplier: 1 Node → Many Gamers
DIY Data Production (Labs' Alternative)
To produce 1,000 hours of data, a lab must provision 1,000 gaming PCs—each tied to a single local gamer. Fixed hardware costs, geographic constraints, and idle time waste.
Shaga's Cloud Gaming Model
A single gaming node streams to multiple gamers remotely from different regions and time zones. Same hardware, continuous utilization, exponentially higher data throughput per dollar of infrastructure.
The Resale Moat in Practice
The "Gold Rush" Buyer Map
The world model ecosystem spans hyperscalers and startups—all racing to solve the same data bottleneck. Shaga's buyer map categorizes these customers by their strategic importance, budget size, and specific data requirements.
Tier A: Strategic Hyperscalers
Biggest BudgetsGoogle DeepMind
Project
Genie 2/3, GameNGen
Data Need
High-fps gameplay with synchronized actions, camera-rich 3D scenes
Microsoft Research
Project
MineWorld, Muse/WHAM
Data Need
Minecraft-style frames+controls, minutes-long FPS trajectories
OpenAI
Project
Sora & Sora 2
Data Need
Interactive gameplay data — offered $500M for Medal last year
xAI (Elon Musk)
Project
World Models for gaming & simulation
Data Need
Interactive gameplay data with causal physics understanding
Alibaba
Project
The Matrix
Data Need
AAA game footage + real-world video with player actions
Tier B: Frontier Startups
High GrowthIncomplete, new ones funded under the radar
3D spatial world models
Photorealistic worlds
Interactive models
Matrix-Game 2.0
Data Broker Outreach
We're engaging with established data brokers who already serve frontier AI labs. These partnerships provide immediate access to buyer networks while we build direct relationships.
Enterprise Brokers
- • Defined.ai - Multimodal datasets
- • Appen - Enterprise programs
- • Sama - Curated datasets
- • TELUS International - Scale programs
Specialized Brokers
- • iMerit - Video/robotics focus
- • Toloka - Custom collections
- • TransPerfect - Global reach
Data Marketplaces
- • AWS Data Exchange
- • Snowflake Marketplace
- • Datarade
- • Narrative I/O
Lighthouse Customer: Proof of Demand
Wayfarer Labs has signed as our first enterprise customer, validating real market demand for interactive gaming data in world model training.
Pricing Strategy: Price Maker → Price Weapon
Phase I: Price Maker
As the dominant supplier, Shaga sets market prices. With limited competition and high demand, pricing anchors at premium levels to maximize revenue capture.
Extract maximum value while supply remains constrained and buyer urgency is high.
Phase II: Price Weapon
When competitors emerge, Shaga can weaponize its resale model. Because COGS are paid once and resold to N buyers, Shaga can price aggressively to starve competitors who lack multi-buyer depth.
Result: Shaga can undercut competitors 5:1 and still maintain profitability through resale depth. Competitors without distributed buyer relationships get priced out.
Aggressive Defense Playbook
In contested market segments, Shaga can temporarily price at $0.20-0.50/hr while securing 4-8 buyers per hour. This pricing is below any competitor's break-even unless they also have deep resale networks—which takes years to build.
Strategic pricing becomes a moat: competitors without multi-buyer infrastructure cannot survive a price war, even if they match Shaga's production costs.
Competitive Model Generations Drive Exponential Demand
The Competitive Dynamic: Industry rumors suggest Genie 3 trained on ~1M hours. As labs race to outperform each other, each model generation requires 10-100× more data. Every competing lab needs this data to stay relevant.
Scaling Law Context: The GPT-3→GPT-5 Parallel
Gen3 Models (Now)
Gen4 Models (2025-26)
Gen5 Models (2027+)
Key Insight: Demand is driven by competitive pressure, not just model performance. Every lab needs Gen4 data to compete with Gen4 models. Labs that fall behind a generation lose market relevance. This creates sustained, exponential demand independent of any single lab's training schedule.
Supply Engine: Frictionless P2P Scale
The Critical Innovation: Nodes Don't Play
Every gaming PC in the world can become a data-producing node—without the owner playing.
Node owners simply leave their PC online. Shaga's cloud gaming infrastructure streams to remote gamers from different regions and time zones. The node owner collects $SHAG tokens while sleeping, working, or traveling. The remote gamer gets free cloud gaming. Labs get data.
Frictionless onboarding = exponential supply growth. No competitor can replicate this P2P model.
What if GPUs designed for gaming, were used for gaming? (shocking)
Millions of consumer gaming GPUs sit idle because they can't compete in AI DePIN networks. RTX 3060s, 3070s, 4060s, 4080s, 4090s—the entire consumer GPU stack from NVIDIA and AMD—are poorly suited for AI training workloads compared to datacenter GPUs (A100, H100, B200).
Consumer GPUs: Bad for AI Compute
- • Lower FP16/INT8 throughput vs datacenter cards
- • Limited VRAM (8-24GB vs 80-192GB for H100/B200)
- • Poor performance/watt for training workloads
- • Can't compete in AI DePIN networks (Render, Akash, io.net)
Result: Idle consumer GPUs earn minimal returns in AI networks
Consumer GPUs: Perfect for Cloud Gaming
- • Designed for real-time rendering at 60-120+ FPS
- • NVENC/AMD VCE hardware encoding built-in
- • 8-16GB VRAM sufficient for AAA games at 1080p-1440p
- • Power efficiency optimized for sustained gameplay
Result: Shaga monetizes GPU supply other networks can't use
The Supply Unlock
Competitive Advantage: While AI DePIN networks compete for scarce datacenter GPUs, Shaga taps into hundreds of millions of consumer gaming GPUs sitting idle worldwide. These cards can't earn meaningful returns in AI compute markets but are perfectly suited for high-quality cloud gaming and data generation.
Streamer-Led GTM: Supply + Distribution in One
Streamers are both suppliers AND go-to-market channels. A single partnership unlocks:
- • Supply: Streamer's PC becomes a node (passive income)
- • Distribution: Their audience onboards as nodes or gamers (viral loop)
- • Content: Data collection becomes monetizable content (dual revenue)
Scale Physics: 100M Hour Capacity
Scale is technically feasible. Constraints are economic (token incentives, node onboarding), not physical infrastructure.
What Convinced Wayfarer Labs: Unprecedented Data Diversity
No other supplier offers this breadth + depth combination. Supports all PC games.
Go-to-Market: Demand & Supply Channels
Supply Acquisition: Data-Driven Emissions (100M PCs)
TAM & Regional Economics
268M PC gamers mapped by region with latency acceptance bands, ARPU ranges ($1.49-$36/mo), and infrastructure costs.
Node Economics
Operator payback periods, regional cost structures, and token emission schedules. Shows why nodes activate in premium markets.
5-Year Projections
Conservative/Base/Aggressive scenarios with subscriber growth, utilization targets, and revenue scaling by market.
Tier A: Strategic Hyperscalers (6-8 labs)
Offering
Core 720p60 @ 120Hz sync product. Gen4 model training volumes (10M+ hrs/yr).
Pricing
$2-5/hr (price maker). Premium pricing for rush delivery or title prioritization.
Motion
Direct BD led by founders. Wayfarer Labs as lighthouse customer ($22M/3yr).
Tier B: Frontier Startups (7+ labs)
Incomplete list, new ones funded under the radar. World Labs, Odyssey, Decart, Skywork AI, etc.
Offering
Same Core product. Gen3-Gen4 training volumes (1M+ hrs/yr).
Pricing
$2-3/hr volume pricing. Pay-as-you-go or quarterly contracts.
Motion
Direct founder BD + data broker partnerships for distribution.
Channel Partners: Brokers & Marketplaces (Low-Volume)
Target Channels
Enterprise brokers (Defined.ai, Appen, Sama), specialized brokers (iMerit, Toloka), data marketplaces (AWS Data Exchange, Snowflake).
Use Case
Handle low-volume purchases (10k-50k hrs/yr). Robotics labs, academic groups, or smaller buyers exploring world model data.
Strategy
Brokers provide distribution without Shaga's direct sales effort. High-volume buyers graduate to direct channel.
Defensible Advantages
1. Only Scaled Supplier of Full-Stack Training Data
Wayfarer Labs signed $22M/3yr because Shaga is the ONLY supplier with: pixels + synchronized controls (120Hz) + clean rights + train-ready format + proven diversity (1,800 games). Medal has clips but no controls. Google SIMA uses bots (expensive, low signal). Replay sites have controls but no pixels. Shaga solves the full stack.
Data quality differentiation + production readiness = switching costs once labs integrate Shaga pipelines into training workflows.
2. Impossible-to-Replicate Timing Window
2+ years of network operations. Chicken-and-egg (gamers need nodes, nodes need gamers) is SOLVED. Labs need data NOW for Gen3/Gen4 models—they can't wait 12-24 months for competitors to bootstrap networks. This timing window is worth billions if executed correctly.
Move fast: sign 3-5 more labs in 6 months. Become THE default supplier while competitors are still onboarding their first nodes.
3. DePIN Cost Structure (50-75% Cheaper)
Labs' alternatives all fail on economics: DIY (hire employees @ $50+/hr, unscalable). Medal: 1 PC = 1 gamer (linear, short clips). QA houses: $2-3/hr labor + build-to-order, no inventory. Shaga: $1/hr COGS + 1 node = many gamers + resale to N buyers.
Cost advantage widens with scale. At 10k nodes, per-hour cost goes DOWN (token rewards + network effects) while centralized costs stay flat.
Bottom Line: Only scaled supplier of bottleneck resource in the biggest AI race since LLMs, with 2-year head start and 50-75% cost advantage nobody can replicate.
Risk Assessment & Competitive Defense
Market Catalyst: Medal's Vertical Integration Creates Supply Gap
General Intuition raised $134M seed (Khosla Ventures, General Catalyst) using Medal.tv's gaming data, then vertically integrated. Medal is now a lab—not an open data supplier.
Impact: This removes 10M+ gamers from the open market. Every other lab (OpenAI, DeepMind, xAI, Microsoft, Anthropic) now needs alternative sources for interactive gaming data.
Our Position: We're the only scaled alternative with pixels + synchronized controls + multi-game breadth ready for immediate delivery. Labs that were evaluating Medal now have one option: us.
Competitive Landscape: Why Alternatives Can't Scale
Real competition comes from alternative data suppliers trying to serve the same 100M+ hour demand. Brokers (Appen, Scale AI, Defined.ai) are partners who provide distribution, not competitors building supply.
Telemetry/Replay-Only Platforms
Examples: ballchasing.com, OpenDota, HSReplay, PureSkill.gg, PUBG API, Overwolf Game Events
They have: Controls/telemetry for single games
They lack: Video frames. Controls without synchronized pixels = useless for world model training (can't learn visual prediction from historical control logs)
Risk Level: LOW - fundamentally wrong data type
IDM/LAM-Derived Actions (Model-Labeled)
Approach: Labs infer actions from frame pairs (inverse dynamics) or learn latent action mappings
Why it breaks: ~10% prediction error per step. Non-identifiable problem (many actions → same frame transition) creates biased labels that underpredict high-frequency corrections.
Result: Error compounds at scale → model collapse. Models learn "smooth" actions that fail in fast FPS dynamics.
Risk Level: MEDIUM - Labs try this, hit scaling limits, then buy Ground-Truth data (us)
QA/Playtesting Houses (Build-to-Order)
Examples: Keywords Studios, PlaytestCloud, custom capture programs
They have: Can produce custom datasets with right format (pixels + controls)
They lack: Linear cost structure ($2-5/hr labor), no resale economics, build-to-order bottleneck
Risk Level: HIGH - Most credible alternative format-wise
Our advantage: DePIN + $SHAG unlock exponential supply ($1/hr COGS) vs their linear hiring constraints
Cloud Gaming Platforms
Examples: GeForce NOW, Xbox Cloud, Luna
They have: Massive infrastructure, millions of users, publisher relationships
They lack: (1) Speed (12-18 mo approval/build cycles), (2) Brand risk tolerance (gamers wouldn't want gameplay sold)
Risk Level: MEDIUM - biggest long-term threat, but slow-moving and risk-averse
Our advantage: 18-24 month speed advantage + neutral third-party positioning (explicit opt-in + token rewards)
Shaga's Differentiation
- ✅ Only supplier with breadth + pixels + synchronized controls + clean rights
- ✅ Train-ready packaging: Data schema, loaders, QA pipelines, ShagaScore validation
- ✅ Cross-title normalization at scale: 1,800 games, consistent quality
- ✅ DePIN cost structure: $1/hr COGS vs $2-5/hr centralized alternatives
- ✅ Resale economics: Produce once, sell to N buyers (margin scales with buyer depth)
IP & Publisher Relationships: Three-Layer Defense
Layer 1: DePIN Regulatory Arbitrage
- • Decentralized protocol structure with token-based payments
- • Individual node operators make autonomous data decisions
- • Jurisdictional complexity across global node network
- Result: Creates operational flexibility during rapid scaling phase
Layer 2: Publisher Partnership Model - "Infinite DLC" Strategy
Publishers can't build data infrastructure but want AI monetization. We convert potential IP risk into revenue partnerships:
- • Publishers license titles to us, we handle data ops, they get 20-30% revenue share
- • They receive fine-tuned world models for their games = new revenue stream (DLC, UGC, infinite content)
- Target: Sign 3-5 major publishers (Riot, Epic, indies) in next 12 months with explicit licensing deals
Layer 3: Legal Framework
- • $2-3M allocated for direct licensing deals and legal infrastructure
- • Title whitelist of confirmed-safe games
- • Portfolio approach: build publisher partnerships to legitimize business model
- • Focus on publishers who benefit from AI ecosystem growth (indie devs, catalog monetization)
Big Tech Competition: Speed as Moat
Microsoft (Azure + Xbox), NVIDIA (GeForce NOW), Amazon (AWS + Luna) have resources but face constraints:
Our Advantages:
- • Speed: We're live with 2+ years of infrastructure. They need 12-18 months for approvals and builds.
- • Platform Risk: They face brand backlash if users discover gameplay is sold. We're neutral third-party with explicit opt-in.
- • Focus: They're distracted by $B-scale core businesses. We're 100% focused on this market.
Data Quality & Commoditization Defense
Mitigation Strategy:
- • Prove Quality Delta: Publish benchmarks showing training performance advantages
- • Build Switching Costs: Deep pipeline integration with labs' training infrastructure (6-12 months engineering work to replicate)
- • Exclusive Relationships: Custom data collection via publisher partnerships (priority access to new game launches)
- • Tooling Ecosystem: Train-ready loaders, quality validation, continuous monitoring → we become infrastructure, not just data
Operational Risks (LOW)
Wayfarer Labs Dependency
- • Sign 2-3 additional lab partners for publisher lab-as-a-service
- • Position as neutral data infrastructure layer that ANY lab can plug into
Privacy & Compliance
- • Comprehensive consent flows, age verification (COPPA)
- • On-device content redaction for sensitive data
- • Geo-fenced storage (GDPR, CCPA compliance)
The 18-24 Month Window: First-Mover Becomes Default Infrastructure
Three converging tailwinds create our land-grab moment:
1. Big tech is 12-18 months behind
(approvals, builds, bureaucracy)
2. Legal clarity is 18-24 months away
(time to build publisher legitimacy)
3. Medal's exit leaves market undersupplied RIGHT NOW
(labs need data today)
Critical Mass = Irreversible Moat:
- • 5-8 major lab contracts → switching costs lock us in
- • 100M+ hours delivered → we ARE the training data standard
- • 3-5 publisher partnerships → IP risk converts to revenue
- • 10k+ nodes → supply becomes unreplicable
This is a land-grab moment. We're the only scaled supplier ready now. By the time big tech builds or legal challenges emerge, we'll be embedded in every major lab's training infrastructure.
The window is open. We're moving.
The Investment Opportunity
Market Timing: Three Converging Forces
Supply Shock
Medal (10M+ gamers) vertically integrated with General Intuition. Open market supply disappeared overnight. Every other lab needs alternative sources NOW.
Scaling Law Breakthrough
World models show power-law scaling like LLMs in 2020. Labs that secure data supply first dominate Gen4/Gen5 generations.
Speed Advantage
We're live with 2+ years of infrastructure. Big tech needs 12-18 months to build. This is a land-grab moment.
The Path: Data Supplier → Infrastructure Standard
TAM Evolution:
$500M data market → $5B infrastructure → $50B+ compute layer
Why We Win
Unreplicable Timing
2-year head start. Labs can't wait for competitors to bootstrap.
DePIN Economics
50-75% cost advantage. Every dollar generates $6-15 revenue through resale.
Three-Sided Moat
Gamer distribution + lab relationships + compute rails
The Opportunity
18 months to embed in every major lab's infrastructure. Too expensive for big tech to displace. Too late for startups to catch up.