Introduction

Coming Soon — Fleek is launching soon. Join the waitlist to get early access.

Fleek is an AI inference optimization platform that delivers 3x faster, 70% lower cost inference without sacrificing quality. We optimize two layers: model (NVFP4, custom kernels) and GPU (microVM infrastructure, 95%+ utilization). The research from Weyl, our internal AI lab, powers everything on Fleek.

What We're Building

Our optimization engine takes any AI model and makes it run dramatically faster. We're starting with cloud APIs for the best generative models, then expanding to universal PyTorch optimization and embedded AI for edge devices.

Key capabilities:

3x faster inference — NVFP4 optimization on Blackwell with custom CUDA kernels
70% lower cost — Full stack efficiency: model optimization + 95%+ GPU utilization
Zero configuration — Paste a HuggingFace URL, get an optimized API
Zero quality loss — Precision-aware quantization preserves model accuracy

Custom Model Optimization (Coming Soon)

Fleek won't be limited to the models we showcase. Soon, we'll optimize any model:

Open-source models — FLUX, DeepSeek, Llama, etc. (available now)
Fine-tuned models — Your custom checkpoints (coming soon)
Proprietary models — Private weights you've trained (coming soon)
Any PyTorch model — If it runs on PyTorch, we can optimize it (coming soon)

Same process. Same $0.0025/GPU-sec pricing. No "custom model premium."

Verified Foundations

Fleek's infrastructure isn't just optimized—it's formally verified. Core components are proven correct in Lean4, with cryptographic attestation at every layer. This means:

Proven correctness — Critical paths verified with mathematical proofs
Cryptographic attestation — SHA256, ed25519, git-based audit trail
Resolution soundness — Every operation completes cleanly or rolls back

Resources

Code & Community

GitHub — Open source tools and examples
Discord — Join the community
X / Twitter — Follow for updates
YouTube — Video content and tutorials

Research

Weyl is Fleek's internal AI research lab, responsible for the R&D powering efficient inference optimization. Explore the technical foundations:

Research Overview — Papers and technical articles
Benchmark Methodology — How we measure performance

Token

FLK Token — Tokenomics and utility
Token Page — Live stats and community

Content

Blog — Company updates and technical deep-dives

What's Next

We're preparing to launch our inference platform. In the meantime:

Join the waitlist — Get early access
Read the research — Understand our approach
Join Discord — Connect with the team

This documentation is pre-production and will be updated as we release features.