Introduction
Coming Soon — Fleek is launching soon. Join the waitlist to get early access.
Fleek is an AI inference optimization platform that delivers 3x faster, 70% lower cost inference without sacrificing quality. We optimize two layers: model (NVFP4, custom kernels) and GPU (microVM infrastructure, 95%+ utilization). The research from Weyl, our internal AI lab, powers everything on Fleek.
What We're Building
Our optimization engine takes any AI model and makes it run dramatically faster. We're starting with cloud APIs for the best generative models, then expanding to universal PyTorch optimization and embedded AI for edge devices.
Key capabilities:
- 3x faster inference — NVFP4 optimization on Blackwell with custom CUDA kernels
- 70% lower cost — Full stack efficiency: model optimization + 95%+ GPU utilization
- Zero configuration — Paste a HuggingFace URL, get an optimized API
- Zero quality loss — Precision-aware quantization preserves model accuracy
Resources
Code & Community
- GitHub — Open source tools and examples
- Discord — Join the community
- X / Twitter — Follow for updates
- YouTube — Video content and tutorials
Research
Weyl is Fleek's internal AI research lab, responsible for the R&D powering efficient inference optimization. Explore the technical foundations:
- Research Overview — Papers and technical articles
- Benchmark Methodology — How we measure performance
Token
- FLK Token — Tokenomics and utility
- Token Page — Live stats and community
Content
- Blog — Company updates and technical deep-dives
What's Next
We're preparing to launch our inference platform. In the meantime:
- Join the waitlist — Get early access
- Read the research — Understand our approach
- Join Discord — Connect with the team
This documentation is pre-production and will be updated as we release features.