Save up to 70% on AI inference. Pay only for the GPU time you actually use.
Pay only for the GPU time you use. Our optimizations = your savings.
Free
(one-time)
Pro
Pay only for what you use
Enterprise
For teams at scale
GPU-second pricing is transparent. When Fleek optimizes a model to run faster, your effective cost per token drops automatically.
Fleek effective pricing vs competitors
All prices per million tokens. Fleek uses GPU-second billing converted to effective token rates.
| Model | Fleek $/M | Fireworks | Together AI | Baseten | Savings | ||||
|---|---|---|---|---|---|---|---|---|---|
| In | Out | In | Out | In | Out | In | Out | ||
| DeepSeek R1 | $0.25 | $1.00 | $1.35 | $5.40 | $3.00 | $7.00 | — | — | 50% |
| gpt-oss-120b | $0.02 | $0.09 | $0.15 | $0.60 | $0.15 | $0.60 | $0.10 | $0.50 | 53% |
| Llama 70B | $0.05 | $0.21 | $0.90 | $0.90 | $0.88 | $0.88 | — | — | 65% |
| Kimi K2.5 | $0.12 | $0.47 | $0.60 | $2.50 | $1.00 | $3.00 | $0.60 | $2.50 | 48% |
| GLM 4.7 | $0.14 | $0.54 | $0.60 | $2.20 | $0.45 | $2.00 | $0.60 | $2.20 | 26% |
| Qwen3 Coder 480B | $0.07 | $0.29 | $0.45 | $1.80 | $2.00 | $2.00 | $0.38 | $1.53 | 41% |
| Qwen3-235B | $0.11 | $0.45 | $0.22 | $0.88 | — | — | — | — | Up to 18% |
* Savings compared to lowest listed competitor price.
12.0K-20.0K tokens/sec throughput
100M tokens ÷ 20.0K TPS × $0.0025/sec
= $100.00/mo
One second of GPU compute time. Simple, transparent pricing at $0.0025 per GPU-second.
GPU-seconds are transparent and pass our optimizations directly to you. When we make models run faster, your costs drop automatically. Token pricing hides these savings.
Use our calculator above for model-specific estimates. Costs vary by model based on throughput—faster models cost less per token.
We built our own inference stack with s4 codegen triggering native Blackwell FP4 tactics. We achieve industry-leading throughput, and our GPU-second pricing passes those gains directly to you.
Learn more about our technologyYes! Every account starts with $5 in free credits. No credit card required. That's enough for thousands of API calls to test your integration.