LLMLlama

Llama 70B

by Meta

Sep 15, 2025128K context$0.05/M input$0.21/M output

Llama 70B is Meta's refined 70B model. Improved instruction following and reduced toxicity. The workhorse for production deployments requiring reliability.

View on HuggingFace
Fleek Pricing
$0.0025/GPU-second
Context128K tokens
Estimated Token Cost
Input
$0.05/M
Output
$0.21/M
Based on 10,000 tokens/sec
vs CompetitorsSave 26%

Overview

Parameters

70B

Architecture

Dense Transformer

Context

128K

Provider

Meta

Best For

Production workloadsEnterprise deploymentContent moderationGeneral AI

OpenAI Compatible

Drop-in replacement for OpenAI API. Just change the base URL.

Pay Per Second

Only pay for actual GPU compute time. No idle costs.

Enterprise Ready

99.9% uptime SLA, SOC 2 compliant, dedicated support.

Auto Scaling

Scales from zero to thousands of requests automatically.

Compare Pricing

FleekFireworksTogetherBaseten
Input$0.05$0.90$0.88
Output$0.21$0.90$0.88
Savings70%70%

Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.

Calculate Your Savings

See how much you'd save running Llama 70B on Fleek

Llama 70B
Your Fleek Cost
$21-31/mo
8.3K-12.5K GPU-sec × $0.0025
Fireworks AI
$90/mo
Your Savings70%
Annual Savings
$768

Technical Specifications

Model NameLlama 70B
Total Parameters70B
Active ParametersN/A
ArchitectureDense Transformer
Context Length128K tokens
Inference Speed10,000 tokens/sec
ProviderMeta
Release DateSep 15, 2025
LicenseLlama Community
HuggingFacehttps://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

Ready to run Llama 70B?

Join the waitlist for early access. Start free with $5 in credits.

View Pricing