LLMMIT

GLM 4.7

by Z.ai (Zhipu)

Dec 22, 2025200K context$0.14/M input$0.54/M output

GLM 4.7 scored 73.8% on SWE-bench Verified — highest among open-source models. 84.9% on LiveCodeBench, beating Claude. Preserved Thinking maintains reasoning across turns.

View on HuggingFace
Fleek Pricing
$0.0025/GPU-second
Context200K tokens
Estimated Token Cost
Input
$0.14/M
Output
$0.54/M
Based on 29,500 tokens/sec
vs CompetitorsSave 9%

Overview

Parameters

355B (MoE)

Architecture

Mixture of Experts

Context

200K

Provider

Z.ai (Zhipu)

Best For

Code generationMulti-step reasoningAgentic codingTool use

OpenAI Compatible

Drop-in replacement for OpenAI API. Just change the base URL.

Pay Per Second

Only pay for actual GPU compute time. No idle costs.

Enterprise Ready

99.9% uptime SLA, SOC 2 compliant, dedicated support.

Auto Scaling

Scales from zero to thousands of requests automatically.

Compare Pricing

FleekFireworksTogetherBaseten
Input$0.14$0.60$0.45$0.60
Output$0.54$2.20$2.00$2.20
Savings70%70%70%

Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.

Calculate Your Savings

See how much you'd save running GLM 4.7 on Fleek

GLM 4.7
Your Fleek Cost
$54-91/mo
21.6K-36.4K GPU-sec × $0.0025
Fireworks AI
$140/mo
Your Savings48%
Annual Savings
$810

Technical Specifications

Model NameGLM 4.7
Total Parameters355B (MoE)
Active Parameters32B
ArchitectureMixture of Experts
Context Length200K tokens
Inference Speed29,500 tokens/sec
ProviderZ.ai (Zhipu)
Release DateDec 22, 2025
LicenseMIT
HuggingFacehttps://huggingface.co/THUDM/GLM-4.7

Ready to run GLM 4.7?

Join the waitlist for early access. Start free with $5 in credits.

View Pricing