Performance Benchmarks
Results from 100 requests with context length of 800 tokens
Burst Load
All requests simultaneous
49,498
tok/s prefill
1,002
tok/s decode
Low Traffic
10 requests per second
4,518
tok/s prefill
91
tok/s decode
High Traffic
50 requests per second
17,579
tok/s prefill
355
tok/s decode
Request rate significantly impacts token processing speed due to server-side batching optimization