Model Comparison

Author

Context Length3K

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.

Input$0.11 / M tokens

Output$0.19 / M tokens

Images– –

Quantization– –

Max Tokens (input + output)3K

Max Output Tokens– –

Stream cancellation

Supports Tools– –

No Prompt Training

Reasoning– –