Back to applications

Inference Engine

High-performance serving infrastructure for open source language models.

Unified API gateway for serving open source language models through various inference backends. Currently implemented with Ollama, with planned support for vLLM, TGI, llama.cpp, and TensorRT-LLM for different deployment scenarios.

Key Features:

  • Authentication and usage tracking
  • Request tracing and cost estimation
  • Multi-backend support (planned)
  • TypeScript and Python SDKs

Learn more about Inference Engine →