Inference Engine — Hector Sanchez

Unified API gateway for serving open source language models through various inference backends. Currently implemented with Ollama, with planned support for vLLM, TGI, llama.cpp, and TensorRT-LLM for different deployment scenarios.

Key Features:

Authentication and usage tracking
Request tracing and cost estimation
Multi-backend support (planned)
TypeScript and Python SDKs

Learn more about Inference Engine →