Unified API gateway for serving open source language models through various inference backends. Currently implemented with Ollama, with planned support for vLLM, TGI, llama.cpp, and TensorRT-LLM for different deployment scenarios.
Key Features:
- Authentication and usage tracking
- Request tracing and cost estimation
- Multi-backend support (planned)
- TypeScript and Python SDKs