AI Gateway allows developers to expose multiple models through specific URL paths, simplifying the integration of various AI models into applications. This block ensures seamless communication and scalability by providing multiple proxy LLMs for different use cases. Key features include A/B testing between LLMs, querying multiple LLMs simultaneously, load balancing across LLMs, and selecting the best response from multiple LLMs. Additionally, it secures these LLMs with access rules and supports rate limiting at the user, team, or route level, enabling efficient management and optimization of AI models within applications.
Model Providers
Model providers provide access to foundational and private models over REST API. The model providers abstract the detail of model location, security tokens and availability.
Proxy LLM which splits the request between multiple LLMs
All the models above
multi-llm
Proxy LLM which queries multiple llms with the same query for critical use cases
All the models above
ab-llm
Proxy LLM which splits queries between multiple models in a given ratio
All the models above
best-llm
Proxy LLM which queries multiple models with same query and gives the better answer scored by different llm
All the models above
Examples
Model providers are defined in the ai/routes/ rolder of a product in YAML files under the key model-providers. There can be multiple YAML files in this folders and the AI Gateway service will coalesce all of them and resolve all providers.
Open AI, LLaMa 3 Providers
Load-balancing LLM Provider
Multi LLM Provider
A/B testing with LLM Providers
Best LLM Provider
Routes
Routes are defined by the developers in the ai/routes/ folder of a product in YAML files under the key routes. There can be multiple YAML files in this folders and the AI Gateway service will coalesce all of them and serve. Routes YAML has three key information for each route
Model Provider and configuration for that provider
Access Rules on who can access this route
Rate Limits defined at user, group and route level
Examples
Access Open AI gpt-4o
In the below example route the developer has created a new route /ai/tasks which can be accessed to Open AI models without exposing or sharijng the token. Here a user is allowed access to make the calls only if they are have llm-open-ai-allowed group in their jwt token claims. They have also applied additional usage
limits to keep the costs in control.
On the above endpoint, a user can call 60 times in a hour, and all users in a given department ‘Sales’ can
make 600 request in an hour collectively, and the entire user base requests are limited at 1000 per hour. Similarly the token limits are applied.
If any of the rate-limit thresholds are violated the gateway returns 429 HTTP status code.
Access local llama3-8b
Here the developer is exposing a local llama3-8b over a route /ai/know
Access local llama3-8b
Here the developer is exposing a local llama3-8b over a route /ai/know
Accessing the models
With the model-providers and routes configuration, the AI Gateway blocks provides unified API
access to all models. The format for of response matches the Open AI chat completion response format.
Examples
Here the developer is calling the /tasks route with POST verb with a json payload with the
prompt name and parameters required to expand the prompt. Refer to Prompts to learn more about defining prompts.