
Sovereign AI, Simplified: OVHcloud on Hugging Face Inference Providers
Sovereign AI, Simplified: OVHcloud on Hugging Face Inference Providers
If you’re looking for the speed and simplicity of serverless AI combined with the governance of a trusted European cloud, OVHcloud on Hugging Face Inference Providers offers an outstanding solution. Within minutes, you can access popular open models using an OpenAI-compatible API, route your traffic to OVHcloud for data sovereignty, and only pay for what you use. This guide will walk you through the benefits of this integration, its significance, and how to quickly get started with straightforward, copy-paste examples.
What You Get at a Glance
- One API, Many Providers: Seamlessly switch between models and providers without the need to rewrite your application.
- OpenAI-Compatible: Direct your OpenAI client to a single base URL and append a provider selector.
- OVHcloud Routing: Run models on OVHcloud AI Endpoints to ensure robust regional control and GDPR-compliant processing.
- Simplified Billing: Enjoy monthly credits on Hugging Face and a pay-as-you-go model with no markup at the provider rates.
- Modern Model Catalog: Access a range of high-quality LLMs and VLMs, including open-weight options for better portability.
OVHcloud AI Endpoints feature prominently in the provider list for Hugging Face Inference Providers, enabling you to select it directly for supported models.
Quick Refresher: What Are Hugging Face Inference Providers?
Inference Providers offer a single, consistent interface to interact with models hosted by various partners. Rather than juggling multiple SDKs and billing systems, make standardized API calls through the Hugging Face router. Here’s what you can do:
- Use the OpenAI-compatible Chat Completions API.
- Specify different providers for each model by adding a suffix.
- Continue using the same client code across Python, JavaScript, or raw HTTP.
This unified interface allows for easy exploration of alternatives, optimization for speed or cost, and eliminates vendor lock-in. It’s integrated into the Hugging Face website, SDKs, and documentation.
Discover OVHcloud AI Endpoints on the Hub
OVHcloud AI Endpoints is OVHcloud’s managed inference service that hosts open models utilizing OVHcloud infrastructure. This service prioritizes data sovereignty and privacy, which are crucial for organizations bound by European regulations. Hugging Face integrates OVHcloud as a primary provider within the same API framework used for other providers.
Beyond supporting developers, OVHcloud promotes its AI Endpoints as a serverless, pay-as-you-go solution, geared towards production workloads across Europe, Canada, and the APAC region. OVHcloud processes requests in trusted environments, with a catalog that includes LLMs, multimodal models, code models, speech, and more.
Why Choose OVHcloud as Your Provider?
- Data Sovereignty and GDPR Compliance: OVHcloud is a European cloud provider, with infrastructure and policies tailored to meet regional compliance requirements.
- Consistency in Performance and Scale: Serverless endpoints are optimized for interactive applications and production traffic, with plans for evolving performance tiers.
- Open Ecosystem: Since the endpoints support open-weight models, you maintain the flexibility to utilize similar models elsewhere when necessary.
- Focus on Growing European Infrastructure: Partnerships and investments signify a commitment to providing sovereign compute and cloud options.
Supported Tasks and Model Examples
Currently, OVHcloud supports Chat Completion for both LLMs and VLMs on Hugging Face. Example model families typically include Meta Llama, Mistral, Qwen, and others spanning text, code, and vision-language tasks, depending on what is available in OVHcloud’s catalog at any time. Always check the model page to confirm the availability with the provider.
Pricing and Billing Overview
You have two options:
– Routed by Hugging Face: Utilize your Hugging Face token and be billed at the provider’s standard rates directly through Hugging Face. No provider account is necessary, and monthly credits apply with no added markup.
– Custom Provider Key: Use your own key for a provider, allowing direct billing from that provider, though Hugging Face credits do not apply in this scenario.
Free tiers and credits are available, with subsequent usage being pay-as-you-go. Check the live documentation for current details.
OpenAI-Compatible by Design
Hugging Face Inference Providers fully support the OpenAI-compatible Chat Completions API. You can specify the provider directly in the model path to ensure your requests are directed to OVHcloud while utilizing your existing OpenAI client and tools.
Quickstart: 5-Minute Setup
Before getting started:
- Create a Hugging Face token with the appropriate scope for Inference Providers.
- Select a model that lists OVHcloud as a provider on its model page.
- Decide whether you wish to route through Hugging Face billing or use a custom provider key.
Consult Hugging Face’s documentation for details on tokens and settings if you want to adjust provider preferences or add a custom key.
Python with the OpenAI Client
Here’s how to call a conversational model via Hugging Face’s router while explicitly targeting OVHcloud using a suffix in the model string.
“`python
import os
from openai import OpenAI
client = OpenAI(
base_url=”https://router.huggingface.co/v1″,
api_key=os.environ[“HF_TOKEN”],
)
resp = client.chat.completions.create(
model=”openai/gpt-oss-20b:ovhcloud”,
messages=[{“role”: “user”, “content”: “Summarize the benefits of sovereign AI in 3 bullets.”}],
)
print(resp.choices[0].message)
“`
This is the same client you’d use for other providers, with only the model string differing.
Python with huggingface_hub
If you prefer using the Hugging Face client, you can also explicitly select OVHcloud by appending the provider suffix.
“`python
from huggingface_hub import InferenceClient
client = InferenceClient() # Uses HF token from environment or configuration
chat = client.chat.completions.create(
model=”openai/gpt-oss-20b:ovhcloud”,
messages=[{“role”: “user”, “content”: “Give me a one-paragraph overview of OVHcloud AI Endpoints.”}],
)
print(chat.choices[0].message)
“`
You can also utilize automatic routing or choose dynamic options like fastest or cheapest by appending a selector (e.g., :fastest) to models that support it.
JavaScript Example
“`javascript
import { InferenceClient } from “@huggingface/inference”;
const client = new InferenceClient(process.env.HF_TOKEN);
const res = await client.chatCompletion({
model: “openai/gpt-oss-20b:ovhcloud”,
messages: [
{ role: “user”, content: “Draft a 2-sentence product pitch for a GDPR-first AI assistant.” }
],
});
console.log(res.choices[0].message);
“`
cURL Example
bash
curl https://router.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b:ovhcloud",
"messages": [{"role": "user", "content": "Name three use cases for VLMs in retail."}]
}'
All approaches target the same Hugging Face router and provider suffix, ensuring consistent behavior.
Multimodal Example (VLM)
OVHcloud also accommodates visual chat models. Here’s a Python example that prompts a VLM to describe an image.
“`python
import os
from openai import OpenAI
client = OpenAI(
base_url=”https://router.huggingface.co/v1″,
api_key=os.environ[“HF_TOKEN”],
)
resp = client.chat.completions.create(
model=”Qwen/Qwen2.5-VL-72B-Instruct:ovhcloud”,
messages=[
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: “Describe this image in one sentence.”},
{“type”: “image_url”, “image_url”: {“url”: “https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg”}}
]
}
],
)
print(resp.choices[0].message)
“`
This demonstrates the same provider suffix pattern for a VLM.
When to Choose OVHcloud
Select OVHcloud routing when any of the following apply:
- You need to maintain data processing and storage within specific jurisdictions.
- Your organization prefers European cloud partners.
- You’re developing applications for regulated sectors that prioritize transparent hosting locations and GDPR compliance.
- You desire serverless convenience alongside the flexibility of open-weight model options.
For performance-sensitive or latency-critical workloads, consult the available product flavors and tiers. OVHcloud outlines a Base API that is generally available and an upcoming Fast API tier tailored for stricter Service Level Objectives (SLOs).
Sample Use Cases to Kickstart Your Projects
- Customer Support Copilots: Keep data processing within designated regions.
- Knowledge Assistants: Enhance internal documentation using retrieval-augmented generation.
- Developer Tools: Automate code suggestions, linting, or code reviews.
- Multimodal Retail Assistants: Analyze images and respond to visual inquiries.
- Voice-Enabled Interfaces: Utilize automatic speech recognition (ASR) and text-to-speech (TTS) models from the catalog.
These scenarios leverage standard chat APIs, ensuring a familiar development flow even when changing models or providers.
Tips for Smooth Integration
- Start in a Sandbox: Use the Inference Playground on the model page to prototype, then transition to code.
- Use Explicit Provider Suffixes: Lock in :ovhcloud in production to avoid surprises if your provider preference changes.
- Monitor Billing: Check usage in your Hugging Face settings to ensure the correct organization is billed for team projects.
- Stay Informed on Quotas and Credits: PRO and enterprise subscriptions include monthly credits for routing through Hugging Face.
- Consider Portability: Favor open-weight models to maintain migration paths if needed.
Detailed guidance and settings can be found in the Inference Providers documentation, which includes billing breakdowns and organization billing headers.
Security and Governance Notes
- Authentication: Employ fine-grained tokens for Inference Providers and scope them appropriately.
- Data Handling: Requests are routed through Hugging Face to the chosen provider; always review the provider-specific terms and data processing policies.
- Regionality: Verify where a chosen model is hosted when opting for OVHcloud routing and check the OVHcloud catalog for region-specific options.
- Change Management: As models and provider catalogs evolve, be sure to pin your model names and versions to ensure reproducibility.
Stay updated with the latest supported tasks, models, and options by consulting the Hugging Face and OVHcloud documentation.
Troubleshooting Quick Checks
- Authentication Errors: Confirm your HF token scopes and ensure you’re hitting the correct router base URL.
- Model Not Found with :ovhcloud: Make sure the model page lists OVHcloud as a supported provider.
- Credit Depletion: Switch to pay-as-you-go or utilize a custom provider key.
- Latency Issues: Test with a smaller model, use the :fastest option for dynamic selection, or evaluate the OVHcloud performance tiers.
Conclusion
Hugging Face Inference Providers simplify the process of adopting a multi-provider strategy without necessitating changes to your application stack. Integrating OVHcloud provides the opportunity to run open models on a European cloud with a strong focus on data sovereignty and transparent pricing. If your team requires a production-ready pathway that effectively balances governance, performance, and development speed, routing your workloads to OVHcloud via the Hugging Face API is an efficient and practical choice.
Explore the examples in this guide, secure your models with :ovhcloud, and deliver something valuable this week!
FAQs
1) Can I Continue Using My OpenAI Client Libraries?
Yes, you can direct your client to https://router.huggingface.co/v1 and append the provider suffix to the model string, for instance, openai/gpt-oss-20b:ovhcloud.
2) How Does Billing Work When I Route Through Hugging Face?
You will be charged the provider’s standard rates, without markup from Hugging Face. Monthly credits apply to eligible accounts for routed requests.
3) Does OVHcloud Support Multimodal Chat Models?
Yes, OVHcloud supports VLM chat completion in addition to LLM chat completion when such models are available. Please check the provider page and model listings.
4) Why Would I Choose OVHcloud Over Another Provider?
To ensure data processing remains within European jurisdictions, align with GDPR regulations, and utilize a European cloud while maintaining the flexibility of open-weight models and Hugging Face’s multi-provider API.
5) How Can I Ensure My Organization Is Billed Instead of My User Account?
Pass the appropriate billing header or set up organization billing in your settings, as per the instructions in the pricing and billing guide.
References
- OVHcloud AI Endpoints provider documentation on Hugging Face.
- Overview of Inference Providers and partner table.
- Pricing and billing for Inference Providers.
- Changelog detailing the OpenAI-compatible API and provider suffix in model path.
- OVHcloud press release regarding AI Endpoints availability, regions, and catalog overview.
- OVHcloud product options: Base API is now GA, and Fast API is on the way.
Thank You for Reading this Blog and See You Soon! 🙏 👋
Let's connect 🚀
Latest Insights
Deep dives into AI, Engineering, and the Future of Tech.

DeepSeek Math V2: The Open-Source Reasoner Achieving Gold-Level IMO Performance
DeepSeek Math V2 claims gold-level IMO performance and near-perfect Putnam results. See how its verifier-generator loop works and how it compares to Gemini.
Read Article


