Configure LLM Providers
Configure your Large Language Model (LLM) API provider in Data Wizard to connect to leading LLMs like OpenAI, Anthropic, Google AI, Mistral AI, and more.
Data Wizard is designed to be LLM provider-agnostic, allowing you to choose from a variety of Large Language Models to power your data extraction tasks. To use Data Wizard, you need to configure your preferred LLM API provider and provide the necessary API keys.
It relies on the power of the LLM Magic PHP package to interact with almost any LLM provider, enabling you to switch between providers easily.
Supported LLM Providers
Data Wizard currently supports integration with the following leading LLM providers:
- OpenAI: Access models like GPT-4, GPT-3.5 Turbo, and more.
- Anthropic: Utilize Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, and other Claude models.
- Google AI: Leverage Gemini models (Gemini Pro, Gemini Flash).
- Mistral AI: Access Mistral Large, Mistral Small, Mistral 7B, and other Mistral models.
- OpenRouter: A proxy service that provides access to various LLMs through a single API, including models from OpenAI, Anthropic, Google, Mistral, and open-source models.
- Ollama: For running and using open-source LLMs locally on your machine.
Configuring API Keys
You need to obtain API keys from your chosen LLM provider and configure them in Data Wizard.
1. Obtain API Keys:
- OpenAI: Create an account at OpenAI Platform and generate API keys in your account settings.
- Anthropic: Sign up for an Anthropic account and obtain your API key from the Anthropic Console.
- Google AI (Gemini): Set up a Google Cloud project and enable the Gemini API. Generate API keys in the Google Cloud Console.
- Mistral AI: Create an account at Mistral AI and find your API key in your account dashboard.
- OpenRouter: Sign up at OpenRouter and get your API key from your OpenRouter account.
- Ollama: Ollama is for local model serving. You don’t need API keys, but you need to have Ollama installed and running with the desired models downloaded.
2. Configure API Keys in Data Wizard:
- Access Settings: Log in to your Data Wizard backend as a superadmin and navigate to the “Settings” menu item.
- LLM Provider Settings: Find the section for “LLM Providers”. You will see settings for each supported provider.
- Enter API Keys: For your chosen provider, enter your API key in the designated field (e.g., “OpenAI API Key”).
Configuration via Environment Variables (Docker):
When running Data Wizard in Docker, you can also configure API keys using environment variables. This is often a more secure and convenient way to manage secrets in containerized environments.
-
Example Docker Command with OpenAI API Key:
Replace
<YOUR_OPENAI_API_KEY>
with your actual OpenAI API key. Environment variables for other providers would follow a similar pattern (e.g.,ANTHROPIC_API_KEY
,GOOGLE_API_KEY
,MISTRAL_API_KEY
,OPENROUTER_API_KEY
). -
Example Docker Compose File with OpenAI API Key
Selecting an LLM in Extractors
Once you have configured your API keys, you can select specific LLMs within each Extractor.
- Edit Extractor: Open the Extractor you want to configure.
- Select Model: In the Extractor settings, find the “Model” dropdown menu.
- Choose a Model: Select your desired LLM from the list. The list is populated based on the providers you have configured and the models supported by Data Wizard.
Model String Format
You can also use model strings in the format <provider>/<model>
to specify models. For example:
openai/gpt-4o
anthropic/claude-3.5-sonnet
google/gemini-2.0-flash-lite
mistral/mistral-large
Choosing the Right LLM
The choice of LLM impacts:
- Performance: Different models have varying strengths in different tasks (e.g., reasoning, code generation, text generation). Refer to benchmark data and provider documentation to understand model capabilities.
- Context Length: Models have different context window sizes, limiting the amount of text and images they can process per inference call.
- Cost: Pricing varies significantly between models and providers. Consider the cost per token for input and output tokens, especially for large-scale data extraction.
Refer to up-to-date benchmarks like Livebench for current performance metrics across different model families.
Consider these factors when selecting an LLM for your specific extraction tasks and budget. Data Wizard’s evaluation features can help you compare the performance of different models.
Next Steps
Learn how to extract some data
Step by step guide to extract data from documents using Data Wizard.