Extracting Data
Getting started with extracting structured data from unstructured documents using Data Wizard.
Prepare your extraction task
Before we can extract some data, you’ll need to tell the wizard what data you want to extract and how to extract it.
Create an extractor for your extraction task
You can just describe the shape of data you want to extract, and an AI will generate an initial draft for you.
Refine your JSON Schema and output instructions
Edit the generated schema to your liking and add other instructions for the AI to follow. Read more in the Extractors section.
Extractors
Extractors are the core configuration objects in Data Wizard
Select the LLM to use
You can select from a large number of LLMs thanks to the LLM Magic PHP package. You will need to add your API keys in the LLM settings before you can use them in an extractor. Find out more in the LLM Provider Configuration section.
LLM Provider Configuration
Configure your Large Language Model (LLM) API provider in Data Wizard to connect to leading LLMs like OpenAI, Anthropic, Google AI, Mistral AI, and more.
Select the extraction strategy to use
There are multiple built-in strategies to choose from, or you can create your own custom strategy.
Extraction Strategies
Learn about the built-in and custom extraction strategies available in Data Wizard.
Run the extractor
After you have configured your extractor, you can run it to extract data from your documents.
You can either use the built-in UI to do this, or you can integrate the feature into an existing application using the iFrame and HTTP API.
Run inside Data Wizard
Start an Extraction through a Bucket or an Extractor
All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it’s being generated, as well as the final extracted data. You can view the data both as raw JSON or in the GUI derived from the JSON schema.
Launch button in bucket and extractor views
View the extracted data in the GUI or as JSON
All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it’s being generated, as well as the final extracted data. You can view the data both as raw JSON or in the GUI derived from the JSON schema.
You can use the built-in UI to create and configure your extractor.
Inspect each extraction step
You can inspect each step of the extraction process to see how the AI is interpreting your instructions and what data it is returning.
You can use the built-in UI to create and configure your extractor.
Download the Data as JSON, XML or CSV
asd
Restart the extraction with different parameters
asd
Modify the data using AI
asd
Analyze extraction costs
asd
Run inside your own application
Next Steps
Learn how to extract some data
Step by step guide to extract data from documents using Data Wizard.