> ## Documentation Index
> Fetch the complete documentation index at: https://docs.data-wizard.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Extracting Data

> Getting started with extracting structured data from unstructured documents using Data Wizard.

## Prepare your extraction task

Before we can extract some data, you'll need to tell the wizard what data you want to extract and how to extract it.

<Steps>
  <Step title="Create an extractor for your extraction task">
    You can just describe the shape of data you want to extract, and an AI will generate an initial draft for you.

    <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/setup/quick-create-extractor.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=8a79c4fb79bdc2e44a8bc98aaccdea36" alt="Create extractor" width="2987" height="2188" data-path="images/screenshots/setup/quick-create-extractor.png" />

    <br />

    <br />
  </Step>

  <Step title="Refine your JSON Schema and output instructions">
    Edit the generated schema to your liking and add other instructions for the AI to follow. Read more in the [Extractors](./extractors) section.

    <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/setup/edit-extractor.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=4116564f9e2bd2715d8cb07078b305cc" alt="Define JSON Schema" width="2987" height="2188" data-path="images/screenshots/setup/edit-extractor.png" />

    <Card title="Extractors" icon="laptop-code" href="./extractors">
      Extractors are the core configuration objects in Data Wizard
    </Card>

    <br />

    <br />
  </Step>

  <Step title="Select the LLM to use">
    You can select from a large number of LLMs thanks to the [LLM Magic](https://github.com/Capevace/llm-magic) PHP package.
    You will need to add your API keys in the LLM settings before you can use them in an extractor. Find out more in the [LLM Provider Configuration](./configure-llm) section.

    <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/setup/select-model.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=a7b578033e21f135918c539e7982b891" alt="Define JSON Schema" width="672" height="640" data-path="images/screenshots/setup/select-model.png" />

    <Card title="LLM Provider Configuration" icon="sliders" href="./configure-llm">
      Configure your Large Language Model (LLM) API provider in Data Wizard to connect to leading LLMs like OpenAI, Anthropic, Google AI, Mistral AI, and more.
    </Card>

    <br />

    <br />
  </Step>

  <Step title="Select the extraction strategy to use">
    There are multiple [built-in strategies](./strategies) to choose from, or you can create your own custom strategy.

    <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/setup/select-strategy.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=b24944e4f1eb0cc2c5bfb5ca4fd7b412" alt="Define JSON Schema" width="678" height="550" data-path="images/screenshots/setup/select-strategy.png" />

    <Card title="Extraction Strategies" icon="code-branch" href="./strategies">
      Learn about the built-in and custom extraction strategies available in Data Wizard.
    </Card>

    <br />

    <br />
  </Step>

  <Step title="Run the extractor">
    After you have configured your extractor, you can run it to extract data from your documents.

    You can either use the built-in UI to do this, or you can integrate the feature into an existing application using the iFrame and HTTP API.

    <CardGroup cols={2}>
      <Card title="Run inside DataWizard" icon="hat-wizard" href="#run-inside-data-wizard">
        Via Data Wizard's backend UI
      </Card>

      <Card title="Run inside your own application" icon="server" href="#run-inside-your-own-application">
        Via the embedded iFrame UI
      </Card>
    </CardGroup>
  </Step>
</Steps>

## Run inside Data Wizard

<Steps>
  <Step title="Start an Extraction through a Bucket or an Extractor">
    All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data.
    You can view the data both as raw JSON or in the GUI derived from the JSON schema.

    <p align="center">
      <Frame caption="Launch button in bucket and extractor views">
        <img alt="Launch from Extractor" src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/screenshots/extractors/start-menu.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=7b3c5c0eade44d6fa3afc187d26194ca" width="537" height="304" data-path="images/screenshots/extractors/start-menu.png" />
      </Frame>
    </p>

    <Tabs>
      <Tab title="Launch from Extractor">
        <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/screenshots/extractors/start.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=cbf95b64c79da5337947103c5e274f16" alt="Launch from Bucket" width="581" height="930" data-path="images/screenshots/extractors/start.png" />
      </Tab>

      <Tab title="Launch from Bucket">
        <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/screenshots/buckets/start.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=ef7b88972e3085d263070095025124a8" alt="Launch from Bucket" width="581" height="930" data-path="images/screenshots/buckets/start.png" />
      </Tab>
    </Tabs>
  </Step>

  <Step title="View the extracted data in the GUI or as JSON">
    All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data.
    You can view the data both as raw JSON or in the GUI derived from the JSON schema.

    <Tabs>
      <Tab title="View Data in GUI">
        You can use the built-in UI to create and configure your extractor.

        <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/run/run-gui.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=4c81723c558f84c5118bf334cb6b1851" alt="View Data in GUI" width="2988" height="2188" data-path="images/screenshots/run/run-gui.png" />
      </Tab>

      <Tab title="View Data as JSON">
        You can use the built-in UI to create and configure your extractor.

        <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/run/run-json.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=3ddbdd05ce1dc07d9ec387483759b424" alt="View Data as JSON" width="2988" height="2188" data-path="images/screenshots/run/run-json.png" />
      </Tab>
    </Tabs>
  </Step>

  <Step title="Inspect each extraction step">
    You can inspect each step of the extraction process to see how the AI is interpreting your instructions and what data it is returning.

    <Tabs>
      <Tab title="LLM Prompt and Responses">
        You can use the built-in UI to create and configure your extractor.

        <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/run/run-chat-1.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=a964292d19064306e23b2ccfecc080ab" alt="View Data in GUI" width="2988" height="2188" data-path="images/screenshots/run/run-chat-1.png" />
      </Tab>

      <Tab title="Supports Images and Tool Calls">
        You can use the built-in UI to create and configure your extractor.

        <img src="https://mintcdn.com/datawizard/0y9q_8kiZAcsvFxg/images/screenshots/run/run-chat-2.png?fit=max&auto=format&n=0y9q_8kiZAcsvFxg&q=85&s=88648d8fe1c16caba297824b2570c4da" alt="View Data as JSON" width="2988" height="2188" data-path="images/screenshots/run/run-chat-2.png" />
      </Tab>
    </Tabs>
  </Step>

  <Step title="Download the Data as JSON, XML or CSV">
    asd
  </Step>

  <Step title="Restart the extraction with different parameters">
    asd
  </Step>

  <Step title="Modify the data using AI">
    asd
  </Step>

  <Step title="Analyze extraction costs">
    asd
  </Step>
</Steps>

## Run inside your own application

<br />

<br />

<br />

**Next Steps**

<Card title="Learn how to extract some data" icon="list-ol" href="./extracting-data">
  Step by step guide to extract data from documents using Data Wizard.
</Card>

<CardGroup>
  <Card title="Extractors" icon="laptop-code" href="./extractors">
    Learn how to define and configure data extraction tasks.
  </Card>

  <Card title="Strategies" icon="code-branch" href="./strategies">
    Understand different data processing strategies.
  </Card>

  <Card title="LLM Provider Configuration" icon="sliders" href="./configure-llm">
    Set up your Large Language Model API keys.
  </Card>

  <Card title="Integration" icon="code" href="./integrate">
    Embed Data Wizard into other applications using iFrames or APIs.
  </Card>
</CardGroup>
