> ## Documentation Index
> Fetch the complete documentation index at: https://docs.data-wizard.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Built-in Strategies

> Learn about the built-in and custom extraction strategies available in Data Wizard.

Strategies determine how Data Wizard processes the documents and interacts with the LLM. Data Wizard provides multiple built-in strategies, and you can also create custom strategies for specific needs.

## Built-in Strategies

* **Simple**: Sends as much of the document as possible within the token limit to the LLM in a single call. Suitable for small documents.
* **Sequential**: Splits the document into smaller parts (based on the chunk size), processes each part sequentially, and includes the results of the previous extraction in the prompt for the next part. Maintains contextual continuity.
* **Parallel**: Splits the document into independent parts and processes each part in isolation. Suitable for multiple independent data points. Good for extracting data that aren't interconnected across pages.
* **Auto-Merging**: Is the same as the sequential and parallel strategies, but additionally includes functionality that removes duplicate items by concatenating the items of the top-level properties and finally runs a final LLM call at the end to deduplicate the final results. This helps to make the models forget fewer entities if they have to make multiple calls.
* **Double-Pass**: Processes the document twice. On the first pass, it uses the parallel strategy, and on the second pass, it reviews and refines the first pass with the sequential strategy, taking both benefits for increased accuracy and efficiency. This one also supports auto-merging.

<AccordionGroup>
  <Accordion title="Simple">
    Sends as much of the document as possible within the token limit to the LLM in a single call.  Suitable for small documents.

    <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/graphs/strategy-simple.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=81b65013450e1c15d7636c54627d60b6" alt="Simple Strategy" width="3024" height="3840" data-path="images/graphs/strategy-simple.png" />
  </Accordion>

  <Accordion title="Sequential">
    Splits the document into smaller parts (based on the chunk size), processes each part sequentially, and includes the results of the previous extraction in the prompt for the next part. Maintains contextual continuity.

    <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/graphs/strategy-sequential.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=8f66c10583601c2fe63b42e4ec0d644b" alt="Sequential Strategy" width="1201" height="3840" data-path="images/graphs/strategy-sequential.png" />
  </Accordion>

  <Accordion title="Parallel">
    Splits the document into independent parts and processes each part in isolation. Suitable for multiple independent data points. Is good for extracting data that aren't interconnected across pages.

    <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/graphs/strategy-parallel.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=e89d17e463b8dd64b0f891038207288a" alt="Parallel Strategy" width="1242" height="3840" data-path="images/graphs/strategy-parallel.png" />
  </Accordion>

  <Accordion title="Sequential with Auto-Merging">
    Splits the document into smaller parts (based on the chunk size), processes each part sequentially, and includes the results of the previous extraction in the prompt for the next part. Maintains contextual continuity. This one also supports auto merging.
  </Accordion>

  <Accordion title="Parallel with Auto-Merging">
    Splits the document into independent parts and processes each part in isolation. Suitable for multiple independent data points. Is good for extracting data that aren't interconnected across pages. This one also supports auto merging.
  </Accordion>

  <Accordion title="Double-Pass">
    Processes the document twice.  On the first pass, it uses the parallel strategy, and on the second pass, it reviews
    and refines the first pass with the sequential strategy, taking both benefits for increased accuracy and efficiency. This one also supports auto merging.

    <img src="https://mintcdn.com/datawizard/IGMRXf370I-t8vt1/images/graphs/strategy-double-pass.png?fit=max&auto=format&n=IGMRXf370I-t8vt1&q=85&s=0cd36fe6089136285744aef7e296528f" alt="Double-Pass Strategy" width="2270" height="3840" data-path="images/graphs/strategy-double-pass.png" />
  </Accordion>
</AccordionGroup>

## Strategy Options

Each strategy can be configured with a set of options:

* **Chunk Size:** The maximum number of tokens to include in each LLM call.
* **Include Text:** Whether to include the raw text content of the document in the prompt.
* **Include Embedded Images:** Whether to include embedded images from the document in the prompt.
* **Mark Embedded Images:** Whether to mark embedded images in the document with identifiers.
* **Include Page Images:** Whether to include screenshots of the document pages in the prompt.
* **Mark Page Images:** Whether to mark page images with identifiers.

## Custom Strategies

<Card title="Learn how to build custom strategies for more control" icon="gear" iconSize={200} horizontal href="./custom-strategies">
  You can create custom strategies to tailor the extraction process to your specific needs. Custom strategies allow you to define how the document is processed, how the LLM is interacted with, and how the results are merged.
</Card>

<br />

<br />

<br />

**Next Steps**

<Card title="Learn how to extract some data" icon="list-ol" href="./extracting-data">
  Step by step guide to extract data from documents using Data Wizard.
</Card>

<CardGroup>
  <Card title="Extractors" icon="laptop-code" href="./extractors">
    Learn how to define and configure data extraction tasks.
  </Card>

  <Card title="Strategies" icon="code-branch" href="./strategies">
    Understand different data processing strategies.
  </Card>

  <Card title="LLM Provider Configuration" icon="sliders" href="./configure-llm">
    Set up your Large Language Model API keys.
  </Card>

  <Card title="Integration" icon="code" href="./integrate">
    Embed Data Wizard into other applications using iFrames or APIs.
  </Card>
</CardGroup>
