DSP

Demonstrate Search Predict (DSP) is a framework, which allows users to build pipelines across retrieval models and LLMs. This is a powerful technique to answer multi-hop questions which require additional questions to be answered to reach the final answer for a given question. Existing RAG pipelines do a one shot; retrieve from retrieval model (RM) and synthesize with the LLM. In contrast, DSP framework allows to build pipelines between RM and LLM. For more information on DSP, please refer to the original repo.

DSP Framework

In NEXTgpt, we have implemented a multihop Q&A system using DSP library.

DSP Framework has three major stages in its implementation.

Demonstrate Stage
In this stage, we need to provide a training set with sample questions and answers related to the same datasource that we need to perform Q&A tasks on.
Search Stage
In this stage, the framework will search for relevant datasource content to answer the given input question. This stage will involve LLM calls to generate search query to find relevant content from datasource.

Below is example of prompts for the search stage of DSP:

id="__codelineno-0-1" name="__codelineno-0-1"> # Generate a search query using LLM SearchQuery = dsp.Type( prefix="Search Query:", desc="${ a simple question for seeking the missing information}", ) # Define the search rationale, you can customize this based on your use case SearchRationale = dsp.Type( prefix="Rationale: Let's think step by step. To answer this question,\ class="s2"> we first need to find out", desc="${ the missing information}", ) # To answer complex questions, you need to breakdown the question into sub-questions rewrite_template = dsp.Template( instructions="Write a search query that will help answer a complex question.", question=Question(), rationale=SearchRationale(), query=SearchQuery(), ) # Hop template to use in each search stage of a complex question CondenseRationale = dsp.Type( prefix="Rationale: Let's think step by step. Based on the given references,\ class="s2"> we have learned the following.", desc="${ reference names and information from the references that \ class="s2"> provide useful clues}", ) hop_template = dsp.Template( instructions=rewrite_template.instructions, context=Context(), question=Question(), rationale=CondenseRationale(), query=SearchQuery(), )

Predict Stage

This is the final stage, which gives the answer based on the answers gathered from the previous stages.

Question = dsp.Type(prefix="Question:", desc="${the question to be answered}")

Answer = dsp.Type(
    prefix="Answer:",
    desc="${the comprehensive answer with the reference names and reasoning.}",
    format=dsp.format_answers,
)

Context = dsp.Type(
    prefix="References:\n",
    desc="${references that may contain relevant content}",
    format=formatters.passages2text,
)

Rationale = dsp.Type(
    prefix="Rationale: Let's think step by step.",
    desc="${a step-by-step deduction that identifies the correct response,\
          which will be provided below}",
)

qa_template = dsp.Template(
    instructions="Answer questions by citing the reference names. "
    "Explain your reasoning.",
    question=Question(),
    answer=Answer(),
)

qa_template_with_CoT = dsp.Template(
    instructions=qa_template.instructions,
    context=Context(),
    question=Question(),
    rationale=Rationale(),
    answer=Answer(),
)

DSP Data Flow

flowchart LR
    A[Input Prompt] -->B[DSP Framework]
    B --> |Demonstrate|C[LLM] --> |LLM Response| B
    B --> |Search|D[Datasource] --> |Relevant Documents|B
    B --> |Predict|E[LLM] --> |LLM Response|B
    B --> |Post Processing|F[Answer]

How to use

Usecase Creation

A user can use the developer portal under the tab usecase creation or call the POST API. Please use the following steps:

Create a datasource first by uploading relevant files (Please refer to Datasource for further details.) We provided a SingtelDemo datasource as an example.
Create DSP use case by providing attributes listed above in the API section

Here's a sample attributes value:

  datasource: SingtelDemo
  train_k: 2
  topk: 3
  max_hops: 3
  num_queries: 3
  num_preds: 1

Here's a sample of training set questions for SingtelFYQA datasource:

    - question: What is the Profit before tax of Singtel in 2021?
    answer: S$m 754
    - question: How many shares of Singtel does Lim Swee Say have in 2023? 
    answer: 1490
    - question: Did Profit before tax increase from 2022 to 2023?
    answer: "No"
    - question: How many shares of Singtel and Starhub does Wee Siew Kim have in 2022?
    answer: Singtel, 532278, Starhub, 72600
    - question: How many shares of Mapletree Industrial Trust Management Ltd. and Mapletree Commercial Trust Management Ltd. does Wee Siew Kim have in 2023? 
    answer: Industrial, 169101, Commercial, 45312

Above, we've provided a set of training questions pertaining to SingtelDemo datasource that we've created, which contains information about Singtel company.

Note: To obtain the above set of training questions, one has to verify manually that the questions and response are correct

Usecase Test/Debug

You can test the usecase through the frontend or debug using the developer portal under the tab Debug.

Type your query under the message and the framework will refer to the assigned datasource and will execute the DSP method to answer your query.

How to use SingtelFYQA usecase?

you can ask questions related to Singtel which has been uploaded as a datasource.

Sample input:

List the names of directors that are seeking re-election in 2022, then give the reference link

Sample response:

The names of directors seeking re-election in 2022 are Yong Hsin Yue, Rajeev Suri, and Wee Siew Kim. The reference link is 1 for Yong Hsin Yue and 2 for Rajeev Suri and Wee Siew Kim.

Complicated questions usually requires the LLM to refer and derive the answers from multiple pages/sections in the datasource. DSP framework will be useful for this type of multi-hop operations.

In the example above there are 2 references given. The number 1 and 2 are clickable links, that will direct you to the document inside the datasource that the LLM refer the answer from.

Note: Since this framework will make multiple calls to the LLM, do expect a longer waiting time for the response.

The response wording might be different, but the content should remain the same.