Still learning LangChain here myself, but I will share the answers I've come up with in my own search.
Notes:
- OP questions edited lightly for clarity.
- Each of these questions is probably better as its own separate post, but I did appreciate having them all together as it pushed me to connect the dots between them. So here's hoping this is useful to others as well.
Question 1
In load_qa_with_sources_chain(), PROMPT is defined as:
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])
which expects two inputs, 'summaries' and 'question'.
However, what is passed in only question=query and NOT 'summaries'.
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT) query = "What did the president say about Justice Breyer" chain({"input_documents": docs, "question": query}, return_only_outputs=True)
How does input_documents map to summaries?
Answer 1
First, the stuff chain loader takes the prompt we pass in and defines an LLMChain with that prompt. And then you can see that that llm_chain is used to initialize a StuffDocumentsChain.
def _load_stuff_chain(
llm: BaseLanguageModel,
prompt: BasePromptTemplate = stuff_prompt.PROMPT,
document_prompt: BasePromptTemplate = stuff_prompt.EXAMPLE_PROMPT,
document_variable_name: str = "summaries",
verbose: Optional[bool] = None,
**kwargs: Any,
) -> StuffDocumentsChain:
llm_chain = LLMChain(llm=llm, prompt=prompt, verbose=verbose)
return StuffDocumentsChain(
llm_chain=llm_chain,
document_variable_name=document_variable_name,
document_prompt=document_prompt,
verbose=verbose,
**kwargs,
)
But also notice that there are two other arguments to _load_stuff_chain(): document_prompt and document_variable_name.
document_prompt: If we do not pass in a custom document_prompt, it relies on the EXAMPLE_PROMPT, which is quite specific. (It is long so I won't repost here.).
document_variable_name: Here you can see where 'summaries' first appears as a default value. And we can see it defined as
the variable name in the llm_chain to put the documents in
In that same stuff.py script there is a _get_inputs() method that collects all of the inputs that will go into the LLM for evaluation. One of those inputs is
inputs[self.document_variable_name] = self.document_separator.join(doc_strings)
So now we know this is actually inputs['summaries'] by default. Also, side note, doc_strings is each doc in docs formatted using document_prompt (via format_document()).
Ok, so now we are almost there, the final step in the stuff system is to send all of the docs, formatted into document_prompts, to the llm_chain for evaluation. That is done in combine_docs() - ending in this call to llm_chain.predict():
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
Remember, we initialized llm_chain with the original PROMPT we passed in, and now it is clear that it is both expecting 'question' AND 'summaries' as input variables.
Question 2
In the summarize_chain example:
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY IN ITALIAN:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)
How does docs map to text?
Answer 2
This gets easier from here, as a lot of the summarize chain code follows similar patterns to the qa chain.
We can see in _load_map_reduce_chain() there's a default value, 'text', which gets assigned to document_variable_name in the MapReduceDocumentChain that is initialized and returned.
Also note L52 and L54 where two different LLMChain objects are initialized, one for map (takes map_prompt) and one for reduce (takes combine_prompt).
# L52
map_chain = LLMChain(llm=llm, prompt=map_prompt, verbose=verbose)
# L54
reduce_chain = LLMChain(llm=_reduce_llm, prompt=combine_prompt, verbose=verbose)
And then reduce_chain is built into combine_document_chain, which is where we can first see the relationship coming in between 'text' (the default value for combine_document_variable_name) and PROMPT (now built into reduce_chain).
combine_document_chain = StuffDocumentsChain(
llm_chain=reduce_chain,
document_variable_name=combine_document_variable_name,
verbose=verbose,
)
Question 3
How does it work with map_prompt and combine_prompt being same?
Answer 3
The fact that both prompts are the same here looks like it may be for the convenience of the example, as the suggested prompt is generic:
>`"Write a concise summary of the following: {text} #..."`
The user can enter different values for map_prompt and combine_prompt; the map step applies a prompt to each document, and the combine step applies one prompt to bring the map results together.
You can see where these steps occur in the code:
The LLM chain for map is applied in combine_docs() step of the MapReduceDocumentsChain:
# self.document_variable_name = 'text'
# d.page_content is the text content from each :doc: in :docs:
"""Combine documents in a map reduce manner.
Combine by mapping first chain over all documents, then reducing the results.
This reducing can be done recursively if needed (if there are many documents).
"""
# L144
results = self.llm_chain.apply(
# FYI - this is parallelized and so it is fast.
[{self.document_variable_name: d.page_content, **kwargs} for d in docs],
callbacks=callbacks,
)
And then the reduce steps are called in the _process_results() method, specifically in the _collapse_chain() and combine_documents_chain() sections.
Question 4
Where can i see the parameters of the load_summarize_chain() function?
Answer 4
All of the summarize variants (stuff, map_reduce, etc) are defined in summarize/__init__.py. In this particular example, the map_reduce chain parameters are on L40-51.