
LangChain is a bridge between developers and large language models. It is made up of:

Question Answering

import textwrap
from pathlib import Path

import bs4
from langchain import hub
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.chains import LLMChain
from langchain.document_loaders import PyPDFLoader, WebBaseLoader, YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_community.llms import CTransformers, LlamaCpp
from langchain_community.vectorstores import FAISS, Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnablePick
from langchain_openai import ChatOpenAI, OpenAI, OpenAIEmbeddings

ROOT = Path().cwd().parent.parent
def print_wrapped(text: str, width: int = 80):
    print(textwrap.fill(text, width))

Basic Usage

def generate_pet_name(animal_type, pet_colour):
    llm = OpenAI(temperature=0.7)
    prompt_template_name = PromptTemplate(
        template="I have a pet {animal_type} and I want a cool name for it, it is {pet_colour} in colour. Suggest 5 cool names for my pet",
    name_chain = LLMChain(llm=llm, prompt=prompt_template_name, output_key="animal_name")
    response = name_chain.invoke({"animal_type": animal_type, "pet_colour": pet_colour})
    return response
pet_name_response = generate_pet_name("dog", "brown")
1. Copper
2. Bruno
3. Hazel
4. Rusty
5. Chestnut


llm = OpenAI(temperature=0.5)
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
Calculator: Useful for when you need to answer questions about math.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia, Calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question


Question: {input}
result = agent.invoke(
    What is the average age of a dog? 
    Look it up if you don't know it. 
    The answer should be an integer. 
    Multiply the age by 3

> Entering new AgentExecutor chain...
 I should use the Calculator tool to calculate the average age of a dog
Action: Calculator
Action Input: (15 + 9 + 12 + 18 + 5) / 5
Observation: Answer: 11.8
Thought: I should multiply the age by 3 to get the answer in dog years
Action: Calculator
Action Input: 11.8 * 3
Observation: Answer: 35.400000000000006
Thought: I now know the final answer
Final Answer: The average age of a dog is approximately 35 years in dog years.

> Finished chain.
The average age of a dog is approximately 35 years in dog years.

Vector DBs

def create_vector_db_from_youtube_url(video_url: str) -> FAISS:
    loader = YoutubeLoader.from_youtube_url(video_url)
    transcript = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(transcript)

    embeddings = OpenAIEmbeddings()

    db = FAISS.from_documents(docs, embeddings)

    return db

def get_response_from_query(db, query, k=4):
    docs = db.similarity_search(query, k=k)
    docs_page_content = " ".join([doc.page_content for doc in docs])

    llm = OpenAI()

    prompt = PromptTemplate(
        input_variables=["question", "docs"],
        You are a helpful assistant that that can answer questions about youtube videos 
        based on the video's transcript.
        Answer the following question: {question}
        By searching the following video transcript: {docs}
        Only use the factual information from the transcript to answer the question.
        If you feel like you don't have enough information to answer the question, say "I don't know".
        Your answers should be verbose and detailed.

    chain = LLMChain(llm=llm, prompt=prompt)

    response = chain.invoke({"question": query, "docs": docs_page_content})
    answer = response["text"]

    return answer, docs
youtube_url = ""
youtube_query = "What is a prompt template?"

db = create_vector_db_from_youtube_url(youtube_url)
response, docs = get_response_from_query(db, youtube_query)
response_lines = response.split("\n\n")
for line in response_lines:
A prompt template refers to a standardized format or structure for a prompt,
which is used to provide instructions or indicate what is expected for a
specific task or activity. In the context of the video transcript, the speaker
is discussing the use of neural networks and how they can be trained to perform
various tasks. The prompt template is an important aspect of this process, as it
provides a clear and consistent structure for the training data.

The speaker explains that neural networks are made up of a large number of
parameters, or "neurons", which work together to solve complex problems. These
neurons are organized in a structure that simulates neural tissue, and can be
trained using data from the internet. In order to effectively train a neural
network, the data must be presented in a standardized format, which is where the
prompt template comes in.

The prompt template is used to provide a consistent structure for the data,
which allows the neural network to learn and make predictions based on patterns
within the data. This is important because it allows the network to make
connections and recognize patterns across different datasets, which ultimately
leads to more accurate predictions.

The prompt template also plays a role in the training process by providing a
baseline for the network to compare against. The speaker explains that when
training a neural network, a small
loader = WebBaseLoader(
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))),
docs = loader.load()

Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In this case we’ll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

We set add_start_index=True so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(docs)

Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

We can embed and store all of our document splits in a single command using the Chroma vector store and OpenAIEmbeddings model.

vectorstore = Chroma.from_documents(documents=splits, embedding=GPT4AllEmbeddings())
ggml_metal_free: deallocating
llm.invoke("Simulate a rap battle between Stephen Colbert and John Oliver")

llama_print_timings:        load time =    4476.29 ms
llama_print_timings:      sample time =      91.55 ms /   256 runs   (    0.36 ms per token,  2796.41 tokens per second)
llama_print_timings: prompt eval time =    4476.25 ms /    13 tokens (  344.33 ms per token,     2.90 tokens per second)
llama_print_timings:        eval time =   11734.44 ms /   255 runs   (   46.02 ms per token,    21.73 tokens per second)
llama_print_timings:       total time =   17380.26 ms
".\n\n[INTRODUCTION]\n\nStephen Colbert: (Entering the stage, microphone in hand) Ladies and gentlemen, boys and girls, welcome back to The Late Show! Tonight, we have a very special guest. He's an incredibly talented comedian who hosts one of the most brilliant satirical news shows on television. Please give it up for my friend, John Oliver!\n\n[AUDIENCE APPLAUSE]\n\nJohn Oliver: (Walking onto the stage with his signature deadpan expression) Thank you, Stephen. It's great to be here. I must say, your audience is quite... passionate.\n\nStephen Colbert: Well, they are indeed! But enough about me. Let's get down to business. You know what we do here at The Late Show - we engage in friendly rap battles, pitting two of the wittiest comedians against each other in a battle of rhymes and wit. Are you ready for this, John?\n\nJohn Oliver: (Pulling out a notepad) Alright, let's do this!\n\n[BATTLE BEGINS]\n\nStephen Colbert"
# llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# llm = CTransformers(
#     **{
#         "model": "TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
#         "model_file": "mistral-7b-instruct-v0.1.Q4_K_M.gguf",
#     }
# )

We’ll use the LangChain Expression Language (LCEL) Runnable protocol to define the chain, allowing us to - pipe together components and functions in transparent way - automatically trace our chain in LangSmith - get streaming, async, and batched calling out of the box

rag_prompt = hub.pull("rlm/rag-prompt-mistral")
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

chain = (
    RunnablePassthrough.assign(context=RunnablePick("context") | format_docs) | rag_prompt | llm | StrOutputParser()
question = "What are the approaches to Task Decomposition?"
chain.invoke({"context": docs, "question": question})
Llama.generate: prefix-match hit

llama_print_timings:        load time =    4476.29 ms
llama_print_timings:      sample time =      31.65 ms /   103 runs   (    0.31 ms per token,  3254.55 tokens per second)
llama_print_timings: prompt eval time =    1054.67 ms /   258 tokens (    4.09 ms per token,   244.63 tokens per second)
llama_print_timings:        eval time =    4707.73 ms /   102 runs   (   46.15 ms per token,    21.67 tokens per second)
llama_print_timings:       total time =    6180.33 ms
' The approaches to task decomposition are (1) using simple prompting by LLM, (2) providing task-specific instructions for humans or LLMs to follow, and (3) utilizing expert models that execute specific tasks and log results. The challenges in long-term planning and task decomposition include adjusting plans in response to unexpected errors, making LLMs less robust than humans who learn from trial and error. Judging the correctness of task results involves evaluating the accuracy and completeness of the output.'
retriever = vectorstore.as_retriever()
qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()} | rag_prompt | llm | StrOutputParser()
question = "What are the approaches to Task Decomposition?"
Llama.generate: prefix-match hit

llama_print_timings:        load time =    4476.29 ms
llama_print_timings:      sample time =      14.65 ms /    74 runs   (    0.20 ms per token,  5050.51 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    3464.94 ms /    74 runs   (   46.82 ms per token,    21.36 tokens per second)
llama_print_timings:       total time =    3674.59 ms
' There are three approaches to task decomposition: LLM with simple prompting, using task-specific instructions, or with human inputs. Long-term planning and task decomposition can be challenging, especially when exploring solution space and adjusting plans with unexpected errors. Task execution involves expert models executing specific tasks and logging results, which can then be judged for correctness.'