正課第二節:設置開發環境 - RAG技術: 智能助手開發實戰

接下來課程的所有程式碼我都會上傳到我自己的github: https://github.com/kevin801221/Langchain_course_code.git 內容介紹：使用Dotenv設置LangChain、Pinecone、OpenAI和Google's Gemini的開發環境。目標：學生能夠設置並配置開發環境。 VScode for LLMs system application 安裝 VS Code 以及實用插件 - 編輯器有很多種，那我們的教學都會以 Visual Studio Code ( 簡稱 VS Code ) 做為主要的開發工具。 VS Code 是目前最熱門的編輯器之一，同時擁有了諸多的優點，以下簡單列出新手可以參考的優點。編輯器輕巧擁有自訂和擴充功能介面乾淨簡潔、容易使用假如覺得 VS Code 不適合你，想要用其他工具的話也可以，像是 WebStorm 或是 Sublime text。透過 VS Code 的官網直接下載，官網會直接根據你的作業系統給予下載位置，如果不正確的話也可以透過選單自己選擇系統。下載位置：官網安裝過程中可以把下方四個選項都打勾，這樣在安裝完成以後，可以透過右鍵的選單把資料夾以 VS Code 打開，會方便許多。安裝完成以後，我們就可以把它打開來，進行一些簡單的設定以及安裝插件開啟自動格式化 (可選) 自動格式化的用意會讓你的程式碼更好閱讀，之後還可以搭配一些插件，讓你的程式碼在格式化以後符合規範。點擊左下角的齒輪後，選擇 Settings 在輸入框打上 format 後，勾選範例圖中三個藍色選項 VS Code 有許多插件可以使我們的開發流程更加順利，接下來推薦一些在剛開始進行網頁前端開發時會使用到的插件。 (以下只會簡單敘述套件功用及附上連結，更詳細的可以觀看官方說明) 左側選擇插件，可以查看目前安裝了那些插件，上方的搜尋框可以尋找想安裝的插件語言插件 Chinese (Traditional) Language Pack for Visual Studio Code (推薦還是多看看英文，除非真的完全不行，否則不推薦安裝😓) 中文化介面接下來是AI輔助工具，在VScode中如果安裝可以輔助自己的coding路上平平安安順順利利(如果好好利用的話) 第一個介紹的是可以根本機端 "ollama" 裡面所支援的大型語言模型串連的工具: Continue - codestral, Cluade, and more ... 第二個是:chatGPT Copilot 可以無痛讓你在VScode中透過openai_api_key跟openai中提供的商業化模型完美溝通，大袋輔助coding工作。再來就是我們必須創在"虛擬環境" 來分割各種工作狀態，將每個不同任務會用到的依賴庫分割開來，大大程度地解決了版本的問題。你哪些方法創造虛擬環境呢? 這裡列出三種常見的方法: 1. `conda` 使用方法 1. 安裝 Anaconda 或 Miniconda 2. 創建虛擬環境： ```bash conda create --name myenv python=3.8 ``` 3. 激活虛擬環境： ```bash conda activate myenv ``` 4. 安裝包： ```bash conda install numpy pandas ``` 5. 停用虛擬環境： ```bash conda deactivate ``` 優點 - 跨平台支持：支持 Windows, macOS 和 Linux。 - 包管理：除了 Python 包，還可以管理非 Python 包（例如，R，C++ 庫）。 - 環境隔離：能有效隔離不同項目的依賴包。 - 方便：自帶大多數常用科學計算包。缺點 - 體積大：Anaconda 佔用磁碟空間較大。 - 速度慢：有時候安裝和更新包的速度較慢。 2. `venv` 使用方法 1. 創建虛擬環境： ```bash python -m venv myenv ``` 2. 激活虛擬環境： - Windows: ```bash .\myenv\Scripts\activate ``` - macOS 和 Linux: ```bash source myenv/bin/activate ``` 3. 安裝包： ```bash pip install numpy pandas ``` 4. 停用虛擬環境： ```bash deactivate ``` 優點 - 內置工具：Python 3.3+ 內置，無需額外安裝。 - 輕量級：不佔用太多磁碟空間。 - 簡單易用：適合簡單的項目環境管理。缺點 - 功能有限：僅管理 Python 包，不能管理非 Python 包。 - 依賴管理較弱：需要手動管理依賴包版本。 3. `poetry` 使用方法 1. 安裝 Poetry： ```bash curl -sSL https://install.python-poetry.org | python3 - ``` 2. 創建虛擬環境和項目： ```bash poetry new myproject cd myproject ``` 3. 安裝依賴包： ```bash poetry add numpy pandas ``` 4. 激活虛擬環境： ```bash poetry shell ``` 5. 停用虛擬環境： ```bash exit ``` 優點 - 依賴管理：自動解決依賴衝突，生成 `poetry.lock` 文件。 - 包發布：支持發布 Python 包到 PyPI。 - 簡單配置：使用 `pyproject.toml` 文件進行配置，統一管理項目依賴和元數據。缺點 - 學習曲線：相對於 `venv` 和 `conda`，需要學習新的命令和配置文件。 - 性能問題：在某些情況下，依賴解析和安裝速度可能較慢。總結根據你的需求和項目複雜度，可以選擇合適的工具來管理虛擬環境。 #必要模組包 openai langchain langchain_openai langchain_experimental langchainhub pinecone-client python-dotenv tiktoken docx2txt pypdf requests numpy pandas 以上可包裝成一個"requirements.txt"檔案, 可以在終端機之下直接執行pip install -r requirements.txt 它就可以直接安裝寫在requirements.txt檔中的所有模組包。設置開發環境：打造你的AI應用基礎設施在這節課中，我們將深入了解如何使用python-dotenv(環境變數導入)設置LangChain、FAISS, Pinecone、OpenAI和Google's Gemini的開發環境。無論你是剛入門的AI開發者，還是經驗豐富的老手，這個過程將幫助你打造穩定的開發基礎，為你的AI應用奠定良好的基礎。內容介紹我們將一步一步地教你如何配置開發環境，確保你能夠順利地使用LangChain、FAISS、Pinecone、OpenAI和Google's Gemini這些工具。這些工具是現代AI開發中不可或缺的一部分，掌握它們的配置方法將使你的開發工作更加高效。課程目標學生能夠設置並配置開發環境。熟悉使用python-dotenv管理環境變量。 LangChain、FAISS、Pinecone、OpenAI和Google's Gemini的基本配置。 (Ref 課程開始！ 1. Dotenv 介紹 Dotenv 是一個方便的工具，用於將環境變量存儲在 .env 文件中。這樣可以讓你輕鬆地管理和使用環境變量，而不必每次都手動設置它們。 2. 設置 LangChain LangChain 是一個強大的框架，旨在簡化自然語言處理（NLP）應用的開發。 # 安裝 LangChain pip install langchain # 在 .env 文件中設置 LangChain 相關的環境變量 LANGCHAIN_API_KEY="your_langchain_api_key" (這會是連接到langsmith) OPENAI_API_KEY = "" GOOGLE_API_KEY = "" 在你的 Python 文件中，可以使用以下方式讀取環境變量： from dotenv import load_dotenv import os load_dotenv() #langchain_api_key = os.getenv("LANGCHAIN_API_KEY") #openai = os. 3. 設置 Faiss Facebook AI 相似性搜索（FAISS）是一個用於高效相似性搜索和密集向量聚類的庫。它包含的演算法可以在任何大小的向量集中進行搜索，直到可能不適合 RAM 的向量集。它還包含用於評估和參數調整的支援代碼。 Setup 設置集成存在於langchain-community包中。我們還需要安裝 faiss 包本身。我們可以使用以下工具安裝它們： pip install -qU langchain-community faiss-cpu 如果你想獲得一流的模型調用自動跟蹤，你也可以通過取消下面的註釋來設置你的LangSmith API密鑰。 # os.environ["LANGCHAIN_TRACING_V2"] = "true" # os.environ["LANGCHAIN_API_KEY"] = getpass.getpass() 初始化 !pip install -qU langchain-openai #openAI import getpass os.environ["OPENAI_API_KEY"] = getpass.getpass() from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-large") import faiss from langchain_community.docstore.in_memory import InMemoryDocstore from langchain_community.vectorstores import FAISS index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world"))) vector_store = FAISS( embedding_function=embeddings, index=index, docstore=InMemoryDocstore(), index_to_docstore_id={}, ) 管理向量存儲將專案添加到向量存儲: from uuid import uuid4 from langchain_core.documents import Document document_1 = Document( page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.", metadata={"source": "tweet"}, ) document_2 = Document( page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.", metadata={"source": "news"}, ) document_3 = Document( page_content="Building an exciting new project with LangChain - come check it out!", metadata={"source": "tweet"}, ) document_4 = Document( page_content="Robbers broke into the city bank and stole $1 million in cash.", metadata={"source": "news"}, ) document_5 = Document( page_content="Wow! That was an amazing movie. I can't wait to see it again.", metadata={"source": "tweet"}, ) document_6 = Document( page_content="Is the new iPhone worth the price? Read this review to find out.", metadata={"source": "website"}, ) document_7 = Document( page_content="The top 10 soccer players in the world right now.", metadata={"source": "website"}, ) document_8 = Document( page_content="LangGraph is the best framework for building stateful, agentic applications!", metadata={"source": "tweet"}, ) document_9 = Document( page_content="The stock market is down 500 points today due to fears of a recession.", metadata={"source": "news"}, ) document_10 = Document( page_content="I have a bad feeling I am going to get deleted :(", metadata={"source": "tweet"}, ) documents = [ document_1, document_2, document_3, document_4, document_5, document_6, document_7, document_8, document_9, document_10, ] uuids = [str(uuid4()) for _ in range(len(documents))] vector_store.add_documents(documents=documents, ids=uuids) 從向量存儲中刪除專案: vector_store.delete(ids=[uuids[-1]]) 查詢向量存儲一旦創建了您的向量存儲並添加了相關文檔，您很可能希望在鏈或代理運行期間查詢它。相似性搜索可以按以下步驟執行簡單的相似性搜索並對元數據進行過濾：帶分數的相似性搜索 results = vector_store.similarity_search_with_score( "Will it be hot tomorrow?", k=1, filter={"source": "news"} ) for res, score in results: print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]") 合併:您還可以合併兩個 FAISS 向量庫 db1 = FAISS.from_texts(["foo"], embeddings) db2 = FAISS.from_texts(["bar"], embeddings) db1.docstore._dict db2.docstore._dict db1.merge_from(db2) db1.docstore._dict Reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.faiss.FAISS.html 3. 設置 Pinecone Pinecone 是一個向量數據庫，適用於大規模的相似性搜索和推薦系統。 # 安裝 Pinecone pip install pinecone-client # 在 .env 文件中設置 Pinecone 相關的環境變量 PINECONE_API_KEY="your_pinecone_api_key" 在你的 Python 文件中，可以使用以下方式讀取環境變量： pinecone_api_key = os.getenv("PINECONE_API_KEY") #Pinecone 索引周圍有一個包裝器，允許您將其用作向量存儲，無論是用於語義搜索還是示例選擇。 from langchain_pinecone import PineconeVectorStore 有關 Pinecone 向量庫的更詳細演練，請參閱此筆記本 %pip install --upgrade --quiet lark %pip install --upgrade --quiet pinecone-notebooks pinecone-client==3.2.2 # Connect to Pinecone and get an API key. from pinecone_notebooks.colab import Authenticate Authenticate() import os api_key = os.environ["PINECONE_API_KEY"] import getpass os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") from pinecone import Pinecone, ServerlessSpec api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY" index_name = "langchain-self-retriever-demo" pc = Pinecone(api_key=api_key) from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings from langchain_pinecone import PineconeVectorStore embeddings = OpenAIEmbeddings() # create new index if index_name not in pc.list_indexes().names(): pc.create_index( name=index_name, dimension=1536, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1"), ) #檢索用Docs 設定 docs = [ Document( page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose", metadata={"year": 1993, "rating": 7.7, "genre": ["action", "science fiction"]}, ), Document( page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2}, ), Document( page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6}, ), Document( page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3}, ), Document( page_content="Toys come alive and have a blast doing so", metadata={"year": 1995, "genre": "animated"}, ), Document( page_content="Three men walk into the Zone, three men walk out of the Zone", metadata={ "year": 1979, "director": "Andrei Tarkovsky", "genre": ["science fiction", "thriller"], "rating": 9.9, }, ), ] vectorstore = PineconeVectorStore.from_documents( docs, embeddings, index_name="langchain-self-retriever-demo" ) 創建我們的自查詢檢索器:現在我們可以實例化我們的檢索器。為此，我們需要預先提供一些有關文檔支援的元數據欄位的資訊，以及文檔內容的簡短描述。 from langchain.chains.query_constructor.base import AttributeInfo from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain_openai import OpenAI metadata_field_info = [ AttributeInfo( name="genre", description="The genre of the movie", type="string or list[string]", ), AttributeInfo( name="year", description="The year the movie was released", type="integer", ), AttributeInfo( name="director", description="The name of the movie director", type="string", ), AttributeInfo( name="rating", description="A 1-10 rating for the movie", type="float" ), ] document_content_description = "Brief summary of a movie" llm = OpenAI(temperature=0) retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, verbose=True ) 測試一下: # This example only specifies a relevant query retriever.invoke("What are some movies about dinosaurs") 4. 設置 OpenAI OpenAI 提供了強大的 GPT 模型，適用於各種自然語言處理任務。 # 安裝 OpenAI pip install openai # 在 .env 文件中設置 OpenAI 相關的環境變量 OPENAI_API_KEY="your_openai_api_key" 在你的 Python 文件中，可以使用以下方式讀取環境變量： openai_api_key = os.getenv("OPENAI_API_KEY") 5. 設置 Google's Gemini Gemini (google.com) (Gemini playground) 至於Gemini的開發用接口Api 要到自己的google開發者平台取得, "Google AI Gemini API | Gemma open models | Google for Developers | Google AI for Developers" Google's Gemini 是一個先進的人工智慧模型，由Google開發並推出。它是一個多模態AI系統，能夠處理文本、圖像、視訊和音頻等多種類型的輸入。Gemini的強大之處在於其靈活性和廣泛的應用範圍，從自然語言處理到複雜的推理任務，都能表現出色。在開發環境中整合Gemini，可以為您的項目帶來強大的AI能力。尤其是它可以應付非常長的上下文輸入輸出，表現優異，價錢便宜, 一半的模型呼叫甚至免費，很適合學生族群。 # 在 .env 文件中設置 Gemini 相關的環境變量 GEMINI_API_KEY="your_gemini_api_key" 在你的 Python 文件中，可以使用以下方式讀取環境變量： gemini_api_key = os.getenv("GEMINI_API_KEY") 第二部分：整合現在我們已經了解了如何設置這些工具，接下來讓我們進行實際操作。以下是步驟指南：步驟1：創建並配置 .env 文件打開你的代碼編輯器，創建一個新的 .env 文件，並將你的API密鑰和其他配置項目填入其中： # .env 文件 LANGCHAIN_API_KEY="your_langchain_api_key" PINECONE_API_KEY="your_pinecone_api_key" OPENAI_API_KEY="your_openai_api_key" GEMINI_API_KEY="your_gemini_api_key" 步驟2：編寫 Python 代碼來讀取這些環境變量創建一個新的 Python 文件，例如 config.py，並使用 python-dotenv 讀取 .env 文件中的變量： # config.py from dotenv import load_dotenv import os load_dotenv() langchain_api_key = os.getenv("LANGCHAIN_API_KEY") pinecone_api_key = os.getenv("PINECONE_API_KEY") openai_api_key = os.getenv("OPENAI_API_KEY") gemini_api_key = os.getenv("GEMINI_API_KEY") print(f"LangChain API Key: {langchain_api_key}") print(f"Pinecone API Key: {pinecone_api_key}") print(f"OpenAI API Key: {openai_api_key}") print(f"Gemini API Key: {gemini_api_key}") 運行 config.py 文件，確認你的環境變量已正確讀取： python config.py 步驟3：使用這些API進行基本操作接下來，你可以嘗試使用這些API進行一些基本操作，以確保它們已正確配置。例如，使用OpenAI的GPT-3生成一些文本： # 使用 OpenAI API import openai openai.api_key = openai_api_key response = openai.Completion.create( engine="davinci", prompt="Hello, world!", max_tokens=50 ) print(response.choices[0].text.strip()) 總結恭喜你！現在你已經成功設置並配置了你的開發環境，並且能夠使用LangChain、Pinecone、OpenAI和Google's Gemini進行開發。這只是開始，接下來你可以進一步探索這些工具的強大功能，打造出色的AI應用。如果你在設置過程中遇到任何問題，不要猶豫，隨時提問。我們在這裡為你提供支持，確保你能夠順利完成這個過程。實作小提示定期更新你的API密鑰，確保安全性。使用虛擬環境（如venv或conda）來管理你的開發環境，避免依賴衝突。熟悉各個工具的官方文檔，這樣你可以更靈活地運用它們。