4.4.1 直接利用openai embedding! 我們直接看code Embedding Techniques Converting text into vectors import os fr...
4.4.1 直接利用openai embedding! 我們直接看code Embedding Techniques Converting text into vectors import os from dotenv import load_dotenv load_dotenv() #load all the environment variables 或是用下面方法載入key也可以但不建議: os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY") from langchain_openai import OpenAIEmbeddings embeddings=OpenAIEmbeddings(model="text-embedding-3-large") embeddings 直接用OpenAIEmbeddings() 然後指定embedding model, 其實不一定要指定large 可以依照使用場景去指定... 嵌入 - OpenAI 中文文件 (openaicto.com) text="This is a tutorial on OPENAI embedding" query_result=embeddings.embed_query(text) query_result from langchain_community.document_loaders import TextLoader loader=TextLoader('speech.txt') docs=loader.load() docs from langchain_text_splitters import RecursiveCharacterTextSplitter text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50) final_documents=text_splitter.split_documents(docs) final_documents ## Vector Embedding And Vector StoreDB from langchain_community.vectorstores import Chroma db=Chroma.from_documents(final_documents,embeddings_1024) db ### Retrieve the results from query vectorstore db query="It will be all the easier for us to conduct ourselves as belligerents" retrieved_results=db.similarity_search(query) print(retrieved_results) 4.4.2 利用很好用的 Ollama 中的 embedding! Ollama Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data. 一樣, 用langchain模塊去跟ollama調用資源和溝通!! 這次我們要試看看 gemma:2b embeddings #看一下長相 r1=embeddings.embed_documents( [ "Alpha is the first letter of Greek alphabet", "Beta is the second letter of Greek alphabet", ] ) len(r1[0]) r1[1] embeddings.embed_query("What is the second letter of Greek alphabet ") ### Other Embedding Models ### https://ollama.com/blog/embedding-models embeddings = OllamaEmbeddings(model="mxbai-embed-large") text = "This is a test document." query_result = embeddings.embed_query(text) query_result len(query_result) 4.4.3 利用很好用更大眾的 Hugging face 中的 embedding 對我來說, hugging face 是NLP或大型語言開發者的一種信仰。Embedding Techniques Using HuggingFace import os from dotenv import load_dotenv load_dotenv() #load all the environment variables os.environ['HF_TOKEN']=os.getenv("HF_TOKEN") Sentence Transformers on Hugging Face Hugging Face sentence-transformers is a Python framework for state-of-the-art sentence, text and image embeddings. One of the embedding models is used in the HuggingFaceEmbeddings class. We have also added an alias for SentenceTransformerEmbeddings for users who are more familiar with directly using that package. from langchain_huggingface import HuggingFaceEmbeddings embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") text="this is atest documents" query_result=embeddings.embed_query(text) query_result 看一下這一串向量多長 len(query_result) doc_result = embeddings.embed_documents([text, "This is not a test document."]) doc_result[0]