init

2026-01-28 14:12:14 +08:00 · 2026-01-28 14:12:14 +08:00 · f4c20270be
commit f4c20270be
15 changed files with 1980 additions and 0 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,9 @@
+# Docker ignore
+.git
+.venv
+__pycache__
+*.pyc
+.env
+index_data/
+logs/
+*.log
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,39 @@
+# App Configuration
+APP_TITLE="LightRAG Knowledge Base API"
+APP_VERSION="1.0"
+HOST="0.0.0.0"
+PORT=9600
+
+# LLM（Text） Configuration
+LLM_BINDING=vllm # ollama, vllm, openai
+LLM_BINDING_HOST=http://192.168.6.115:8002/v1 # vLLM OpenAI API base
+LLM_MODEL=qwen3-8b-fp8
+LLM_KEY=EMPTY # vLLM default key
+LLM_MODEL_MAX_ASYNC=4 # vLLM 并发能力强，可以调高
+
+# LLM（Vision） Configuration
+VL_BINDING=vllm # ollama, vllm, openai
+VL_BINDING_HOST=http://192.168.6.115:8001/v1
+VL_MODEL=qwen2.5-vl-3b-awq
+VL_KEY=EMPTY
+
+# Embedding Configuration
+EMBEDDING_BINDING=tei # ollama, tei, openai
+EMBEDDING_BINDING_HOST=http://192.168.6.115:8003 # TEI usually exposes /embed
+EMBEDDING_MODEL=BAAI/bge-m3 # model id in TEI
+EMBEDDING_KEY=EMPTY
+
+# Rerank - TEI
+RERANK_ENABLED=True
+RERANK_BINDING_HOST=http://192.168.6.115:8004
+RERANK_MODEL=BAAI/bge-reranker-v2-m3
+RERANK_KEY=EMPTY
+
+# Storage
+DATA_DIR=./index_data
+
+# RAG Configuration
+EMBEDDING_DIM=1024
+MAX_TOKEN_SIZE=8192
+MAX_RAG_INSTANCES=5  # 最大活跃 RAG 实例数
+COSINE_THRESHOLD=0.6  # 余弦相似度阈值
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,7 @@
+.venv/
+__pycache__/
+*.pyc
+.env
+index_data/
+logs/
+*.log
--- a/29
+++ b/29
@ -0,0 +1,29 @@
+# 使用官方 Python 3.11 运行环境作为基础镜像
+FROM python:3.11-slim
+
+# 设置工作目录
+WORKDIR /app
+
+# 安装系统依赖
+# build-essential 包含 gcc 等编译工具，某些 Python 包安装时需要
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+
+# 将依赖文件复制到容器中
+COPY requirements.txt .
+
+# 安装 requirements.txt 中指定的 Python 包
+RUN pip install --no-cache-dir -r requirements.txt
+
+# 将当前目录内容复制到容器中的 /app 目录下
+COPY . .
+
+# 暴露 9600 端口供外部访问
+EXPOSE 9600
+
+# 设置环境变量，确保 Python 输出直接打印到控制台
+ENV PYTHONUNBUFFERED=1
+
+# 容器启动时运行 uvicorn
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "9600"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,112 @@
+# LightRAG Knowledge Base Service
+
+基于 HKU-DS/LightRAG 构建的知识库微服务，专为中文场景优化，支持“事实+图谱”混合检索。
+
+## 🚀 快速开始
+
+### 1. 准备工作
+
+- **Ollama**: 确保 Ollama 服务已启动，并已拉取以下模型：
+  - LLM: `deepseek-v3.2:cloud` (或自定义)
+  - Embedding: `bge-m3`
+- **Python**: 3.10+ (推荐使用 `uv` 管理环境)
+
+### 2. 本地运行
+
+#### 方式 A: 使用标准 pip (推荐)
+
+```bash
+# 1. 创建并激活虚拟环境
+python3 -m venv .venv
+source .venv/bin/activate  # Windows: .venv\Scripts\activate
+
+# 2. 安装依赖
+pip install -r requirements.txt
+
+# 3. 启动服务
+python3 -m uvicorn app.main:app --host 0.0.0.0 --port 9600 --reload
+```
+
+#### 方式 B: 使用 uv (极速)
+
+```bash
+# 1. 初始化项目
+uv venv
+source .venv/bin/activate
+
+# 2. 安装依赖
+uv pip install -r requirements.txt
+
+# 3. 启动服务
+python3 -m uvicorn app.main:app --host 0.0.0.0 --port 9600 --reload
+```
+
+服务地址: <http://localhost:9600>
+API 文档: <http://localhost:9600/docs>
+
+### 3. Docker 运行
+
+```bash
+docker build -t lightrag-api .
+docker run -p 9600:9600 --env-file .env lightrag-api
+```
+
+## 📚 API 文档 (核心)
+
+服务接口完全兼容 OpenAI 响应标准，支持流式与非流式输出。
+
+| 接口 | 方法 | 描述 | 示例 |
+| :--- | :--- | :--- | :--- |
+| `/query` | POST | 知识检索 | `{"query": "问题", "mode": "hybrid", "stream": true, "think": true, "only_rag": false}` |
+| `/ingest/file` | POST | 上传文件 | `multipart/form-data`, file=@doc.pdf |
+| `/ingest/text` | POST | 摄入纯文本 | `{"text": "文本内容"}` |
+| `/ingest/batch_qa` | POST | 批量摄入 QA | `[{"question": "Q1", "answer": "A1"}, ...]` |
+| `/documents` | GET | 文档列表 | 查看已索引文档及状态 |
+| `/docs/{id}` | DELETE | 删除文档 | 根据 ID 删除文档及关联图谱数据 |
+
+**`/query` 参数说明**:
+
+- `query`: 用户问题。
+- `mode`: 检索模式 (`hybrid`, `naive`, `local`, `global`)。推荐使用 `hybrid`。
+- `stream`: 是否流式输出 (OpenAI 兼容 Chunk 格式)。
+- `think`: 是否启用思考模式 (DeepSeek 风格，返回 `reasoning_content`)。
+- `only_rag`: **严格模式**。若为 `true`，未从知识库检索到内容时将拒绝回答，不使用 LLM 通用知识。
+
+**响应字段 (流式)**:
+- `delta.content`: 正文回答。
+- `delta.reasoning_content`: 思考过程 (DeepSeek 风格)。
+- `delta.x_rag_status`: 检索命中状态 (`hit` 或 `miss`)。
+
+**租户管理**:
+
+通过 Header `X-Tenant-ID` 进行租户隔离，每个租户拥有独立的存储空间。
+```bash
+curl -H "X-Tenant-ID: my_tenant" http://localhost:9600/query -d '{"query": "..."}'
+```
+
+## 🛠️ 项目结构
+
+```text
+/
+├── app/
+│   ├── api/            # 接口路由定义 (OpenAI 标准流式实现)
+│   ├── core/           # 核心逻辑 (RAG Manager, 多租户管理, PDF图文解析)
+│   ├── config.py       # Pydantic-settings 配置管理
+│   └── main.py         # FastAPI 入口
+├── index_data/         # [重要] 知识库持久化数据根目录
+│   └── {tenant_id}/    # 各租户独立文件夹
+│       ├── graph_*.graphml    # 知识图谱结构
+│       ├── kv_store_*.json    # 键值存储 (文本块, 实体描述等)
+│       └── vdb_*.json         # 向量数据库
+├── requirements.txt    # 依赖列表 (包含 Pillow, PyPDF 等)
+├── Dockerfile          # 容器化构建文件 (中文注释)
+├── deploy.sh           # 一键部署脚本 (支持 host-gateway 访问宿主机 Ollama)
+└── .env                # 环境变量配置
+```
+
+## ⚠️ 注意事项
+
+1. **中文优化**: 已内置针对中文优化的 Prompt，移除了原版对 `{language}` 变量的强依赖，支持中英混合查询自动识别。
+2. **写锁机制**: 当前底层使用文件存储 (NanoVectorDB + NetworkX)，**不支持多进程并发写入**。
+3. **编辑逻辑**: RAG 的“编辑”操作本质是“删除旧文档 -> 重新摄入新文档”。直接修改文本块会导致图谱关系错乱。
+4. **初始化**: 首次启动或摄入大量数据时，需要构建图谱索引，CPU 占用较高，请耐心等待。
--- a/app/api/routes.py
+++ b/app/api/routes.py
@ -0,0 +1,491 @@
+import json
+import logging
+import os
+import time
+from fastapi import APIRouter, UploadFile, File, HTTPException, Body, Header, Depends
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel
+from lightrag import LightRAG, QueryParam
+from app.core.rag import get_rag_manager, llm_func
+from app.core.prompts import CUSTOM_RAG_RESPONSE_PROMPT
+from app.config import settings
+from app.core.ingest import process_pdf_with_images
+
+router = APIRouter()
+
+# ==========================================
+# 依赖注入
+# ==========================================
+async def get_current_rag(
+    x_tenant_id: str = Header("default", alias="X-Tenant-ID")
+) -> LightRAG:
+    """
+    依赖项：获取当前租户的 LightRAG 实例
+    从 Header 中读取 X-Tenant-ID，默认为 'default'
+    """
+    manager = get_rag_manager()
+    return await manager.get_rag(x_tenant_id)
+
+# ==========================================
+# 数据模型
+# ==========================================
+class QueryRequest(BaseModel):
+    query: str
+    mode: str = "hybrid"  # 可选: naive, local, global, hybrid
+    top_k: int = 5
+    stream: bool = False
+    think: bool = False
+    only_rag: bool = False # 是否仅使用RAG检索结果，不进行LLM兜底
+
+class IngestResponse(BaseModel):
+    filename: str
+    status: str
+    message: str
+
+class QAPair(BaseModel):
+    question: str
+    answer: str
+
+# ==========================================
+# 接口实现
+# ==========================================
+
+import secrets
+import string
+
+@router.get("/admin/tenants")
+async def list_tenants(token: str):
+    """
+    管理员接口：获取租户列表
+    """
+    if token != settings.ADMIN_TOKEN:
+        raise HTTPException(status_code=403, detail="Invalid admin token")
+        
+    try:
+        if not os.path.exists(settings.DATA_DIR):
+            return {"tenants": []}
+            
+        tenants = []
+        for entry in os.scandir(settings.DATA_DIR):
+            if entry.is_dir() and not entry.name.startswith("."):
+                tenant_id = entry.name
+                secret_file = os.path.join(entry.path, ".secret")
+                
+                # 读取或生成租户专属 Secret
+                if os.path.exists(secret_file):
+                    with open(secret_file, "r") as f:
+                        secret = f.read().strip()
+                else:
+                    # 生成16位随机字符串
+                    alphabet = string.ascii_letters + string.digits
+                    secret = ''.join(secrets.choice(alphabet) for i in range(16))
+                    try:
+                        with open(secret_file, "w") as f:
+                            f.write(secret)
+                    except Exception as e:
+                        logging.error(f"Failed to write secret for tenant {tenant_id}: {e}")
+                        continue
+                
+                # 生成租户访问 Token (租户ID_随机串)
+                tenant_token = f"{tenant_id}_{secret}"
+                tenants.append({
+                    "id": tenant_id,
+                    "token": tenant_token
+                })
+        return {"tenants": tenants}
+    except Exception as e:
+        logging.error(f"Failed to list tenants: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.get("/health")
+async def health_check():
+    """健康检查接口"""
+    return {"status": "ok", "llm": settings.LLM_MODEL}
+
+@router.post("/query")
+async def query_knowledge_base(
+    request: QueryRequest,
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    查询接口
+    - query: 用户问题
+    - mode: 检索模式 (推荐 hybrid 用于事实类查询)
+    - stream: 是否流式输出 (默认 False)
+    - think: 是否启用思考模式 (默认 False)
+    """
+    try:
+        # 构造查询参数
+        context_param = QueryParam(
+            mode=request.mode, 
+            top_k=request.top_k,
+            only_need_context=True,
+            enable_rerank=settings.RERANK_ENABLED
+        )
+        
+        # 执行上下文检索
+        context_resp = await rag.aquery(request.query, param=context_param)
+
+        logging.info(f"Context response: {context_resp}")
+        
+        # 判断检索状态
+        has_context = False
+        
+        # 1. 基础检查：排除空字符串和明确的无上下文标记
+        if context_resp and "[no-context]" not in context_resp and "None" not in context_resp:
+            # 2. 严谨检查：只有包含具体的 Document Chunks (原文片段) 才视为有效命中
+            # 实体(Entities)容易因通用词产生脏匹配，不宜单独作为命中依据
+            if "Document Chunks" in context_resp:
+                chunks_part = context_resp.split("Document Chunks")[1]
+                # 检查 Chunks 部分是否包含 JSON 格式的内容字段
+                if '"content":' in chunks_part or '"text":' in chunks_part:
+                    has_context = True
+                
+        if has_context:
+            rag_status = "hit"
+        else:
+            rag_status = "miss"
+
+        # 处理流式输出 (SSE 协议 - OpenAI 兼容格式)
+        if request.stream:
+            async def stream_generator():
+                chat_id = f"chatcmpl-{secrets.token_hex(12)}"
+                created_time = int(time.time())
+                model_name = settings.LLM_MODEL
+
+                # 辅助函数：构造 OpenAI 兼容的 Chunk
+                def openai_chunk(content=None, reasoning_content=None, finish_reason=None, extra_delta=None):
+                    delta = {}
+                    if content:
+                        delta["content"] = content
+                    if reasoning_content:
+                        delta["reasoning_content"] = reasoning_content
+                    if extra_delta:
+                        delta.update(extra_delta)
+                    
+                    chunk = {
+                        "id": chat_id,
+                        "object": "chat.completion.chunk",
+                        "created": created_time,
+                        "model": model_name,
+                        "choices": [
+                            {
+                                "index": 0,
+                                "delta": delta,
+                                "finish_reason": finish_reason
+                            }
+                        ]
+                    }
+                    return f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n"
+
+                if has_context:
+                    yield openai_chunk(
+                        reasoning_content=f"1. 上下文已检索 (长度: {len(context_resp)} 字符).\n",
+                        extra_delta={"x_rag_status": rag_status}
+                    )
+                else:
+                    yield openai_chunk(
+                        reasoning_content="未找到相关上下文\n",
+                        extra_delta={"x_rag_status": rag_status}
+                    )
+                    
+                    # 如果开启了仅RAG模式且未找到上下文，则直接结束
+                    if request.only_rag:
+                        yield openai_chunk(content="未找到相关知识库内容。", finish_reason="stop")
+                        yield "data: [DONE]\n\n"
+                        return
+
+                    yield openai_chunk(reasoning_content="   (将依赖 LLM 自身知识)\n")
+
+                    # 未找到上下文，关闭思考模式
+                    request.think = False
+
+                yield openai_chunk(reasoning_content="2. 答案生成中...\n")
+                
+                # 2. 生成答案
+                sys_prompt = CUSTOM_RAG_RESPONSE_PROMPT.format(
+                    context_data=context_resp, 
+                    response_type="Multiple Paragraphs",
+                    user_prompt=""
+                )
+                
+                # 调用 LLM 生成 (流式)
+                stream_resp = await llm_func(
+                    request.query, 
+                    system_prompt=sys_prompt, 
+                    stream=True,
+                    think=request.think,
+                    hashing_kv=rag.llm_response_cache
+                )
+                
+                async for chunk in stream_resp:
+                    if isinstance(chunk, dict):
+                        if chunk.get("type") == "thinking":
+                            yield openai_chunk(reasoning_content=chunk["content"])
+                        elif chunk.get("type") == "content":
+                            yield openai_chunk(content=chunk["content"])
+                    elif chunk:
+                        yield openai_chunk(content=chunk)
+                
+                # 发送结束标记
+                yield openai_chunk(finish_reason="stop")
+                yield "data: [DONE]\n\n"
+            
+            # 使用 text/event-stream Content-Type
+            return StreamingResponse(stream_generator(), media_type="text/event-stream")
+
+        # 处理普通输出 (OpenAI 兼容格式)
+        # 根据策略生成回答
+        final_answer = ""
+        
+        if not has_context and request.only_rag:
+            # 严格模式且未检索到，直接结束
+            final_answer = "未找到相关知识库内容。"
+        else:
+            # 正常生成
+            sys_prompt = CUSTOM_RAG_RESPONSE_PROMPT.format(
+                context_data=context_resp, 
+                response_type="Multiple Paragraphs",
+                user_prompt=""
+            )
+            
+            # 调用 LLM 生成
+            stream_resp = await llm_func(
+                request.query, 
+                system_prompt=sys_prompt, 
+                stream=False,
+                think=request.think,
+                hashing_kv=rag.llm_response_cache
+            )
+            
+            # 非流式调用 LLM 直接返回结果
+            final_answer = stream_resp
+
+        return {
+            "id": f"chatcmpl-{secrets.token_hex(12)}",
+            "object": "chat.completion",
+            "created": int(time.time()),
+            "model": settings.LLM_MODEL,
+            "choices": [
+                {
+                    "index": 0,
+                    "message": {
+                        "role": "assistant",
+                        "content": final_answer,
+                        # 扩展字段：非标准，但在某些客户端可能有用，或者放入 usage/metadata
+                        "x_rag_status": rag_status
+                    },
+                    "finish_reason": "stop"
+                }
+            ],
+            "usage": {
+                "prompt_tokens": -1, 
+                "completion_tokens": len(final_answer),
+                "total_tokens": -1
+            }
+        }
+
+    except Exception as e:
+        logging.error(f"查询失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.post("/ingest/text")
+async def ingest_text(
+    text: str = Body(..., embed=True),
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """直接摄入文本内容"""
+    try:
+        # 使用异步方法 ainsert
+        await rag.ainsert(text)
+        return {"status": "success", "message": "Text ingested successfully"}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.post("/ingest/batch_qa")
+async def ingest_batch_qa(
+    qa_list: list[QAPair],
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    批量摄入 QA 对
+    - 自动将 QA 格式化为语义连贯的文本块
+    - 自动合并短 QA 以优化索引效率
+    """
+    if not qa_list:
+        return {"status": "skipped", "message": "Empty QA list"}
+
+    try:
+        # 1. 格式化并合并文本
+        batch_text = ""
+        current_batch_size = 0
+        MAX_BATCH_CHARS = 2000  # 约 1000 tokens，保守估计
+        
+        inserted_count = 0
+        
+        for qa in qa_list:
+            # 格式化单条 QA
+            entry = f"--- Q&A Entry ---\nQuestion: {qa.question}\nAnswer: {qa.answer}\n\n"
+            entry_len = len(entry)
+            
+            # 如果当前批次过大，先提交一次
+            if current_batch_size + entry_len > MAX_BATCH_CHARS:
+                await rag.ainsert(batch_text)
+                batch_text = ""
+                current_batch_size = 0
+                inserted_count += 1
+            
+            batch_text += entry
+            current_batch_size += entry_len
+            
+        # 提交剩余的文本
+        if batch_text:
+            await rag.ainsert(batch_text)
+            inserted_count += 1
+            
+        return {
+            "status": "success", 
+            "message": f"Successfully processed {len(qa_list)} QA pairs into {inserted_count} text chunks."
+        }
+    except Exception as e:
+        logging.error(f"QA批量导入失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.post("/ingest/file", response_model=IngestResponse)
+async def upload_file(
+    file: UploadFile = File(...),
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    文件上传与索引接口
+    支持 .txt, .md, .pdf
+    """
+    try:
+        content = ""
+        filename = file.filename
+        
+        # 读取文件内容
+        file_bytes = await file.read()
+        
+        # 根据文件类型解析
+        if filename.endswith(".pdf"):
+            # PDF 解析逻辑 (支持图文)
+            content = await process_pdf_with_images(file_bytes)
+        elif filename.endswith(".txt") or filename.endswith(".md"):
+            # 文本文件直接解码
+            content = file_bytes.decode("utf-8")
+        else:
+            return IngestResponse(
+                filename=filename, 
+                status="skipped", 
+                message="Unsupported file format. Only PDF, TXT, MD supported."
+            )
+
+        if not content.strip():
+            return IngestResponse(
+                filename=filename,
+                status="failed",
+                message="Empty content extracted."
+            )
+
+        # 插入 LightRAG 索引 (这是一个耗时操作)
+        # 使用异步方法 ainsert
+        await rag.ainsert(content)
+        
+        return IngestResponse(
+            filename=filename,
+            status="success",
+            message=f"Successfully indexed {len(content)} characters."
+        )
+
+    except Exception as e:
+        logging.error(f"处理文件 {file.filename} 失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Processing failed: {str(e)}")
+
+@router.get("/documents")
+async def list_documents(
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    获取文档列表
+    返回当前知识库中已索引的所有文档及其状态
+    """
+    try:
+        # 从 rag 实例获取工作目录
+        doc_status_path = os.path.join(rag.working_dir, "kv_store_doc_status.json")
+        if not os.path.exists(doc_status_path):
+            return {"count": 0, "docs": []}
+            
+        with open(doc_status_path, "r", encoding="utf-8") as f:
+            data = json.load(f)
+            
+        # 格式化返回结果
+        docs = []
+        for doc_id, info in data.items():
+            docs.append({
+                "id": doc_id,
+                "summary": info.get("content_summary", "")[:100] + "...", # 摘要截断
+                "length": info.get("content_length", 0),
+                "created_at": info.get("created_at"),
+                "status": info.get("status")
+            })
+            
+        return {
+            "count": len(docs),
+            "docs": docs
+        }
+    except Exception as e:
+        logging.error(f"获取文档列表失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.get("/documents/{doc_id}")
+async def get_document_detail(
+    doc_id: str,
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    获取文档详情
+    返回指定文档的完整内容
+    """
+    try:
+        # 读取 kv_store_full_docs.json
+        full_docs_path = os.path.join(rag.working_dir, "kv_store_full_docs.json")
+        if not os.path.exists(full_docs_path):
+            raise HTTPException(status_code=404, detail="Document store not found")
+            
+        with open(full_docs_path, "r", encoding="utf-8") as f:
+            data = json.load(f)
+            
+        if doc_id not in data:
+            raise HTTPException(status_code=404, detail="Document not found")
+            
+        doc_info = data[doc_id]
+        return {
+            "id": doc_id,
+            "content": doc_info.get("content", ""),
+            "create_time": doc_info.get("create_time"),
+            "file_path": doc_info.get("file_path", "unknown")
+        }
+    except HTTPException:
+        raise
+    except Exception as e:
+        logging.error(f"获取文档详情失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+@router.delete("/docs/{doc_id}")
+async def delete_document(
+    doc_id: str,
+    rag: LightRAG = Depends(get_current_rag)
+):
+    """
+    删除指定文档
+    - doc_id: 文档ID (例如 doc-xxxxx)
+    """
+    try:
+        logging.info(f"正在删除文档: {doc_id}")
+        # 调用 LightRAG 的删除方法
+        await rag.adelete_by_doc_id(doc_id)
+        return {"status": "success", "message": f"Document {doc_id} deleted successfully"}
+    except Exception as e:
+        logging.error(f"删除文档 {doc_id} 失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Delete failed: {str(e)}")
--- a/app/config.py
+++ b/app/config.py
@ -0,0 +1,53 @@
+import os
+from pydantic_settings import BaseSettings
+
+class Settings(BaseSettings):
+    # App
+    APP_TITLE: str = "LightRAG Knowledge Base API"
+    APP_VERSION: str = "1.0"
+    HOST: str = "0.0.0.0"
+    PORT: int = 9600
+
+    # Data
+    DATA_DIR: str = "./index_data"
+
+    # LLM (Text) - vLLM
+    LLM_BINDING: str = "vllm" # ollama, vllm, openai
+    LLM_BINDING_HOST: str = "http://192.168.6.115:8002/v1" # vLLM OpenAI API base
+    LLM_MODEL: str = "qwen2.5-7b-awq"
+    LLM_KEY: str = "EMPTY" # vLLM default key
+    LLM_MODEL_MAX_ASYNC: int = 4 # vLLM 并发能力强，可以调高
+
+    # LLM (Vision) - vLLM
+    VL_BINDING: str = "vllm" # ollama, vllm, openai
+    VL_BINDING_HOST: str = "http://192.168.6.115:8001/v1"
+    VL_MODEL: str = "qwen2.5-vl-3b-awq"
+    VL_KEY: str = "EMPTY"
+    
+    # Embedding - TEI
+    EMBEDDING_BINDING: str = "tei" # ollama, tei, openai
+    EMBEDDING_BINDING_HOST: str = "http://192.168.6.115:8003" # TEI usually exposes /embed
+    EMBEDDING_MODEL: str = "BAAI/bge-m3" # model id in TEI
+    EMBEDDING_KEY: str = "EMPTY"
+    
+    # Rerank - TEI
+    RERANK_ENABLED: bool = True
+    RERANK_BINDING_HOST: str = "http://192.168.6.115:8004"
+    RERANK_MODEL: str = "BAAI/bge-reranker-v2-m3"
+    RERANK_KEY: str = "EMPTY"
+    
+    # RAG Config
+    EMBEDDING_DIM: int = 1024
+    MAX_TOKEN_SIZE: int = 8192
+    MAX_RAG_INSTANCES: int = 3  # 最大活跃 RAG 实例数
+    COSINE_THRESHOLD: float = 0.4  # 向量检索相似度阈值
+    
+    # Admin & Security
+    ADMIN_TOKEN: str = "fzy"
+    
+    class Config:
+        env_file = ".env"
+        env_file_encoding = 'utf-8'
+        extra = "ignore"  # 忽略多余的环境变量
+
+settings = Settings()
--- a/app/core/ingest.py
+++ b/app/core/ingest.py
@ -0,0 +1,104 @@
+import base64
+import logging
+import httpx
+from io import BytesIO
+from app.config import settings
+
+async def vl_image_caption_func(image_data: bytes, prompt: str = "请详细描述这张图片") -> str:
+    """
+    使用 VL 模型生成图片描述
+    支持 ollama 和 openai/vllm 协议
+    """
+    if not settings.VL_BINDING_HOST:
+        return "[Image Processing Skipped: No VL Model Configured]"
+
+    try:
+        # 1. 编码图片为 Base64
+        base64_image = base64.b64encode(image_data).decode('utf-8')
+        
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            if settings.VL_BINDING == "ollama":
+                # Ollama 协议
+                url = f"{settings.VL_BINDING_HOST}/api/generate"
+                payload = {
+                    "model": settings.VL_MODEL,
+                    "prompt": prompt,
+                    "images": [base64_image],
+                    "stream": False
+                }
+                response = await client.post(url, json=payload)
+                response.raise_for_status()
+                result = response.json()
+                description = result.get('response', '')
+                
+            else:
+                # OpenAI / vLLM 协议
+                url = f"{settings.VL_BINDING_HOST}/chat/completions"
+                headers = {
+                    "Content-Type": "application/json",
+                    "Authorization": f"Bearer {settings.VL_KEY}"
+                }
+                
+                payload = {
+                    "model": settings.VL_MODEL,
+                    "messages": [
+                        {
+                            "role": "user",
+                            "content": [
+                                {"type": "text", "text": prompt},
+                                {
+                                    "type": "image_url",
+                                    "image_url": {
+                                        "url": f"data:image/jpeg;base64,{base64_image}"
+                                    }
+                                }
+                            ]
+                        }
+                    ],
+                    "max_tokens": 300
+                }
+                
+                response = await client.post(url, headers=headers, json=payload)
+                response.raise_for_status()
+                result = response.json()
+                description = result['choices'][0]['message']['content']
+
+            return f"[Image Description: {description}]"
+            
+    except Exception as e:
+        logging.error(f"VL Caption failed: {str(e)}")
+        return f"[Image Processing Failed: {str(e)}]"
+
+async def process_pdf_with_images(file_bytes: bytes) -> str:
+    """
+    解析 PDF，提取文本并对图片进行 Caption
+    """
+    import pypdf
+    from PIL import Image
+    
+    text_content = ""
+    pdf_file = BytesIO(file_bytes)
+    reader = pypdf.PdfReader(pdf_file)
+    
+    for page_num, page in enumerate(reader.pages):
+        # 1. 提取文本
+        page_text = page.extract_text()
+        text_content += f"--- Page {page_num + 1} Text ---\n{page_text}\n\n"
+        
+        # 2. 提取图片
+        if settings.VL_BINDING_HOST:
+            for count, image_file_object in enumerate(page.images):
+                try:
+                    # 获取图片数据
+                    image_data = image_file_object.data
+                    
+                    # 简单验证图片有效性
+                    Image.open(BytesIO(image_data)).verify() 
+                    
+                    # 调用 VL 模型
+                    caption = await vl_image_caption_func(image_data)
+                    text_content += f"--- Page {page_num + 1} Image {count + 1} ---\n{caption}\n\n"
+                except Exception as e:
+                    logging.warning(f"Failed to process image {count} on page {page_num}: {e}")
+                    
+    return text_content
--- a/app/core/prompts.py
+++ b/app/core/prompts.py
@ -0,0 +1,86 @@
+from lightrag.prompt import PROMPTS
+
+# 自定义 Prompt 模板：将 References 标题改为中文，并优化 Markdown 格式要求
+CUSTOM_RAG_RESPONSE_PROMPT = """---Role---
+
+You are an expert AI assistant specializing in synthesizing information from a provided knowledge base. Your primary function is to answer user queries accurately by ONLY using the information within the provided **Context**.
+
+---Goal---
+
+Generate a comprehensive, well-structured answer to the user query.
+The answer must integrate relevant facts from the Knowledge Graph and Document Chunks found in the **Context**.
+Consider the conversation history if provided to maintain conversational flow and avoid repeating information.
+
+---Instructions---
+
+1. Step-by-Step Instruction:
+  - Carefully determine the user's query intent in the context of the conversation history to fully understand the user's information need.
+  - Scrutinize both `Knowledge Graph Data` and `Document Chunks` in the **Context**. Identify and extract all pieces of information that are directly relevant to answering the user query.
+  - Weave the extracted facts into a coherent and logical response. Your own knowledge must ONLY be used to formulate fluent sentences and connect ideas, NOT to introduce any external information.
+  - Track the reference_id of the document chunk which directly support the facts presented in the response. Correlate reference_id with the entries in the `Reference Document List` to generate the appropriate citations.
+  - Generate a references section at the end of the response. Each reference document must directly support the facts presented in the response.
+  - Do not generate anything after the reference section.
+
+2. Content & Grounding:
+  - Strictly adhere to the provided context from the **Context**; DO NOT invent, assume, or infer any information not explicitly stated.
+  - If the answer cannot be found in the **Context**, state that you do not have enough information to answer. Do not attempt to guess.
+
+3. Formatting & Language:
+  - The response MUST be in the same language as the user query.
+  - The response MUST utilize Markdown formatting for enhanced clarity and structure (e.g., headings, bold text, bullet points).
+  - The response should be presented in {response_type}.
+
+4. References Section Format:
+  - The References section should be under heading: `### 参考文献`
+  - Reference list entries should adhere to the format: `* [n] Document Title`. Do not include a caret (`^`) after opening square bracket (`[`).
+  - The Document Title in the citation must retain its original language.
+  - Output each citation on an individual line
+  - Deduplicate references: If multiple citations point to the same Document Title, list the document title only once.
+  - Provide maximum of 5 most relevant citations.
+  - Do not generate footnotes section or any comment, summary, or explanation after the references.
+
+5. Reference Section Example:
+```
+### 参考文献
+
+- [1] Document Title One
+- [2] Document Title Two
+- [3] Document Title Three
+```
+
+6. Additional Instructions: {user_prompt}
+
+
+---Context---
+
+{context_data}
+"""
+
+def patch_prompts():
+    """
+    修改 LightRAG 默认提示词，针对中文环境优化
+    """
+    PROMPTS["keywords_extraction"] = """---Role---
+You are an expert keyword extractor, specializing in analyzing user queries for a Retrieval-Augmented Generation (RAG) system. Your purpose is to identify both high-level and low-level keywords in the user's query that will be used for effective document retrieval.
+
+---Goal---
+Given a user query, your task is to extract two distinct types of keywords:
+1. **high_level_keywords**: for overarching concepts or themes, capturing user's core intent, the subject area, or the type of question being asked.
+2. **low_level_keywords**: for specific entities or details, identifying the specific entities, proper nouns, technical jargon, product names, or concrete items.
+
+---Instructions & Constraints---
+1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
+2. **Source of Truth**: All keywords must be explicitly derived from the user query, with both high-level and low-level keyword categories are required to contain content.
+3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
+4. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
+5. **Language**: All extracted keywords MUST be in the SAME LANGUAGE as the user query. For example, if the query is in Chinese, keywords MUST be in Chinese. Proper nouns (e.g., personal names, place names, organization names) should be kept in their original language.
+
+---Examples---
+{examples}
+
+---Real Data---
+User Query: {query}
+
+---Output---
+Output:
+"""
--- a/app/core/rag.py
+++ b/app/core/rag.py
@ -0,0 +1,281 @@
+import logging
+import os
+import threading
+import httpx
+import numpy as np
+import ollama
+from collections import OrderedDict
+from typing import Optional, List, Union
+
+from lightrag import LightRAG
+from lightrag.utils import EmbeddingFunc
+from lightrag.llm.ollama import ollama_embed
+from app.config import settings
+
+# 全局 RAG 管理器
+rag_manager = None
+
+# ==============================================================================
+# LLM Functions
+# ==============================================================================
+
+async def ollama_llm_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str:
+    """Ollama LLM 实现"""
+    # 参数清理
+    kwargs.pop('model', None)
+    kwargs.pop('hashing_kv', None)
+    kwargs.pop('enable_cot', None)
+    keyword_extraction = kwargs.pop("keyword_extraction", False)
+    if keyword_extraction:
+        kwargs["format"] = "json"
+    
+    stream = kwargs.pop("stream", False)
+    think = kwargs.pop("think", False)
+    
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": system_prompt})
+    messages.extend(history_messages)
+    messages.append({"role": "user", "content": prompt})
+
+    client = ollama.AsyncClient(host=settings.LLM_BINDING_HOST)
+    if stream:
+        async def inner():
+            response = await client.chat(model=settings.LLM_MODEL, messages=messages, stream=True, think=think, **kwargs)
+            async for chunk in response:
+                msg = chunk.get("message", {})
+                if "thinking" in msg and msg["thinking"]:
+                    yield {"type": "thinking", "content": msg["thinking"]}
+                if "content" in msg and msg["content"]:
+                    yield {"type": "content", "content": msg["content"]}
+        return inner()
+    else:
+        response = await client.chat(model=settings.LLM_MODEL, messages=messages, stream=False, think=False, **kwargs)
+        return response["message"]["content"]
+
+async def openai_llm_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str:
+    """OpenAI 兼容 LLM 实现 (适用于 vLLM)"""
+    # 参数清理
+    kwargs.pop('model', None)
+    kwargs.pop('hashing_kv', None)
+    kwargs.pop('enable_cot', None)
+    keyword_extraction = kwargs.pop("keyword_extraction", False)
+    
+    # vLLM/OpenAI 不直接支持 format="json"，通常需要在 prompt 中指示或使用 response_format
+    # 这里简单处理：如果需要 json，在 prompt 中暗示（LightRAG 的 prompt 通常已经包含 json 指令）
+    if keyword_extraction:
+        kwargs["response_format"] = {"type": "json_object"}
+
+    stream = kwargs.pop("stream", False)
+    # think 参数是 DeepSeek 特有的，OpenAI 标准接口不支持，暂时忽略
+    think = kwargs.pop("think", None)
+    # 这里使用qwen3指定的chat_template_kwargs，开启/禁用思考模式
+    kwargs["chat_template_kwargs"] = {"enable_thinking": think}
+
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": system_prompt})
+    messages.extend(history_messages)
+    messages.append({"role": "user", "content": prompt})
+
+    url = f"{settings.LLM_BINDING_HOST}/chat/completions"
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {settings.LLM_KEY}"
+    }
+    
+    payload = {
+        "model": settings.LLM_MODEL,
+        "messages": messages,
+        "stream": stream,
+        **kwargs
+    }
+
+    if stream:
+        async def inner():
+            async with httpx.AsyncClient(timeout=120.0) as client:
+                async with client.stream("POST", url, headers=headers, json=payload) as response:
+                    response.raise_for_status()
+                    async for line in response.aiter_lines():
+                        if line.startswith("data: "):
+                            data_str = line[6:]
+                            if data_str.strip() == "[DONE]":
+                                break
+                            try:
+                                import json
+                                chunk = json.loads(data_str)
+                                delta = chunk['choices'][0]['delta']
+                                if 'content' in delta:
+                                    yield {"type": "content", "content": delta['content']}
+                                # vLLM 可能不返回 thinking 字段，除非是 DeepSeek 模型且配置了
+                            except:
+                                pass
+        return inner()
+    else:
+        async with httpx.AsyncClient(timeout=120.0) as client:
+            response = await client.post(url, headers=headers, json=payload)
+            response.raise_for_status()
+            result = response.json()
+            return result['choices'][0]['message']['content']
+
+async def llm_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str:
+    """LLM 调度函数"""
+    if settings.LLM_BINDING == "ollama":
+        return await ollama_llm_func(prompt, system_prompt, history_messages, **kwargs)
+    elif settings.LLM_BINDING in ["vllm", "openai", "custom"]:
+        return await openai_llm_func(prompt, system_prompt, history_messages, **kwargs)
+    else:
+        raise ValueError(f"Unsupported LLM_BINDING: {settings.LLM_BINDING}")
+
+# ==============================================================================
+# Embedding Functions
+# ==============================================================================
+
+async def ollama_embedding_func(texts: list[str]) -> np.ndarray:
+    return await ollama_embed(
+        texts, 
+        embed_model=settings.EMBEDDING_MODEL, 
+        host=settings.EMBEDDING_BINDING_HOST
+    )
+
+async def tei_embedding_func(texts: list[str]) -> np.ndarray:
+    """TEI (Text Embeddings Inference) Embedding 实现"""
+    url = f"{settings.EMBEDDING_BINDING_HOST}/embed" # TEI 标准接口
+    headers = {"Content-Type": "application/json"}
+    if settings.EMBEDDING_KEY and settings.EMBEDDING_KEY != "EMPTY":
+        headers["Authorization"] = f"Bearer {settings.EMBEDDING_KEY}"
+    
+    payload = {
+        "inputs": texts,
+        "truncate": True # TEI 参数，防止超长报错
+    }
+    
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        response = await client.post(url, headers=headers, json=payload)
+        response.raise_for_status()
+        # TEI 返回: [[0.1, ...], [0.2, ...]]
+        embeddings = response.json()
+        return np.array(embeddings)
+
+async def embedding_func(texts: list[str]) -> np.ndarray:
+    """Embedding 调度函数"""
+    if settings.EMBEDDING_BINDING == "ollama":
+        return await ollama_embedding_func(texts)
+    elif settings.EMBEDDING_BINDING == "tei":
+        return await tei_embedding_func(texts)
+    else:
+        # 默认回退到 ollama 或者报错
+        raise ValueError(f"Unsupported EMBEDDING_BINDING: {settings.EMBEDDING_BINDING}")
+
+# ==============================================================================
+# Rerank Functions
+# ==============================================================================
+
+async def tei_rerank_func(query: str, documents: list[str], top_n: int = 10) -> list[dict]:
+    """TEI Rerank 实现"""
+    if not documents:
+        return []
+        
+    url = f"{settings.RERANK_BINDING_HOST}/rerank"
+    headers = {"Content-Type": "application/json"}
+    if settings.RERANK_KEY and settings.RERANK_KEY != "EMPTY":
+        headers["Authorization"] = f"Bearer {settings.RERANK_KEY}"
+        
+    # TEI 不支持 top_n 参数，我们手动截断或忽略
+    payload = {
+        "query": query,
+        "texts": documents,
+        "return_text": False
+    }
+    
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        response = await client.post(url, headers=headers, json=payload)
+        response.raise_for_status()
+        # TEI 返回: [{"index": 0, "score": 0.99}, {"index": 1, "score": 0.5}]
+        results = response.json()
+        
+        # LightRAG 期望返回包含 index 和 relevance_score 的字典列表
+        # 这样 LightRAG 才能正确映射回原始文档并进行排序
+        formatted_results = []
+        for res in results:
+            formatted_results.append({
+                "index": res['index'],
+                "relevance_score": res['score']
+            })
+            
+        return formatted_results
+
+# ==============================================================================
+# RAG Manager
+# ==============================================================================
+
+class RAGManager:
+    def __init__(self, capacity: int = 3):
+        self.capacity = capacity
+        self.cache = OrderedDict()
+        self.lock = threading.Lock()
+
+    async def get_rag(self, user_id: str) -> LightRAG:
+        """获取指定用户的 LightRAG 实例 (LRU 缓存)"""
+        with self.lock:
+            if user_id in self.cache:
+                self.cache.move_to_end(user_id)
+                logging.debug(f"Cache hit for user: {user_id}")
+                return self.cache[user_id]
+        
+        logging.info(f"Initializing RAG instance for user: {user_id}")
+        user_data_dir = os.path.join(settings.DATA_DIR, user_id)
+        
+        if not os.path.exists(user_data_dir):
+            os.makedirs(user_data_dir)
+            
+        # 准备参数
+        rag_params = {
+            "working_dir": user_data_dir,
+            "llm_model_func": llm_func,
+            "llm_model_name": settings.LLM_MODEL,
+            "llm_model_max_async": settings.LLM_MODEL_MAX_ASYNC, # vLLM 并发能力强，可以调高
+            "max_parallel_insert": 1,
+            "embedding_func": EmbeddingFunc(
+                embedding_dim=settings.EMBEDDING_DIM,
+                max_token_size=settings.MAX_TOKEN_SIZE,
+                func=embedding_func
+            ),
+            "embedding_func_max_async": 8, # TEI 并发强
+            "enable_llm_cache": True,
+            "cosine_threshold": settings.COSINE_THRESHOLD
+        }
+        
+        # 如果启用了 Rerank，注入 rerank_model_func
+        if settings.RERANK_ENABLED:
+            logging.info("Rerank enabled for RAG instance")
+            rag_params["rerank_model_func"] = tei_rerank_func
+            
+        rag = LightRAG(**rag_params)
+        
+        await rag.initialize_storages()
+        
+        with self.lock:
+            if user_id in self.cache:
+                self.cache.move_to_end(user_id)
+                return self.cache[user_id]
+            
+            self.cache[user_id] = rag
+            self.cache.move_to_end(user_id)
+            
+            while len(self.cache) > self.capacity:
+                oldest_user, _ = self.cache.popitem(last=False)
+                logging.info(f"Evicting RAG instance for user: {oldest_user}")
+                
+        return rag
+
+def initialize_rag_manager():
+    global rag_manager
+    rag_manager = RAGManager(capacity=settings.MAX_RAG_INSTANCES)
+    logging.info(f"RAG Manager initialized with capacity: {settings.MAX_RAG_INSTANCES}")
+    return rag_manager
+
+def get_rag_manager() -> RAGManager:
+    if rag_manager is None:
+        return initialize_rag_manager()
+    return rag_manager
--- a/app/main.py
+++ b/app/main.py
@ -0,0 +1,43 @@
+import logging
+from fastapi import FastAPI
+from fastapi.staticfiles import StaticFiles
+from contextlib import asynccontextmanager
+from app.config import settings
+from app.core.rag import initialize_rag_manager
+from app.core.prompts import patch_prompts
+from app.api.routes import router
+import os
+
+# 配置日志
+logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """应用生命周期管理"""
+    # 1. Patch Prompts
+    patch_prompts()
+    
+    # 2. Init RAG Manager
+    initialize_rag_manager()
+    
+    yield
+
+app = FastAPI(
+    title=settings.APP_TITLE,
+    version=settings.APP_VERSION,
+    lifespan=lifespan
+)
+
+# 确保静态目录存在
+static_dir = os.path.join(os.path.dirname(__file__), "static")
+if not os.path.exists(static_dir):
+    os.makedirs(static_dir)
+
+# 挂载静态文件
+app.mount("/static", StaticFiles(directory=static_dir), name="static")
+
+app.include_router(router)
+
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run("app.main:app", host=settings.HOST, port=settings.PORT, reload=True)
--- a/app/static/admin.html
+++ b/app/static/admin.html
@ -0,0 +1,505 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>LightRAG 管理后台</title>
+    <!-- Vue 3 -->
+    <script src="https://cdn.jsdelivr.net/npm/vue@3/dist/vue.global.js"></script>
+    <!-- Element Plus -->
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/element-plus/dist/index.css" />
+    <script src="https://cdn.jsdelivr.net/npm/element-plus"></script>
+    <!-- Element Plus Icons -->
+    <script src="https://cdn.jsdelivr.net/npm/@element-plus/icons-vue"></script>
+    <!-- Axios -->
+    <script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>
+    <!-- Marked -->
+    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
+    <style>
+        body { margin: 0; padding: 0; font-family: 'Helvetica Neue', Helvetica, 'PingFang SC', 'Hiragino Sans GB', 'Microsoft YaHei', '微软雅黑', Arial, sans-serif; background-color: #f5f7fa; }
+        .container { max-width: 1200px; margin: 0 auto; padding: 20px; }
+        .header { background: #fff; padding: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.1); margin-bottom: 20px; display: flex; justify-content: space-between; align-items: center; }
+        .header h1 { margin: 0; font-size: 24px; color: #409EFF; }
+        .card { background: #fff; border-radius: 4px; padding: 20px; margin-bottom: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.05); }
+        .chat-box { height: 500px; overflow-y: auto; border: 1px solid #EBEEF5; padding: 20px; border-radius: 4px; margin-bottom: 20px; background: #fafafa; }
+        .message { margin-bottom: 15px; display: flex; }
+        .message.user { justify-content: flex-end; }
+        .message.assistant { justify-content: flex-start; }
+        .message-content { max-width: 80%; padding: 10px 15px; border-radius: 4px; font-size: 14px; line-height: 1.6; }
+        .message.user .message-content { background: #409EFF; color: #fff; }
+        .message.assistant .message-content { background: #fff; border: 1px solid #EBEEF5; }
+        .markdown-body { font-size: 14px; }
+        .markdown-body pre { background: #f6f8fa; padding: 10px; border-radius: 4px; overflow-x: auto; }
+    </style>
+</head>
+<body>
+    <div id="app">
+        <div class="header">
+            <h1 @click="goHome" style="cursor: pointer;">LightRAG Admin</h1>
+            <div v-if="currentPage === 'tenant'">
+                <el-tag type="success" size="large">当前租户: {{ currentTenantId }}</el-tag>
+                <el-button v-if="isAdmin" type="primary" link @click="goHome">返回首页</el-button>
+            </div>
+        </div>
+
+        <div class="container">
+            <!-- 首页：租户列表 -->
+            <div v-if="currentPage === 'home'">
+                <el-row :gutter="20">
+                    <el-col :span="24">
+                        <div class="card">
+                            <div style="display: flex; justify-content: space-between; margin-bottom: 20px;">
+                                <h3>租户列表</h3>
+                                <el-button type="primary" @click="refreshTenants">刷新</el-button>
+                            </div>
+                            <el-table :data="tenants" stripe style="width: 100%">
+                                <el-table-column prop="id" label="租户ID"></el-table-column>
+                                <el-table-column label="操作" width="180">
+                                    <template #default="scope">
+                                        <el-button type="primary" size="small" @click="enterTenant(scope.row)">管理文档</el-button>
+                                    </template>
+                                </el-table-column>
+                            </el-table>
+                        </div>
+                    </el-col>
+                </el-row>
+            </div>
+
+            <!-- 租户页：文档管理与问答 -->
+            <div v-if="currentPage === 'tenant'">
+                <el-tabs v-model="activeTab" class="card">
+                    <!-- 文档管理 Tab -->
+                    <el-tab-pane label="文档管理" name="docs">
+                        <div style="margin-bottom: 20px;">
+                            <el-button type="primary" @click="showImportDialog = true">导入文档/QA</el-button>
+                            <el-button @click="fetchDocuments">刷新列表</el-button>
+                        </div>
+                        <el-table :data="documents" v-loading="loadingDocs" style="width: 100%">
+                            <el-table-column prop="id" label="文档ID" width="220"></el-table-column>
+                            <el-table-column prop="summary" label="摘要" show-overflow-tooltip></el-table-column>
+                            <el-table-column prop="length" label="长度" width="100"></el-table-column>
+                            <el-table-column prop="created_at" label="创建时间" width="180">
+                                <template #default="scope">
+                                    {{ formatDate(scope.row.created_at) }}
+                                </template>
+                            </el-table-column>
+                            <el-table-column label="操作" width="180">
+                                <template #default="scope">
+                                    <el-button size="small" @click="viewDocument(scope.row.id)">详情</el-button>
+                                    <el-popconfirm title="确定删除吗？" @confirm="deleteDocument(scope.row.id)">
+                                        <template #reference>
+                                            <el-button size="small" type="danger">删除</el-button>
+                                        </template>
+                                    </el-popconfirm>
+                                </template>
+                            </el-table-column>
+                        </el-table>
+                    </el-tab-pane>
+
+                    <!-- 知识检索 Tab -->
+                    <el-tab-pane label="知识检索" name="chat">
+                        <div class="chat-box" ref="chatBox">
+                            <div v-for="(msg, index) in chatHistory" :key="index" :class="['message', msg.role]">
+                                <div class="message-content markdown-body" v-html="renderMarkdown(msg.content, msg.thinking, msg.retrievalStatus)"></div>
+                            </div>
+                        </div>
+                        <div style="display: flex; gap: 10px; align-items: center;">
+                            <el-checkbox v-model="onlyRag" label="仅使用知识库" border></el-checkbox>
+                            <el-input v-model="queryInput" placeholder="请输入问题..." @keyup.enter="sendQuery" style="flex: 1;"></el-input>
+                            <el-button type="primary" :loading="chatLoading" @click="sendQuery">发送</el-button>
+                        </div>
+                    </el-tab-pane>
+                </el-tabs>
+            </div>
+        </div>
+
+        <!-- 导入弹窗 -->
+        <el-dialog v-model="showImportDialog" title="导入知识" width="600px" :close-on-click-modal="!importing" :close-on-press-escape="!importing" :show-close="!importing">
+            <div v-loading="importing" element-loading-text="正在导入中，请稍候...（大文件可能需要较长时间）">
+                <el-tabs v-model="importType">
+                    <el-tab-pane label="文件上传" name="file">
+                        <el-upload
+                            class="upload-demo"
+                            drag
+                            action="#"
+                            :http-request="uploadFile"
+                            :show-file-list="false"
+                            :disabled="importing"
+                        >
+                            <el-icon class="el-icon--upload"><upload-filled /></el-icon>
+                            <div class="el-upload__text">拖拽文件到此处或 <em>点击上传</em></div>
+                            <template #tip>
+                                <div class="el-upload__tip">支持 .txt, .md, .pdf</div>
+                            </template>
+                        </el-upload>
+                    </el-tab-pane>
+                    <el-tab-pane label="纯文本" name="text">
+                        <el-input v-model="importText" type="textarea" rows="10" placeholder="请输入文本内容..." :disabled="importing"></el-input>
+                        <div style="margin-top: 10px; text-align: right;">
+                            <el-button type="primary" @click="uploadText" :loading="importing">提交</el-button>
+                        </div>
+                    </el-tab-pane>
+                    <el-tab-pane label="QA 问答" name="qa">
+                        <div v-for="(qa, index) in qaList" :key="index" style="margin-bottom: 15px; border-bottom: 1px solid #eee; padding-bottom: 15px;">
+                            <el-input v-model="qa.question" placeholder="问题 (Question)" style="margin-bottom: 5px;" :disabled="importing"></el-input>
+                            <el-input v-model="qa.answer" type="textarea" placeholder="回答 (Answer)" :disabled="importing"></el-input>
+                            <el-button type="danger" link size="small" @click="removeQA(index)" v-if="qaList.length > 1" :disabled="importing">删除</el-button>
+                        </div>
+                        <el-button type="primary" plain style="width: 100%; margin-bottom: 10px;" @click="addQA" :disabled="importing">添加一组 QA</el-button>
+                        <div style="text-align: right;">
+                            <el-button type="primary" @click="uploadQA" :loading="importing">提交所有 QA</el-button>
+                        </div>
+                    </el-tab-pane>
+                </el-tabs>
+            </div>
+        </el-dialog>
+
+        <!-- 文档详情弹窗 -->
+        <el-dialog v-model="showDocDialog" title="文档详情" width="800px">
+            <div v-loading="docDetailLoading">
+                <pre style="white-space: pre-wrap; background: #f5f7fa; padding: 15px; border-radius: 4px; max-height: 500px; overflow-y: auto;">{{ currentDocContent }}</pre>
+            </div>
+            <template #footer>
+                <span class="dialog-footer">
+                    <el-popconfirm title="确定删除此文档吗？" @confirm="deleteCurrentDoc">
+                        <template #reference>
+                            <el-button type="danger">删除文档</el-button>
+                        </template>
+                    </el-popconfirm>
+                    <el-button @click="showDocDialog = false">关闭</el-button>
+                </span>
+            </template>
+        </el-dialog>
+    </div>
+
+    <script>
+        const { createApp, ref, onMounted, computed } = Vue;
+        const { ElMessage } = ElementPlus;
+
+        const app = createApp({
+            setup() {
+                // 状态
+                const currentPage = ref('home');
+                const tenants = ref([]);
+                const currentTenantId = ref('');
+                const currentToken = ref('');
+                
+                // 文档管理
+                const activeTab = ref('docs');
+                const documents = ref([]);
+                const loadingDocs = ref(false);
+                const showImportDialog = ref(false);
+                const importing = ref(false);
+                const importType = ref('file');
+                const importText = ref('');
+                const qaList = ref([{ question: '', answer: '' }]);
+                
+                // 文档详情
+                const showDocDialog = ref(false);
+                const currentDocId = ref('');
+                const currentDocContent = ref('');
+                const docDetailLoading = ref(false);
+
+                // 聊天
+                const queryInput = ref('');
+                const onlyRag = ref(false);
+                const chatHistory = ref([]);
+                const chatLoading = ref(false);
+                const chatBox = ref(null);
+
+                // API Base
+                const api = axios.create({ baseURL: '/' });
+
+                // 解析 URL 参数
+                const urlParams = new URLSearchParams(window.location.search);
+                const tokenParam = urlParams.get('token');
+                const tenantParam = urlParams.get('tenant_id');
+                const isAdmin = ref(false);
+
+                // 初始化
+                onMounted(() => {
+                    if (tenantParam && tokenParam) {
+                        // 租户模式
+                        if (!tokenParam.startsWith(tenantParam + '_')) {
+                            ElMessage.error('Token 不匹配');
+                            return;
+                        }
+                        enterTenant({ id: tenantParam, token: tokenParam }, false);
+                    } else if (tokenParam === 'fzy') {
+                        // 管理员模式
+                        currentPage.value = 'home';
+                        isAdmin.value = true;
+                        refreshTenants();
+                    } else {
+                        ElMessage.warning('请在 URL 中提供有效的 token');
+                    }
+                });
+
+                // 方法
+                const goHome = () => {
+                    window.location.href = '?token=fzy';
+                };
+
+                const refreshTenants = async () => {
+                    try {
+                        const res = await api.get('/admin/tenants', { params: { token: 'fzy' } });
+                        tenants.value = res.data.tenants;
+                    } catch (e) {
+                        ElMessage.error('获取租户列表失败');
+                    }
+                };
+
+                const enterTenant = (tenant, updateUrl = true) => {
+                    currentTenantId.value = tenant.id;
+                    currentToken.value = tenant.token;
+                    currentPage.value = 'tenant';
+                    
+                    // 设置 Header
+                    api.defaults.headers.common['X-Tenant-ID'] = tenant.id;
+                    
+                    if (updateUrl) {
+                        const newUrl = `${window.location.pathname}?page=tenant&tenant_id=${tenant.id}&token=${tenant.token}`;
+                        window.history.pushState({}, '', newUrl);
+                    }
+                    
+                    fetchDocuments();
+                };
+
+                const fetchDocuments = async () => {
+                    loadingDocs.value = true;
+                    try {
+                        const res = await api.get('/documents');
+                        documents.value = res.data.docs;
+                    } catch (e) {
+                        ElMessage.error('获取文档失败');
+                    } finally {
+                        loadingDocs.value = false;
+                    }
+                };
+
+                const deleteDocument = async (id) => {
+                    try {
+                        await api.delete(`/docs/${id}`);
+                        ElMessage.success('删除成功');
+                        fetchDocuments();
+                        showDocDialog.value = false;
+                    } catch (e) {
+                        ElMessage.error('删除失败');
+                    }
+                };
+
+                const viewDocument = async (id) => {
+                    currentDocId.value = id;
+                    showDocDialog.value = true;
+                    docDetailLoading.value = true;
+                    try {
+                        const res = await api.get(`/documents/${id}`);
+                        currentDocContent.value = res.data.content;
+                    } catch (e) {
+                        currentDocContent.value = '加载失败';
+                    } finally {
+                        docDetailLoading.value = false;
+                    }
+                };
+                
+                const deleteCurrentDoc = () => {
+                    deleteDocument(currentDocId.value);
+                };
+
+                // 导入逻辑
+                const uploadFile = async (param) => {
+                    importing.value = true;
+                    const formData = new FormData();
+                    formData.append('file', param.file);
+                    try {
+                        await api.post('/ingest/file', formData);
+                        ElMessage.success('上传成功');
+                        showImportDialog.value = false;
+                        fetchDocuments();
+                    } catch (e) {
+                        ElMessage.error('上传失败');
+                    } finally {
+                        importing.value = false;
+                    }
+                };
+
+                const uploadText = async () => {
+                    if (!importText.value) return;
+                    importing.value = true;
+                    try {
+                        await api.post('/ingest/text', { text: importText.value });
+                        ElMessage.success('导入成功');
+                        showImportDialog.value = false;
+                        importText.value = '';
+                        fetchDocuments();
+                    } catch (e) {
+                        ElMessage.error('导入失败');
+                    } finally {
+                        importing.value = false;
+                    }
+                };
+
+                const addQA = () => qaList.value.push({ question: '', answer: '' });
+                const removeQA = (idx) => qaList.value.splice(idx, 1);
+                
+                const uploadQA = async () => {
+                    const validQA = qaList.value.filter(q => q.question && q.answer);
+                    if (validQA.length === 0) return ElMessage.warning('请填写 QA');
+                    
+                    importing.value = true;
+                    try {
+                        await api.post('/ingest/batch_qa', validQA);
+                        ElMessage.success(`成功导入 ${validQA.length} 条 QA`);
+                        showImportDialog.value = false;
+                        qaList.value = [{ question: '', answer: '' }];
+                        fetchDocuments();
+                    } catch (e) {
+                        ElMessage.error('导入失败');
+                    } finally {
+                        importing.value = false;
+                    }
+                };
+
+                // 聊天逻辑
+                const sendQuery = async () => {
+                    if (!queryInput.value.trim()) return;
+                    
+                    const q = queryInput.value;
+                    chatHistory.value.push({ role: 'user', content: q });
+                    queryInput.value = '';
+                    chatLoading.value = true;
+                    
+                    try {
+                        // 使用流式
+                        const response = await fetch('/query', {
+                            method: 'POST',
+                            headers: {
+                                'Content-Type': 'application/json',
+                                'X-Tenant-ID': currentTenantId.value
+                            },
+                            body: JSON.stringify({ 
+                                query: q, 
+                                stream: true, 
+                                mode: 'mix', 
+                                think: true,
+                                only_rag: onlyRag.value
+                            })
+                        });
+
+                        const reader = response.body.getReader();
+                        const decoder = new TextDecoder();
+                        
+                        // 创建消息对象并加入数组
+                        chatHistory.value.push({ role: 'assistant', content: '', thinking: '', retrievalStatus: null });
+                        // 获取响应式对象 (Proxy) 以便更新触发视图渲染
+                        const assistantMsg = chatHistory.value[chatHistory.value.length - 1];
+                        
+                        let buffer = '';
+                        
+                        while (true) {
+                            const { done, value } = await reader.read();
+                            if (done) break;
+                            
+                            buffer += decoder.decode(value, { stream: true });
+                            const blocks = buffer.split('\n\n');
+                            buffer = blocks.pop(); // 保留最后一个可能不完整的块
+                            
+                            for (const block of blocks) {
+                                if (!block.trim() || block.trim() === 'data: [DONE]') continue;
+
+                                const lines = block.split('\n');
+                                for (const line of lines) {
+                                    if (line.startsWith('data: ')) {
+                                        try {
+                                            const jsonStr = line.slice(6);
+                                            const chunk = JSON.parse(jsonStr);
+                                            
+                                            // 解析 OpenAI 兼容格式
+                                            if (chunk.choices && chunk.choices[0].delta) {
+                                                const delta = chunk.choices[0].delta;
+                                                
+                                                // 处理 x_rag_status
+                                                if (delta.x_rag_status) {
+                                                    assistantMsg.retrievalStatus = delta.x_rag_status;
+                                                }
+                                                
+                                                // 处理思考过程
+                                                if (delta.reasoning_content) {
+                                                    assistantMsg.thinking += delta.reasoning_content;
+                                                }
+                                                
+                                                // 处理正文内容
+                                                if (delta.content) {
+                                                    assistantMsg.content += delta.content;
+                                                }
+                                            }
+                                            
+                                            // 滚动到底部
+                                            if (chatBox.value) chatBox.value.scrollTop = chatBox.value.scrollHeight;
+                                            
+                                        } catch (e) {
+                                            console.error('JSON parse error:', e);
+                                        }
+                                    }
+                                }
+                            }
+                        }
+                    } catch (e) {
+                        chatHistory.value.push({ role: 'assistant', content: '❌ 请求出错: ' + e.message });
+                    } finally {
+                        chatLoading.value = false;
+                    }
+                };
+
+                const renderMarkdown = (content, thinking, retrievalStatus) => {
+                    let html = '';
+                    
+                    if (retrievalStatus) {
+                         const color = retrievalStatus === 'hit' ? '#67c23a' : '#e6a23c';
+                         const text = retrievalStatus === 'hit' ? '已检索到相关知识' : '未检索到相关知识';
+                         const icon = retrievalStatus === 'hit' ? '✔️' : '⚠️';
+                         html += `<div style="margin-bottom: 8px; font-size: 12px; color: ${color}; font-weight: bold;">${icon} ${text}</div>`;
+                    }
+
+                    if (thinking) {
+                        html += `<div class="thinking-box" style="background: #f0f9eb; padding: 10px; border-radius: 4px; margin-bottom: 10px; border: 1px solid #e1f3d8; color: #67c23a; white-space: pre-wrap; font-family: monospace; font-size: 12px;">${thinking}</div>`;
+                    }
+                    html += marked.parse(content || '');
+                    return html;
+                };
+
+                const formatDate = (dateStr) => {
+                    if (!dateStr) return '-';
+                    try {
+                        const date = new Date(dateStr);
+                        return isNaN(date.getTime()) ? dateStr : date.toLocaleString();
+                    } catch (e) {
+                        return dateStr;
+                    }
+                };
+
+                return {
+                    currentPage, tenants, currentTenantId,
+                    activeTab, documents, loadingDocs, 
+                    showImportDialog, importType, importText, qaList, importing,
+                    showDocDialog, currentDocId, currentDocContent, docDetailLoading,
+                    queryInput, chatHistory, chatLoading, chatBox,
+                    goHome, refreshTenants, enterTenant, fetchDocuments,
+                    viewDocument, deleteDocument, deleteCurrentDoc,
+                    uploadFile, uploadText, addQA, removeQA, uploadQA,
+                    sendQuery, renderMarkdown, formatDate, isAdmin, onlyRag
+                };
+            }
+        });
+        
+        // 注册图标
+        for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
+            app.component(key, component)
+        }
+        app.use(ElementPlus);
+        app.mount('#app');
+    </script>
+    <!-- 引入 Element Plus 图标 -->
+    <script src="https://unpkg.com/@element-plus/icons-vue"></script>
+</body>
+</html>
--- a/deploy.sh
+++ b/deploy.sh
@ -0,0 +1,39 @@
+#!/bin/bash
+
+# 定义变量
+IMAGE_NAME="lightrag-service"
+CONTAINER_NAME="lightrag-container"
+PORT=9600
+DATA_DIR="$(pwd)/index_data"
+
+# 确保数据目录存在
+mkdir -p "$DATA_DIR"
+
+# 1. 构建镜像
+echo "正在构建 Docker 镜像..."
+docker build -t $IMAGE_NAME .
+
+# 2. 停止旧容器
+if [ "$(docker ps -aq -f name=$CONTAINER_NAME)" ]; then
+    echo "停止并删除旧容器..."
+    docker stop $CONTAINER_NAME
+    docker rm $CONTAINER_NAME
+fi
+
+# 3. 启动新容器
+echo "启动服务..."
+# 注意：
+# 1. 使用 --env-file 挂载环境变量
+# 2. 使用 -v 挂载数据卷，确保数据持久化
+# 3. 使用 --add-host 允许容器访问宿主机的 Ollama/vLLM 服务 (host.docker.internal)
+docker run -d \
+    --name $CONTAINER_NAME \
+    -p $PORT:$PORT \
+    --env-file .env \
+    -v "$DATA_DIR":/app/index_data \
+    --add-host=host.docker.internal:host-gateway \
+    --restart unless-stopped \
+    $IMAGE_NAME
+
+echo "服务已启动: http://localhost:$PORT"
+echo "查看日志: docker logs -f $CONTAINER_NAME"
--- a/docs/LightRAG_Technical_Guide.md
+++ b/docs/LightRAG_Technical_Guide.md
@ -0,0 +1,171 @@
+# 深入浅出 LightRAG：下一代知识图谱增强检索技术详解
+
+> 本文档旨在帮助开发者从零理解 RAG 技术演进，掌握 LightRAG 的核心原理与工程实践。
+
+## 1. 基础概念篇：从 RAG 说起
+
+### 1.1 什么是 RAG？
+
+- **核心痛点**：LLM 的幻觉问题与知识滞后性。
+- **解决方案**：检索增强生成 (Retrieval-Augmented Generation)。
+- **类比**：考试时允许“翻书”，而不是纯靠“死记硬背”。
+
+### 1.2 为什么需要 LightRAG？
+
+- **传统 RAG 的局限**：
+  - **碎片化**：只见树木不见森林，难以回答宏观总结类问题（如“文章的主旨是什么？”）。
+  - **关联弱**：无法有效处理跨段落、跨文档的逻辑关联。
+- **Graph RAG 的崛起**：引入知识图谱，让 AI 理解实体间的关系。
+- **LightRAG 的定位**：
+  - **快而轻**：相比微软 GraphRAG 动辄几小时的索引时间，LightRAG 更加轻量高效。
+  - **双流机制**：同时保留“向量检索”（即时性）和“图谱检索”（结构性）。
+
+### 1.3 横向对比：LightRAG vs 其他框架
+
+| 特性 | Naive RAG (LangChain等) | Microsoft GraphRAG | LightRAG |
+| :--- | :--- | :--- | :--- |
+| **核心机制** | 纯向量相似度 | 社区聚类 + 图谱 | 双层检索 (图+向量) |
+| **索引速度** | 极快 (秒级) | 极慢 (小时级) | 较快 (分钟级) |
+| **查询成本** | 低 | 极高 (大量Token消耗) | 中等 |
+| **适用场景** | 简单事实问答 | 复杂数据集的宏观分析 | 兼顾细节与宏观的通用场景 |
+
+---
+
+## 2. 核心原理篇：LightRAG 如何工作？
+
+### 2.1 架构总览
+
+LightRAG 的核心在于它构建了一个**双层索引结构**，既能看清“细节”，又能把握“大局”：
+
+```mermaid
+graph TD
+    UserQuery[用户查询] --> KW[关键词提取]
+    KW -->|High-Level Keywords| Global[全局检索: 关系与宏观主题]
+    KW -->|Low-Level Keywords| Local[局部检索: 实体与具体细节]
+    UserQuery -->|Vector Search| Naive[向量检索: 文本片段]
+    
+    Global --> Context[上下文融合]
+    Local --> Context
+    Naive --> Context
+    
+    Context --> LLM[LLM 生成回答]
+```
+
+- **Low-Level (低层)**：具体的实体 (Entity) 和 概念 (Concept)。例如：“IPhone 16”、“A18芯片”。
+- **High-Level (高层)**：实体间的关系 (Relation) 和 宽泛的主题。例如：“苹果公司的产品线策略”、“高性能芯片对续航的影响”。
+
+### 2.2 数据处理流水线 (Pipeline)
+
+从一篇文档变成知识库，需要经历以下步骤：
+
+1. **切片 (Chunking)**：
+   - 将长文档切分为固定大小的文本块。
+   - **默认策略**：LightRAG 默认以 **1200 tokens** 为窗口大小进行切分，重叠 **100 tokens** 以保持上下文连贯性。
+
+2. **提取 (Extraction)**：
+   - 利用 LLM 并行分析每个 Chunk。
+   - **提取目标**：
+     - **实体 (Entities)**：人名、地名、专有名词。
+     - **关系 (Relations)**：实体A <-> 关系描述 <-> 实体B。
+   - *Prompt 优化点*：针对中文环境，我们去除了原版 Prompt 中的 `{language}` 变量依赖，强制 LLM 输出与 Query 同语言的关键词，避免了“中文问->英文搜”的错位。
+
+3. **存储 (Storage)**：
+   - **向量库 (VectorDB)**：存储文本块 (Chunks) 的 Embedding 向量，用于相似度检索。
+   - **图数据库 (GraphDB)**：存储节点 (实体) 和边 (关系)，构建知识网络。
+   - **键值库 (KV Store)**：存储原始文本内容，用于最后生成答案时的上下文回填。
+
+### 2.3 检索模式 (Mode) 详解
+
+LightRAG 支持以下 6 种检索模式（参考 `base.py` 定义）：
+
+1. **Local Mode (局部模式)**
+   - **原理**：基于 `Low-level Keywords` (实体) 在图谱中寻找一跳邻居 (1-hop neighbors)。
+   - **场景**：侧重实体细节。例如：“张三的职位是什么？”
+2. **Global Mode (全局模式)**
+   - **原理**：基于 `High-level Keywords` (关系) 在图谱中寻找全局关系路径。
+   - **场景**：侧重宏观关系。例如：“这家公司的管理层结构是怎样的？”
+3. **Hybrid Mode (混合模式)**
+   - **原理**：`Local` + `Global` 的结合。同时关注细节和宏观。
+   - **场景**：通用场景，效果最均衡。
+4. **Naive Mode (朴素模式)**
+   - **原理**：纯向量检索 (Vector Only)。不提取关键词，直接拿 Query 向量去撞库。
+   - **场景**：简单事实匹配，或者图谱尚未构建完成时。
+5. **Mix Mode (融合模式)** [默认]
+   - **原理**：**Hybrid (图谱)** + **Naive (向量)** 的大融合。
+   - **特点**：最全面的上下文覆盖，但也最消耗 Token。
+6. **Bypass Mode (直通模式)**
+   - **原理**：完全跳过 RAG 检索，直接将 Query 发送给 LLM。
+   - **场景**：闲聊、打招呼，或者不需要知识库的通用问题。
+
+---
+
+## 3. 工程实现篇：代码与细节
+
+### 3.1 核心类解析 (伪代码)
+
+```python
+class LightRAG:
+    def insert(self, text):
+        # 1. 切分文本 -> chunks (默认 1200 tokens)
+        chunks = split_text(text, chunk_size=1200)
+        
+        # 2. 并行调用 LLM 提取 (Entity, Relation)
+        #    这里使用 LLM (如 DeepSeek) 进行实体抽取
+        entities, relations = llm_extract(chunks)
+        
+        # 3. 更新 Graph 和 VectorDB
+        graph.add_nodes(entities)
+        graph.add_edges(relations)
+        vector_db.add(chunks)
+        pass
+
+    def query(self, question, mode="hybrid"):
+        # 1. 预处理
+        #    - bypass: 直接 return llm(question)
+        #    - naive: 直接 vector_search(question)
+        
+        # 2. 关键词提取 (Local/Global/Hybrid/Mix)
+        #    调用 LLM 分析 Query，提取关键词
+        #    extract_keywords(question) -> {high_level, low_level}
+
+        # 3. 根据 mode 选择检索策略
+        #    - local: graph.get_neighbors(low_level_keywords)
+        #    - global: graph.traverse(high_level_keywords)
+        #    - hybrid: local + global
+        #    - mix: knowledge_graph + vector_retrieval
+        
+        # 4. 收集 Context 并生成答案
+        #    将检索到的所有 Context 拼接到 System Prompt 中
+        #    调用 LLM 生成最终回答
+        pass
+```
+
+### 3.2 存储文件详解 (index_data)
+
+在 `index_data` 目录下，你会看到以下核心文件，它们共同构成了 LightRAG 的“大脑”：
+
+| 文件名 | 类型 | 用途 |
+| :--- | :--- | :--- |
+| `kv_store_text_chunks.json` | KV Store | **原始切片**。存储被切分后的文本块原始内容。 |
+| `kv_store_full_docs.json` | KV Store | **完整文档**。存储上传的原始文档内容。 |
+| `graph_chunk_entity_relation.graphml` | Graph | **知识图谱**。使用 NetworkX 格式存储实体(点)和关系(边)的拓扑结构。 |
+| `vdb_entities.json` | Vector | **实体向量**。用于通过向量相似度查找相关实体。 |
+| `vdb_chunks.json` | Vector | **切片向量**。用于 Naive 模式下的文本相似度检索。 |
+| `vdb_relationships.json` | Vector | **关系向量**。用于查找相关的关系描述。 |
+| `lightrag_cache.json` | Cache | **LLM 缓存**。存储 LLM 的历史响应，避免对相同问题重复调用 API (省钱神器)。 |
+
+### 3.3 性能优化策略
+
+- **异步并发**：LightRAG 内部大量使用 `async/await`，特别是在实体提取阶段，会并发请求 LLM，极大缩短索引时间。
+- **缓存机制**：`lightrag_cache.json` 实现了请求级缓存。如果你的 Prompt 和输入没变，它会直接返回历史结果，0 延迟，0 成本。
+- **增量更新**：RAG 不支持直接“修改”文档。我们的策略是 `Delete-then-Insert`（先删后加），确保图谱结构的原子性更新。
+
+---
+
+## 4. 实战指南：注意事项
+
+- **LLM 选择**：LightRAG 强依赖 LLM 的指令遵循能力（用于提取实体）。推荐使用 **DeepSeek-V3**、**Qwen-2.5** 等强力模型，小模型（<7B）可能会导致提取失败。
+- **成本控制**：
+  - **索引成本**：较高。因为要对全文做精细的实体提取。
+  - **查询成本**：中等。Hybrid 模式下，Context 长度可能会较长，注意控制 `top_k` 参数。
+- **部署建议**：推荐使用 Docker 部署，屏蔽环境差异。API 服务已封装了队列机制，但底层写操作（索引）是单线程锁定的，**请勿多实例挂载同一目录并发写入**。
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,11 @@
+lightrag-hku==1.4.9.10
+fastapi
+uvicorn
+python-multipart
+pypdf
+ollama
+numpy
+httpx
+pydantic-settings
+Pillow
+pydantic