Document `Doc`

传入 upsert() 的每个 Doc 必须：

具有 id（如果已存在相同 id 的 Document，将被替换）
提供符合 Collection Schema 的数据：
1. 标量字段：以键值对形式在 fields 中提供（标量字段名作为键）
2. 向量 Embedding：以键值对形式在 vectors 中提供（向量名作为键）
可以省略 nullable 的标量字段

Upsert 单个 Document

假设你已有一个 Collection，Schema 如下：

一个标量字段：text（字符串）
一个稠密向量 Embedding：text_embedding（4 维 FP32 向量）
4 维向量仅用于演示——实际 Embedding 通常维度更大。

你已打开 Collection 并准备好 collection 对象。

按如下方式 upsert Document：

Upsert Document

import zvec

# 创建 Document
doc = zvec.Doc(  
    id="text_1",  # ← 必须唯一
    vectors={
        "text_embedding": [0.1, 0.2, 0.3, 0.4],  # ← 必须匹配向量名
                          # ↑ 浮点数列表；长度 = 维度 (4)
    },
    fields={
        "text": "This is a sample text.",  # ← 必须匹配标量字段名
    },
)

# Upsert Document
result = collection.upsert(doc)  
print(result)  # {"code": 0} 表示成功

upsert() 方法会先验证 Document：

错误用法——如未知字段或向量维度不匹配——会抛出错误。
如果验证通过，方法执行 upsert 并返回 Status 对象，指示成功或失败（如磁盘空间不足）。

成功 upsert 的 Document 立即可查询 🚀。

批量 Upsert Document

传入 Doc 对象列表即可一次 upsert 多个 Document。每个 Doc 独立处理，方法返回 Status 对象列表——每个 Document 对应一个。

批量 Upsert Document

import zvec

result = collection.upsert(  
    [
        zvec.Doc(
            id="text_1",
            vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
            fields={"text": "This is a sample text."},
        ),
        zvec.Doc(
            id="text_2",
            vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
            fields={"text": "This is another sample text."},
        ),
        zvec.Doc(
            id="text_3",
            vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
            fields={"text": "One more sample text."},
        ),
    ]
)

print(result)  # [{"code":0}, {"code":0}, {"code":0}]

如果批量中任何 Document 存在错误用法（如未知字段或向量维度不匹配），方法将抛出异常且不会 upsert 任何 Document。

如果所有 Document 验证通过，方法会尝试逐一 upsert。某个 Document 失败（如磁盘空间不足）不会阻止其他 Document 的 upsert。

🔍 请始终检查结果列表中每个 Status。

Upsert 包含稀疏向量的 Document

假设你的 Collection 包含一个名为 sparse_embedding 的稀疏向量。

按如下方式 upsert 包含稀疏向量的 Document：

Upsert 包含稀疏向量的 Document

import zvec

result = collection.upsert(  
    zvec.Doc(
        id="text_1",
        vectors={
            "sparse_embedding": {
                42: 1.25,  # ← 维度 42 的权重为 1.25
                1337: 0.8,  # ← 维度 1337 的权重为 0.8
                2999: 0.63,  # ← 维度 2999 的权重为 0.63
            }
        },
    )
)

print(result)  # {"code":0}

稀疏向量以维度索引（整数）到值（浮点数）的映射表示。 没有固定的维度大小——只需包含非零维度。

Upsert 包含多个字段和向量的 Document

实际应用中通常需要包含多个标量字段和向量 Embedding 的 Collection。在此示例中，假设你的 Collection 包含以下 Schema：

标量字段：
1. book_title（字符串）
2. category（字符串数组）
3. publish_year（32 位整数）
向量 Embedding：
1. dense_embedding：768 维稠密向量
2. sparse_embedding：稀疏向量

按如下方式 upsert 包含多个字段和向量的 Document：

Upsert 包含多个字段和向量的 Document

import zvec

# 创建 Document
doc = zvec.Doc(  
    id="book_1",
    vectors={
        "dense_embedding": [0.1 for _ in range(768)],  # ← 实际使用时替换为真实 Embedding
        "sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64},  # ← 实际使用时替换为真实 Embedding
    },
    fields={
        "book_title": "Gone with the Wind",  # ← 字符串
        "category": ["Romance", "Classic Literature"],  # ← 字符串数组
        "publish_year": 1936,  # ← 整数
    },
)

# Upsert Document
result = collection.upsert(doc)  
print(result)  # {"code": 0} 表示成功

Upsert Document

Document Doc

Upsert 单个 Document

代码示例

批量 Upsert Document

Upsert 包含稀疏向量的 Document

代码示例

Upsert 包含多个字段和向量的 Document

代码示例

本页目录

Document `Doc`