Understanding Retrieval Augmented Generation in Ultravox
/api/corpora
(note: ‘corpora’ is the plural of ‘corpus’).
Corpus
can contain one or more sources. Each Source
contributes one or more documents, which are single files (HTML page, PDF, DOC, etc.). Each Document
is broken up into one or more chunks, which are the units that may be returned for a query. For each Chunk
several vectors are produced for similarity search.
The API exposes metadata about documents as well as the number of chunks and vectors in a corpus. However, there are no APIs provided to directly manipulate or edit documents, chunks, or vectors. If you need to update or delete documents, you must update or delete the source.
queryCorpus
tool instead.startUrls
will trigger crawling on anything in the same domain (subdomains must be specified separately)