Vector / Embeddings

A vector, often called an array, is a collection of elements that can have different data types.

Mathematically, a (real) vector $\textbf{x}$ of length or dimension $n$ is represented as:

\textbf{x} = \left[x_1\,\, x_2\,\, \dots\,\, x_n\right],\,\,\, \textbf{x} \in \mathbb{R}^n

In vector databases, we primarily work with real numerical vectors. This is especially true in AI, where models use numeric data for predictions. Large Language Models (LLMs) are no exception, even though we interact with them using natural language.

LLMs don't process text directly. Instead, they use a mathematical representation called embeddings, which represent text as numeric vectors. The attention mechanism in transformer models allows these embeddings to preserve important semantic and contextual information. This enables LLMs to effectively "understand" human inputs. When we interact with LLMs, our input is encoded into embeddings, which the models actually process.

LLMs often suffer from hallucinations, so providing them with factual knowledge is crucial. To avoid writing lengthy prompts that consume most of the model's context length, we store relevant information in a database for later retrieval. Since LLMs don't process plain text, we store the embeddings associated with the natural text. This allows us to mathematically compare the similarity of a request to each piece of information and retrieve the relevant context.

Vector databases are essential beyond generative AI. They are critical in Information Retrieval, Data Governance, Efficient Storage, and other areas that require efficient data storage.

Shapelets provides the perfect vector database due to its simplicity and efficiency, take your RAG application to the next level with ShapeletsVecDB.