Shapelets VectorDB vs. Qdrant:

Which Vector Database performs best?

08 October 2024 | 4 minutes

Qdrant

Do you prefer to listen to it? Just hit play!

by Shapelets

Adrián Carrio

Lead Data Scientist

08 October 2024
Quotes for highlighted phrases
Shapelets VectorDB is a commercial vector database known for its ultra-low memory footprint and unseen ingestion speed. 

Index

Qdrant vs. Shapelets | Which Vector Database Performs Best?

In the fast-changing world of machine learning and artificial intelligence, scalable and efficient vector databases are crucial for handling massive volumes of unstructured data, particularly for tasks like similarity search and recommendation systems. Two leading options in this space are Shapelets VectorDB and Qdrant, each with distinct strengths in vector storage, retrieval, and indexing. To provide a clear comparison, we conducted a detailed benchmark of these two databases using VectorDBBench, a specialized tool that evaluates vector database performance across key metrics. This article explores the findings, offering insights into each database’s capabilities, strengths, and ideal use cases.

Qdrant is an open-source vector database designed for storing, searching, and managing high-dimensional vector data, commonly applied in machine learning and AI projects. Shapelets VectorDB, on the other hand, is a commercial option with an ultra-low memory footprint and exceptional ingestion speed, available as both a C++ and Python library.

When choosing a vector database, specific applications may benefit from different performance aspects — such as storage capacity, search speed, or filtering search efficiency — depending on the primary use case. This benchmark focuses on search performance, especially the speed of nearest-neighbor search. Fast search capabilities are vital for real-time personalization and authentication applications, where quick response times enable dynamic ad targeting, content recommendations, and identity verification through voice or facial recognition, delivering instant, personalized results.

The dataset

Let’s talk about the dataset chosen for this benchmark. The COHERE dataset is a large-scale collection of high-quality text embeddings generated by Cohere’s language models, designed to support a variety of natural language processing (NLP) tasks. It provides embeddings for diverse text inputs, enabling efficient semantic search, text classification, recommendation, and question-answering applications. This dataset is often used in vector databases and machine learning models to enhance understanding of semantic relationships within massive text corpora, making it a valuable resource for benchmarking vector database performance and for developing NLP solutions that require robust text comprehension and similarity matching.

The actual dataset used is available for download from Hugging Face.

In these experiments, a typical scenario indexing and querying 1 million embeddings with 768 dimensions has been used. Nearest neighbor search is measured across multiple concurrent threads, ranging from 1 to 100, for 30 seconds each. All tests were executed using CPU only in a laptop equipped with an AMD Ryzen™ 7 7435HS Mobile Processor with a 3.1 GHz base clock (20 MB cache, up to 4.5 GHz, 8 cores, 16 threads) and 32 GB of RAM.

Metrics and results

Now, let’s dive into each of the metrics evaluated in this benchmark:

| Index Building Time.

This metric measures the time it takes for the database to build an index from a set of vectors. The index organizes vector embeddings, optimizing them for efficient similarity searches. A shorter index building time indicates faster readiness for data querying, which is particularly important when dealing with large or frequently updated datasets, like massive data ingestions required in disaster recovery scenarios..

Index Building Time Benchmark

| Recall

Recall is a measure of search accuracy, reflecting the percentage of relevant items retrieved from the database. In vector databases, it’s especially important when finding the closest matches or “nearest neighbors” for a query vector. A higher recall rate signifies more accurate results, which is critical for applications where precision in similarity or relevance is essential, such as recommendation systems and search engines. With Shapelets VectorDB all relevant items were returned, leading to a recall of 100%, while Qdrant returned “noisier” results, obtaining a recall of 97%.

Recall Benchmark

| Latency

Latency is the time taken to complete a search query, from initiation to result. In vector databases, low latency is crucial for applications that require real-time or near-real-time responses, such as live recommendations, personalization, or identity verification. Lower latency translates to faster response times, enhancing user experience in applications with time-sensitive demands. In these experiments both databases were run locally, with Shapelets VectorDB latency approaching 3 ms on average and Qdrant doubling this figure with a latency of more than 6 ms.

Latency Benchmark

| Maximum QPS (Queries Per Second)

Maximum QPS indicates the maximum number of queries the database can handle per second without degradation in performance. This metric reflects the scalability and robustness of a vector database under high query loads, making it essential for applications that need to handle large volumes of simultaneous requests, such as large-scale recommendation systems or interactive search applications. In our experiments the figure for Shapelets, with 2127 QPS, exceeded three times that of Qdrant, with 618.4 QPS.

Maximum QPS Benchmark

Conclusion

In conclusion, Shapelets VectorDB’s superior performance in both efficiency and accuracy underscores its capability to handle intensive vector search workloads effectively, making it a strong choice for applications requiring high-speed, high-precision vector retrieval. As the landscape of vector databases continues to evolve, we plan to expand our analyses with additional comparisons against other leading vector databases. Stay tuned for more insights as we dive deeper into performance benchmarks and the specific strengths of each solution.

Want to apply Shapelets to your projects?

Contact us and let’s study your case.

Pin It on Pinterest

Share This