Highly efficient data storage
and search system
Indexing and Vector Database
# Load your documents
loader = TextLoader("../state_of_the_union.txt") documents = loader.load()
# Split the document into chunks
CharacterTextSplitter(chunk_size=1000, chunk_overlap=50) docs = text_splitter.split_documents(documents)
# Create an embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
# Convert the chunks into embeddings and load them into the vectorDB
db = sh.from_documents(docs, embedding_function)
# Run a similarity query
query = "What did the president say about Ketanji Brown Jackson?" docs = db.similarity_search(query)
# Print the retrieved results
print(docs[0].page_content)
Challenges
Are you having issues using context to find relevant data or searching in large vector stores?
Latest acquired data points need to be indexed before they become available to search processes
Use various index types to contextualize your data and speed up geospatial and date/time-based searches.
Unable to cope with large streams of data while keeping a low memory footprint.
Speed usually comes at the cost of lower accuracy. Not with Shapelets Core.
Pure SaaS offerings are not suitable for everyone.
ARCHITECTURE
Achieve not only fast responses to queries but also indexing times in the order of milliseconds.
Obtain excellent recall metrics for both exact and approximate similarity searches
Index all kinds of vectors and scalar data, including dates, times, durations and geospatial data.
Compatible with any LLM/model that produces embeddings/vectors and integrated with popular frameworks such as LangChain.
Indexing
Shapelets Core provides millisecond-scale indexing and querying, with minimal computational requirements.
Efficient indexing is crucial to accelerate similarity searches, but is usually disregarded in favour of fast query responses. Indexing can be distributed across multiple nodes for horizontal scaling, allowing for real-time indexing operations. Accelerate contextualized searches by using not just vector indices but also indices for scalar, datetimes and geospatial data.
Store
Depending on requirements, data can be stored in memory, local disk or in the cloud to optimize performance and costs.
The use of optimizations like cache line alignment reduces cache misses and improves overall efficiency. Furthermore, indices are based on compressible bitmaps with minimal size and always stored in memory, making search processes extremely efficient.
Search
Shapelets Core provides various search algorithms available to serve different use-case scenarios
Providing both approximate as well as exact results and even informing about the relevance of the data stored for a given query through distance histograms. Its API is also compatible with Langchain, allowing to easily build applications in the context of retrieval augmented generation (RAG).Â
Use Case / RAG
Shapelets Vector DB is perfect for retrieval augmented generation (RAG) applications based on sets of documents that grow periodically.
Using standard databases for feeding dashboards and information systems that require multiple queries/views on big data usually involves high TCOs and causes slow responsiveness.
Building a system in which users interact with a corpus of legal documents is hard when new documents are continuously added, rocketing computing costs for indexing.
Shapelets Core uses highly efficient algorithms for indexing, offering real-time indexing capabilities with minimal CPU and memory requirements.
Use it as a server-based vector DB or integrate it in your projects as a python library.
Indexing and Vector Database.
A scalable and multidimensional indexing solution
Shapelets Vector DB offers both efficient storage and indexing capabilities
Just Storage…
- ‘Archive and Move On’ scenarios (Compliance, Regulations, Proof of Record)
- Deferred Processing scenarios (Backtesting, System Of Record)
Just Index…
- Great for complementing your existing storage solutions.
- Integration with LLM / ML solutions
- Complex IoT and metric scenarios.
Combined Solution
Remove the need to integrate and maintain multiple systems with an all-in-one solution
Accelerate your data access today
Shapelets Core helps data scientists and data engineering in their daily tasks handling big data. Contact us today for a free demo.