Unlimited data availability
Data Access
# Create a sandbox
from shapelets.data import sandbox sb = sandbox()
#Load data from multiple parquet files at once
my_data = sb.from_parquet('my_data',['my_files/**/*.parquet'])
# Export the result
my_data.to_csv
Challenges
Are you having issues to access your data?
If you add them all up, relevant amounts of time are lost waiting for files to be loaded and query results to arrive
Current tools for handling data at scale require large and expensive computing infrastructures
Unable to handle data unless it fits in memory
Dealing with data stored in various formats and locations adds a lot of complexity.
Most data tools require environments that are difficult to set up and configure
ARCHITECTURE
Load and query data files 2 to 10x faster than Pandas, PyArrow and Polars
Run queries efficiently on datasets larger than your RAM memory
Import and export files in multiple formats (e.g. Parquet, Arrow, Feather, DataFrames) with a single python library
Compatible with any LLM/model that produces embeddings/vectors and integrated with popular frameworks such as LangChain.
Loading
Shapelets Core provides inmediate file loading from heterogeneous data sources and with minimal computational requirements.
Loading data stored in one or multiple files is a daily duty for data scientists. Save plenty of time by using Shapelets Core to load Parquet, CSV and Feather files instantly into lazy tables.
Query
Use standard SQL to query data efficiently coming from multiple sources by loading into memory only what is strictly necessary
The use of lazy evaluations allows the instant concatenation of multiple queries. When the execute method is called, the query is planned and the necessary data to get a result is gathered from the files and loaded into memory. This efficiency greatly reduces query response times and memory footprint.
Export
Shapelets Core provides simple data conversion to multiple formats
Convert your data files easily into Parquet, CSV or Feather and export query results into other formats, such as Pandas Dataframes, so you can use them with third party libraries directly from Python.
Benchmarks
Incredible speed with minimal memory footprint
In this benchmark we have compared the loading times of a CSV files with 5M rows and 600 MB, and parquet files with 170M rows and 5 GB.
Use Case / ETL
Shapelets Core is perfect for running ETL processes to feed dashboards and information systems
Using standard databases for feeding dashboards and information systems that require multiple queries/views on big data usually involves high TCOs and causes slow responsiveness.
Shapelets Core allows you to efficiently query and integrate the data stored in multiple heterogenous data sources, including file systems and databases (e.g. Oracle, Postgre) and store it in multiple formats or use it directly for visualization purposes.
The ultimate tool for big data access in a single Python package
Shapelets Core offers both efficient I/O and a powerful SQL relational database management system (RDBMS).
Accelerate your data access today
Shapelets Core helps data scientists and data engineering in their daily tasks handling big data. Contact us today for a free demo.