Unlimited data availability

Shapelets Core

Data Access

The most comprehensive toolset

Unseen speeds for data I/O from files and databases in combination with a cutting-edge relational SQL engine

Read, query and write data at scale from filesystems and databases 2 to 10 times faster.

Python

Copy to Clipboard

# Create a sandbox

from shapelets.data import sandbox
sb = sandbox()
    
Copy to Clipboard

#Load data from multiple parquet files at once

my_data = sb.from_parquet('my_data',['my_files/**/*.parquet'])
    
Copy to Clipboard

# Export the result

my_data.to_csv
    

Challenges

Are you having issues to access your data?

Slow queries

If you add them all up, relevant amounts of time are lost waiting for files to be loaded and query results to arrive

Huge TCOs

Current tools for handling data at scale require large and expensive computing infrastructures

Scalability

Unable to handle data unless it fits in memory

Data harmonization

Dealing with data stored in various formats and locations adds a lot of complexity.

Setup and configuration issues

Most data tools require environments that are difficult to set up and configure

ARCHITECTURE

Data Access Architecture

Fast

Load and query data files 2 to 10x faster than Pandas, PyArrow and Polars

Low memory footprint

Run queries efficiently on datasets larger than your RAM memory

Unmatched compatibility

Import and export files in multiple formats (e.g. Parquet, Arrow, Feather, DataFrames) with a single python library

Versatile

Compatible with any LLM/model that produces embeddings/vectors and integrated with popular frameworks such as LangChain.

Loading

Shapelets Core provides inmediate file loading from heterogeneous data sources and with minimal computational requirements.

Loading data stored in one or multiple files is a daily duty for data scientists. Save plenty of time by using Shapelets Core to load Parquet, CSV and Feather files instantly into lazy tables.

Query

Use standard SQL to query data efficiently coming from multiple sources by loading into memory only what is strictly necessary

The use of lazy evaluations allows the instant concatenation of multiple queries. When the execute method is called, the query is planned and the necessary data to get a result is gathered from the files and loaded into memory. This efficiency greatly reduces query response times and memory footprint.

Export

Shapelets Core provides simple data conversion to multiple formats

Convert your data files easily into Parquet, CSV or Feather and export query results into other formats, such as Pandas Dataframes, so you can use them with third party libraries directly from Python.

Benchmarks

Incredible speed with minimal memory footprint

In this benchmark we have compared the loading times of a CSV files with 5M rows and 600 MB, and parquet files with 170M rows and 5 GB.

Competitive benchmarks
Competitive benchmarks

Use Case / ETL

Shapelets Core is perfect for running ETL processes to feed dashboards and information systems

Using standard databases for feeding dashboards and information systems that require multiple queries/views on big data usually involves high TCOs and causes slow responsiveness.
Shapelets Core allows you to efficiently query and integrate the data stored in multiple heterogenous data sources, including file systems and databases (e.g. Oracle, Postgre) and store it in multiple formats or use it directly for visualization purposes.

Quotes for highlighted phrases

The ultimate tool for big data access in a single Python package

Shapelets Core offers both efficient I/O and a powerful SQL relational database management system (RDBMS).

Accelerate your data access today

Shapelets Core helps data scientists and data engineering in their daily tasks handling big data. Contact us today for a free demo.

Sphere Shapelets

Pin It on Pinterest