Skip to main content

What is Shapelets Core?

Shapelets Core is a Python library, offering several inherent advantages:

  • Deployment in serverless environments, such as AWS Lambda
  • Flexibility to run on your own servers without any restrictions
  • Easy integration into existing projects and workflows
  • High customizability to fit specific needs
  • Simplifies development by abstracting low-level details and leveraging Python's simplicity and efficiency

Shapelets Core incorporates Data Indexing and Data Access functionalities:

Data Indexing

shapelets.indices is a Python module within the shapelets-core library, offering powerful data indexing capabilities compatible with multiple data types: vector data (embeddings) and scalar data such as dates, timestamps, geospatial points...

shapelets.indices can also be used as a real-time vector database, offering seamless storage, indexing and retrieval of vector-like information. ShapeletsVecDB is our own vector database which offers compatibility with LangChain.

Instead of relying on traditional algorithms used in vector databases, we introduce an innovative approach based on Time Series Quantization. This results in unparalleled indexing speed. Additionally, memory usage is minimal by utilizing compressible bitmaps to represent indices.

Data Access

shapelets.data is a python module within the shapelets-core library, designed to provide Data Scientists and AI professionals with efficient and straightforward methods for accessing, reading, transforming and loading data.

You can access data structures and common file formats (e.g., Parquet, CSV) using the shapelets.data module through the Sandbox class. Once loaded into a sandbox, the data can be queried with SQL-like logic. You can also export data to various formats such as Python lists, dictionaries, PyArrow tables, Pandas DataFrames, Polars DataFrames, JSON, CSV, and Parquet.

The interface of Data Access simplifies the data manipulation process, enabling seamless integration into existing workflows.

Key features of shapelets.data include:

  • Parallel Data Processing: Executes data operations in parallel, significantly improving performance and reducing processing time.

  • Multi-format Support: Reads a wide variety of data formats, including:

    • Parquet
    • CSV
    • JSON
    • Spatial data
    • Manually created data for testing
    • SQL query outputs
  • Flexible Data Transformation: Any data subjected to operations can be easily transformed into various dataframe formats, such as:

    • Pandas
    • PyArrow:
      • Tensor
      • Batch
      • StructArray
    • Python Data Structures:
      • Dictionary
      • List
    • Polars
    • Awkward
    • Modin
    • Vaex
    • NumPy
    • Any format that supports the Python __dataframe__ definition!

Lazy evaluation allows queries to be chained and only executed when the user decides to run them. This postpones the cost of execution until it's necessary and avoids repeating executions if they're needed again later. It also eliminates the need to load all the data into memory, as only the fields deemed essential by the query planner are processed, resulting in a minimal memory footprint

Data Access is implemented as a Python library, allowing Data Scientists to easily manage their data!