Skip to main content

What is Shapelets Core?

Shapelets Core incorporates Data Access and Data Indexing functionalities:

Data Access

shapelets.data is a python module within the shapelets-core library, designed to provide Data Scientists and AI professionals with efficient and straightforward methods for accessing, reading, transforming, and loading data.

Data structures and data files in common formats (e.g. Parquet, CSV) can be loaded into shapelets.data through the use of the Sandbox class. Once loaded into a sandbox, data can be easily queried using a SQL-like programming logic and exported.

This familiar interface simplifies the data manipulation process, enabling seamless integration into existing workflows.

Key features of shapelets.data include:

  • Parallel Data Processing: Executes data operations in parallel, significantly improving performance and reducing processing time.

  • Multi-format Support: Reads a wide variety of data formats, including:

    • Parquet
    • CSV
    • JSON
    • Spatial data
    • Manually created data for testing
    • SQL query outputs
  • Flexible Data Transformation: Any data subjected to operations can be easily transformed into various dataframe formats, such as:

    • Pandas
    • PyArrow:
      • Tensor
      • Batch
      • StructArray
    • Python Data Structures:
      • Dictionary
      • List
    • Polars
    • Awkward
    • Modin
    • Vaex
    • NumPy
    • Any format that supports the Python __dataframe__ definition!

Lazy evaluation: Queries can be chained and will only be executed when the user decides to do so. This delays the expense of the execution until it is necessary and avoids having to repeat executions in case they are needed later on. This also removes the need to load all the data into memory as only those fields identified as strictly required by the query planner will be loaded, leading to a minimal memory footprint.

Data Access is implemented as a Python library, allowing Data Scientists to easily manage their data!

Data indexing

shapelets.indices is a python module within the shapelets-core library, offering powerful data indexing capabilities compatible with multiple data types: scalar and vector data, dates and times, as well as geospatial data.

shapelets.indices can also be used as a real-time vector database, offering seamless storage, indexing, and retrieval of vector-like information.

Instead of relying on traditional algorithms used in vector databases, we introduce an innovative approach based on Time Series Quantization. This results in unparalleled indexing speed. Additionally, memory usage is minimal by utilizing compressible bitmaps to represent indices.

Shapelets Core is a Python library, offering several inherent advantages:

  • Deployment in serverless environments, such as AWS Lambda
  • Flexibility to run on your own servers without any restrictions
  • Easy integration into existing projects and workflows
  • High customizability to fit specific needs
  • Simplifies development by abstracting low-level details and leveraging Python's simplicity and efficiency