shapelets.DataSet#

class shapelets.DataSet(rel: Relation, _DataSet__map_fn: Callable[[Relation], type], isColAttr: bool = False)#
Attributes:
alias
columns
schema

Methods

add_column(colname, *genExpr)

Adds a new column (colname) to the Dataset and returns the new Dataset.

count()

Returns the number of rows in this DataSet

distinct([cols])

Returns distinct row values found in this dataset

drop_columns([cols, pattern, full_match, flags])

Drops columns in a DataSet

filter(func)

Returns the Dataset filtered according to the conditions set by a lambda function (func)

head([n])

Returns the first n rows.

limit([n, offset])

Returns a new DataSet of n rows.

rename_columns(new_names)

Renames the columns in a DataSet

rewrite_col(idx, col)

It takes idx (index) and col (colname) as params and returns a tuple (idx,new_col) where:

select_columns([cols, pattern, full_match, ...])

Selects or reorganises columns in a DataSet

shape()

Returns the shape of this DataSet, as a tuple containing the number of rows and the column count.

sort_by(cols[, ascending])

Sets a sorting criteria

split_by_column(colname)

This method returns a dictionary of DataSet, each of them corresponding to the different entries of a specific column (colname).

tail([n])

Returns the last n rows.

to_arrow_record_batch_reader(blocks)

Returns an object that can be iterated to consume data in blocks.

to_arrow_table(blocks)

Returns the full result as a table made of chucks of size approx rowsInBatch

to_csv(file[, delimiter, escape, ...])

Materializes a relation and exports the results to a CSV file

cross_product

describe

intersect

minus

printSchema

sample

to_numpy

to_numpy_batch

to_pandas

to_pandas_batch

to_parquet

union