shapelets.DataSet#
- class shapelets.DataSet(rel: Relation, _DataSet__map_fn: Callable[[Relation], type], isColAttr: bool = False)#
- Attributes:
- alias
- columns
- schema
Methods
add_column(colname, *genExpr)Adds a new column (colname) to the Dataset and returns the new Dataset.
count()Returns the number of rows in this DataSet
distinct([cols])Returns distinct row values found in this dataset
drop_columns([cols, pattern, full_match, flags])Drops columns in a DataSet
filter(func)Returns the Dataset filtered according to the conditions set by a lambda function (func)
head([n])Returns the first n rows.
limit([n, offset])Returns a new DataSet of n rows.
rename_columns(new_names)Renames the columns in a DataSet
rewrite_col(idx, col)It takes idx (index) and col (colname) as params and returns a tuple (idx,new_col) where:
select_columns([cols, pattern, full_match, ...])Selects or reorganises columns in a DataSet
shape()Returns the shape of this DataSet, as a tuple containing the number of rows and the column count.
sort_by(cols[, ascending])Sets a sorting criteria
split_by_column(colname)This method returns a dictionary of DataSet, each of them corresponding to the different entries of a specific column (colname).
tail([n])Returns the last n rows.
to_arrow_record_batch_reader(blocks)Returns an object that can be iterated to consume data in blocks.
to_arrow_table(blocks)Returns the full result as a table made of chucks of size approx rowsInBatch
to_csv(file[, delimiter, escape, ...])Materializes a relation and exports the results to a CSV file
cross_product
describe
intersect
minus
printSchema
sample
to_numpy
to_numpy_batch
to_pandas
to_pandas_batch
to_parquet
union