shapelets.SandBox.from_parquet#

SandBox.from_parquet(paths: Union[str, Path, List[Union[str, Path]]], *, binary_string: Optional[bool] = False, hive_partitioning: Optional[bool] = False, include_filename: Optional[bool] = False) → DataSet#

Mounts parquet files

Parameters:

paths: str or Path or a list of str or Paths, required

Paths to parquet files to load. It accepts either single string or Path objects or a list of them.

Use a string value when using wildcards in your path (*) to match a directory tree structure. These paths may contain references to environment variables ($var or ${var}) and home directory expressions (~).

Use paths to specify valid and resoluble paths.

Paths, either in string or Path object formats, should include the file pattern to load (ej: *.parquet)

binary_string: boolean, optional, defaults to False

Treat binary data as strings

hive_partitioning: boolean, optional, defaults to False

The directories in paths include Hive expressions that should be incorporated into the loaded dataset.

include_filename: boolean, optional, defaults to False

When set to true, an additional column will be included in the loaded dataset, with the path to the file where the data was loaded.

Examples

>>> df = session.from_parquet("my_data.parquet")    

shapelets.SandBox.from_parquet#

Quick search