shapelets.SandBox.from_parquet#
- SandBox.from_parquet(paths: Union[str, Path, List[Union[str, Path]]], *, binary_string: Optional[bool] = False, hive_partitioning: Optional[bool] = False, include_filename: Optional[bool] = False) DataSet #
Mounts parquet files
- Parameters:
- paths: str or Path or a list of str or Paths, required
Paths to parquet files to load. It accepts either single string or Path objects or a list of them.
Use a string value when using wildcards in your path (*) to match a directory tree structure. These paths may contain references to environment variables ($var or ${var}) and home directory expressions (~).
Use paths to specify valid and resoluble paths.
Paths, either in string or Path object formats, should include the file pattern to load (ej: *.parquet)
- binary_string: boolean, optional, defaults to False
Treat binary data as strings
- hive_partitioning: boolean, optional, defaults to False
The directories in paths include Hive expressions that should be incorporated into the loaded dataset.
- include_filename: boolean, optional, defaults to False
When set to true, an additional column will be included in the loaded dataset, with the path to the file where the data was loaded.
Examples
>>> df = session.from_parquet("my_data.parquet")