hats.io.file_io#
Submodules#
Functions#
|
Deletes file from filesystem. |
|
Load a csv file to a pandas dataframe |
Load a csv file to a pandas dataframe |
|
|
Load a text file content to a list of strings. |
|
Make a directory at a given file pointer |
|
Read the object spatial distribution information from a healpix FITS file. |
|
Read parquet dataset from directory pointer or list of files. |
|
Read single parquet file. |
|
Reads parquet file(s) to a pandas DataFrame |
|
Read FileMetaData from footer of a single Parquet file. |
|
Remove a directory, and all contents, recursively. |
|
Write a pandas DataFrame to a CSV file |
|
Write a pandas DataFrame to a parquet file |
|
Write the object spatial distribution information to a healpix FITS file. |
|
Write a metadata only parquet file from a schema |
|
Write a string to a text file |
|
Append directories and/or a file name to a specified file pointer. |
|
Checks if a directory already has some contents (any files or subdirectories) |
|
Checks if a file or directory exists for a given file pointer |
|
Find files or directories matching the provided path parts. |
|
Returns a UPath file pointer from a path string or other path-like type. |
|
Create UPath with protocol-specific configurations. |
|
Checks if a regular file (NOT a directory) exists for a given file pointer. |
Package Contents#
- delete_file(file_handle: str | pathlib.Path | upath.UPath)[source]#
Deletes file from filesystem.
- Parameters:
- file_handle: str | Path | UPath
location of file pointer
- load_csv_to_pandas(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) pandas.DataFrame[source]#
Load a csv file to a pandas dataframe
- Parameters:
- file_pointer: str | Path | UPath
location of csv file to load
- **kwargs
arguments to pass to pandas
read_csvloading method
- Returns:
- pd.DataFrame
contents of the CVS file, as a dataframe.
- load_csv_to_pandas_generator(file_pointer: str | pathlib.Path | upath.UPath, *, chunksize=10000, compression=None, **kwargs) collections.abc.Generator[pandas.DataFrame][source]#
Load a csv file to a pandas dataframe
- Parameters:
- file_pointer: str | Path | UPath
location of csv file to load
- chunksizeint
(Default value = 10_000) number of rows to load per chunk
- compressionstr
(Default value = None) for compressed CSVs, the manner of compression. e.g. ‘gz’, ‘bzip’.
- **kwargs
arguments to pass to pandas
read_csvloading method
- Yields:
- pd.DataFrame
chunked contents of the CVS file, as a dataframe.
- load_text_file(file_pointer: str | pathlib.Path | upath.UPath, encoding: str = 'utf-8')[source]#
Load a text file content to a list of strings.
- Parameters:
- file_pointer: str | Path | UPath
location of file to read
- encoding: str
(Default value = “utf-8”) string encoding method used by the file
- Returns:
- str
full string contents of the file as a list of strings, one per line.
- make_directory(file_pointer: str | pathlib.Path | upath.UPath, exist_ok: bool = False)[source]#
Make a directory at a given file pointer
Will raise an error if a directory already exists, unless exist_ok is True in which case any existing directories will be left unmodified.
- Parameters:
- file_pointer: str | Path | UPath
location in file system to make directory
- exist_ok: bool
(Default value = False) If false will raise error if directory exists. If true existing directories will be ignored and not modified
- read_fits_image(map_file_pointer: str | pathlib.Path | upath.UPath) numpy.ndarray[source]#
Read the object spatial distribution information from a healpix FITS file.
- Parameters:
- map_file_pointer: str | Path | UPath
location of file to be read
- Returns:
- np.ndarray
one-dimensional numpy array of integers where the value at each index corresponds to the number of objects found at the healpix pixel.
- read_parquet_dataset(source: str | pathlib.Path | upath.UPath | list[str | pathlib.Path | upath.UPath], **kwargs) tuple[str | list[str], pyarrow.dataset.Dataset][source]#
Read parquet dataset from directory pointer or list of files.
Note that pyarrow.dataset reads require that directory pointers don’t contain a leading slash, and the protocol prefix may additionally be removed. As such, we also return the directory path that is formatted for pyarrow ingestion for follow-up.
See more info on source specification and possible kwargs at https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html
- Parameters:
- source: str | Path | UPath | list[str | Path | UPath]
directory, path, or list of paths to read data from
- **kwargs
additional arguments passed to
pyarrow.dataset.dataset
- Returns:
- tuple[str | list[str], Dataset]
Tuple containing a path to the dataset (that is formatted for pyarrow ingestion) and the dataset read from disk.
- read_parquet_file(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) pyarrow.parquet.ParquetFile[source]#
Read single parquet file.
- Parameters:
- file_pointer: str | Path | UPath
location of parquet file
- **kwargs
additional arguments to be passed to pyarrow.parquet.ParquetFile
- Returns:
- pq.ParquetFile
full contents of parquet file
- read_parquet_file_to_pandas(file_pointer: str | pathlib.Path | upath.UPath, is_dir: bool | None = None, **kwargs) nested_pandas.NestedFrame[source]#
Reads parquet file(s) to a pandas DataFrame
- Parameters:
- file_pointer: str | Path | UPath
File Pointer to a parquet file or a directory containing parquet files
- is_dirbool | None
If True, the pointer represents a pixel directory, otherwise, the pointer represents a file. In both cases there is no need to check the pointer’s content type. If is_dir is None (default), this method will resort to upath.is_dir() to identify the type of pointer. Inferring the type for HTTP is particularly expensive because it requires downloading the contents of the pointer in its entirety.
- **kwargs
Additional arguments to pass to pandas read_parquet method
- Returns:
- NestedFrame
Pandas DataFrame with the data from the parquet file(s)
- read_parquet_metadata(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) pyarrow.parquet.FileMetaData[source]#
Read FileMetaData from footer of a single Parquet file.
- Parameters:
- file_pointer: str | Path | UPath
location of file to read metadata from
- **kwargs
additional arguments to be passed to pyarrow.parquet.read_metadata
- Returns:
- pq.FileMetaData
parqeut file metadata (includes schema)
- remove_directory(file_pointer: str | pathlib.Path | upath.UPath, ignore_errors=False)[source]#
Remove a directory, and all contents, recursively.
- Parameters:
- file_pointer: str | Path | UPath
directory in file system to remove
- ignore_errorsbool
(Default value = False) if True errors resulting from failed removals will be ignored
- write_dataframe_to_csv(dataframe: pandas.DataFrame, file_pointer: str | pathlib.Path | upath.UPath, **kwargs)[source]#
Write a pandas DataFrame to a CSV file
- Parameters:
- dataframe: pd.DataFrame
DataFrame to write
- file_pointer: str | Path | UPath
location of file to write to
- **kwargs
args to pass to pandas
to_csvmethod
- write_dataframe_to_parquet(dataframe: pandas.DataFrame, file_pointer)[source]#
Write a pandas DataFrame to a parquet file
- Parameters:
- dataframe: pd.DataFrame
DataFrame to write
- file_pointerstr | Path | UPath
location of file to write to
- write_fits_image(histogram: numpy.ndarray, map_file_pointer: str | pathlib.Path | upath.UPath)[source]#
Write the object spatial distribution information to a healpix FITS file.
- Parameters:
- histogram: np.ndarray
one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.
- map_file_pointer: str | Path | UPath
location of file to be written
- write_parquet_metadata(schema, file_pointer: str | pathlib.Path | upath.UPath, metadata_collector: list | None = None, **kwargs)[source]#
Write a metadata only parquet file from a schema
- Parameters:
- schemapa.Schema
pyarrow schema to be written
- file_pointer: str | Path | UPath
location of file to be written to
- metadata_collector: list | None
(Default value = None) where to collect metadata information
- **kwargs
additional arguments to be passed to pyarrow.parquet.write_metadata
- write_string_to_file(file_pointer: str | pathlib.Path | upath.UPath, string: str, encoding: str = 'utf-8')[source]#
Write a string to a text file
- Parameters:
- file_pointer: str | Path | UPath
file location to write file to
- string: str
string to write to file
- encoding: str
(Default value = “utf-8”) encoding method to write to file with
- append_paths_to_pointer(pointer: str | pathlib.Path | upath.UPath, *paths: str) upath.UPath[source]#
Append directories and/or a file name to a specified file pointer.
- Parameters:
- pointerstr | Path | UPath
FilePointer object to add path to
- *paths: str
any number of directory names optionally followed by a file name to append to the pointer
- Returns:
- UPath
New file pointer to path given by joining given pointer and path names
- directory_has_contents(pointer: str | pathlib.Path | upath.UPath) bool[source]#
Checks if a directory already has some contents (any files or subdirectories)
- Parameters:
- pointerstr | Path | UPath
File Pointer to check for existing contents
- Returns:
- bool
True if there are any files or subdirectories below this directory.
- does_file_or_directory_exist(pointer: str | pathlib.Path | upath.UPath) bool[source]#
Checks if a file or directory exists for a given file pointer
- Parameters:
- pointerstr | Path | UPath
File Pointer to check if file or directory exists at
- Returns:
- bool
True if file or directory at pointer exists, False if not
- find_files_matching_path(pointer: str | pathlib.Path | upath.UPath, *paths: str) list[upath.UPath][source]#
Find files or directories matching the provided path parts.
- Parameters:
- pointerstr | Path | UPath
base File Pointer in which to find contents
- *paths: str
any number of directory names optionally followed by a file name. directory or file names may be replaced with * as a matcher.
- Returns:
- list[UPath]
New file pointers to files found matching the path
- get_upath(path: str | pathlib.Path | upath.UPath) upath.UPath[source]#
Returns a UPath file pointer from a path string or other path-like type.
- Parameters:
- path: str | Path | UPath
base file path to be normalized to UPath
- Returns:
- UPath
Instance of UPath.
- get_upath_for_protocol(path: str | pathlib.Path) upath.UPath[source]#
Create UPath with protocol-specific configurations.
If we access pointers on S3 and credentials are not found we assume an anonymous access, i.e., that the bucket is public.
- Parameters:
- path: str | Path | UPath
base file path to be normalized to UPath
- Returns:
- UPath
Instance of UPath.
- is_regular_file(pointer: str | pathlib.Path | upath.UPath) bool[source]#
Checks if a regular file (NOT a directory) exists for a given file pointer.
- Parameters:
- pointerstr | Path | UPath
File Pointer to check if a regular file
- Returns:
- bool
True if regular file at pointer exists, False if not or is a directory