hats.io#

Utilities for reading and writing catalog files

Submodules#

Functions#

write_parquet_metadata(catalog_path[, ...])

Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog.

get_common_metadata_pointer(→ upath.UPath)

Get file pointer to _common_metadata parquet metadata file

get_parquet_metadata_pointer(→ upath.UPath)

Get file pointer to _metadata parquet metadata file

get_partition_info_pointer(→ upath.UPath)

Get file pointer to partition_info.csv metadata file

get_point_map_file_pointer(→ upath.UPath)

Get file pointer to point_map.fits FITS image file.

get_skymap_file_pointer(→ upath.UPath)

Get file pointer to skymap.fits or skymap.K.fits FITS image file.

pixel_catalog_file(→ upath.UPath)

Create path pointer for a pixel catalog file. This will not create the directory

pixel_directory(→ upath.UPath)

Create path pointer for a pixel directory. This will not create the directory.

Package Contents#

write_parquet_metadata(catalog_path: str | pathlib.Path | upath.UPath, order_by_healpix=True, output_path: str | pathlib.Path | upath.UPath | None = None, create_thumbnail: bool = False, thumbnail_threshold: int = 1000000, create_metadata: bool = True)[source]#

Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog.

Creates files:

catalog/
├── data_thumbnail.parquet    (only if create_thumbnail=True)
├── ...
└── dataset/
    ├── _common_metadata      (always written)
    ├── _metadata             (only if create_metadata=True)
    └──  ...

data_thumbnail.parquet gives the user a quick overview of the whole dataset. It is a compact file containing one row from each data partition, up to a maximum of thumbnail_threshold rows.

dataset/_common_metadata contains the full schema of the dataset. This file will know all of the columns and their types, as well as any file-level key-value metadata associated with the full Parquet dataset.

dataset/_metadata contains the combined row group footers from all Parquet files in the dataset, which allows readers to read the entire dataset without having to open each individual Parquet file. This file can be large for datasets with many files, so users may choose to omit it by setting create_metadata=False.

Parameters:
catalog_pathstr | Path | UPath

Base path for the catalog root.

order_by_healpixbool, default=True

If True, reorder combined metadata by breadth-first Healpix pixel ordering (e.g., secondary indexes). Set False for datasets that should not be reordered. Does not modify dataset files on disk.

output_pathstr | Path | UPath | None, default=None

Base path to write metadata files. If None, uses catalog_path.

create_thumbnailbool, default=False

If True, writes a compact data_thumbnail.parquet containing one row per sampled file.

thumbnail_thresholdint, default=1_000_000

Maximum number of rows in the thumbnail (or maximum number of files, if thumbnail_threshold exceeds the number of files). One row per partition.

create_metadatabool, default=True

If True, writes dataset/_metadata combining row group footers.

Returns:
int

Total number of rows across all parquet files in the dataset.

Notes

For more information on the general Parquet metadata files, and why we write them, see https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files

For more information on HATS-specific metadata files and conventions, see https://www.ivoa.net/documents/Notes/HATS/

get_common_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#

Get file pointer to _common_metadata parquet metadata file

Parameters:
catalog_base_dir: str | Path | UPath

base directory of the catalog (includes catalog name)

Returns:
UPath

File Pointer to the catalog’s _common_metadata file

get_parquet_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#

Get file pointer to _metadata parquet metadata file

Parameters:
catalog_base_dir: str | Path | UPath

base directory of the catalog (includes catalog name)

Returns:
UPath

File Pointer to the catalog’s _metadata file

get_partition_info_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#

Get file pointer to partition_info.csv metadata file

Parameters:
catalog_base_dir: str | Path | UPath

base directory of the catalog (includes catalog name)

Returns:
UPath

File Pointer to the catalog’s partition_info.csv file

get_point_map_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#

Get file pointer to point_map.fits FITS image file.

Parameters:
catalog_base_dir: str | Path | UPath

base directory of the catalog (includes catalog name)

Returns:
UPath

File Pointer to the catalog’s point_map.fits FITS image file.

get_skymap_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath, order: int | None = None) upath.UPath[source]#

Get file pointer to skymap.fits or skymap.K.fits FITS image file.

Parameters:
catalog_base_dir: str | Path | UPath

base directory of the catalog (includes catalog name)

order: int | None

(Default value = None) desired order for the map, if looking for a down-sampled map.

Returns:
UPath

File Pointer to the FITS image file.

pixel_catalog_file(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel: hats.pixel_math.healpix_pixel.HealpixPixel, query_params: dict | None = None, npix_suffix: str = '.parquet') upath.UPath[source]#

Create path pointer for a pixel catalog file. This will not create the directory or file.

The catalog file name will take the HiPS standard form of:

<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>/Npix=<pixel_number>.parquet

Where the directory number is calculated using integer division as:

(pixel_number/10000)*10000
Parameters:
catalog_base_dirstr | Path | UPath | None

base directory of the catalog (includes catalog name)

pixelHealpixPixel

the healpix pixel to create path to

query_params: dict | None

(Default value = None) Params to append to URL. Ex:

{'cols': ['ra', 'dec'], 'fltrs': ['r>=10', 'g<18']}
npix_suffix: str

(Default value = “.parquet”) extension for the parquet file (or / if a directory)

Returns:
UPath

catalog file name

pixel_directory(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel_order: int, pixel_number: int | None = None, directory_number: int | None = None) upath.UPath[source]#

Create path pointer for a pixel directory. This will not create the directory.

One of pixel_number or directory_number is required. The directory name will take the HiPS standard form of:

<catalog_base_dir>/dataset/Norder=<pixel_order>/Dir=<directory number>

Where the directory number is calculated using integer division as:

(pixel_number/10000)*10000
Parameters:
catalog_base_dirstr | Path | UPath | None

base directory of the catalog (includes catalog name)

pixel_orderint

the healpix order of the pixel

pixel_numberint | None

the number of the healpix pixel at pixel_order

directory_numberint | None

directory number (or inferred from pixel number)

Returns:
UPath

directory name