hats.io#
Utilities for reading and writing catalog files
Submodules#
Functions#
|
Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog. |
|
Get file pointer to _common_metadata parquet metadata file |
|
Get file pointer to _metadata parquet metadata file |
|
Get file pointer to |
|
Get file pointer to point_map.fits FITS image file. |
|
Get file pointer to skymap.fits or skymap.K.fits FITS image file. |
|
Create path pointer for a pixel catalog file. This will not create the directory |
|
Create path pointer for a pixel directory. This will not create the directory. |
Package Contents#
- write_parquet_metadata(catalog_path: str | pathlib.Path | upath.UPath, order_by_healpix=True, output_path: str | pathlib.Path | upath.UPath | None = None, create_thumbnail: bool = False, thumbnail_threshold: int = 1000000, create_metadata: bool = True)[source]#
Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog.
Creates files:
catalog/ ├── data_thumbnail.parquet (only if create_thumbnail=True) ├── ... └── dataset/ ├── _common_metadata (always written) ├── _metadata (only if create_metadata=True) └── ...data_thumbnail.parquetgives the user a quick overview of the whole dataset. It is a compact file containing one row from each data partition, up to a maximum ofthumbnail_thresholdrows.dataset/_common_metadatacontains the full schema of the dataset. This file will know all of the columns and their types, as well as any file-level key-value metadata associated with the full Parquet dataset.dataset/_metadatacontains the combined row group footers from all Parquet files in the dataset, which allows readers to read the entire dataset without having to open each individual Parquet file. This file can be large for datasets with many files, so users may choose to omit it by settingcreate_metadata=False.- Parameters:
- catalog_pathstr | Path | UPath
Base path for the catalog root.
- order_by_healpixbool, default=True
If True, reorder combined metadata by breadth-first Healpix pixel ordering (e.g., secondary indexes). Set False for datasets that should not be reordered. Does not modify dataset files on disk.
- output_pathstr | Path | UPath | None, default=None
Base path to write metadata files. If None, uses
catalog_path.- create_thumbnailbool, default=False
If True, writes a compact
data_thumbnail.parquetcontaining one row per sampled file.- thumbnail_thresholdint, default=1_000_000
Maximum number of rows in the thumbnail (or maximum number of files, if thumbnail_threshold exceeds the number of files). One row per partition.
- create_metadatabool, default=True
If True, writes
dataset/_metadatacombining row group footers.
- Returns:
- int
Total number of rows across all parquet files in the dataset.
Notes
For more information on the general Parquet metadata files, and why we write them, see https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files
For more information on HATS-specific metadata files and conventions, see https://www.ivoa.net/documents/Notes/HATS/
- get_common_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#
Get file pointer to _common_metadata parquet metadata file
- Parameters:
- catalog_base_dir: str | Path | UPath
base directory of the catalog (includes catalog name)
- Returns:
- UPath
File Pointer to the catalog’s _common_metadata file
- get_parquet_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#
Get file pointer to _metadata parquet metadata file
- Parameters:
- catalog_base_dir: str | Path | UPath
base directory of the catalog (includes catalog name)
- Returns:
- UPath
File Pointer to the catalog’s _metadata file
- get_partition_info_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#
Get file pointer to
partition_info.csvmetadata file- Parameters:
- catalog_base_dir: str | Path | UPath
base directory of the catalog (includes catalog name)
- Returns:
- UPath
File Pointer to the catalog’s
partition_info.csvfile
- get_point_map_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) upath.UPath[source]#
Get file pointer to point_map.fits FITS image file.
- Parameters:
- catalog_base_dir: str | Path | UPath
base directory of the catalog (includes catalog name)
- Returns:
- UPath
File Pointer to the catalog’s point_map.fits FITS image file.
- get_skymap_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath, order: int | None = None) upath.UPath[source]#
Get file pointer to skymap.fits or skymap.K.fits FITS image file.
- Parameters:
- catalog_base_dir: str | Path | UPath
base directory of the catalog (includes catalog name)
- order: int | None
(Default value = None) desired order for the map, if looking for a down-sampled map.
- Returns:
- UPath
File Pointer to the FITS image file.
- pixel_catalog_file(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel: hats.pixel_math.healpix_pixel.HealpixPixel, query_params: dict | None = None, npix_suffix: str = '.parquet') upath.UPath[source]#
Create path pointer for a pixel catalog file. This will not create the directory or file.
The catalog file name will take the HiPS standard form of:
<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>/Npix=<pixel_number>.parquet
Where the directory number is calculated using integer division as:
(pixel_number/10000)*10000
- Parameters:
- catalog_base_dirstr | Path | UPath | None
base directory of the catalog (includes catalog name)
- pixelHealpixPixel
the healpix pixel to create path to
- query_params: dict | None
(Default value = None) Params to append to URL. Ex:
{'cols': ['ra', 'dec'], 'fltrs': ['r>=10', 'g<18']}
- npix_suffix: str
(Default value = “.parquet”) extension for the parquet file (or / if a directory)
- Returns:
- UPath
catalog file name
- pixel_directory(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel_order: int, pixel_number: int | None = None, directory_number: int | None = None) upath.UPath[source]#
Create path pointer for a pixel directory. This will not create the directory.
One of pixel_number or directory_number is required. The directory name will take the HiPS standard form of:
<catalog_base_dir>/dataset/Norder=<pixel_order>/Dir=<directory number>
Where the directory number is calculated using integer division as:
(pixel_number/10000)*10000
- Parameters:
- catalog_base_dirstr | Path | UPath | None
base directory of the catalog (includes catalog name)
- pixel_orderint
the healpix order of the pixel
- pixel_numberint | None
the number of the healpix pixel at
pixel_order- directory_numberint | None
directory number (or inferred from pixel number)
- Returns:
- UPath
directory name