API#
Opening Measurement Sets#
The standard xarray.open_datatree() method should
be used to open a DataTree interface
to the underlying Measurement Set data.
>>> datatree = xarray.open_datatree("/data/data.ms", partition_schema=["FIELD_ID"])
These methods defer to the relevant methods on the Entrypoint Class. Consult the method signatures for information on extra arguments that can be passed.
Entrypoint Class#
Entrypoint class for the MSv2 backend.
- class xarray_ms.backend.msv2.entrypoint.MSv2EntryPoint#
- open_dataset(filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, *, drop_variables: str | Iterable[str] | None = None, partition_schema: List[str] | None = None, partition_key: PartitionKeyT | None = None, preferred_chunks: Dict[str, int] | None = None, auto_corrs: bool = False, ninstances: int = 8, epoch: str | None = None, structure_factory: MSv2StructureFactory | None = None) Dataset#
Create a
Datasetpresenting an MSv4 view over a partition of a MSv2 CASA Measurement Set- Parameters:
filename_or_obj – The path to the MSv2 CASA Measurement Set file.
drop_variables – Variables to drop from the dataset.
partition_schema – The columns to use for partitioning the Measurement set. Defaults to
['OBSERVATION_ID', 'PROCESSOR_ID', 'DATA_DESC_ID', 'OBS_MODE']. See Partitioning Guide for more further information.partition_key – A key corresponding to an individual partition. For example
(('DATA_DESC_ID', 0), ('FIELD_ID', 0)). IfNone, the first partition will be opened.preferred_chunks – The preferred chunks for each partition.
auto_corrs – Include/Exclude auto-correlations.
ninstances – The number of Measurement Set instances to open for parallel I/O.
epoch – A unique string identifying the creation of this Dataset. This should not normally need to be set by the user
structure_factory – A factory for creating MSv2Structure objects. This should not normally need to be set by the user
- Returns:
A
Datasetreferring to the unique partition specified bypartition_schemaandpartition_key.
- open_datatree(filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, *, preferred_chunks: Dict[str, Any] | None = None, drop_variables: str | Iterable[str] | None = None, partition_schema: List[str] | None = None, auto_corrs: bool = False, ninstances: int = 8, epoch: str | None = None, **kwargs) DataTree#
Create a
DataTreepresenting an MSv4 view over multiple partitions of a MSv2 CASA Measurement Set.- Parameters:
filename_or_obj – The path to the MSv2 CASA Measurement Set file.
preferred_chunks –
Chunk sizes along each dimension, e.g.
{"time": 10, "frequency": 16}. Individual partitions can be chunked differently by partially (or fully) specifying a partition key: e.g.{ # Applies to all partitions with the relevant DATA_DESC_ID (("DATA_DESC_ID", 0),): {"time": 10, "frequency": 16}, (("DATA_DESC_ID", 1),): {"time": 20, "frequency": 32}, } { # Applies to all partitions with the relevant DATA_DESC_ID and FIELD_ID (("DATA_DESC_ID", 0), ('FIELD_ID', 1)): {"time": 10, "frequency": 16}, (("DATA_DESC_ID", 1), ('FIELD_ID', 0)): {"time": 20, "frequency": 32}, } { # String variants "DATA_DESC_ID=0,FIELD_ID=0": {"time": 10, "frequency": 16}, "D=0,F=1": {"time": 20, "frequency": 32}, }
Note
xarray’s reserved
chunksargument must be specified in order to enable this functionality and enable fine-grained chunking in Datasets and DataTrees. See xarray’s backend documentation on Preferred chunk sizes for more information.drop_variables – Variables to drop from the dataset.
partition_schema – The columns to use for partitioning the Measurement set. Defaults to
['OBSERVATION_ID', 'PROCESSOR_ID', 'DATA_DESC_ID', 'OBS_MODE']. See Partitioning Guide for more further information.auto_corrs – Include/Exclude auto-correlations.
ninstances – The number of Measurement Set instances to open for parallel I/O.
epoch – A string uniquely identifying this Dataset. This should not normally be set by the user
- Returns:
An xarray
DataTree