Tutorial: Understand the RF Xarray Structure¶
Use this notebook when you want to answer:
- what are the dimensions of the processed RF dataset?
- what are the named coordinates and xarray indexes?
- why is
cycle_ida shared sparse axis across experiments? - what changes when you select one experiment, one cycle, or one hostname?
The goal is to make the NetCDF feel like a physical measurement table rather than an abstract tensor.
import sys
!{sys.executable} -m pip install matplotlib numpy requests xarray pyyaml
Requirement already satisfied: matplotlib in c:\users\calle\radioconda\lib\site-packages (3.9.2) Requirement already satisfied: numpy in c:\users\calle\radioconda\lib\site-packages (2.1.2) Requirement already satisfied: requests in c:\users\calle\radioconda\lib\site-packages (2.31.0) Requirement already satisfied: xarray in c:\users\calle\radioconda\lib\site-packages (2025.6.1) Requirement already satisfied: pyyaml in c:\users\calle\radioconda\lib\site-packages (6.0.1) Requirement already satisfied: contourpy>=1.0.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (1.3.0) Requirement already satisfied: cycler>=0.10 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (1.4.7) Requirement already satisfied: packaging>=20.0 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (24.1) Requirement already satisfied: pillow>=8 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (3.2.0) Requirement already satisfied: python-dateutil>=2.7 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\calle\radioconda\lib\site-packages (from requests) (3.3.0) Requirement already satisfied: idna<4,>=2.5 in c:\users\calle\radioconda\lib\site-packages (from requests) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\calle\radioconda\lib\site-packages (from requests) (2.0.6) Requirement already satisfied: certifi>=2017.4.17 in c:\users\calle\radioconda\lib\site-packages (from requests) (2024.7.4) Requirement already satisfied: pandas>=2.1 in c:\users\calle\radioconda\lib\site-packages (from xarray) (2.2.3) Requirement already satisfied: pytz>=2020.1 in c:\users\calle\radioconda\lib\site-packages (from pandas>=2.1->xarray) (2024.2) Requirement already satisfied: tzdata>=2022.7 in c:\users\calle\radioconda\lib\site-packages (from pandas>=2.1->xarray) (2024.2) Requirement already satisfied: six>=1.5 in c:\users\calle\radioconda\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
[notice] A new release of pip is available: 24.3.1 -> 26.0.1 [notice] To update, run: python.exe -m pip install --upgrade pip
from pathlib import Path
import importlib.util
import sys
from IPython.display import Markdown, display
NOTEBOOK_DIR = Path.cwd().resolve()
for candidate_dir in (
NOTEBOOK_DIR,
NOTEBOOK_DIR / "tutorials",
NOTEBOOK_DIR / "processing" / "tutorials",
):
if (candidate_dir / "csi_plot_utils.py").exists():
NOTEBOOK_DIR = candidate_dir.resolve()
break
else:
raise ImportError(f"Could not locate csi_plot_utils.py from {Path.cwd().resolve()}")
UTILS_PATH = NOTEBOOK_DIR / "csi_plot_utils.py"
PROCESSING_DIR = NOTEBOOK_DIR.parent
PROJECT_ROOT = PROCESSING_DIR.parent
spec = importlib.util.spec_from_file_location("csi_plot_utils", UTILS_PATH)
if spec is None or spec.loader is None:
raise ImportError(f"Could not load csi_plot_utils from {UTILS_PATH}")
csi = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = csi
spec.loader.exec_module(csi)
EXPERIMENT_ID = "EXP003"
DATASET_PATH = None # Set this to a specific .nc file when you do not want the newest match.
MAX_COORD_PREVIEW = 6
MAX_CYCLE_ROWS = 12
ds, dataset_path = csi.open_dataset(experiment_id=EXPERIMENT_ID, dataset_path=DATASET_PATH)
available_cycles = csi.available_cycle_ids(ds, EXPERIMENT_ID)
if available_cycles.size == 0:
raise ValueError(f"No CSI cycles available for experiment {EXPERIMENT_ID}")
SELECTED_CYCLE_ID = int(available_cycles[0])
experiment = ds.sel(experiment_id=EXPERIMENT_ID)
host_mask = experiment["csi_available"].sel(cycle_id=SELECTED_CYCLE_ID).values > 0
SELECTED_HOSTNAME = str(experiment["hostname"].values[host_mask][0])
print(f"Loaded dataset: {dataset_path}")
print(f"Selected experiment: {EXPERIMENT_ID}")
print(f"Example cycle for the walkthrough: {SELECTED_CYCLE_ID}")
print(f"Example hostname for the walkthrough: {SELECTED_HOSTNAME}")
csi.print_dataset_overview(ds)
Loaded dataset: C:\Users\Calle\OneDrive\Documenten\GitHub\ELLIIIT-dataset-26\results\csi_EXP003__EXP005__EXP006__EXP007__EXP008__EXP009__EXP010__EXP011__EXP012.nc Selected experiment: EXP003 Example cycle for the walkthrough: 1 Example hostname for the walkthrough: A05 Experiments: ['EXP003', 'EXP005', 'EXP006', 'EXP007', 'EXP008', 'EXP009', 'EXP010', 'EXP011', 'EXP012'] Dataset shape: experiment_id=9, cycle_id=1355, hostname=42 Cycle ID range: 1 .. 1356 Last measurement timestamp: 2026-04-04T07:54:35+02:00 (source file mtime)
1. The Named Axes, Coordinates, and Variables¶
The helper below summarizes the xarray layout directly from the dataset. This is the quickest way to see which names are dimensions, which names are coordinate indexes, and which names are real data variables.
display(Markdown(csi.xarray_structure_markdown(ds, max_coord_preview=MAX_COORD_PREVIEW)))
ds.indexes
Dataset Axes¶
| Dimension | Size | Meaning |
|---|---|---|
| cycle_id | 1355 | Shared orchestrator cycle axis across the dataset. Not every experiment uses every listed cycle. |
| hostname | 42 | One RF receiver host or tile. |
| experiment_id | 9 | One logical measurement run such as EXP003 or EXP005. |
Coordinate Indexes¶
| Coordinate | Index type | Preview |
|---|---|---|
| cycle_id | Index | 1, 2, 3, 4, 5, 6, ... (1355 total) |
| hostname | Index | A05, A06, A07, A08, A09, A10, ... (42 total) |
| experiment_id | Index | EXP003, EXP005, EXP006, EXP007, EXP008, EXP009, ... (9 total) |
Data Variables¶
| Variable | Dims | Shape | Meaning |
|---|---|---|---|
| csi_real | experiment_id, cycle_id, hostname | (9, 1355, 42) | Real part of the cable-corrected complex CSI value. |
| csi_imag | experiment_id, cycle_id, hostname | (9, 1355, 42) | Imaginary part of the cable-corrected complex CSI value. |
| csi_available | experiment_id, cycle_id, hostname | (9, 1355, 42) | Boolean-like mask that marks whether a host contributed CSI for that experiment/cycle. |
| rover_x | experiment_id, cycle_id | (9, 1355) | Rover X coordinate in meters for one experiment/cycle pair. |
| rover_y | experiment_id, cycle_id | (9, 1355) | Rover Y coordinate in meters for one experiment/cycle pair. |
| rover_z | experiment_id, cycle_id | (9, 1355) | Rover Z coordinate in meters for one experiment/cycle pair. |
| position_available | experiment_id, cycle_id | (9, 1355) | Boolean-like mask that marks whether the rover position is valid for that cycle. |
Think of the dataset as one stack of experiment slices.
- A full dataset uses
(experiment_id, cycle_id, hostname)as its named axes. - Selecting one
experiment_idremoves the outer axis and leaves acycle_id x hostnameslice. - Rover variables live on the
cycle_idaxis only, because one rover pose belongs to one cycle. - CSI variables live on
cycle_id x hostname, because one cycle can contain many host measurements.
Indexes:
cycle_id Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
...
1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356],
dtype='int32', name='cycle_id', length=1355)
hostname Index(['A05', 'A06', 'A07', 'A08', 'A09', 'A10', 'B05', 'B06', 'B07', 'B08',
'B09', 'B10', 'C05', 'C06', 'C07', 'C08', 'C09', 'C10', 'D05', 'D06',
'D07', 'D08', 'D09', 'D10', 'E05', 'E06', 'E07', 'E08', 'E09', 'E10',
'F05', 'F06', 'F07', 'F08', 'F09', 'F10', 'G05', 'G06', 'G07', 'G08',
'G09', 'G10'],
dtype='object', name='hostname')
experiment_id Index(['EXP003', 'EXP005', 'EXP006', 'EXP007', 'EXP008', 'EXP009', 'EXP010',
'EXP011', 'EXP012'],
dtype='object', name='experiment_id')
2. Make The Shared cycle_id Axis Tangible¶
This table shows one experiment slice as a per-cycle view. Each row is one rover stop. csi_host_count tells you how many hostnames reported CSI for that cycle. position_valid tells you whether the rover pose is usable for that cycle.
This is the key mental model: rover variables are per-cycle values, while CSI is a vector attached to that same cycle.
cycle_table = csi.experiment_cycle_table(
ds,
EXPERIMENT_ID,
max_rows=MAX_CYCLE_ROWS,
only_cycles_with_csi=True,
)
print(f"Showing the first {cycle_table.sizes['cycle_id']} cycles with CSI for {EXPERIMENT_ID}")
cycle_table
Showing the first 12 cycles with CSI for EXP003
<xarray.Dataset> Size: 648B
Dimensions: (cycle_id: 12)
Coordinates:
* cycle_id (cycle_id) int32 48B 1 2 3 4 5 6 7 8 9 10 11 12
experiment_id <U6 24B 'EXP003'
Data variables:
has_any_csi (cycle_id) float64 96B 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
csi_host_count (cycle_id) float64 96B 42.0 42.0 42.0 ... 42.0 42.0 42.0
position_valid (cycle_id) float64 96B 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
rover_x (cycle_id) float64 96B 1.918 2.541 2.536 ... 2.535 2.535
rover_y (cycle_id) float64 96B 2.865 2.254 2.375 ... 3.333 3.453
rover_z (cycle_id) float64 96B 0.7394 0.7401 ... 0.7434 0.7443
Attributes:
experiment_id: EXP0033. What Changes When You Index Into The Dataset¶
xarray keeps the data readable because you select by coordinate names instead of integer positions. The walkthrough below shows how the remaining dimensions shrink as you move from the full dataset to one experiment, then one cycle, then one hostname.
display(
Markdown(
csi.selection_walkthrough_markdown(
ds,
EXPERIMENT_ID,
SELECTED_CYCLE_ID,
SELECTED_HOSTNAME,
)
)
)
Selection Walkthrough¶
| Selection | Remaining dims | Meaning |
|---|---|---|
ds |
cycle_id=1355, hostname=42, experiment_id=9 | The complete dataset. |
ds.sel(experiment_id="EXP003") |
cycle_id=1355, hostname=42 | One experiment slice. Rover variables are vectors over cycle_id; CSI variables are a cycle_id x hostname matrix. |
ds.sel(experiment_id="EXP003", cycle_id=1) |
hostname=42 | One physical rover stop. Rover variables become scalars, CSI becomes a vector over hostnames. |
...sel(hostname="A05") |
(scalar) | One host in one cycle. CSI variables become scalars. |
Use .sel(...) for named coordinates such as experiment_id, cycle_id, and hostname.
Use .isel(...) only when you intentionally want integer positions instead of coordinate labels.
4. One Physical Measurement = One Rover Position + One CSI Vector¶
A physical RF measurement point is identified by the pair (experiment_id, cycle_id). That pair gives you:
- one rover position
- one vector of CSI values across the active hostnames
The two helper calls below make that explicit.
position = csi.cycle_position(ds, EXPERIMENT_ID, SELECTED_CYCLE_ID)
snapshot = csi.extract_csi_snapshot(ds, EXPERIMENT_ID, SELECTED_CYCLE_ID)
position
{'experiment_id': 'EXP003',
'cycle_id': 1,
'position_available': True,
'rover_x': 1.9183037109375,
'rover_y': 2.865016845703125,
'rover_z': 0.7394267578125,
'csi_host_count': 42}
print(
f"CSI snapshot rows: {snapshot.sizes['hostname']} hostnames for "
f"experiment {EXPERIMENT_ID}, cycle {SELECTED_CYCLE_ID}"
)
snapshot.isel(hostname=slice(0, min(8, snapshot.sizes['hostname'])))
CSI snapshot rows: 42 hostnames for experiment EXP003, cycle 1
<xarray.Dataset> Size: 608B
Dimensions: (hostname: 8)
Coordinates:
* hostname (hostname) <U3 96B 'A05' 'A06' 'A07' ... 'A10' 'B05' 'B06'
Data variables:
csi_real (hostname) float64 64B 0.009814 0.006442 ... 0.0009353
csi_imag (hostname) float64 64B -0.001862 -0.003845 ... 0.003907
csi_amplitude (hostname) float64 64B 0.009989 0.007503 ... 0.004017
csi_power_db (hostname) float64 64B -40.01 -42.5 -43.87 ... -46.01 -47.92
csi_phase_deg (hostname) float64 64B -10.74 -30.83 -40.84 ... -48.13 76.54
antenna_x (hostname) float64 64B nan nan nan nan nan nan nan nan
antenna_y (hostname) float64 64B nan nan nan nan nan nan nan nan
antenna_z (hostname) float64 64B nan nan nan nan nan nan nan nan
Attributes:
experiment_id: EXP003
cycle_id: 1
position_available: True
rover_x: 1.9183037109375
rover_y: 2.865016845703125
rover_z: 0.7394267578125
csi_host_count: 425. Flatten The Sparse Dataset Into A Simple Measurement Table¶
The helper below converts the valid rover rows into a flat measurement_index table. This is often the easiest representation when you want to iterate over physical positions without thinking about the shared cycle_id axis.
measurements = csi.positions_for_experiments(ds, [EXPERIMENT_ID])
print(f"Flattened valid measurement rows: {measurements.sizes['measurement_index']}")
measurements.isel(measurement_index=slice(0, min(10, measurements.sizes['measurement_index'])))
Flattened valid measurement rows: 529
<xarray.Dataset> Size: 720B
Dimensions: (measurement_index: 10)
Coordinates:
* measurement_index (measurement_index) int64 80B 0 1 2 3 4 5 6 7 8 9
Data variables:
experiment_id (measurement_index) <U6 240B 'EXP003' ... 'EXP003'
cycle_id (measurement_index) int64 80B 1 2 3 4 5 6 7 8 9 10
rover_x (measurement_index) float64 80B 1.918 2.541 ... 2.534
rover_y (measurement_index) float64 80B 2.865 2.254 ... 3.213
rover_z (measurement_index) float64 80B 0.7394 0.7401 ... 0.7428
csi_host_count (measurement_index) int64 80B 42 42 42 42 ... 42 42 42 42
Attributes:
experiment_ids: ['EXP003']Next Steps¶
After you are comfortable with the dataset layout, continue with:
plot_csi_positions.ipynbfor a short overview of trajectory and heatmapstutorial_rover_positions.ipynbfor the rover geometry and active antenna layouttutorial_csi_per_position.ipynbfor extracting one CSI vector from one measurement point