Tutorial: Understand the RF Xarray Structure¶

Use this notebook when you want to answer:

  • what are the dimensions of the processed RF dataset?
  • what are the named coordinates and xarray indexes?
  • why is cycle_id a shared sparse axis across experiments?
  • what changes when you select one experiment, one cycle, or one hostname?

The goal is to make the NetCDF feel like a physical measurement table rather than an abstract tensor.

import sys
!{sys.executable} -m pip install matplotlib numpy requests xarray pyyaml
Requirement already satisfied: matplotlib in c:\users\calle\radioconda\lib\site-packages (3.9.2)
Requirement already satisfied: numpy in c:\users\calle\radioconda\lib\site-packages (2.1.2)
Requirement already satisfied: requests in c:\users\calle\radioconda\lib\site-packages (2.31.0)
Requirement already satisfied: xarray in c:\users\calle\radioconda\lib\site-packages (2025.6.1)
Requirement already satisfied: pyyaml in c:\users\calle\radioconda\lib\site-packages (6.0.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\calle\radioconda\lib\site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\calle\radioconda\lib\site-packages (from requests) (3.3.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\calle\radioconda\lib\site-packages (from requests) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\calle\radioconda\lib\site-packages (from requests) (2.0.6)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\calle\radioconda\lib\site-packages (from requests) (2024.7.4)
Requirement already satisfied: pandas>=2.1 in c:\users\calle\radioconda\lib\site-packages (from xarray) (2.2.3)
Requirement already satisfied: pytz>=2020.1 in c:\users\calle\radioconda\lib\site-packages (from pandas>=2.1->xarray) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\calle\radioconda\lib\site-packages (from pandas>=2.1->xarray) (2024.2)
Requirement already satisfied: six>=1.5 in c:\users\calle\radioconda\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
[notice] A new release of pip is available: 24.3.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
from pathlib import Path
import importlib.util
import sys

from IPython.display import Markdown, display

NOTEBOOK_DIR = Path.cwd().resolve()
for candidate_dir in (
    NOTEBOOK_DIR,
    NOTEBOOK_DIR / "tutorials",
    NOTEBOOK_DIR / "processing" / "tutorials",
):
    if (candidate_dir / "csi_plot_utils.py").exists():
        NOTEBOOK_DIR = candidate_dir.resolve()
        break
else:
    raise ImportError(f"Could not locate csi_plot_utils.py from {Path.cwd().resolve()}")

UTILS_PATH = NOTEBOOK_DIR / "csi_plot_utils.py"
PROCESSING_DIR = NOTEBOOK_DIR.parent
PROJECT_ROOT = PROCESSING_DIR.parent
spec = importlib.util.spec_from_file_location("csi_plot_utils", UTILS_PATH)
if spec is None or spec.loader is None:
    raise ImportError(f"Could not load csi_plot_utils from {UTILS_PATH}")
csi = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = csi
spec.loader.exec_module(csi)
EXPERIMENT_ID = "EXP003"
DATASET_PATH = None  # Set this to a specific .nc file when you do not want the newest match.
MAX_COORD_PREVIEW = 6
MAX_CYCLE_ROWS = 12
ds, dataset_path = csi.open_dataset(experiment_id=EXPERIMENT_ID, dataset_path=DATASET_PATH)
available_cycles = csi.available_cycle_ids(ds, EXPERIMENT_ID)
if available_cycles.size == 0:
    raise ValueError(f"No CSI cycles available for experiment {EXPERIMENT_ID}")

SELECTED_CYCLE_ID = int(available_cycles[0])
experiment = ds.sel(experiment_id=EXPERIMENT_ID)
host_mask = experiment["csi_available"].sel(cycle_id=SELECTED_CYCLE_ID).values > 0
SELECTED_HOSTNAME = str(experiment["hostname"].values[host_mask][0])

print(f"Loaded dataset: {dataset_path}")
print(f"Selected experiment: {EXPERIMENT_ID}")
print(f"Example cycle for the walkthrough: {SELECTED_CYCLE_ID}")
print(f"Example hostname for the walkthrough: {SELECTED_HOSTNAME}")
csi.print_dataset_overview(ds)
Loaded dataset: C:\Users\Calle\OneDrive\Documenten\GitHub\ELLIIIT-dataset-26\results\csi_EXP003__EXP005__EXP006__EXP007__EXP008__EXP009__EXP010__EXP011__EXP012.nc
Selected experiment: EXP003
Example cycle for the walkthrough: 1
Example hostname for the walkthrough: A05
Experiments: ['EXP003', 'EXP005', 'EXP006', 'EXP007', 'EXP008', 'EXP009', 'EXP010', 'EXP011', 'EXP012']
Dataset shape: experiment_id=9, cycle_id=1355, hostname=42
Cycle ID range: 1 .. 1356
Last measurement timestamp: 2026-04-04T07:54:35+02:00 (source file mtime)

1. The Named Axes, Coordinates, and Variables¶

The helper below summarizes the xarray layout directly from the dataset. This is the quickest way to see which names are dimensions, which names are coordinate indexes, and which names are real data variables.

display(Markdown(csi.xarray_structure_markdown(ds, max_coord_preview=MAX_COORD_PREVIEW)))
ds.indexes

Dataset Axes¶

Dimension Size Meaning
cycle_id 1355 Shared orchestrator cycle axis across the dataset. Not every experiment uses every listed cycle.
hostname 42 One RF receiver host or tile.
experiment_id 9 One logical measurement run such as EXP003 or EXP005.

Coordinate Indexes¶

Coordinate Index type Preview
cycle_id Index 1, 2, 3, 4, 5, 6, ... (1355 total)
hostname Index A05, A06, A07, A08, A09, A10, ... (42 total)
experiment_id Index EXP003, EXP005, EXP006, EXP007, EXP008, EXP009, ... (9 total)

Data Variables¶

Variable Dims Shape Meaning
csi_real experiment_id, cycle_id, hostname (9, 1355, 42) Real part of the cable-corrected complex CSI value.
csi_imag experiment_id, cycle_id, hostname (9, 1355, 42) Imaginary part of the cable-corrected complex CSI value.
csi_available experiment_id, cycle_id, hostname (9, 1355, 42) Boolean-like mask that marks whether a host contributed CSI for that experiment/cycle.
rover_x experiment_id, cycle_id (9, 1355) Rover X coordinate in meters for one experiment/cycle pair.
rover_y experiment_id, cycle_id (9, 1355) Rover Y coordinate in meters for one experiment/cycle pair.
rover_z experiment_id, cycle_id (9, 1355) Rover Z coordinate in meters for one experiment/cycle pair.
position_available experiment_id, cycle_id (9, 1355) Boolean-like mask that marks whether the rover position is valid for that cycle.

Think of the dataset as one stack of experiment slices.

  • A full dataset uses (experiment_id, cycle_id, hostname) as its named axes.
  • Selecting one experiment_id removes the outer axis and leaves a cycle_id x hostname slice.
  • Rover variables live on the cycle_id axis only, because one rover pose belongs to one cycle.
  • CSI variables live on cycle_id x hostname, because one cycle can contain many host measurements.
Indexes:
    cycle_id       Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
       ...
       1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356],
      dtype='int32', name='cycle_id', length=1355)
    hostname       Index(['A05', 'A06', 'A07', 'A08', 'A09', 'A10', 'B05', 'B06', 'B07', 'B08',
       'B09', 'B10', 'C05', 'C06', 'C07', 'C08', 'C09', 'C10', 'D05', 'D06',
       'D07', 'D08', 'D09', 'D10', 'E05', 'E06', 'E07', 'E08', 'E09', 'E10',
       'F05', 'F06', 'F07', 'F08', 'F09', 'F10', 'G05', 'G06', 'G07', 'G08',
       'G09', 'G10'],
      dtype='object', name='hostname')
    experiment_id  Index(['EXP003', 'EXP005', 'EXP006', 'EXP007', 'EXP008', 'EXP009', 'EXP010',
       'EXP011', 'EXP012'],
      dtype='object', name='experiment_id')

2. Make The Shared cycle_id Axis Tangible¶

This table shows one experiment slice as a per-cycle view. Each row is one rover stop. csi_host_count tells you how many hostnames reported CSI for that cycle. position_valid tells you whether the rover pose is usable for that cycle.

This is the key mental model: rover variables are per-cycle values, while CSI is a vector attached to that same cycle.

cycle_table = csi.experiment_cycle_table(
    ds,
    EXPERIMENT_ID,
    max_rows=MAX_CYCLE_ROWS,
    only_cycles_with_csi=True,
)
print(f"Showing the first {cycle_table.sizes['cycle_id']} cycles with CSI for {EXPERIMENT_ID}")
cycle_table
Showing the first 12 cycles with CSI for EXP003
<xarray.Dataset> Size: 648B
Dimensions:         (cycle_id: 12)
Coordinates:
  * cycle_id        (cycle_id) int32 48B 1 2 3 4 5 6 7 8 9 10 11 12
    experiment_id   <U6 24B 'EXP003'
Data variables:
    has_any_csi     (cycle_id) float64 96B 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    csi_host_count  (cycle_id) float64 96B 42.0 42.0 42.0 ... 42.0 42.0 42.0
    position_valid  (cycle_id) float64 96B 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    rover_x         (cycle_id) float64 96B 1.918 2.541 2.536 ... 2.535 2.535
    rover_y         (cycle_id) float64 96B 2.865 2.254 2.375 ... 3.333 3.453
    rover_z         (cycle_id) float64 96B 0.7394 0.7401 ... 0.7434 0.7443
Attributes:
    experiment_id:  EXP003
xarray.Dataset
    • cycle_id: 12
    • cycle_id
      (cycle_id)
      int32
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=int32)
    • experiment_id
      ()
      <U6
      'EXP003'
      array('EXP003', dtype='<U6')
    • has_any_csi
      (cycle_id)
      float64
      1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
      array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
    • csi_host_count
      (cycle_id)
      float64
      42.0 42.0 42.0 ... 42.0 42.0 42.0
      array([42., 42., 42., 42., 42., 42., 42., 42., 42., 42., 42., 42.])
    • position_valid
      (cycle_id)
      float64
      1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
      array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
    • rover_x
      (cycle_id)
      float64
      1.918 2.541 2.536 ... 2.535 2.535
      array([1.91830371, 2.54144556, 2.53586108, 2.53549023, 2.53543335,
             2.53498926, 2.53407666, 2.53364722, 2.53377734, 2.53357544,
             2.53495654, 2.53538843])
    • rover_y
      (cycle_id)
      float64
      2.865 2.254 2.375 ... 3.333 3.453
      array([2.86501685, 2.25370142, 2.37524902, 2.49515308, 2.61471655,
             2.73427002, 2.85399072, 2.97374976, 3.09379517, 3.21349072,
             3.33326733, 3.45272559])
    • rover_z
      (cycle_id)
      float64
      0.7394 0.7401 ... 0.7434 0.7443
      array([0.73942676, 0.74005927, 0.74008026, 0.7404585 , 0.74080383,
             0.74087518, 0.74136768, 0.74172736, 0.74216925, 0.7428092 ,
             0.74341095, 0.7442674 ])
    • cycle_id
      PandasIndex
      PandasIndex(Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int32', name='cycle_id'))
  • experiment_id :
    EXP003

3. What Changes When You Index Into The Dataset¶

xarray keeps the data readable because you select by coordinate names instead of integer positions. The walkthrough below shows how the remaining dimensions shrink as you move from the full dataset to one experiment, then one cycle, then one hostname.

display(
    Markdown(
        csi.selection_walkthrough_markdown(
            ds,
            EXPERIMENT_ID,
            SELECTED_CYCLE_ID,
            SELECTED_HOSTNAME,
        )
    )
)

Selection Walkthrough¶

Selection Remaining dims Meaning
ds cycle_id=1355, hostname=42, experiment_id=9 The complete dataset.
ds.sel(experiment_id="EXP003") cycle_id=1355, hostname=42 One experiment slice. Rover variables are vectors over cycle_id; CSI variables are a cycle_id x hostname matrix.
ds.sel(experiment_id="EXP003", cycle_id=1) hostname=42 One physical rover stop. Rover variables become scalars, CSI becomes a vector over hostnames.
...sel(hostname="A05") (scalar) One host in one cycle. CSI variables become scalars.

Use .sel(...) for named coordinates such as experiment_id, cycle_id, and hostname. Use .isel(...) only when you intentionally want integer positions instead of coordinate labels.

4. One Physical Measurement = One Rover Position + One CSI Vector¶

A physical RF measurement point is identified by the pair (experiment_id, cycle_id). That pair gives you:

  • one rover position
  • one vector of CSI values across the active hostnames

The two helper calls below make that explicit.

position = csi.cycle_position(ds, EXPERIMENT_ID, SELECTED_CYCLE_ID)
snapshot = csi.extract_csi_snapshot(ds, EXPERIMENT_ID, SELECTED_CYCLE_ID)

position
{'experiment_id': 'EXP003',
 'cycle_id': 1,
 'position_available': True,
 'rover_x': 1.9183037109375,
 'rover_y': 2.865016845703125,
 'rover_z': 0.7394267578125,
 'csi_host_count': 42}
print(
    f"CSI snapshot rows: {snapshot.sizes['hostname']} hostnames for "
    f"experiment {EXPERIMENT_ID}, cycle {SELECTED_CYCLE_ID}"
)
snapshot.isel(hostname=slice(0, min(8, snapshot.sizes['hostname'])))
CSI snapshot rows: 42 hostnames for experiment EXP003, cycle 1
<xarray.Dataset> Size: 608B
Dimensions:        (hostname: 8)
Coordinates:
  * hostname       (hostname) <U3 96B 'A05' 'A06' 'A07' ... 'A10' 'B05' 'B06'
Data variables:
    csi_real       (hostname) float64 64B 0.009814 0.006442 ... 0.0009353
    csi_imag       (hostname) float64 64B -0.001862 -0.003845 ... 0.003907
    csi_amplitude  (hostname) float64 64B 0.009989 0.007503 ... 0.004017
    csi_power_db   (hostname) float64 64B -40.01 -42.5 -43.87 ... -46.01 -47.92
    csi_phase_deg  (hostname) float64 64B -10.74 -30.83 -40.84 ... -48.13 76.54
    antenna_x      (hostname) float64 64B nan nan nan nan nan nan nan nan
    antenna_y      (hostname) float64 64B nan nan nan nan nan nan nan nan
    antenna_z      (hostname) float64 64B nan nan nan nan nan nan nan nan
Attributes:
    experiment_id:       EXP003
    cycle_id:            1
    position_available:  True
    rover_x:             1.9183037109375
    rover_y:             2.865016845703125
    rover_z:             0.7394267578125
    csi_host_count:      42
xarray.Dataset
    • hostname: 8
    • hostname
      (hostname)
      <U3
      'A05' 'A06' 'A07' ... 'B05' 'B06'
      array(['A05', 'A06', 'A07', 'A08', 'A09', 'A10', 'B05', 'B06'], dtype='<U3')
    • csi_real
      (hostname)
      float64
      0.009814 0.006442 ... 0.0009353
      array([ 0.00981416,  0.00644217,  0.00484674,  0.00036805, -0.00458623,
              0.00327471,  0.00334053,  0.00093531])
    • csi_imag
      (hostname)
      float64
      -0.001862 -0.003845 ... 0.003907
      array([-0.00186194, -0.00384528, -0.00419018, -0.00173135, -0.00357859,
              0.00772883, -0.0037267 ,  0.00390656])
    • csi_amplitude
      (hostname)
      float64
      0.009989 0.007503 ... 0.004017
      array([0.00998922, 0.00750252, 0.00640691, 0.00177004, 0.0058172 ,
             0.00839396, 0.00500474, 0.00401697])
    • csi_power_db
      (hostname)
      float64
      -40.01 -42.5 ... -46.01 -47.92
      array([-40.00936545, -42.49586216, -43.86702719, -55.04036113,
             -44.70571862, -41.52066587, -46.0123658 , -47.92203265])
    • csi_phase_deg
      (hostname)
      float64
      -10.74 -30.83 ... -48.13 76.54
      array([ -10.74248492,  -30.83266961,  -40.84453311,  -77.99867133,
             -142.03540725,   67.03761715,  -48.1276879 ,   76.53561794])
    • antenna_x
      (hostname)
      float64
      nan nan nan nan nan nan nan nan
      array([nan, nan, nan, nan, nan, nan, nan, nan])
    • antenna_y
      (hostname)
      float64
      nan nan nan nan nan nan nan nan
      array([nan, nan, nan, nan, nan, nan, nan, nan])
    • antenna_z
      (hostname)
      float64
      nan nan nan nan nan nan nan nan
      array([nan, nan, nan, nan, nan, nan, nan, nan])
    • hostname
      PandasIndex
      PandasIndex(Index(['A05', 'A06', 'A07', 'A08', 'A09', 'A10', 'B05', 'B06'], dtype='object', name='hostname'))
  • experiment_id :
    EXP003
    cycle_id :
    1
    position_available :
    True
    rover_x :
    1.9183037109375
    rover_y :
    2.865016845703125
    rover_z :
    0.7394267578125
    csi_host_count :
    42

5. Flatten The Sparse Dataset Into A Simple Measurement Table¶

The helper below converts the valid rover rows into a flat measurement_index table. This is often the easiest representation when you want to iterate over physical positions without thinking about the shared cycle_id axis.

measurements = csi.positions_for_experiments(ds, [EXPERIMENT_ID])
print(f"Flattened valid measurement rows: {measurements.sizes['measurement_index']}")
measurements.isel(measurement_index=slice(0, min(10, measurements.sizes['measurement_index'])))
Flattened valid measurement rows: 529
<xarray.Dataset> Size: 720B
Dimensions:            (measurement_index: 10)
Coordinates:
  * measurement_index  (measurement_index) int64 80B 0 1 2 3 4 5 6 7 8 9
Data variables:
    experiment_id      (measurement_index) <U6 240B 'EXP003' ... 'EXP003'
    cycle_id           (measurement_index) int64 80B 1 2 3 4 5 6 7 8 9 10
    rover_x            (measurement_index) float64 80B 1.918 2.541 ... 2.534
    rover_y            (measurement_index) float64 80B 2.865 2.254 ... 3.213
    rover_z            (measurement_index) float64 80B 0.7394 0.7401 ... 0.7428
    csi_host_count     (measurement_index) int64 80B 42 42 42 42 ... 42 42 42 42
Attributes:
    experiment_ids:  ['EXP003']
xarray.Dataset
    • measurement_index: 10
    • measurement_index
      (measurement_index)
      int64
      0 1 2 3 4 5 6 7 8 9
      array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    • experiment_id
      (measurement_index)
      <U6
      'EXP003' 'EXP003' ... 'EXP003'
      array(['EXP003', 'EXP003', 'EXP003', 'EXP003', 'EXP003', 'EXP003',
             'EXP003', 'EXP003', 'EXP003', 'EXP003'], dtype='<U6')
    • cycle_id
      (measurement_index)
      int64
      1 2 3 4 5 6 7 8 9 10
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
    • rover_x
      (measurement_index)
      float64
      1.918 2.541 2.536 ... 2.534 2.534
      array([1.91830371, 2.54144556, 2.53586108, 2.53549023, 2.53543335,
             2.53498926, 2.53407666, 2.53364722, 2.53377734, 2.53357544])
    • rover_y
      (measurement_index)
      float64
      2.865 2.254 2.375 ... 3.094 3.213
      array([2.86501685, 2.25370142, 2.37524902, 2.49515308, 2.61471655,
             2.73427002, 2.85399072, 2.97374976, 3.09379517, 3.21349072])
    • rover_z
      (measurement_index)
      float64
      0.7394 0.7401 ... 0.7422 0.7428
      array([0.73942676, 0.74005927, 0.74008026, 0.7404585 , 0.74080383,
             0.74087518, 0.74136768, 0.74172736, 0.74216925, 0.7428092 ])
    • csi_host_count
      (measurement_index)
      int64
      42 42 42 42 42 42 42 42 42 42
      array([42, 42, 42, 42, 42, 42, 42, 42, 42, 42])
    • measurement_index
      PandasIndex
      PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='measurement_index'))
  • experiment_ids :
    ['EXP003']

Next Steps¶

After you are comfortable with the dataset layout, continue with:

  • plot_csi_positions.ipynb for a short overview of trajectory and heatmaps
  • tutorial_rover_positions.ipynb for the rover geometry and active antenna layout
  • tutorial_csi_per_position.ipynb for extracting one CSI vector from one measurement point