The H5 Data Store

Introduction

ArC TWO Control Panel uses an HDF5-based file format to store all data. It’s a scalable, strongly-typed format with filesystem-like hierarchy, suitable for large datasets. The H5DataStore API defines a file format on top of HDF5 with some provisions towards crossbar-oriented experiments. For all modules, either internal or external, as well the corresponding background operations the active datastore is exposed via the datastore property so there should not be a reason to create a new datastore on an active ArC2Control session.

The file format

Data in HDF5 is organised in groups and datasets in a filesystem-like hierarchy. The ArC TWO Control Panel API defines a few specific groups and datasets that are guaranteed to be always available. Every item in an HDF5 file can contain additional metadata, or attributes in HDF5 lingo. Some attributes are always defined but arbitrary attributes can be attached to a dataset. This can be experiment-specific data or just additional metadata for bookkeeping purposes.

A dataset or group in HDF5 can be identified by directory-like structure such as /data/timeseries/alpha. For example from this path we can understand that dataset alpha is member of group timeseries which itself is member of group data which is a toplevel group. All toplevel groups are implicitly members of the root node which is not typically named. ArC2Control defines the following toplevel groups with the their respective attributes.

Toplevel HDF5 groups and attributes
Group	Attribute	Type	Required?
root (hidden)	H5DS_VERSION_MAJOR	int64	Y
	H5DS_VERSION_MINOR	int64	Y
	PYTABLES_FORMAT_VERSION	str128	N
/synthetics	No attributes defined	N/A	N/A
/crosspoints	No attributes defined	N/A	N/A
/crossbar	words	int64	Y
	bits	int64	Y

The /synthetics group holds experiments that can span more than one devices; the /crosspoints group holds experiments groups of crosspoint experiments in the format of W00B00. A crosspoint group can hold either a group with experiment datasets or just a single dataset. Group /crossbar contains a current and voltage view of the the entire crossbar array. Since the crossbar size is configurable the words and bits attributes must be defined.

Below is an example structure of a hypothetical data file. G denotes a group and D a dataset.

[G] / # root node
 │
 ├── [G] synthetics # tests with more than one crosspoint, always present
 │    │
 │    ├─ [D] test00 # data, shape depending on experiment
 │    ├─ [D] test01 # data, shape depending on experiment
 |    └─ [G] test02 # experiment with more than one tables
 │        │
 │        └─ [D] test02a # experiment data
 │
 ├── [G] crosspoints # data tied to a single device
 │    │
 │    └── [G] W00B00 # crosspoint
 │         │
 │         ├─ [D] timeseries # history of device biasing
 │         │                 # current, voltage, pulse_width, read_voltage, type
 │         │                 # 5 columns, expandable length, always present
 │         │
 │         └─ [G] experiments
 │             │
 │             ├─ [D] test00 # data, shape depending on experiment
 │             ├─ [D] test01 # data, shape depending on experiment
 │             └─ [G] test02 # experiment with more than one tables
 │                 │
 │                 └─ [D] test02a # experiment data
 │
 │
 │
 └── [G] crossbar # crossbar raster view, always present
      │           # this only holds the last crossbar status
      │           # individual device history is covered by
      │           # crosspoints/WXXBYY/timeseries
      │
      ├─ [D] voltage # shape = (bits × words), always present
      └─ [D] current # shape = (bits × words), always present

The size and data type of each individual dataset is completely up to the developer to decide. ArC2Control does not assume anything for the type of contained data as long as their position in the file conforms to the above specification. You should not need to create the structure manually as there are functions that take care of the naming and structure of datasets. Below is an example of interacting with an H5DataStore. Datasets have strict datatype requirements and as such the datatype must be known at creation time. The datatype is specified as a numpy structured array dtype.

from arc2control.h5utils import H5DataStore         # < This is done by
import numpy as np                                  # < automatically by
                                                    # < ArC2Control
datastore = H5DataStore('fname.h5', shape=(32, 32)) # <

# Add a reading to a specific crosspoint, structure will be
# created automatically
datastore.update_status(5, 7, 1.0e-6, 0.5, 100e-6, 0.5, OpType.PULSEREAD)

# let's create some dummy data, the columns must be equally sized
dsetlen = 1000
current = np.random.normal(size=(dsetlen,))
voltage = np.random.normal(size=(dsetlen,))

# H5DataStore used numpy dtypes to describe datasets
dtype = [('voltage', '<f4'), ('current', '<f4')]
# Create a new dataset for an experiment with identifier 'RET'
dset = datastore.make_wb_table(5, 7, 'RET', (dsetlen, ), dtype)

# broadcast the data
dset[:, 'voltage'] = voltage
dset[:, 'current'] = current

# data is now saved in the dataset

A note on expandable datasets

Datasets can be created in the backing store as appendable datasets. This is not a list in the python sense but lots of chunked tables tied together efficiently (hopefully). All expandable datasets created with this class, including the built-in timeseries, MUST have an NROWS attribute that signified the next available index. This is done automatically for methods make_wb_table() and make_synthetic_table() as well as the datasets returned by the dataset() method.

API Reference

exception arc2control.h5utils.H5AccessError: Thrown when trying to write to a file opened read-only.

class arc2control.h5utils.H5DataStore(fname, name=None, mode=H5Mode.APPEND, shape=(32, 32))

This is the toplevel class that interacts with an HDF5 datastore suitable for storing arc2control data.

A name can be provided but will default to basename(fname) if none is provided. When creating a new file with H5Mode.WRITE the crossbar dimensions must be specified and they default to 32×32. In append and read modes the size is picked up from the metadata of the file itself.

An H5DataStore can also be used as a context manager for brief interactions with data files

>>> from h5utils import H5DataStore
>>> with H5DataStore('/path/to/store', 'dataset') as ds:
>>>     ds.update_status(0, 0, 10e-6, 1.0, 100e-6, 0.2)
>>> # file is saved here

close(): Close the file. It needs to be reopened again for any other interaction.

property conductance: Conductance view of the crossbar raster

property current: Current view of the crossbar raster

dataset(name): Return the HDF5 dataset specified by name

property fname: The filename associated with this data store

keys()

Top-level keys of this dataset

Returns:: Top-level keys for this dataset (excluding the root node)

make_sequence_group(name, datasets=[], tstamp=True)

Create a new sequence pseudo-group used to organise many existing experiments into a logical sequence. Argument datasets is a list of existing dataset. HDF5 soft links to the dataset will be appended to the members of this group and the additional attributes sequence (the name of the sequence they are member to) and seqno (position in the sequence) will be added.

Parameters:

name (str) – Identifier for this sequence
datasets – List of strings to existing datasets
tstamp (bool) – Whether the current timestamp should be appended to the dataset name

Returns:

The HDF5 group corresponding to the newly created sequence

make_synthetic_group(crosspoints, name, tstamp=True)

Create a new synthetic experiment group. This can be used to group multiple data tables under a single experimental node. This will return the underlying HDF5 group. Unless tstamp is set to False the current timestamp with ns precision will be added to the group name.

Parameters:

crosspoints – An array of (wordline, bitline) tuples with all the crosspoints involved
name (str) – The identifier of this group
tstamp (bool) – Whether the current timestamp should be appended to the group name

Returns:

A reference to the newly created HDF5 group

make_synthetic_table(crosspoints, name, shape, dtype, grp=None, maxshape=None, tstamp=True)

Create a new experiment table encompassing many crosspoints. Arguments shape and dtype follow numpy conventions. This will return the underlying HDF5 dataset. If maxshape is None the dataset will always be chunked but will allow appends (default). Unless tstamp is set to False the current timestamp with ns precision will be added to the dataset names. If grp is specified then the table will be created as a child of the specified experiment group. Group name can be either relative (no leading ‘/’) or absolute. In the latter case the parent path must match the corrent word/bit coordinate otherwise an exception will be raised. Group can either be an instance of h5py.Group or str.

Parameters:

crosspoints – An array of (wordline, bitline) tuples with all the crosspoints involved
name (str) – The identifier of this dataset
shape – A numpy shape for this dataset
dtype – The numpy dtype of this dataset
grp – Path of the group this table belongs to or None if it’s a singular dataset. This can also be an instance of h5py.Group.
maxshape – A maximum numpy shape for this dataset; if None an expandable chunked dataset will be created instead
tstamp (bool) – Whether the current timestamp should be appended to the dataset name

Returns:

A newly created HDF5 dataset

make_wb_group(word, bit, name, tstamp=True)

Create a new experiment group tied to a specific crosspoint. This can be used to group multiple data tables under a single experimental node. This will return the underlying HDF group. Unless tstamp is set to False the current timestamp with ns precision will be added to the group name.

Parameters:

word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
name (str) – The identifier of this group
tstamp (bool) – Whether the current timestamp should be appended to the group name

Returns:

A reference to the newly created HDF5 group

make_wb_table(word, bit, name, shape, dtype, grp=None, maxshape=None, tstamp=True)

Create a new experiment table tied to a specific crosspoint. Arguments shape and dtype follow numpy conventions. This will return the underlying HDF dataset. If maxshape is None the dataset will always be chunked but will allow appends (default). Unless tstamp is set to False the current timestamp with ns precision will be added to the dataset names. If grp is specified then the table will be created as a child of the specified experiment group. Group name can be either relative (no leading ‘/’) or absolute. In the latter case the parent path must match the corrent word/bit coordinate otherwise an exception will be raised. Group can either be an instance of h5py.Group or str.

Parameters:

word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
name (str) – The identifier of this dataset
shape – A numpy shape for this dataset
dtype – The numpy dtype of this dataset
grp – Path of the group this table belongs to or None if it’s a singular dataset. This can also be an instance of h5py.Group.
maxshape – A maximum numpy shape for this dataset; if None an expandable chunked dataset will be created instead
tstamp (bool) – Whether the current timestamp should be appended to the dataset name

Returns:

A newly created HDF5 dataset

property name: The name associated with this data store

property resistance: Resistance view of the crossbar raster

property shape: Size of the crossbar stored in this data store

timeseries(word, bit)

Complete biasing history of specified crosspoint

Parameters:

word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint

Returns:

A structured numpy array containing the biasing history

update_status(word, bit, current, voltage, pulse, read_voltage, optype=OpType.READ)

Add a new biasing history entry for the specified crosspoint.

Parameters:

word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
current (float) – The measured current of the crosspoint
voltage (float) – The voltage applied to this crosspoint
pulse (float) – The pulsewidth, if any, of the applied pulse
read_voltage (float) – The voltage used to read the device
optype – An instance of OpType indicating the type of the operation associated with this entry

update_status_bulk(word, bit, currents, voltages, pulses, read_voltages, optypes)

Similar to update_status() but with bulk insertion of values. All parameters must be equally sized numpy arrays. Arguments read_voltages and optypes can be scalar and their values will be brodcasted over the relevant rows

Parameters:

word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
currents – An ndarray containing a series of measured currents
voltages – An ndarray containing a series of applied voltages
pulses – An ndarray containing a series of applied pulse widths
read_voltages – An ndarray or single float value that corresponds to the voltage used to read back the crosspoint
optype – An array or single instance of arc2control.h5utils.OpType indicating the type of the operations applied to the crosspoint.

property voltage: Voltage view of the crossbar raster

exception arc2control.h5utils.H5DimsError: Thrown when trying to save data to a dataset with incompatible size.

exception arc2control.h5utils.H5FormatError: The HDF5 file is not compatible with the current file format..

class arc2control.h5utils.H5Mode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None): HDF5 access mode when opening or creating files.

class arc2control.h5utils.OpType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Operation type. This is essentially 2-bit bitmask.

PULSE = 2: Bit 1 raised means a pulse operation.

PULSEREAD = 3: Both bits are raised

READ = 1: Bit 0 raised means a read operation.