The H5 Data Store
Introduction
ArC TWO Control Panel uses an HDF5-based file format to store all data. It’s
a scalable, strongly-typed format with filesystem-like hierarchy, suitable for
large datasets. The H5DataStore
API defines a file format on top of HDF5
with some provisions towards crossbar-oriented experiments. For all modules,
either internal or external, as well the corresponding background operations
the active datastore is exposed via the datastore
property so there should
not be a reason to create a new datastore on an active ArC2Control session.
The file format
Data in HDF5 is organised in groups and datasets in a filesystem-like hierarchy. The ArC TWO Control Panel API defines a few specific groups and datasets that are guaranteed to be always available. Every item in an HDF5 file can contain additional metadata, or attributes in HDF5 lingo. Some attributes are always defined but arbitrary attributes can be attached to a dataset. This can be experiment-specific data or just additional metadata for bookkeeping purposes.
A dataset or group in HDF5 can be identified by directory-like structure
such as /data/timeseries/alpha
. For example from this path we can
understand that dataset alpha
is member of group timeseries
which itself is member of group data
which is a toplevel group. All
toplevel groups are implicitly members of the root node which is not
typically named. ArC2Control defines the following toplevel groups with
the their respective attributes.
Group |
Attribute |
Type |
Required? |
---|---|---|---|
root (hidden) |
H5DS_VERSION_MAJOR |
int64 |
Y |
H5DS_VERSION_MINOR |
int64 |
Y |
|
PYTABLES_FORMAT_VERSION |
str128 |
N |
|
/synthetics |
No attributes defined |
N/A |
N/A |
/crosspoints |
No attributes defined |
N/A |
N/A |
/crossbar |
words |
int64 |
Y |
bits |
int64 |
Y |
The /synthetics group holds experiments that can span more than one
devices; the /crosspoints group holds experiments groups of crosspoint
experiments in the format of W00B00
. A crosspoint group can hold
either a group with experiment datasets or just a single dataset. Group
/crossbar contains a current and voltage view of the the entire crossbar
array. Since the crossbar size is configurable the words
and bits
attributes must be defined.
Below is an example structure of a hypothetical data file. G denotes a group and D a dataset.
[G] / # root node
│
├── [G] synthetics # tests with more than one crosspoint, always present
│ │
│ ├─ [D] test00 # data, shape depending on experiment
│ ├─ [D] test01 # data, shape depending on experiment
| └─ [G] test02 # experiment with more than one tables
│ │
│ └─ [D] test02a # experiment data
│
├── [G] crosspoints # data tied to a single device
│ │
│ └── [G] W00B00 # crosspoint
│ │
│ ├─ [D] timeseries # history of device biasing
│ │ # current, voltage, pulse_width, read_voltage, type
│ │ # 5 columns, expandable length, always present
│ │
│ └─ [G] experiments
│ │
│ ├─ [D] test00 # data, shape depending on experiment
│ ├─ [D] test01 # data, shape depending on experiment
│ └─ [G] test02 # experiment with more than one tables
│ │
│ └─ [D] test02a # experiment data
│
│
│
└── [G] crossbar # crossbar raster view, always present
│ # this only holds the last crossbar status
│ # individual device history is covered by
│ # crosspoints/WXXBYY/timeseries
│
├─ [D] voltage # shape = (bits × words), always present
└─ [D] current # shape = (bits × words), always present
The size and data type of each individual dataset is completely up to the
developer to decide. ArC2Control does not assume anything for the type of
contained data as long as their position in the file conforms to the above
specification. You should not need to create the structure manually as there
are functions that take care of the naming and structure of datasets. Below
is an example of interacting with an H5DataStore
. Datasets have strict
datatype requirements and as such the datatype must be known at creation
time. The datatype is specified as a numpy structured array dtype.
from arc2control.h5utils import H5DataStore # < This is done by
import numpy as np # < automatically by
# < ArC2Control
datastore = H5DataStore('fname.h5', shape=(32, 32)) # <
# Add a reading to a specific crosspoint, structure will be
# created automatically
datastore.update_status(5, 7, 1.0e-6, 0.5, 100e-6, 0.5, OpType.PULSEREAD)
# let's create some dummy data, the columns must be equally sized
dsetlen = 1000
current = np.random.normal(size=(dsetlen,))
voltage = np.random.normal(size=(dsetlen,))
# H5DataStore used numpy dtypes to describe datasets
dtype = [('voltage', '<f4'), ('current', '<f4')]
# Create a new dataset for an experiment with identifier 'RET'
dset = datastore.make_wb_table(5, 7, 'RET', (dsetlen, ), dtype)
# broadcast the data
dset[:, 'voltage'] = voltage
dset[:, 'current'] = current
# data is now saved in the dataset
A note on expandable datasets
Datasets can be created in the backing store as appendable datasets. This is
not a list in the python sense but lots of chunked tables tied together
efficiently (hopefully). All expandable datasets created with this class,
including the built-in timeseries, MUST have an NROWS
attribute that
signified the next available index. This is done automatically for methods
make_wb_table()
and
make_synthetic_table()
as well as the
datasets returned by the dataset()
method.
API Reference
- exception arc2control.h5utils.H5AccessError
Thrown when trying to write to a file opened read-only.
- class arc2control.h5utils.H5DataStore(fname, name=None, mode=H5Mode.APPEND, shape=(32, 32))
This is the toplevel class that interacts with an HDF5 datastore suitable for storing arc2control data.
A name can be provided but will default to basename(fname) if none is provided. When creating a new file with
H5Mode.WRITE
the crossbar dimensions must be specified and they default to 32×32. In append and read modes the size is picked up from the metadata of the file itself.An
H5DataStore
can also be used as a context manager for brief interactions with data files>>> from h5utils import H5DataStore >>> with H5DataStore('/path/to/store', 'dataset') as ds: >>> ds.update_status(0, 0, 10e-6, 1.0, 100e-6, 0.2) >>> # file is saved here
- close()
Close the file. It needs to be reopened again for any other interaction.
- property conductance
Conductance view of the crossbar raster
- property current
Current view of the crossbar raster
- dataset(name)
Return the HDF5 dataset specified by
name
- property fname
The filename associated with this data store
- keys()
Top-level keys of this dataset
- Returns:
Top-level keys for this dataset (excluding the root node)
- make_sequence_group(name, datasets=[], tstamp=True)
Create a new sequence pseudo-group used to organise many existing experiments into a logical sequence. Argument
datasets
is a list of existing dataset. HDF5 soft links to the dataset will be appended to the members of this group and the additional attributessequence
(the name of the sequence they are member to) andseqno
(position in the sequence) will be added.- Parameters:
name (str) – Identifier for this sequence
datasets – List of strings to existing datasets
tstamp (bool) – Whether the current timestamp should be appended to the dataset name
- Returns:
The HDF5 group corresponding to the newly created sequence
- make_synthetic_group(crosspoints, name, tstamp=True)
Create a new synthetic experiment group. This can be used to group multiple data tables under a single experimental node. This will return the underlying HDF5 group. Unless
tstamp
is set toFalse
the current timestamp with ns precision will be added to the group name.- Parameters:
crosspoints – An array of (wordline, bitline) tuples with all the crosspoints involved
name (str) – The identifier of this group
tstamp (bool) – Whether the current timestamp should be appended to the group name
- Returns:
A reference to the newly created HDF5 group
- make_synthetic_table(crosspoints, name, shape, dtype, grp=None, maxshape=None, tstamp=True)
Create a new experiment table encompassing many crosspoints. Arguments
shape
and dtype follow numpy conventions. This will return the underlying HDF5 dataset. Ifmaxshape
isNone
the dataset will always be chunked but will allow appends (default). Unlesststamp
is set toFalse
the current timestamp with ns precision will be added to the dataset names. Ifgrp
is specified then the table will be created as a child of the specified experiment group. Group name can be either relative (no leading ‘/’) or absolute. In the latter case the parent path must match the corrent word/bit coordinate otherwise an exception will be raised. Group can either be an instance ofh5py.Group
orstr
.- Parameters:
crosspoints – An array of (wordline, bitline) tuples with all the crosspoints involved
name (str) – The identifier of this dataset
shape – A numpy shape for this dataset
dtype – The numpy dtype of this dataset
grp – Path of the group this table belongs to or
None
if it’s a singular dataset. This can also be an instance ofh5py.Group
.maxshape – A maximum numpy shape for this dataset; if
None
an expandable chunked dataset will be created insteadtstamp (bool) – Whether the current timestamp should be appended to the dataset name
- Returns:
A newly created HDF5 dataset
- make_wb_group(word, bit, name, tstamp=True)
Create a new experiment group tied to a specific crosspoint. This can be used to group multiple data tables under a single experimental node. This will return the underlying HDF group. Unless
tstamp
is set toFalse
the current timestamp with ns precision will be added to the group name.- Parameters:
word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
name (str) – The identifier of this group
tstamp (bool) – Whether the current timestamp should be appended to the group name
- Returns:
A reference to the newly created HDF5 group
- make_wb_table(word, bit, name, shape, dtype, grp=None, maxshape=None, tstamp=True)
Create a new experiment table tied to a specific crosspoint. Arguments
shape
anddtype
follow numpy conventions. This will return the underlying HDF dataset. Ifmaxshape
isNone
the dataset will always be chunked but will allow appends (default). Unlesststamp
is set toFalse
the current timestamp with ns precision will be added to the dataset names. Ifgrp
is specified then the table will be created as a child of the specified experiment group. Group name can be either relative (no leading ‘/’) or absolute. In the latter case the parent path must match the corrent word/bit coordinate otherwise an exception will be raised. Group can either be an instance ofh5py.Group
orstr
.- Parameters:
word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
name (str) – The identifier of this dataset
shape – A numpy shape for this dataset
dtype – The numpy dtype of this dataset
grp – Path of the group this table belongs to or
None
if it’s a singular dataset. This can also be an instance ofh5py.Group
.maxshape – A maximum numpy shape for this dataset; if
None
an expandable chunked dataset will be created insteadtstamp (bool) – Whether the current timestamp should be appended to the dataset name
- Returns:
A newly created HDF5 dataset
- property name
The name associated with this data store
- property resistance
Resistance view of the crossbar raster
- property shape
Size of the crossbar stored in this data store
- timeseries(word, bit)
Complete biasing history of specified crosspoint
- Parameters:
word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
- Returns:
A structured numpy array containing the biasing history
- update_status(word, bit, current, voltage, pulse, read_voltage, optype=OpType.READ)
Add a new biasing history entry for the specified crosspoint.
- Parameters:
word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
current (float) – The measured current of the crosspoint
voltage (float) – The voltage applied to this crosspoint
pulse (float) – The pulsewidth, if any, of the applied pulse
read_voltage (float) – The voltage used to read the device
optype – An instance of
OpType
indicating the type of the operation associated with this entry
- update_status_bulk(word, bit, currents, voltages, pulses, read_voltages, optypes)
Similar to
update_status()
but with bulk insertion of values. All parameters must be equally sized numpy arrays. Argumentsread_voltages
andoptypes
can be scalar and their values will be brodcasted over the relevant rows- Parameters:
word (int) – The wordline of the crosspoint
bit (int) – The bitline of the crosspoint
currents – An ndarray containing a series of measured currents
voltages – An ndarray containing a series of applied voltages
pulses – An ndarray containing a series of applied pulse widths
read_voltages – An ndarray or single float value that corresponds to the voltage used to read back the crosspoint
optype – An array or single instance of
arc2control.h5utils.OpType
indicating the type of the operations applied to the crosspoint.
- property voltage
Voltage view of the crossbar raster
- exception arc2control.h5utils.H5DimsError
Thrown when trying to save data to a dataset with incompatible size.
- exception arc2control.h5utils.H5FormatError
The HDF5 file is not compatible with the current file format..
- class arc2control.h5utils.H5Mode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
HDF5 access mode when opening or creating files.
- class arc2control.h5utils.OpType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Operation type. This is essentially 2-bit bitmask.
- PULSE = 2
Bit 1 raised means a pulse operation.
- PULSEREAD = 3
Both bits are raised
- READ = 1
Bit 0 raised means a read operation.