py-h5py Overview
HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data.
From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called "groups", and accessed using the tradional POSIX /path/to/resource syntax.
Usage
The following modules are needed to be loaded into the environment:$ module load python $ module load python_h5pyUsing h5py with python is very straightforward. Import the h5py and numpy python modules. Because of the similarities between h5py and numpy's datasets, both modules work very well together.
import h5py import numpy as np f= h5py.File("testfile.hdf5","w") dset = f.create_dataset("dset1", (100,), dtype='i') dset[...] = np.arange(100)The h5py script is run the same as any other python script. To make sure it worked, you can load the hdf5 module and use h5dump to display the contents of the newly created hdf5 data file.
$ module load cray-hdf5 $ h5dump testfile.hdf5 HDF5 "testfile.hdf5" { GROUP "/" { DATASET "dset1" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 100 ) / ( 100 ) } DATA { (0): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, (19): 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, (35): 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, (51): 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, (67): 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, (83): 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, (99): 99 } } } }For a more in depth tutorial, see http://docs.h5py.org/en/latest/quick.html
SUMMIT
- py-h5py@2.6.0%gcc@4.8.5+mpi
- py-h5py@2.6.0%gcc@4.8.5~mpi ^hdf5~mpi
- py-h5py@2.6.0%gcc@4.8.5~mpi ^python@3.5.2 ^hdf5~mpi
RHEA
- 2.5.0
- 2.2.0
TITAN
- 2.4.0
- 2.5.0
- 2.6.0
- 2.6.0_parallel