With your dtype I can create an array:
In [37]: np.array([_],dtype=SPECIAL_TYPE)
Out[37]:
array([ (array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=uint8), 1, 'a', 1, 1, list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]), 1)],
dtype=[('arr', 'O'), ('int1', 'u1'), ('str', 'O'), ('int2', 'u1'), ('int3', 'u1'), ('list', 'O'), ('int4', 'u1')])
But trying to create dataset with it, even 1d, dumps me out of the interpreter:
In [38]: f=h5py.File('vlentest.h5','w')
In [39]: db = f.create_dataset('db',(10,), dtype=SPECIAL_TYPE)
In [40]: db[:]
Segmentation fault (core dumped)
There two issues - does vlen work in a 2d array, and does it work in a compound dtype? You are pushing the bounds with multiple vlen in a dtype in a 2d array.
Have you seen documentation or examples using vlen in a compound dtype?
Notice how h5py implements the vlen in numpy - it defines those fields a 'O' object dtype. That stores a pointer in the array, not the variable length object itself. Normally object dtype arrays cannot be saved with h5py. But these fields must has some added annotation that h5py uses to translate the pointer into the kind of structure that HDF5 accepts.
Storing string datasets in hdf5 with unicode explores how a vlen str is stored.
Storing multidimensional variable length array with h5py
Experimenting, stating with something small
In [14]: f = h5py.File('temp.h5')
In [15]: db1 = f.create_dataset('db1',(5,), dtype=dt1)
In [16]: db2 = f.create_dataset('db2',(5,), dtype=dt2)
In [17]: db1[:]
Out[17]:
array([('',), ('',), ('',), ('',), ('',)],
dtype=[('str', 'O')])
In [18]: db2[:]
Out[18]:
array([('', 0), ('', 0), ('', 0), ('', 0), ('', 0)],
dtype=[('str', 'O'), ('int4', '<i4')])
Setting some db1 values:
In [24]: db1[0]=('a',)
In [25]: db1[1]=('ab',)
In [26]: db1[:]
Out[26]:
array([('a',), ('ab',), ('',), ('',), ('',)],
dtype=[('str', 'O')])
db2 works the same way:
In [30]: db2[0]=('abc',10)
In [31]: db2[1]=('abcde',6)
In [32]: db2[:]
Out[32]:
array([('abc', 10), ('abcde', 6), ('', 0), ('', 0), ('', 0)],
dtype=[('str', 'O'), ('int4', '<i4')])
2 vlen strings also work:
In [34]: dt3 = np.dtype([("str1", h5py.special_dtype(vlen=str)),("str2", h5py.special_dtype(vlen=str))])
In [35]: db3 = f.create_dataset('db3',(3,), dtype=dt3)
In [36]: db3[:]
Out[36]:
array([('', ''), ('', ''), ('', '')],
dtype=[('str1', 'O'), ('str2', 'O')])
In [37]: db3[0] = ('abc','defg')
In [38]: db3[1] = ('abcd','de')
In [39]: db3[:]
Out[39]:
array([('abc', 'defg'), ('abcd', 'de'), ('', '')],
dtype=[('str1', 'O'), ('str2', 'O')])
and with an array vlen
In [41]: dt4 = np.dtype([("str1", h5py.special_dtype(vlen=str)),("list", h5py.special_dtype(vlen=np.int))])
In [42]: dt4
Out[42]: dtype([('str1', 'O'), ('list', 'O')])
In [43]: db4 = f.create_dataset('db4',(3,), dtype=dt4)
In [47]: db4[0]=('abcdef',np.arange(5))
In [48]: db4[1]=('abc',np.arange(3))
In [49]: db4[:]
Out[49]:
array([('abcdef', array([0, 1, 2, 3, 4])), ('abc', array([0, 1, 2])),
('', array([], dtype=int32))],
dtype=[('str1', 'O'), ('list', 'O')])
but I can't use a list
In [50]: db4[2]=('abc',[1,2,3,4])
--------------------------------------------------------------------------
AttributeError: 'list' object has no attribute 'dtype'
h5py saves arrays, not lists. Apparently that applies to these nested values as well. http://docs.h5py.org/en/latest/special.html has examples of setting a vlen with a list, but it has first converted to an array.
If I try to save a 2d array, it only writes a 1d
In [59]: db4[2]=('abc',np.ones((2,2),int))
In [60]: db4[:]
Out[60]:
array([('abcdef', array([0, 1, 2, 3, 4])), ('abc', array([0, 1, 2])),
('abc', array([1, 1]))],
dtype=[('str1', 'O'), ('list', 'O')])
This dtype works:
In [21]: dt1 = np.dtype([("str1", h5py.special_dtype(vlen=str)),('f1',int),("list", h5py.special_dtype(vlen=np.int))])
This does the core dump
In [30]: dt1 = np.dtype([("f0", h5py.special_dtype(vlen=np.uint8)),('f1',int),("f2", h5py.special_dtype(vlen=np.int))])
Is this a vlen uint8 problem, or a problem with a vlen be first?