Discussion:
problem loading hdf5 file
E. Joshua Rigler
2004-04-30 17:35:11 UTC
Permalink
I am having trouble loading a supposedly valid hdf5 data file (it loads
OK with the java-based hdfview program). I get the message "error:
load: error while reading hdf5 item ...", and nothing is loaded. Is
there a good way to get a little more info for debugging this problem?
Is there a lower-level interface in Octave for manipulating
sophisticated (or poorly designed) hdf files?

It seems to me the file shouldn't be SO non-standard that Octave can't
load a single variable or structure, but maybe so. If someone wants to
check, and if the server is even up, a test file can be downloaded from:

ftp://g0dps01u.ecs.nasa.gov/SORCE/SOR3SSID.001/2003.11.20/

(there's only two files in the directory, and only one is *.h5)

-EJR



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-05-03 13:19:38 UTC
Permalink
Ok, the problem is simple to state, but a bit harder to fix... Octave
only know two compund datatype. Complex and range, with the forms

Type: struct {
"real" +0 native double
"imag" +8 native double
} 16 bytes

Type: struct {
"base" +0 native double
"limit" +8 native double
"increment" +16 native double
} 24 bytes

Your files have compound datatypes, containing arbitrary values
representing a structure... Octave assumes that such data will be
passed to it as an HDF5 group, rather than a compound type of a
dataset. "h5ls -v <file>" identifies the following compound types
for your file

Type: struct {
"instrumentModeId" +16 32-bit big-endian integer
"julianTimetag" +0 IEEE 64-bit big-endian float
"version" +12 32-bit big-endian integer
"minWavelength" +24 IEEE 64-bit big-endian float
"maxWavelength" +32 IEEE 64-bit big-endian float
"irradiance" +40 IEEE 64-bit big-endian float
"irradianceUncertainty" +48 IEEE 64-bit big-endian float
"quality" +56 IEEE 64-bit big-endian float
} 64 bytes

Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"parameterName" +32 39-byte null-terminated ASCII string
"value" +16 IEEE 64-bit big-endian float
"uncertainty" +24 IEEE 64-bit big-endian float
} 72 bytes



Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"averageJulianTime" +16 IEEE 64-bit big-endian float
"averageJulianTimeStdev" +24 IEEE 64-bit big-endian float
"correctedIrradiance" +32 IEEE 64-bit big-endian float
"correctedIrradianceUncertainty" +40 IEEE 64-bit big-endian float
"correctedIrradianceStdev" +48 IEEE 64-bit big-endian float
"trueEarthIrradiance" +56 IEEE 64-bit big-endian float
"trueEarthIrradianceUncertainty" +64 IEEE 64-bit big-endian float
"trueEarthIrradianceStdev" +72 IEEE 64-bit big-endian float
} 80 bytes
Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"averageJulianTimetag" +8 IEEE 64-bit big-endian float
"averageJulianTimetagStDev" +16 IEEE 64-bit big-endian float
"timeSpanInHours" +24 8-bit integer
"diodeNumber" +28 32-bit big-endian integer
"version" +32 16-bit big-endian integer
"minWavelengthInBandpass" +36 IEEE 32-bit big-endian float
"maxWavelengthInBandpass" +40 IEEE 32-bit big-endian float
"medianIrradiance" +48 IEEE 64-bit big-endian float
"averageIrradiance" +56 IEEE 64-bit big-endian float
"absoluteUncertainty" +64 IEEE 64-bit big-endian float
"measurementPrecision" +72 IEEE 64-bit big-endian float
"calculationPrecision" +80 IEEE 64-bit big-endian float
"degradationModel" +88 8-bit integer
"numberOfPoints" +92 32-bit big-endian integer
} 96 bytes
E. Joshua Rigler
2004-05-03 15:36:20 UTC
Permalink
The following quote is from the website used to download these hdf data
files:

All SORCE standard data product files are written using the NCSA
HDF5 file format. HDF5 is a heirarchical data format which
consists of two object types: groups and datasets. Groups are
analogous to directories or folders (all files have at least a
root or "/" group), whereas datasets may contain arrays, images
or tables.

I suspect they just lifted the description from the NCSA folks, because
as you (David) pointed out, these data are not actually being saved as
ordinary hdf5 "groups". I work fairly close to at least on of the PIs
on this project, so I'll see what the likelihood of getting them to
clean up their data files is, but somehow I doubt they'll be very
interested.

I don't really know how important it is to "fix" Octave to handle poorly
designed data files. It's probably just easier for me to write some
h5dump scripts for now. Thanks again for your feedback.

-EJR
Post by David Bateman
Ok, the problem is simple to state, but a bit harder to fix... Octave
only know two compund datatype. Complex and range, with the forms
Type: struct {
"real" +0 native double
"imag" +8 native double
} 16 bytes
Type: struct {
"base" +0 native double
"limit" +8 native double
"increment" +16 native double
} 24 bytes
Your files have compound datatypes, containing arbitrary values
representing a structure... Octave assumes that such data will be
passed to it as an HDF5 group, rather than a compound type of a
dataset. "h5ls -v <file>" identifies the following compound types
for your file
Type: struct {
"instrumentModeId" +16 32-bit big-endian integer
"julianTimetag" +0 IEEE 64-bit big-endian float
"version" +12 32-bit big-endian integer
"minWavelength" +24 IEEE 64-bit big-endian float
"maxWavelength" +32 IEEE 64-bit big-endian float
"irradiance" +40 IEEE 64-bit big-endian float
"irradianceUncertainty" +48 IEEE 64-bit big-endian float
"quality" +56 IEEE 64-bit big-endian float
} 64 bytes
Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"parameterName" +32 39-byte null-terminated ASCII string
"value" +16 IEEE 64-bit big-endian float
"uncertainty" +24 IEEE 64-bit big-endian float
} 72 bytes
Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"averageJulianTime" +16 IEEE 64-bit big-endian float
"averageJulianTimeStdev" +24 IEEE 64-bit big-endian float
"correctedIrradiance" +32 IEEE 64-bit big-endian float
"correctedIrradianceUncertainty" +40 IEEE 64-bit big-endian float
"correctedIrradianceStdev" +48 IEEE 64-bit big-endian float
"trueEarthIrradiance" +56 IEEE 64-bit big-endian float
"trueEarthIrradianceUncertainty" +64 IEEE 64-bit big-endian float
"trueEarthIrradianceStdev" +72 IEEE 64-bit big-endian float
} 80 bytes
Type: struct {
"julianTimetag" +0 IEEE 64-bit big-endian float
"averageJulianTimetag" +8 IEEE 64-bit big-endian float
"averageJulianTimetagStDev" +16 IEEE 64-bit big-endian float
"timeSpanInHours" +24 8-bit integer
"diodeNumber" +28 32-bit big-endian integer
"version" +32 16-bit big-endian integer
"minWavelengthInBandpass" +36 IEEE 32-bit big-endian float
"maxWavelengthInBandpass" +40 IEEE 32-bit big-endian float
"medianIrradiance" +48 IEEE 64-bit big-endian float
"averageIrradiance" +56 IEEE 64-bit big-endian float
"absoluteUncertainty" +64 IEEE 64-bit big-endian float
"measurementPrecision" +72 IEEE 64-bit big-endian float
"calculationPrecision" +80 IEEE 64-bit big-endian float
"degradationModel" +88 8-bit integer
"numberOfPoints" +92 32-bit big-endian integer
} 96 bytes
John W. Eaton
2004-05-04 04:48:11 UTC
Permalink
On 3-May-2004, E. Joshua Rigler <***@colorado.edu> wrote:

| I don't really know how important it is to "fix" Octave to handle poorly
| designed data files. It's probably just easier for me to write some
| h5dump scripts for now. Thanks again for your feedback.

I don't think Octave's load command should be fixed to handle
arbitrary HDF5 files. Instead, it would probably be better to have a
thin wrapper around the HDF5 library that would allow you to easily
write an Octave script to read whatever HDF5 files you have (still
assuming that you can represent the data in some way inside Octave).

jwe



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-05-04 09:10:10 UTC
Permalink
Post by John W. Eaton
| I don't really know how important it is to "fix" Octave to handle poorly
| designed data files. It's probably just easier for me to write some
| h5dump scripts for now. Thanks again for your feedback.
I don't think Octave's load command should be fixed to handle
arbitrary HDF5 files. Instead, it would probably be better to have a
thin wrapper around the HDF5 library that would allow you to easily
write an Octave script to read whatever HDF5 files you have (still
assuming that you can represent the data in some way inside Octave).
I cringe..... Not saying this is impossible, but you'd basically have to
write a version of ls-hdf5.cc for this wrapper rather than reusing what
is already there.

Looking at the suggestion to use "h5dump", I think a combination of this,
a good xml DTD and xmlread from octave-forge might do the trick. Though
it seems the DTD delivered with h5dump can't handle arbitrary compound
datatypes either, so something new would have to be written....

Cheers
David
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
W.J. Atsma
2004-05-04 17:40:49 UTC
Permalink
Why not export all the data in a single struct if the file is not an
octave-created one, replacing the file-like structure "/" with ".". Then you
can look at the complete structure in octave and reinterpret them as you
like.

Willem
Post by David Bateman
Post by John W. Eaton
| I don't really know how important it is to "fix" Octave to handle
| poorly designed data files. It's probably just easier for me to write
| some h5dump scripts for now. Thanks again for your feedback.
I don't think Octave's load command should be fixed to handle
arbitrary HDF5 files. Instead, it would probably be better to have a
thin wrapper around the HDF5 library that would allow you to easily
write an Octave script to read whatever HDF5 files you have (still
assuming that you can represent the data in some way inside Octave).
I cringe..... Not saying this is impossible, but you'd basically have to
write a version of ls-hdf5.cc for this wrapper rather than reusing what
is already there.
Looking at the suggestion to use "h5dump", I think a combination of this,
a good xml DTD and xmlread from octave-forge might do the trick. Though
it seems the DTD delivered with h5dump can't handle arbitrary compound
datatypes either, so something new would have to be written....
Cheers
David
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-05-05 08:30:33 UTC
Permalink
Post by W.J. Atsma
Why not export all the data in a single struct if the file is not an
octave-created one, replacing the file-like structure "/" with ".". Then you
can look at the complete structure in octave and reinterpret them as you
like.
HDF5 doesn't have the concept of a structure. The implementation of a
structure that octave expects in files to import is as an HDF5 GROUP.
The problem is the example given uses an HDF5 DATASET with a compound
datatype used to contain the structure. This is much harder to interpret
if you don't know the form of the structure a-priori as you have to
construct an HDF5 compound type internally to hold the structure before
you can import the values. It is also more limiting as all members of
the structure must have the same size.

Regards
David
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Kienzle
2004-05-06 04:18:50 UTC
Permalink
Post by David Bateman
Post by W.J. Atsma
Why not export all the data in a single struct if the file is not an
octave-created one, replacing the file-like structure "/" with ".". Then you
can look at the complete structure in octave and reinterpret them as you
like.
HDF5 doesn't have the concept of a structure. The implementation of a
structure that octave expects in files to import is as an HDF5 GROUP.
The problem is the example given uses an HDF5 DATASET with a compound
datatype used to contain the structure. This is much harder to
interpret
if you don't know the form of the structure a-priori as you have to
construct an HDF5 compound type internally to hold the structure before
you can import the values. It is also more limiting as all members of
the structure must have the same size.
The matlab function hdfinfo seems to load in a complete
description of the dataset using a tree of structures, with
structure arrays for the repeating parts. It doesn't read
in the actual data which is good because the datasets can
be very large and the user may only want to read in a part
of one. The function hdfread takes one of the nodes of
the structure tree and reads it in. With the appropriate
options, it can read only part of the dataset. A similar
interface could be set up for writing to an HDF file.

Matlab also provides a direct mapping of the hdf C APIs.
It's my belief that higher level languages should hide
the details whenever they can, so I wouldn't bother with
this except if I needed it for compatibility.

Paul Kienzle
***@users.sf.net

PS, I didn't see a simple way to examine the tree
structure returned by hdfinfo from the matlab
command line, so I wrote my own. Something similar
should work for octave, but with deblank(argn(1,:))
instead of inputname(1).

%SHOWSTRUCT display structure tree
% showstruct(S[,n]) displays the contents of structure S as
% well as any substructures. The first bit of the data is displayed
% for each field. If n is supplied, only print the first n levels.

function showstruct(s,maxlevel,indent,name)
if nargin<2, maxlevel=inf; end
if nargin<3, indent=0; end
if nargin<4, name=inputname(1); end
fields=fieldnames(s);
n = prod(size(s));
if n==1
fprintf(1,'%s--- %s ---\n',blanks(indent),name);
for i=1:length(fields)
fprintf(1,'%s%s: \t', blanks(indent), fields{i});
showfield(s.(fields{i}),maxlevel-1,indent+2,[name,'.',fields{i}]);
end
else
for j=1:n
fprintf(1,'%s--- %s(%d) ---\n',blanks(indent),name,j);
for i=1:length(fields)
fprintf(1,'%s%s: \t', blanks(indent), fields{i});

showfield(s(j).(fields{i}),maxlevel-1,indent+2,[name,'.',fields{i}]);
end
end
end

function showfield(f,maxlevel,indent,name)
fprintf(1,'%s ',class(f));
showdims(size(f));
if isstruct(f)
fprintf(1,'\n');
if maxlevel>0, showstruct(f,maxlevel,indent,name); end
elseif isstr(f)
fprintf(' ''%.30s''\n',f(1,:));
elseif isreal(f)
if isempty(f)
t='';
elseif prod(size(f))<8
t=sprintf('%g, ',double(f(1:end-1)));
t=[t,sprintf('%g',double(f(end)))];
else
t=sprintf('%g, ',double(f(1:4)));
t=[t,'...'];
end
fprintf(1,' [%s]\n',t);
else
fprintf(1,'\n');
end

function showdims(d)
fprintf(1,'%d',d(1));
fprintf(1,'x%d',d(2:end));



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Loading...