HDF maximum chunk size

We are using fesapi v1.2.1. In an earlier version we used to be able to set the max chunk size in HdfProxy. Sometime prior to fesapi v0.16 this call disappeared and it seems that we are reliant on fesapi to set the max chunk size, which seems to be a constant 4GB.

We are writing very large hdf files, with an example property being 1.2GB, all in one chunk. Our Resqml/HDF consuming code cannot handle chunks this large - it reads properties in a split fashion so every time it requests a partial read, the HDF5 library has to decompress the entire chunk, discard it and repeat (the chunk is too large for the cache too). This is infeasible.

I am a dev on the writing side. Our consuming developers recommend that we write with a max chunk size of 1MB. Is there any way that we can control the max HDF5 chunk size when writing? Does fesapi v2.0 support this?

Hi Keith,

Short answer, FESAPI does not support chunking so far. Even v2.0.0.0.

The initial wish was to keep FESAPI simple and chunking was seen as too complex without clear added value. I think it was a bad judgement, a mistake and that chunking should be now included in FESAPI especially looking at how simple it looks to be integrated. Sorry about that.

This development seems possible and does not look to be really hard to implement. I created a Github feature request in order to track this demand which will have to be prioritized (either when I’ll have time or when the FESAPI initiative will vote for it) : https://github.com/F2I-Consulting/fesapi/issues/279

I confirm that chunk is for now and only for compressed dataset arbitrarily set to 4 GB by FESAPI. There is no chunking for uncompressed dataset.

A lot of adopters do not compress data when they use FESAPI. Compression can even be a problem for some readers : I had some comments about that according to some readers using HDF parallel reading (I have not verified anything but just believed them).
These readers then read contiguously the dataset by part without performance penalty but with storage penalty. It might be a workaround (to disable compression and to read in contiguous order) for your use case waiting for a chunk setter in next FESAPI versions.

Thanks for the prompt response. I have added a comment to the Github feature request.

Are you considering this feature for v1.2 as well as v2.0? We have just upgraded to v1.2.1 and could not realistically consider another breaking upgrade (which I suspect v2.0.0.0 will be) any time soon.

I consider it first to be v2.0 which is indeed breaking.
Depending on the development easiness (and/or on the FESAPI Initiative vote), it could be backported in v1.2.

As a quick view on how to enable chunk size in FESAPI (just to wrap the HDF chunk setter somehow in writeArrayNd and createArrayNd and writeitemizedListOfList), I guess it could be easy and consequently backported.

However, I also expect it would be backported in a v1.3 (not a v1.2) which would also be breaking rigorously speaking but not a lot breaking at all.
By breaking I just mean that some few method signature would change (in order to optionally pass the chunk size as a new parameter of current methods)

Reminder of versioning system

  • first digit : change in architecture
  • second digit : breaking change (method signature change)
  • third digit : non breaking change ( change only at the cpp/implementation level)
  • fourth digit : tiny fix such as typo, documentation etc…