Python Swig wrapper

Hi, I would like to know if in the Python Swig wrapper for fesapi, is it possible (or is there methods like “setitems”, “getitems”, besides the “setitem”, “getitem”) to vectorize the array operation for efficient array conversion in the case of very large models. I couldn’t find this in example.py and not sure where exactly to look for it. Appreciate any ideas on this topic.

I think there are actually two questions!
A) If you have very large models, I think you don’t really want “setitems” and “getitems” method on your SWIG arrays. I think you would want to write/get into HDF5 by chunk in order not to deal with giant array in your memory at anytime.
Some methods allow chunking, some not in FESAPI. For example setValuesOfInt64Hdf5Array3dOfValues() : Fesapi: resqml2::AbstractValuesProperty Class Reference

B) You can store your very large model in memory, then you just want to be more efficient by setting the array in memory by chunk instead of 1 item at a time.
This is a quite pure SWIG question. Looking at the SWIG documentation, there are several ways to achieve that. The main one looks to be to let FESAPI C (SWIG) array to accept a NumPy array instead. This looks possible using numpy.i but it will require me time to work on it…

1 Like

Thank you. I am not sure I understood the context of part A of your response. Wondering if you could elaborate more on that? Did you mean the context describes a use case where the large model already exists in memory and you are pointing out ways to optimize the H5 writing of the bulk data? Does it apply also to reading bulk data? But for latter then it means the model already exists on disk.

I think my question falls under part B of your response. I tried using multi threaded approach using chunk size, with setitem and getitem to try to optimize, but I found this didn’t make any difference likely due to the Python GIL. I heard that Python 3.13 and later have released the GIL to enable multi threads. Would you recommend switching to this more current Python version and rebuilding the Fesapi python bindings with 3.13 or later to enabled multi threading?

Yes. Or, you receive live a property but you cannot wait to have received all this property before to write it on disk because it does not fit in your available memory. Then you are forced to write it by chunk on disk.
In this scenario you still use “setitem” and “getitem” with FESAPI (in memory) but you write the property by chunk on disk instead of writing the property at once.

Yes

Ok I think so. Notice that answer A and answer B are not exclusive. You could write to FESAPI (in memory) with “setitems” and “getitems” and decide to write on disk by chunk.
Indeed “setitem” and “getitem” only writes in memory, not on disk.

I saw a lot of discussions about Python GIL and SWIG on some forums but really had not enough time to work deeply on it. I am sorry but I cannot give advise to you about that for now. Maybe someone else in this forum could…
The python wrapper of FESAPI is quite new compared to JAVA, C# ones. And, maybe as a consequence, maybe because some use resqpy instead, I have not a lot of feedbacks about it and I don’t personally use it a lot. So, I am sorry but my expertise is quite low on this topic.
I am really interested in improving it but, for now, I cannot find time to deeply work on it.

1 Like

Fyi, for the python swig wrapper I tested v2.12.2.1 and also installed and reviewed the latest package 2.14.1.0 but I found for all of them while the C++ classes for SubRepresentation and GridConnectionaSetRepresentation have the toplogical assignement functions pushBackRefToExistingDataset and setCellIndexPairsUsingExistingDataset, the python module is missing them even in the latest version. Is there any worksround for these? Any chance these can included in the upcoming version?

Thanks.

Edit - I thought about it some more after carefully reviewing the implementation of these functions (which us H5S_ALL so we can’t use actual size and write tiny arrays), and it seems there might be a way in python to work around the missing functions referencing existing dataset, and the bottleneck of calling setitem on the Fesapi arrays such as Int64Array etc, by just initializing them instead with the size and writing them (all zeroes), by calling setCellIndexPairs and pushBackSubrepresentationPatch, since that only calls either new or malloc with the size which shouldn’t impact the performance. The actual writing can be done to H5 using h5py separately after the EPC is closed. Let me know if that sounds reasonable?

I can certainly wrap SubRepresentation::pushBackRefToExistingDataset and GridConnectionaSetRepresentation::setCellIndexPairsUsingExistingDataset in other languages than C++ (including Python) in the next release.

Then, you would be free to use h5py to directly write your HDF5 file. It would certainly workaround your issue in a quite nice way. This is not the best since this issue should ideally be solved in FESAPI but this is a nice workaround imo.

1 Like

Thanks, that would be greatly appreciated. I also separately tested the workaround I described in the edit above, and confirm that it works correctly and gave me an overall improvement in performance by a factor of 4.25 for a model of size ~5M cells.

I think I already ported the missing methods in the dev branch : [SWIG] Port all possible methods from Subrepresentation and GridConne… · F2I-Consulting/fesapi@ab7af2b · GitHub
I think it should be testable in artefacts : [SWIG] Port all possible methods from Subrepresentation and GridConne… · F2I-Consulting/fesapi@ab7af2b · GitHub (bottom of the page)
However : this is dev version. Use with caution!!!

1 Like