Q: In general will Smartsim ever become available on Summit?
A: We’re exploring that, but Summit is also going away “soon”. So it would only be available for a limited time. We’re looking to offer an official OLCF smartsim module after the workshop. At the very least, hopefully will have an official one on Frontier relatively soon. t’s been known to work on Summit in the past, so definitely: https://github.com/CrayLabs/SmartSim-Zoo/tree/master/summit
A: Yes, we plan to support Summit and the LSF launcher as long as the machine is still operational.


Q: Any DFT packages? I am particularly interested in VASP but it is not open-source :(.
A: I’m not aware of any DFT packages as open collaborations.  Is there a particular package you are thinking of?
A: We are happy to consult on SmartSim integrations and the open source license permits embedding into any application, but we have not worked directly with VASP. As long as VASP is coded in Fortran/C/C++/Python, then the integration should be fairly easy.


Q: Is it one co-located DB instance per MPI task or per node?
A: It is per-node. All of the MPI ranks on a compute node would communicate with the on-node in-memory database. It is worth noting that the co-located DB is capable of leveraging all of the GPUs on the compute node and there are parameters in the API functions to efficiently distribute inferences across those multiple GPUs. Also, the Client API functions are largely the same between colocated and clustered deployments, so the integration is portable between deployments and machines


Q: I may have missed your comment: On second point, would low-res climate models benefit from turbulence or more high-res?
A: The goal of the NCAR collaboration was the embed high-fidelity (small grid size) turbulence behavior into a low-fidelity (large grid size) climate model that would traditionally have incorrect turbulence parameters. This was done with calling a machine learning model from within the turbulence model calculation in the ocean model using SmartSim


Q: I have a JAX ML trained model. I want to make use of predictions from the model in a CFD code. Can I do that through SmartSim?
A: See recording (around 01:11:25)


Q: Where can I find information on the potential overhead introduced with integration of a scientific application with SmartSim?
A: See recording (around 01:15:40)


Q: Work with AMD GPUs?
A: Yes SmartSim works with AMD GPUs, and just a small note on AMD GPUs — we are working on releasing Spack recipes to help users and sys admins to get shared installations on their machines, but we also have instructions for building SmartSim from source for AMD GPUs on Frontier


Q: If you have a GPU-capable simulation model, how do you distribute GPUs on-node
A: When you use the API function to set a model in the database, you can specify the subset of GPUs to use. So you can tell SmartSim and the clients to only use 2 out of 4 GPUs, for example


Q: Is there any GPU-GPU communication within SmartSim?
A: We haven’t investigated that, but we would be interested in hearing more about the use case. Currently we would have to transfer data back to the host.  I’m not sure that there is an easy solution to that since we require all data to go through the in-memory database.


Q: How would this module work integrated into a conda env that you may need to build a package that is going to use TF or PT plus a simulation engine? Often conda will ignore stuff in the path set by LMOD
A: At the moment the spack package will only set your `PYTHONPATH` which most virtualenv solutions (iirc conda included) should respect and preserve!


Misc note from chat: Optimizing inference performance is somewhat application specific.  For example, there are API function parameters that specify batch size which allows you to process closely-timed requests more efficiently.  We often parameterize that call based on the number of MPI ranks in the simulation or on the compute node because simulations often make the request almost simultaneously