The Row Workflow
Now, let’s put this all together and automate the submission process with row.
workflow.toml
This row workflow.toml configuration will run action simulate on
all the directories in the workspace:
[workspace]
value_file = "signac_statepoint.json"
[default.action]
command = "target/release/action $ACTION_NAME {directories}"
launchers = ["rayon"]
[[action]]
name = "simulate"
products = ["trajectory.gsd"]
resources.walltime.per_directory = "01:00:00"
resources.threads_per_process = 1
group.maximum_size = 1
Command
command = "target/release/action $ACTION_NAME {directories}"
The command tells row how to launch the action binary. Use the
$ACTION_NAME environment variable instead of hard-coding simulate
so that command is ready for use when you add more actions subcommands
in the future.
Launcher
hoomd-rs uses rayon for thread-level parallelism. Set the rayon launcher to correctly configure the number of threads:
launchers = ["rayon"]
also set threads_per_process to a non-default value:
resources.threads_per_process = 1
Warning
Skip either of these settings and hoomd-rs will attempt to use all cores on the compute node (e.g. 128) even if SLURM has locked your job to 1 core. The resulting resource contention will cause your simulations to run extremely slowly.
How should you choose threads_per_process? You need to choose it appropriately
based on how you configure your simulation model. Most of hoomd-rs uses
only 1 thread, so that should be the default. In the current release, only
ParallelSweep uses multiple threads. Benchmark and see how your model scales
before submitting a set of jobs to a cluster. It would be a waste of your time
if you requested threads_per_process=32, but your model ran even faster with
threads_per_process=8.
Maximum Size
At this time, you must set
group.maximum_size=1
A future version of this template will support bundling actions on many directories in a single cluster job.
Execute the Workflow
Execute:
row submit -n 1
to execute the simulate action on one of the eligible directories.
When it completes, you will find log.parquet.0, trajectory.gsd, and
model.postcard in the directory.
To see how the action can continue from where it left off, set
resources.walltime.per_directory = "00:06:00"
in workflow.toml and then
row submit -n 1
again.
This time, the action should quit after 1 minute (it defaults to a 5 minute wall time buffer) and will print:
...
[INFO hoomd_workflow::simulate] Step 15000 / 100000 (15%)
[INFO hoomd_workflow::simulate] Step 16000 / 100000 (16%)
[INFO hoomd_workflow::simulate] Step 17000 / 100000 (17%)
[INFO hoomd_workflow::simulate] Step 18000 / 100000 (18%)
[INFO hoomd_workflow::simulate] Stopping simulation, wall time limit reached.
Now, the job directory will contain log.parquet.0, trajectory.in-progress.gsd,
and model.postcard. Submit again and the action will pick up right where
it left off. Each subsequent submission will generate a new log.parquet.N file.
When it reaches step 100,000, trajectory.in-progress.gsd will be renamed to
trajectory.gsd.
Development of hoomd-rs is led by the Glotzer Group at the University of Michigan.
Copyright © 2024-2026 The Regents of the University of Michigan.