Kelvin Oluwada Milare Obuneme Olaiya
Taking advantage of multiple computing resources can be a way to reduce the time necessary to complete the execution of a simulation batch.
Main actors:
Cluster is an entity representing the collection of nodes that are currently connected forming a cluster. Through the cluster it is possible to obtain a Dispatcher, specifying the complexity that the nodes in the dispatcher should be able to handle.
ClusterNode represent a server node to which jobs can be distributed.
Dispatcher contains a subset of the nodes in the cluster. It is responsible for accepting SimulationBatches and distribute them across subset of nodes. Distribution is made according to a DispatchStrategy
DispatchStrategy it models the strategy with which the work load gets distributed to a collection of nodes (e.g. round-robin).
Complexity describes the complexity in terms of ram usage and memory occupation for the simulations in a batch.
SimulationBatch represents a simulation batch with its complexity. It is composed of a simulation configuration and a collection of simulation initializers.
SimulationConfig contains the general batch information such as the end step and end time of the simulations and a loader from which simulation instances will be created. Dependencies are files that must be made available to all servers in order to execute the simulation correctly.
SimulationInitializer contains a combination of variables values that will be used to create a simulation instance. For every simulation initializer in a simulation batch corresponds a job for a node in the cluster.
BatchResult models the result of a simulation batch that have been submitted via a Dispatcher. It gives information on the total number of errors, if any, that have occurred while executing the simulation batch and a utility method to save all the distributed export files locally.
SimulationResult models the result of a single job.
message Simulation {
string simulationID = 1;
bytes environment = 2;
bytes exports = 3;
string jobDescriptor = 4;
}
A test example:
"Simulation are correctly distributed" {
startServers(serverConfigFile, SERVERS_TO_LAUNCH).use {
val cluster = ClusterImpl(registry)
awaitServerJoin(cluster, SERVERS_TO_LAUNCH, 10.seconds)
startClient(clientConfigFile).use {
until(20.seconds) {
registry.simulations().size == 1
}
val simulationID = registry.simulations().first()
registry.simulationJobs(simulationID) shouldHaveSize SIMULATION_BATCH_SIZE
}
}
}
Kotest functions for non-determinism:
eventually
continually
until