headerphoto

Big Picture

parallelization big picture

One use case for the sambamba framework is fully automatic parallelization. A rough sketch of the idea is depicted in the figure to the right:

[A] Long running, yet still conservative analyses are performed at compile time. The result is a set of candidates for speculative parallel execution as well as already parallelized versions of statically parallelizable methods. The candidates are potentially guarded by statically determined preconditions. One example would be that method M is parallelizable by executing calls to N and O in parallel iff the parameters of M are pairwise disjoint.

[P] The statically identified parallelization candidates are taken in order of the expected profit and binary code is created. This speculatively parallelized code is guarded by a software transactional memory system if necessary. Code for which parallel candidates guarded by statically derived preconditions exist are dealt with by creating an instrumented version of the code in order to observe if these preconditions are meet frequently. If this is the case, the parallel version is generated and installed together with a check if the corresponding precondition is meet. Note that if the dispatch mechanism uses the statically derived precondition checks in order to decide which version to execute, there is no need to guard that particular version by a full fledged STM system for example.

[X] The parallelized code is executed. Every misspeculation is detected by the STM system and kept track of.

[C] The module registering a new speculatively parallelized version of a method is informed about misspeculation if it occurs. Additionally the runtime of parallelized methods can be observed and compared to the runtime of the sequential version of that method. In case misspeculation occurs too often or the runtime of a method drops significantly, the corresponding parallel version is dropped. This event is logged in the dynamic data store such that the particular parallel version is not considered in future runs. If parallelization candidates exist which could be further parallelized if certain dependences did not exist, these dependences are tracked. If further parallelization depends on control flow or value speculation, the corresponding branches and values are observed.

[S] If certain tracked dependences do not exist, (abstract) values are frequently the same or the same paths are taken most of the time, specialized versions of the corresponding code are created. Non occurring dependences are speculated away, the values are marked constant and the paths not taken are removed. The specialized versions created in such a way are then fed back into the loop for further parallelization.