An OSCAR (Optimally SCheduled Advanced multiprocessoR) architecture high performance multiprocessor connects SCM (Single Chip Multiprocessors) or RISC processors having local memory, distributed shared memory and data transfer unit for hiding data transfer overhead by a network with variable barrier synchronization hardware. OSCAR architecture has been developed to support Multigrain Parallel Processing using static and dynamic task scheduling.
OSCAR SCM has been designed to realize multigrain parallel processing on a chip for cost effective parallel processing. OSCAR SCM integrates multiple simple RISC processors having a local memory, a distributed shared memory and a data transfer unit connected by multiple busses and global registers. Hierarchical memories including global registers are optimally used by OSCAR multigrain parallelizing compiler with an automatic data distribution (Data Localization) function and an overlapping data transfer scheduling function.
OSCAR multigrain parallelizing compiler hierarchically use coarse grain parallelism among loops and subroutines, loop parallelism among loop iterations and (near) fine grain parallelism among instructions or statements. Coarse grain parallelism are automatically exploited by earliest executable condition analysis considering control and data dependencies. Coarse grain tasks are assigned to processors or processor clusters by a centralized or distributed dynamic scheduling routine generated by the compiler or assigned statically by the compiler depending on characteristics of the generated macrotask graph. Performance of OSCAR multigrain compiler has been evaluated on OSCAR multiprocessor with OSCAR machine code backend, IBM RS6000 SP 604e high node with OpenMP backend, Fujitsu VPP 300 with VPP backend and so on.
Automatic data distribution technique (Data Localization) for loops under coarse grain parallel processing environment with static and dynamic task scheduling. This data localization technique mainly consists of Loop Aligned Decomposition, Task-Fusion and Partial Static Task Assignment.
An practical parallelized optimization algorithm for "strong" NP hard minimum execution time multiprocessor scheduling problem. PDF/IHS allows us to solve huge scale of problems with several thousand of tasks for a few minutes on SMP servers like SUN Enterprise 3000.
Scheduling algorithms to realize overlapping of data transfer and coarse grain task execution to be embedded into OSCAR multigrain compiler. The performance of the scheduler has been evaluated on Fujitsu VPP supercomputers with VPP Fortran backend.
A kind of benchmarks to evaluate multiprocessor scheduling algorithms. This standard task graph set are available from this home page for fair evaluation of various heuristic and optimization algorithms using the same task graphs.
OSCAR Meta-scheduling realizes automatic coarse grain parallel processing of a single job on heterogeneous supercomputer cluster like COMPACS composed of Hitachi SR2201, NEC SX4, Fujitsu VPP300, Cray T94 and IBM SP2 in STA (Science and Technology Agency) JAERI CCSE by using OSCAR Multigrain compiler with STA-MPI backend.
Multigrain parallel processing has been applied to various applications including the following.