Summary:
The rapid revolution in microprocessor chip architecture due to multicore technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the parallelism po- tential due to such multi-core architectures ? In this paper, we reportanin-depthstudyonsuchchallengesbasedonour experience of optimizing the Fast Fourier Transform (FFT) on the IBM Cyclops-64 chip architecture - a large-scale multi-core chip architecture consisting 160 thread units, associated memory banks and an interconnection network that connect them together in a shared memory organiza- tion. We demonstrate how multi-core architectures like the C64 could be used to achieve a high performance imple- mentation of FFT both in 1D and 2D cases. We analyze the optimization challenges and opportunities including prob- lem decomposition, load balancing, work distribution, and data-reuse, together with the exploiting of the C64 archi- tecture features such as the multi-level of memory hierarchy and large register files. Furthermore, the experience learned during the hand- tuned optimization process have provided valuable guid- ance in our compiler optimization design and implementa- tion.
Technology Use: .Net Or Java Or Python
Modules:
Algoritham Use: Not Defined
