Roberto Giorgi, ACM International Conference on Computing Frontiers 2012

Roberto Giorgi presented TERAFLUX project at the ACM International Conference on Computing Frontiers -Cagliari, Italy, May 15th - 17th, 2012

Here are the slides of the presentation: TERAFLUX, Exploiting Dataflow Parallelism in Teradevices

Abstract: The TERAFLUX project is a Future and Emerging Technologies (FET) Large-Scale Project funded by the European Union. TERAFLUX is at the forefront of major research challenges such as programmability, manageable architecture design, reliability of many-core or 1000+ core chips. In the near future, new computing systems will consist of a huge number of transistors - probably 1 Tera or 1000 billions by 2020: we name such systems as "Teradevices".

Most recent updates in the worldwide scenario include the availability of a new type of transistor (3D transistor), which marks the biggest change in the semiconductor industry since 1948 with the introduction of the transistor itself. New materials like Graphene may allow even greater power saving. The technology-node scaling has reached 22nm, with 14nm silicon foundries to be operative by 2013, and it seems the pace will continue at least until 8nm. The 3D layering gives new lymph to the Moore's law too. In this scenario, the TERAFLUX project brings together 10 industrial and academic partners to give their best contribution in order to find a common ground to solve at once all the above three challenges. The research in this project is inspired by the Dataflow principle. As recalled by Jack Dennis, dataflow is "a Scheme of Computation in which an activity is initiated by presence of the data it needs to perform its function. We believe that, if properly exploited, dataflow can enable parallelism which is orders of magnitude greater than what is achievable by control-flow dominated execution models. To investigate our concepts, we are studying dataflow principles at any level of a complete transformation hierarchy, starting from general and complex applications able to load properly a Teradevice through programming models, compilation tools, reliability techniques and architecture.

One key point it is also the evaluation of this system: our choice has been to rely on an existing simulation infrastructure (HPLabs COTSon) that immediately enabled us to start from a nowadays Teradevice (i.e., a 1000+ cluster of nodes, where each node consists of tens of cores) and progressively evolve such system into a more ambitious system where we can gradually remove major bottlenecks. While relying on solid and well-known reference points such as the x86-64 ISA, GCC tools, StarSs programming model and applications, we wish to demonstrate the validity of our research in such common evaluation infrastructure.

The system is not forced to follow entirely the dataflow paradigm: in fact, we distinguish among legacy and system-threads (L-, S-threads) and dataflow threads (DF-threads): this will allow for a progressive migration of programs to the new "dataflow paradigm", while accelerating the available DF-threads on the more dataflow-friendly cores. One other important choice is the exploration of synchronization mechanism such as transactional memory, and the repetition of threads running on failing cores by using the dataflow principles. We can currently afford to run with an acceptable slowdown and accuracy, parallel, scalable, full-system (with unmodified Linux) simulations of 1000+ x86-64 cores while experimenting with very ambitious changes in the execution model implying a major effort to support the execution model based on dataflow threads, especially from the compiler point of view.

AttachmentSize
PDF icon giorgi120516-CF12-TERAFLUX.pdf2.34 MB