TERAFLUX on the web
TERAFLUX related links
- The COTSon SIMULATOR for Full-System simulation of Many-Cores
- OPENMP-SS (OmpSs): A Programming Model
- OPENSTREAM : a Data-Flow streaming extension of OpenMP, for the C language, implemented as a patch to GCC and a dedicate runtime system for data-flow tasks
- The Manchester University Transactions for Scala (MUTS)
- DFScala: a library for using the dataflow model of parallelism in the Scala programming language
- DRT: Dataflow Run Time: Run and Test DF-Threads based programs on standard x86_64 machines
- The TERAFLUX applications (SVN repository - to checkout the code, issue:
svn co https://svn.teraflux.eu/svnpub/apps)
- The GCC Graphite
- The NANOS++: a runtime designed to serve as runtime support in parallel environments
- The Open-HMPP: a superset of the OpenACC API
European Reserach Links
- EC research partecipants portal
- EC Information Society portal
- FP7 ICT portal
- FET proactive inititive
- FET open initiative
EURETILE investigates and implements brain-inspired foundational innovations to the system architecture of massively parallel tiled computer architectures and the corresponding programming paradigm. The execution target is a many-tile HW platform, equipped with a many-tile simulator. A set of SW process - HW tile mapping candidates are generated by the holistic SW tool-chain using a combination of analytic and bio-inspired methods.
The Hardware dependent Software is then generated, providing OS services with maximum efficiency/minimal overhead. The many-tile simulator collects profiling data, closing the loop of the SW tool chain. Fine-grain parallelism inside processes is exploited by optimized intra-tile compilation techniques. The elementary HW tile is a multi-processor, which includes a Distributed Network Processor (for inter-tile communication), a floating-point VLIW processor (for numerical intensive computations), and a RISC processor (for control, user interface and sequential computations). Furthermore, EURETILE investigates and implements the innovations for equipping the existing full-European elementary HW tile with high-bandwidth, low-latency brain-like inter-tile communication (emulating 3 levels of connection hierarchy, namely neural columns, cortical areas and cortex). The innovations will secure a 15+ year HW road-map of low-power and fault-tolerant excellence. EURETILE leverages on the working SW and HW prototypes of the innovative multi-tile HW paradigm and SW tool-chain developed by the FET-ACA SHAPES Integrated Project (2006-2009). This background knowledge includes working tile silicon and board, a multi-tile simulator (running up to eight tiles), and a complete SW tool-chain including a parallel programming and an automatic mapping/optimization environment (Distributed Operation Layer), a specialized OS (DNA-OS automatically generated for both RISC and VLIW) integrated with Linux RT, and an optimizing compiler co-designed with the HW tile.
Processor and network architectures are making rapid progress with more and more cores being integrated into single processors and more and more machines getting connected with increasing bandwidth. Processors become heterogeneous and reconfigurable, thus allowing for dynamic adaptation to specialised needs. In future, thousands of billions of devices may be connected to form a single computing unit.
No current programming model is able to cope with this development, as they are too tightly coupled with the underlying device structure. Furthermore, complex, non-aligned middlewares and operating systems render the programming model unnecessarily inefficient. In order to realise efficient programmability of terascale devices by experts and average developers equally, a complete new approach to handling these types of devices across all layers is required:
The S(o)OS project will address future distributed systems on the level of a holistic operating system architecture by drawing from Service Oriented Architectures and the strength of Grids. This will decouple the OS from the underlying resource infrastructure, thus making execution across an almost unlimited number of varying devices possible, independent from the actual hardware.
S(o)OS will allow for automatic distribution of code parts across such a resource fabric by investigating means to execute processes, threads and parallel applications across resources in a way that addresses both code requirements and resource availability, thus improving overall performance.
S(o)OS intends to enable even average developers to cope with large, widely distributed infrastructures. The project therefore examines means for run-time code assessment, its segmentation and distribution across the infrastructure. This will range from automated assessment to a powerful extension for experienced developers to specify e.g. communication and relationship requirements.
Technology projections indicate that future electronic devices will keep shrinking, being faster and consuming less energy per operation. In the next decade, a single chip will be able to perform trillions of operations per second and provide trillions of bytes per second in off-chip bandwidth. This is the so called Terascale Computing era, where terascale performance will be mainstream, available in personal computer, and being the building block of large data centers with petascale computing capabilities. However, these smaller devices will be much more susceptible to faults and its performance will exhibit a significant degree of variability. As a consequence, to unleash these impressive computing capabilities, a major hurdle in terms of reliability has to be overcome. The TRAMS project is the bridge for reliable, energy efficient and cost effective computing in the era of nanoscale challenges and teraflop opportunities.
The International Roadmap for Semiconductors (ITRS) report indicates that the Metal Oxide Semiconductor devices (MOS or MOS like devices) will be ultimately scaled down below 10nm in several years. The CMOS technologies after the 16nm technology generation are called Late CMOS technologies and will include novel multi-gate device architectures and novel channel and gate stack materials. Reliability issues are expected to be exacerbated to in sub-10 nm CMOS technology.
Beyond-CMOS emerging technologies will reach device dimensions reduction below 5nm utilising among others, nanowire transistors, quantum devices, carbon nanotubes, graphene, or molecular electronics. Both the Late CMOS and the Beyond CMOS technologies hold the promise of a significant increase in device integration density complemented by an increase in system performance and functionality. However, a dramatic reduction in single device quality is also expected, complemented by increase in statistical variability, severe reduction of the signal to noise ratio, and severe reliability problems. Therefore, alternative device solutions and computation paradigms need to be investigated to keep the technology evolution pace in such a challenging scenario. Memory cells and, in general, system architectures intended for nanotechnologies (both late CMOS and emerging devices) need to address the variability and reliability problem and should be capable of solving or at least largely alleviating it.
In order to build reliable nanosystems, the TRAMS project addresses a specific variability and reliability-aware analysis and design flow as well as a hierarchical tolerance design. In such a tera-device multi-core system the main idea will be to define countermeasure techniques at circuit and architecture design levels. The objective of this project is to investigate in depth potential new design alternatives and paradigms, which will be able to provide reliable memory systems out of highly unreliable nanodevices at a reasonable cost and design effort.