In the past few years,Computing has changed from single core to multi-core, multi-core to heterogeneous.But when we move from single core to multi cores,many things have changed since then,Single core processing as a discrete CPU,often executing a single instruction per clock cycle used in a less sophisticated applications like pocket calculators and other control applications.
In Single Computing Era,Discrete CPU used as single processing unit.with the advancement of technology,more mathematical compute functions like Floating point units were added as the separate device.
Later,rapid progress in lithography and manufacturing technology which reduced the size of silicon die effectively and allowing to double the transistors in a device area every 18 to 24 months.This trend referred to as Moore’s Law.The early years of processor development directed towards the increasing of register width,more sophisticated instructions(MMX,SSE,etc) and finally integration of Floating math point unit.
Instruction Level parallelism,the early processors executed instructions in the order placed by their programmers.But it was noted in most programs,instructions sequence existed with no direct dependencies to each other.In such cases,performance could be increases by executing these instructions in parallel.This led to the more sophisticated microarchitectures with parallel execution units exploiting Instruction Level Parallelism(ILP).when combined out of order execution and super pipelining not only increased the Instruction per cycle(IPC) executed but also the clock rate of the Processor
The next unit added to the microprocessor was Single Instruction Multiple Data(SIMD) Vector units.This unit allows to execute single instruction to multiple data elements in a parallel.This significantly improved the Throughput of the processor core.
In the mid 2000,the performance of single core came to an apex,where it became impractical to gain the performance from it.Operating system became more sophisticated enabling more than one application to run simultaneously.further Operating system allowed to break down the applications into threads that could work together in parallel.Threads could be distributed to the discrete Microprocessor cores in a system exploiting Thread Level Parallelism to gain performance.In order to gain a consistent operating results,processors deployed were expected to form a Symmetric Multiprocessor System(SMP).With the growth in technology,more cores were added to the single multicore device.Hence,this generation of processors were classified according to the Number of cores(Core count).
Besides including Integer and floating point in the processor ,each core containing Parallel Vector SIMD executions units such as Advanced Vector Extensions(AVX).Including these units helped to increase the performance efficiency.
However unleashing this performance by the programmer was a tedious jobs due to limited working register resources and load/store programming model of CPU.
with the increasing demand of graphic intensive content in mid 1990s,an addition of specialized graphics processing unit to the system was needed. GPU was tailored to process the image in the image buffers.Graphics processing was evolved to include 3D,texture mapping,etc.The GPU architecture was evolved to handle flexibility to handle different types of data and processing with high throughput efficiency.The capability of GPU was recognized due to their parallel execution behavior also helped for non-graphics applications.
hence combining these GPU with CPU,heterogeneous computing came into existence.
Increasing the number of transistors provided an opportunity to integrate CPU and GPU into single device.It also allows their memory to enable direct sharing of the data structures.with limitations of silicon scaling(maintaining Moore’s Law) and driving need to improve performance and efficiency.This is the transition to heterogeneous computing,where the various execution units were tightly integrated and share system resources.The critical part was to elevate the GPU to the level of CPU for accessing memory,queuing and execution.In other words,rather than having a CPU and various Co-Processors,these various processing elements were can be referred ta as the combination of Compute Cores(CC).
This computing has changed the way of looking at the processor by integrating CPU cores and GPU cores and sharing the physical and virtual memory and still providing the cache coherency between the processing elements.