After recently giving a presentation as part of my PhD, I am left with a large swath of data. This is data that, to anyone in the field, is not particularly surprising, but is also damn impossible to find. To anyone not studying Computer Architecture, you might find these trends interesting.
The first thing to cover is Moores Law which is a self enforced trend in which the number of transistors on die (the bit of silicon inside a computer processor package) doubles every four years. Most of the following data comes from CPU DB by Stanford's VLSI group. Keep in mind that the following are log plots which is why the trends look linear.
From the formation of the industry until about 2005, this also mapped to an increase in performance. In 2005, the industry encountered three different barriers.
The first barrier is known as the Power Wall. Before 2005, the primary driver of the performance trend was frequency scaling. As the number of transistors on die increase, they also got smaller giving them a higher maximum switching frequency. The increase in switching frequency of the whole chip accounted for equivalent increases in performance.
However, as frequency scaled up, so did power until the chip was producing more heat than a fan-cooled heat sink could disperse. As a chip heats up, the maximum switching frequency of the transistors on die slows down. Heat also adds wear and tear by exacerbating electromigration. Effectively, electrons in a wire pick up gold atoms from one place and drop them at another much like water erodes away rocks in a river. So to increase the frequency further, we would have to switch to some other cooling medium like water which is ultimately uneconomic. Also, the formation of the mobile market with the introduction of the iPhone in 2007 dramatically increased the demand for low power processing devices.
So, to keep scaling performance while maintaining frequency and therefore power, the industry turned to parallelism, increasing the number of cores on die and the number of assembly instructions each core could schedule per clock cycle (instruction level parallelism). The ILP graph data is from Wikipedia's Instructions Per Second page.
However, according to Amdahl's Law, the speedup attainable from parallelism is dependant upon the type of workload. If the workload isn't parallelizable, then this doesn't help. Also, there is an asymptotic upper bound for the attainable speedup for any workload dependant upon the smallest sequential action in that workload. Ultimately, this is called the ILP Wall. Jyotsna Sabarinathan at the University of Texas did a study of available parallelism in a few workloads in 1999.
The third barrier is called the Memory Wall. The throughput of memory systems are increasing at a much slower rate than the throughput for computing systems. This is creating a bottleneck between the memory hierarchy and the computing fabric that slows the whole system. This data comes from the Stream Benchmark by John D. McCalpin.