# CIS 371 Computer Organization and Design

Unit 14: (Low) Power and Energy

CIS 371 (Martin): Power

#### **Energy & Power**

- Energy: measured in Joules or Watt-seconds
  - Total amount of energy stored/used
  - Battery life, electric bill, environmental impact
  - Instructions per Joule (car analogy: miles per gallon)
- Power: energy per unit time (measured in Watts)
  - Related to "performance" (which is also a "per unit time" metric)
  - Power impacts power supply and cooling requirements (cost)
    - Power-density (Watt/mm²): important related metric
  - Peak power vs average power
    - E.g., camera, power "spikes" when you actually take a picture
  - Joules per second (car analogy: gallons per hour)
- Two sources:
  - Dynamic power: active switching of transistors
  - Static power: leakage of transistors even while inactive

## Power/Energy Are Increasingly Important

- Battery life for mobile devices
  - · Laptops, phones, cameras
- Tolerable temperature for devices without active cooling
  - Power means temperature, active cooling means cost
  - No room for a fan in a cell phone, no market for a hot cell phone
- **Electric bill** for compute/data centers
  - Pay for power twice: once in, once out (to cool)
- Environmental concerns
  - Electronics account for growing fraction of energy consumption

CIS 371 (Martin): Power 2

## Energy Data from Homework 1 (SAXPY)



CIS 371 (Martin): Power 3 CIS 371 (Martin): Power

#### Power Data from Homework 1 (SAXPY)



# **Reducing Dynamic Power**

• Target each component: P<sub>dynamic</sub> ≈ N \* C \* V<sup>2</sup> \* f \* A

• Reduce number of transistors (N)

• Use fewer transistors/gates

Reduce capacitance (C)

• Smaller transistors (Moore's law)

• Reduce voltage (V)

• Quadratic reduction in energy consumption!

• But also slows transistors (transistor speed is ~ to V)

• Reduce frequency (f)

• S lower clock frequency (reduces power but not energy) Why?

• Reduce activity (A)

• "Clock gating" disable clocks to unused parts of chip

Don't switch gates unnecessarily

#### **Dynamic Power**

• Dynamic power (P<sub>dynamic</sub>): aka switching or active power

• Energy to switch a gate (0 to 1, 1 to 0)

• Each gate has capacitance (C)

• Charge stored is ∝ C \* V

• Energy to charge/discharge a capacitor is ∞ to C \* V<sup>2</sup>

• Time to charge/discharge a capacitor is ∞ to V

• Result: frequency ~ to V

•  $P_{dynamic} \approx N * C * V^2 * f * A$ 

• N: number of transistors

• C: capacitance per transistor (size of transistors)

• V: voltage (supply voltage for gate)

• f: frequency (transistor switching freq. is ∞ to clock freq.)

• A: activity factor (not all transistors may switch this cycle)

CIS 371 (Martin): Power 6

#### **Static Power**

Static power (P<sub>static</sub>): aka idle or leakage power

• Transistors don't turn off all the way

• Transistors "leak"

•  $P_{static} \approx N * V * e^{-V_t}$ 

• N: number of transistors

• V: voltage

 V<sub>t</sub> (threshold voltage): voltage at which transistor conducts (begins to switch)

• Switching speed vs leakage trade-off

• freq  $\propto (V - V_t)^2 / V$ 

• The lower the V<sub>t</sub>:

• Good: Faster transistors (linear)

• Bad: Leakier transistors (exponential!)



## **Reducing Static Power**

- Target each component: P<sub>static</sub> ≈ N \* V \* e<sup>-Vt</sup>
- Reduce number of transistors (N)
  - Use fewer transistors/gates
- **Disable transistors** (also targets N)
  - "Power gating" disable power to unused parts (long latency to power up)
  - Power down units (or entire cores) not being used
- Reduce voltage (V)
  - Linear reduction in static energy consumption
  - But also slows transistors (transistor speed is ~ to V)
- Dual V<sub>t</sub> use a mixture of high and low V<sub>t</sub> transistors
  - Use slow, low-leak transistors in SRAM arrays
  - Requires extra fabrication steps (cost)
- Low-leakage transistors
  - High-K/Metal-Gates in Intel's 45nm process, "tri-gate" in Intel's 22nm

CIS 371 (Martin): Power

## Dynamic Voltage/Frequency Scaling

|            | Mobile PentiumIII<br>" <b>SpeedStep</b> " | Transmeta 5400<br>"LongRun" | Intel X-Scale<br>(StrongARM2) |
|------------|-------------------------------------------|-----------------------------|-------------------------------|
| f (MHz)    | 300-1000 (step=50)                        | 200-700 (step=33)           | 50-800 (step=50)              |
| V (V)      | 0.9-1.7 (step=0.1)                        | 1.1-1.6V (cont)             | 0.7-1.65 (cont)               |
| High-speed | 3400MIPS @ 34W                            | 1600MIPS @ 2W               | 800MIPS @ 0.9W                |
| Low-power  | 1100MIPS @ 4.5W                           | 300MIPS @ 0.25W             | 62MIPS @ 0.01W                |

- Dynamic voltage/frequency scaling
  - Favors parallelism
- Example: Intel Xscale
  - 1 GHz  $\rightarrow$  200 MHz reduces energy used by 30x
    - But around 5x slower
  - 5 x 200 MHz in parallel, use 1/6th the energy
  - Power is driving the trend toward multi-core

## **Dynamic Voltage/Frequency Scaling**

#### Dynamically trade-off power for performance

- Change the voltage and frequency at runtime
- · Under control of operating system
- Recall: P<sub>dynamic</sub> ≈ N \* C \* V<sup>2</sup> \* f \* A
  - Because frequency ∞ to V...
  - P<sub>dvnamic</sub> ∞ to V<sup>3</sup>
- Reduce both V and f linearly
  - · Cubic decrease in dynamic power
  - Linear decrease in performance (actually sub-linear)
    - Thus, only about quadratic in energy
  - Linear decrease in static power
    - Thus, static energy can become dominant
- Newer chips can do this on a per-core basis

CIS 371 (Martin): Power 10

#### Trends in Power

|                 | 386  | 486  | Pentium | Pentium II | Pentium4 | Core2 | Core i7 |
|-----------------|------|------|---------|------------|----------|-------|---------|
| Year            | 1985 | 1989 | 1993    | 1998       | 2001     | 2006  | 2009    |
| Technode (nm)   | 1500 | 800  | 350     | 180        | 130      | 65    | 45      |
| Transistors (M) | 0.3  | 1.2  | 3.1     | 5.5        | 42       | 291   | 731     |
| Voltage (V)     | 5    | 5    | 3.3     | 2.9        | 1.7      | 1.3   | 1.2     |
| Clock (MHz)     | 16   | 25   | 66      | 200        | 1500     | 3000  | 3300    |
| Power (W)       | 1    | 5    | 16      | 35         | 80       | 75    | 130     |
| Peak MIPS       | 6    | 25   | 132     | 600        | 4500     | 24000 | 52800   |
| MIPS/W          | 6    | 5    | 8       | 17         | 56       | 320   | 406     |

- Supply voltage decreasing over time
  - But "voltage scaling" is (perhaps) reaching its limits
- Emphasis on power starting around 2000
  - · Resulting in slower frequency increases

CIS 371 (Martin): Power 11 CIS 371 (Martin): Power 12

9

# Processor Power Breakdown

- Power breakdown for IBM POWER4
  - Two 4-way superscalar, 2-way multi-threaded cores, 1.5MB L2
  - Big power components are L2, D\$, out-of-order logic, clock, I/O
  - Implications on out-of-order vs in-order





CIS 371 (Martin): Power 13

# Implications on Software

- Software-controlled dynamic voltage/frequency scaling
  - OS? Application?
  - Example: video decoding
    - Too high a clock frequency wasted energy (battery life)
    - Too low a clock frequency quality of video suffers
- Managing low-power modes
  - Don't want to "wake up" the processor every millisecond
  - Slow/fast cores: 1 slow low-energy core, N fast high-energy cores
  - "Race to sleep" versus "slow and steady" approaches
- Tuning software
  - Faster algorithms can be converted to lower-power algorithms
  - Via dynamic voltage/frequency scaling
- Exploiting parallelism

CIS 371 (Martin): Power 14