## ESE3700: Circuit-Level Modeling, Design, and Optimization for Digital Systems

#### Lec 18: April 9, 2025 Memory Overview and Periphery





- Memory
  - Overview
  - Periphery
- □ Project 2 is on this

#### Memory Overview





## Semiconductor Memory Classification

| RWM              |                                       | NVRWM                        | ROM                                    |
|------------------|---------------------------------------|------------------------------|----------------------------------------|
| Random<br>Access | Non-Random<br>Access                  | EPROM<br>E <sup>2</sup> PROM | Mask-Programmed<br>Programmable (PROM) |
| SRAM<br>DRAM     | FIFO<br>LIFO<br>Shift Register<br>CAM | FLASH                        |                                        |





N words => N select signals Too many select signals

## Memory Architecture: Decoders



Penn ESE 3700 Spring 2025 - Li



#### **Problem: ASPECT RATIO or HEIGHT >> WIDTH**



## Latches/Register – Can Store a State

- Build register from pair of latches
- Control with non-overlapping clocks



### Memory Periphery





- Decoders
- Column Circuitry
  - Bit-line Conditioning
  - Sense Amplifiers
  - Input/Output Buffers
- Control/Timing Circuitry



- $\square$  2<sup>n</sup> words of 2<sup>m</sup> bits each
- Good regularity easy to design
- Very high density if good cells are used





#### Array Architecture

- $\square$  2<sup>n</sup> words of 2<sup>m</sup> bits each
- Good regularity easy to design
- Very high density if good cells are used



#### Decoders





- $\square$  2<sup>n</sup> words of 2<sup>m</sup> bits each
- Good regularity easy to design
- Very high density if good cells are used





□  $n:2^n$  decoder consists of  $2^n$  n-input AND gates

- One needed for each row of memory
- Build AND from NAND or NOR gates

Static CMOS



Penn ESE 3700 Spring 2025 - Li



• For n > 4, NAND gates become slow

Break large gates into multiple smaller gates





• For n > 4, NAND gates become slow

Break large gates into multiple smaller gates





#### Many of these gates are redundant

- Factor out common
  - gates into predecoder
- Saves area
- Same path effort



Row Select: Precharge NAND



# Row Select: Precharge NAND







### Column Circuitry

#### & Bit-line Conditioning





#### Array Architecture

- $\square$  2<sup>n</sup> words of 2<sup>m</sup> bits each
- Good regularity easy to design
- Very high density if good cells are used





- $\square$  2<sup>n</sup> words of 2<sup>m</sup> bits each
- Good regularity easy to design
- Very high density if good cells are used





- Cell size accounts for most of array size
  - Reduce cell size at expense of complexity
- □ 6T SRAM Cell
  - Used in most commercial chips
  - Data stored in cross-coupled inverters
- **Read:** 
  - Precharge BL, BL'
  - Raise WL
- Write:
  - Drive data onto BL, BL'
  - Raise WL





□ Some circuitry is required for each column

- Required: Bitline conditioning
  - Precharging
  - Driving input data to bitline
- Increased speed: Sense amplifiers
- Aspect ratio (square memory): Column multiplexing (AKA Column Decoders)



Precharge bitlines high before read operations





#### Precharge bitlines high before reads







#### Precharge bitlines high before reads



- □ What if pre-charged to Vdd/2?
  - Pros: reduces read-upset
  - Challenge: generate Vdd/2 voltage on chip

## Column Capacitance Consequence

□ Preclass1: What is capacitance of a bitline?

□  $W_{access}$  (pass transistor size), d rows,  $\gamma = C_{diff0} / C_0$ 





- □ Preclass1: What is capacitance of a bitline?
  - □  $W_{access}$  (pass transistor size), d rows,  $\gamma = C_{diff0} / C_0$
- Preclass2: What is the delay for the cell to drive the bitline during a read?
  - $\Box$  W<sub>buf</sub> (inverter size in cell), R<sub>0</sub>





□ Preclass1: What is capacitance of a bitline?

□  $W_{access}$  (pass transistor size), d rows,  $\gamma = C_{diff0} / C_0$ 

- Preclass2: What is the delay for the cell to drive the bitline during a read?
  - $\square$  W<sub>buf</sub> (inverter size in cell), R<sub>0</sub>
- **D** Preclass3: Waccess=Wbuf=1,  $\gamma = 1/2$ 
  - □ Delay for d=32, 512?





□ Preclass1: What is capacitance of a bitline?

□  $W_{access}$  (pass transistor size), d rows,  $\gamma = C_{diff0} / C_0$ 

- Preclass2: What is the delay for the cell to drive the bitline during a read?
  - $W_{buf}$  (inverter size in cell),  $R_0$
- **Conclude:** Can't size up cell $\rightarrow$  driving bitline will be slow



## Sense Amplifiers

- Bitlines have many cells attached
  - Ex: 32-kbit SRAM has 128 rows x 256 cols
  - 128 cells on each bitline
- **u**  $t_{pd} \propto (C/I) \Delta V$ 
  - Even with shared diffusion contacts, 64C of diffusion capacitance (big C)
  - Discharged slowly through small transistors in each memory cell (small I)
- Sense amplifiers are triggered on small voltage swing  $(\Delta V)$





- Differential pair requires no clock
- But always dissipates static power





- Clocked sense amp saves power
- Requires sense\_clk after enough bitline swing
- □ Isolation transistors cut off large bitline capacitance



# Word Line Capacitance

#### □ Preclass4: What is capacitance of word line (row)?

- W<sub>access</sub>- transistor width of column device
- w columns
- $\gamma = C_{diff0} / C_0$
- □ Preclass5: Delay driving word line?
  - W<sub>wldrive</sub> Drive inverter



Column Drivers: Memory Bank



Penn ESE 3700 Spring 2025 - Li



## Tristate Buffer

- □ Typically used for signal traveling, e.g. bus
- Ideally all devices connected to a bus should be disconnected except for active device reading or writing to bus
- Use high-impedance state to simulate disconnecting



| Input | En | Ouptut |
|-------|----|--------|
| 0     | 0  | Z      |
| 1     | 0  | Z      |
| 0     | 1  | 0      |
| 1     | 1  | 1      |







Memory with column decoder



Penn ESE 3700 Spring 2025 - Li





Penn ESE 3700 Spring 2025 - Li





Penn ESE 3700 Spring 2025 - Li





Penn ESE 3700 Spring 2025 - Li



- Memory for compact state storage
- □ Share circuitry across many bits
  - Minimize area per bit  $\rightarrow$  maximize density
- Aggressively use:
  - Pass transistors, Ratioing
  - Precharge, Amplifiers to keep area down



#### Project 2 out

- Work in teams of up to two
- Final report due Wednesday 4/30
- □ Wednesday 4/16 Midterm 2 (next week)
  - 1:45pm-3:45pm **in class**
  - Midterm 2 Review session (4/16) in class
  - Lectures 11-18
  - Closed note, calculator allowed
  - All old exams online
    - **2**015-2024

#### Do review your preclass!!



Prof. André DeHon (University of Pennsylvania)
 Prof. Tania Khanna (University of Pennsylvania)



## Additional Reading (Optional)

#### **ROM** Memories

































#### **Problem: ASPECT RATIO or HEIGHT >> WIDTH**



#### **ROM** Memories























## Serial Access Memories

Serial access memories do not use an address

- Serial In Parallel Out (SIPO)
- Parallel In Serial Out (PISO)
- Shift Registers
- Queues (FIFO, LIFO)

![](_page_65_Picture_0.jpeg)

1-bit shift register reads in serial data

• After N steps, presents N-bit parallel output

![](_page_65_Figure_3.jpeg)

![](_page_66_Picture_0.jpeg)

## Parallel In Serial Out

• Load all N bits in parallel when shift = 0

• Then shift one bit out per cycle

![](_page_66_Figure_4.jpeg)

![](_page_67_Picture_0.jpeg)

*Shift registers* store and delay data
Simple design: cascade of registers

![](_page_67_Figure_2.jpeg)

![](_page_68_Picture_0.jpeg)

- □ Flip-flops aren't very area-efficient
- □ For large shift registers, keep data in SRAM instead
- Move read/write pointers to RAM rather than move data
  - Initialize read address to first entry, write to last

![](_page_68_Figure_6.jpeg)

![](_page_69_Picture_0.jpeg)

- *Queues* allow data to be read and written at different rates.
- □ Read and write each use their own clock, data
- Queue indicates whether it is full or empty
- Build with SRAM and read/write counters (pointers) storing read/write address

![](_page_69_Figure_5.jpeg)

![](_page_70_Picture_0.jpeg)

- First In First Out (FIFO)
  - Initialize read and write pointers to first element
  - Queue is EMPTY
  - On write, increment write pointer
  - If write almost catches read, Queue is FULL
  - On read, increment read pointer
  - If read catches write, Queue is EMPTY
- Last In First Out (LIFO)
  - Also called a *stack*
  - Use a single *stack pointer* for read and write