Benjamin Gojman

Email: First letter of first name full last name at seas dot upenn dot edu
Moore 315
3330 Walnut St.
Philadelphia, PA 19104

Research

Conferences

  • GROK-INT: Generating Real On-chip Knowledge for Interconnect Delays Using Timing Extraction
    Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '14), May 11-13, 2014

    With continued scaling, all transistors are no longer created equal. The delay of a length 4 horizontal routing segment at coordinates (23,17) will differ from one at (12,14) in the same FPGA and from the same segment in another FPGA. The vendor tools give conservative values for these delays, but knowing exactly what these delays are can be invaluable. In this paper, we show how to obtain this information, inexpensively, using only components that already exist on the FPGA (configurable PLLs, registers, logic, and interconnect). The techniques we present are general and can be used to measure the delays of any resource on any FPGA with these components. We provide general algorithms for identifying the set of useful delay components, the set of measurements necessary to compute these delay components, and the calculations necessary to perform the computation. We demonstrate our techniques on the interconnect for an Altera Cyclone-III (65nm). As a result, we are able to quantify over a 100ps spread in delays for nominally identical routing segments on a single FPGA.

  • GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays using Timing Extraction GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays using Timing Extraction
    Benjamin Gojman, Sirisha Nalmela, Nikil Mehta, Nicholas Howarth and André DeHon
    21st ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '13), 11-13 February 2013

    Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and type of process variation that exists in the FPGA. To obtain these delays, Timing Extraction measures, using only resources already available in the FPGA, the delay of a small subset of the total paths in the FPGA. We apply Timing Extraction to the Logic Array Block (LAB) on an Altera Cyclone III FPGA to obtain a view of the delay down to near individual LUT granularity, characterizing components with delays on the order of a few hundred picoseconds with a resolution of ±3.2 ps. This information reveals that the 65nm process used has, on average, random variation of σ/μ=4.0% with components having an average maximum spread of 83ps. Timing Extraction also shows that as VDD decreases from 1.2V to 0.9V in a Cyclone IV 60nm FPGA, paths slow down and variation increases from σ/μ=4.3% to σ/μ=5.8%, a clear indication that lowering VDD magnifies the impact of random variation.

  • VMATCH: Using Logical Variation to Counteract Physical Variation in Bottom-Up, Nanoscale Systems VMATCH: Using Logical Variation to Counteract Physical Variation in Bottom-Up, Nanoscale Systems
    IEEE International Conference on Field-Programmable Technology (FPT '09), 09-11 December 2009
    [abstract] [author's copy] [slides] [DOI]

    Nanowire building blocks provide a promising path to small feature size and thus the ability to more densely pack logic. However, the small feature size and novel, bottomup manufacturing process will exhibit extreme variation and produce many devices that operate outside acceptable operating ranges. One-mapping-fits-all, prefabrication assignment of logical functions to physical transistors that exhibit high threshold variation will not work—combining the wide range of physical variation in transistor threshold voltage with the wide range of fanouts in the design produces an unworkably large composite range of possible delays. Nonetheless, by carefully matching the fanout of each net to the physical threshold voltages of devices after fabrication, it is possible to reduce the net range of path delays sufficiently to achieve high system yield. By adding a modest amount of extra resources, we achieve 100% yield for systems built out of devices with 38% variation, the ITRS prediction for threshold variation in 5 nm transistors. Moreover, for these systems, we maintain delay, energy and area close to the variation-free nominal case.

  • 3D Nanowire-Based Programmable Logic 3D Nanowire-Based Programmable Logic
    Benjamin Gojman, Raphael Rubin, Concetta Pilotto, Tetsufumi Tanamoto and André DeHon
    International Conference on Nano-Networks (NanoNet '06), 14-16 September 2006
    Best Paper

    In nanowire-based logic, the semiconducting material (e.g., Si, GaN, SiGe) is grown into individual nanowires rather than being part of the substrate. This offers us the opportunity to stack multiple layers of nanowires to create a three-dimensional logic structure which has high quality semiconductors in all vertical layers. The authors detail a feasible three-dimensional programmable logic architecture which can plausibly be realized from layers of semiconducting nanowires, making only modest assumptions about the control and placement of individual nanowires in the assembly. This shows a natural path for continuing to scale areal logic density once nanowire pitches approach fundamental limits. The authors show that the three dimensional systems are volumetrically efficient, with the surface area reducing roughly in proportion to the number of vertical layers. The authors further show that, on average, delay is reduced 18% from compact layout in three dimensions. For only a 20% area impact, the authors show how to avoid adding any manufacturing steps to physically isolate portions of nanowire layers

  • Analysis of a Mask-based Nanowire Decoder Analysis of a Mask-based Nanowire Decoder
    Eric Rachlin, John E. Savage and Benjamin Gojman
    IEEE Computer Society Annual Symposium on VLSI (ISVLSI '05), 11-12 May 2005

    A key challenge facing nanotechnologies will be controlling nanoarrays, two orthogonal sets of nanowires that form a crossbar, using a moderate number of mesoscale wires. Three methods have been proposed to use mesoscale wires to control individual nanowires. The first is based on nanowire differentiation during manufacture, the second makes random doped connections between nanowires and mesoscale wires, and the third, a mask-based approach, interposes high-K dielectric regions between nanowires and mesoscale wires. All three addressing schemes involve a stochastic step in their implementation. In this paper, we analyze the mask-based approach and show that a large number of mesoscale control wires is necessary for its realization.

  • Decoding of Stochastically Assembled Nanoarrays Decoding of Stochastically Assembled Nanoarrays
    Benjamin Gojman, Eric Rachlin and John E. Savage
    IEEE Computer Society Annual Symposium on VLSI (ISVLSI '04), 19-20 Feb 2004

    A key challenge that face nanotechnologies is controlling the uncertainty introduced by stochastic self-assembly. In this paper we explore architectural and manufacturing strategies to cope with this uncertainty when assembling nanoarrays, crossbars composed of two orthogonal sets of coded parallel nanowires. Because the encodings of nanowires that are assembled into a nanoarray cannot be predicted in advance, a discovery process is needed and specialized decoding circuitry must be employed. We have developed a probabilistic method of analysis so that various design strategies can be evaluated.

Journals

  • GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays using Timing Extraction
    Benjamin Gojman, Sirisha Nalmela, Nikil Mehta, Nicholas Howarth and André DeHon
    To appear in ACM Transactions on Reconfigurable Technology and Systems (TRETS)
    [abstract] [author's copy] [DOI]

    Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and type of process variation that exists in the FPGA. To obtain these delays, Timing Extraction measures, using only resources already available in the FPGA, the delay of a small subset of the total paths in the FPGA. We apply Timing Extraction to the Logic Array Block (LAB) on an Altera Cyclone III FPGA to obtain a view of the delay down to near individual LUT SRAM cell granularity, characterizing components with delays on the order of tens to a few hundred picoseconds with a resolution of ±3.2 ps, matching the expected error bounds. This information reveals that the 65 nm process used has, on average, random variation of σ/μ = 4.0% with components having an average maximum spread of 83 ps. Timing Extraction also shows that as VDD decreases from 1.2 V to 0.9 V in a Cyclone IV 60 nm FPGA, paths slow down and variation increases from σ/μ = 4.3% to σ/μ = 5.8%, a clear indication that lowering VDD magnifies the impact of random variation.

  • Crystals and Snowflakes: Building Computation from Nanowire Crossbars Crystals and Snowflakes: Building Computation from Nanowire Crossbars
    IEEE Computer, Volume 44, Issue 2, February, 2011

    Suitable architectures and paradigm shifts in assembly and usage models will make it possible to exploit the compactness and energy benefits of single-nanometer dimension devices and allow extending these structures into the third dimension without depending on top-down lithography to define the smallest feature sizes in a system.

  • Inversion Schemes for Sublithographic Programmable Logic Arrays Inversion Schemes for Sublithographic Programmable Logic Arrays
    IET Computers and Digital Techniques, Volume 3, Number 6, November, 2009.

    A programmable logic array (PLA) needs its inputs available in both the positive and negative polarities. In lithographic-scale VLSI PLAs, programmable array logics (PALs) and programmable logic devices (PLDs) a buffer and inverter at the PLA input typically produces both polarities from a single polarity input. However, the extreme regularity required for sublithographic designs has driven nanoscale architectures to consider alternate solutions. Consequently, the authors compare three schemes: one based on producing both polarities in a restoration stage (selective inversion), one based on a local inversion stage and one based on a full dual-rail logic implementation. The authors develop a mapping flow for the dual-rail logic and quantify its cost in both logical product terms and physical implementation area and also develop area and timing models for all three schemes. Mapping benchmarks from the Toronto 20 set, the authors are able to show that the local inversion scheme is faster (less than one-fifth the latency), lower energy (one-half the energy) and comparable size to the selective inversion scheme and faster (less than half the latency), smaller (one-third of the area) and lower energy (one-ninth the energy) than the dual-rail scheme.

  • Evaluation of Design Strategies for Stochastically Assembled Nanoarray Memories Evaluation of Design Strategies for Stochastically Assembled Nanoarray Memories
    Benjamin Gojman, Eric Rachlin and John E. Savage
    ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 1 Issue 2, July 2005

    A key challenge facing nanotechnologies is learning to control uncertainty introduced by stochastic self-assembly. In this article, we explore architectural and manufacturing strategies to cope with this uncertainty when assembling nanoarrays, crossbars composed of two orthogonal sets of parallel nanowires (NWs) that are differentiated at their time of manufacture. NW deposition is a stochastic process and the NW encodings present in an array cannot be known in advance. We explore the reliable construction of memories from stochastically assembled arrays. This is accomplished by describing several families of NW encodings and developing strategies to map external binary addresses onto internal NW encodings using programmable circuitry. We explore a variety of different mapping strategies and develop probabilistic methods of analysis. This is the first article that makes clear the wide range of choices that are available.

Workshops

  • Techniques for Fault Reduction in Out-of-Order Microprocessors Techniques for Fault Reduction in Out-of-Order Microprocessors
    International Workshop on Logic and Synthesis (IWLS '05), 8-10 June 2005

    This paper addresses the issue of reducing transient faults that affect instructions while they are in the instruction queue waiting to be executed. Previous work has shown that for an in-order processor, squashing instructions triggered by a cache miss can reduce the number of transient faults. This paper shows that for an outof-order processor, reducing the size of the instruction queue can have a bigger impact than more adaptive techniques such as fetch halting. Ongoing work will explore more effective techniques for selective fetch halting to provide a reduction in faults committed while having a minimal impact on performance.

Edited Book Chapters

  • Component-Specific Mapping for Low-Power Operation in the Presence of Variation and Aging Component-Specific Mapping for Low-Power Operation in the Presence of Variation and Aging
    Low-Power Variation-Tolerant Design in Nanometer Silicon
    Editors: Swarup Bhunia and Saibal Mukhopadhyay, Springer US Pg. 381-432 2011

    Traditional solutions to variation and aging cost energy. Adding static margins to tolerate high device variance and potential device degradation prevent aggressive voltage scaling to reduce energy. Post-fabrication configuration, as we have in FPGAs, provides an opportunity to avoid the high costs of static margins. Rather than assuming worst-case device characteristics, we can deploy devices based on their fabricated or aged characteristics. This allows us to place the high-speed/leaky devices as needed on critical paths and slower/less-leaky devices on non-critical paths. As a result, it becomes possible to meet system timing requirements at lower voltages than conservative margins. To exploit this post-fabrication configurability, we must customize the assignment of logical functions to resources based on the resource characteristics of a particular component after it has been fabricated and the resource characteristics have been determined—that is, component-specific mapping. When we perform this component-specific mapping, we can accommodate extremely high defect rates (e.g., 10%), high variation (e.g., σ_Vt=38 %), as well as lifetime aging effects with low overhead. As the magnitude of aging effects increase, the mapping of functions to resources becomes an adaptive process that is continually refined in-system, throughout the lifetime of the component.