Duke Hound
Statistical Learning for Straggler Diagnosis

Statistical machine learning framwork for diagnosing performance stragglers from datacenter traces. Analysis is deployed atop Spark for distributed computation. Analysis is demonstrated for production Google datacenter and Lenovo experimental system.

[Zip] Hound Repository

The analysis framework accompanies the paper "Hound: Causal learning for datacenter-scale straggler diagnosis" in the Proceedings of the International Conference on Measurement and Modeling of Computer Systems [SIGMETRICS'18].



Duke ActionBench
Mobile Benchmarks for Gem5

ActionBench provides APK files of the mobile benchmarks in an ISPASS 2016 paper. These files can be placed in a mounted Gem5 image and can be installed inside the simulator. The repository includes benchmark source code, written in Java, and Gem5 simulation scripts.

[GitHub] ActionBench Repository
[Zip] ActionBench Repository

The benchmark suite accompanies the paper "Evaluating Asymmetric Multiprocessing for Mobile Applications" in the proceedings of the International Symposium on Performance Analysis of Systems and Software [ISPASS'16].



Duke DSM
Datacenter Simulation Methodologies

DSM is a tutorial on datacenter simulation methodologies. In an era of big data, datacenters comprise the essential infrastructure for cloud computing. Yet simulation and evaluation methodologies remain a challenge as computer architects seek to improve datacenter performance and efficiency. This tutorial demonstrates the tools for datacenter research and walks partcipants through the approach taken at Duke. At the end of the tutorial, participants will be able to (1) deploy a full-system, cycle-accurate simulator, (2) simulate datacenter workloads, and (3) explore new design spaces.

[Website] ISCA 2015 Tutorial Webpage
[Website] MICRO 2014 Tutorial Webpage
[Lecture] Datacenter design and management



Harvard CORE
Comprehensive Optimization via Regression Estimates

CORE is a collection of example R scripts that construct microarchitectural performance and power regression models. These models are based on restricted cubic splines. The derivation process includes correlation, association, clustering, and significance analyses. The current scripts illustrate model construction for out-of-order, superscalar architectures using data from the IBM Turandot/PowerTimer simulation infrastructure, which simulates a POWER4/5-like architecture.

The code implements statistical techniques for exploratory data analysis (correlation, association, clustering, significance testing) using the open source R statistical computing package. For downloads and installation instructions of the R package, refer to its website.

[Download] Data for formulating regression models
[Download] Code for data analysis and regression modeling
[Website] The R Project for Statistical Computing

This data and code are to accompany the paper "Accurate and efficient regression modeling for microarchitectural performance and power prediction" in the proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems [ASPLOS'06] . For a detailed explanation of the code and analysis, please refer to the following tutorial [IEEE'07]. Further information may be found in the technical report leading up to the ASPLOS 2006 paper [TR'06].



Berkeley OSKI
Optimized Sparse Kernel Interface

The Optimized Sparse Kernel Interface (OSKI) Library is a collection of low-level C primitives that provide automatically tuned computational kernels on sparse matrices, for use in solver libraries and applications. OSKI has a BLAS-style interface, providing basic kernels like sparse matrix-vector multiply and sparse triangular solve, among others.

[Website] BeBOP Optimized Sparse Kernel Interface