ESE 302 Home Page


Instructor: Tony E. Smith
274 Towne (898-9647)
tesmith@seas.upenn.edu
Office Hours: By Appointment

Teaching Assistant: 
Yuxi Song

songyuxi@seas.upenn.edu       
Office Hours:   Wednesdays, 1-2 PM

                          Towne Library (Group Study Area)



ANNOUNCEMENTS

TABLE OF CONTENTS

  • Course Description
  • Prerequisites
  • Required Materials
  • Recommended Materials
  • Course Topics
  • Course Grading
  • Time and Location
  • CETS Labs
  • Lecture Schedule
  • Homework Assignments
  • Homework Policy
  • Practice Exam Problems
  • Exam Policy
  • Exam Solutions
  • Class Data Sets
  • Regression Material
  • Project Material
  • Example Projects
  • Lecture Slides


  • COURSE DESCRIPTION

    This course builds on ESE 301 (Engineering Probability), and introduces students to the basic methods of statistical estimation, hypothesis testing, and regression. The emphasis is on practical applications of these tools, including the analysis of a variety of real-world data sets using standard statistical software. The capstone of the course is a small-team project, typically involving pairs of individuals. Each team is expected to formulate a problem of interest, gather relevant data pertaining to the problem, and analyze this data using multiple regression techniques. The project culminates in a written report that is designed to strengthen the students’ technical writing skills.

    EDUCATIONAL OBJECTIVES.   This course will introduce students to:

    1. Formal statistical methods for engineering applications.
    2. Practical analysis and interpretation of engineering data.
    3. Methods for formulating problems in statistical terms.
    4. Use of JMP sofware for regression and tests of hypotheses.
    5. Technical writing and communication of analytical results.  
    (return to contents)


    PREREQUISITES

      ESE 301 (or comparable course in probability)

    (return to contents)


    REQUIRED MATERIALS

    (return to contents)


    RECOMMENDED MATERIALS

    (return to contents)


    COURSE TOPICS

    Data Representations Ch.1[D] plus material in Ch.5 [JMP]
    Review of Probability Chs.2-5[D] plus JMP-Examples
    Random Sampling Ch.5[D] plus JMP-Examples
    Statistical Estimation Ch.6[D] plus JMP-Examples
    Confidence Intervals Ch.7[D] plus JMP-Examples
    Hypothesis Testing Chs.8-9[D] plus JMP-Examples
    Regression Analysis Chs.12-13[D] plus supplementary materials and JMP-Examples

    (return to contents)


    COURSE GRADING

    Homework 10%
    First Exam 25%
    Second Exam 25%
    Project 40%

    Code of Academic Integrity

    Using or attempting to use unauthorized assistance, material, lab results,  or
    solutions (in part or whole) is a violation of the
    Code of Academic Integrity
    and will result in a zero grade for the course


    (return to contents)


    COURSE TIME AND LOCATION
     


    (return to contents)


    CETS Labs
     


    (return to contents)


     


    TENTATIVE SCHEDULE FOR FALL 2014

    Lectures

    Day/Date

    Topic

     Homework

    INTRO

    Th/Aug.28

    Introduction

     

    1

    Tu/Sep.2

    Data Representations

     

    2

    Th/Sep.4

    Discrete Random Variables

     

    3

    Tu/Sep.9

    Sums of Random Variables

     

    4

    Th/Sep.11

    Continuous Random Variables

     PS1 due

    5

    Tu/Sep.16

    Random Sampling

     

    6

    Th/Sep.18

    Central Limit Theorem

     

    7

    Tu/Sep.23

    Estimation

     
    8

    Th/Sep.25

    Regression Model  PS2 due
    9

    Tu/Sep.30

    Regression Analysis

     

    10

    Th/Oct.2

    Multiple Regression Model

     

    11

    Tu/Oct.7

    Multiple Regression Analysis

     PS3 due
     

    Th/Oct.9

    FALL BREAK

     
     

    Tu/Oct.14

    EXAM 1

     
    12

    Th/Oct.16

    Simple Confidence Intervals

     
    13

    Tu/Oct.21

    One-Sided Confidence Intervals

     
    14

    Th/Oct.23

    General Confidence Intervals

     

    15

    Tu/Oct.28

    Regression Applications

      Project  Proposal due

    16

    Th/Oct.30

    Regression Applications

     

    17

    Tu/Nov.4

    Simple Tests of Hypotheses

     PS4 due

    18

    Th/Nov.6

    General Tests of Hypotheses

     

    19

    Tu/Nov.11

    Two-Sample Tests

     
     

    Th/Nov.13

    NO CLASS  
    20 Tu/Nov.18

    Regression Applications

     

    21

    Th/Nov.20

    Regression Applications

     
    22

    Tu/Nov.25

    Regression Applications

    PS5 Due
    23

    Th/Nov.27

    THANKSGIVING BREAK

     

      Tu/Dec.2 EXAM 2  
    24 Th/Dec.4 Additional Regression Topics  
    25 Tu/Dec.9 Additional Regression Topics  
      Mon/Dec.15 Final Projects Due (5PM)  

     

    (return to contents)


    HOMEWORK ASSIGNMENTS

    PROBLEMS 
    SOLUTIONS
    PS1
    S1
     PS2
    S2
     PS3
    S3
     PS4
    S4
     PS5
    S5


    HOMEWORK POLICY:


    (return to contents)
     


    PRACTICE EXAM PROBLEMS

    PROBLEMS 
    SOLUTIONS
    PE1.1
    PS1.1
    PE1.2 PS1.2
     PE2.1
    PS2.1
    PE2.2 PS2.2

    (return to contents)
     

    EXAM POLICY:


    (return to contents)
     


    EXAM SOLUTIONS

    PROBLEMS 
    SOLUTIONS
    E1
    ES1
    E2
    ES2

    (return to contents)
     
     


    DATA SETS FOR CLASS

           All class data sets can be downloaded from the web site: http://www.seas.upenn.edu/~ese302/lab-content/

           In addition, the following homework data sets can be accessed directly::

     

    (return to contents)
     



    REGRESSION MATERIAL
      (return to contents)


    PROJECT MATERIAL

    1. PROJECT DESCRIPTION

    During the first few weeks of class, you should choose a partner to work with. Projects are expected to involve teams of two individuals. Individual projects are permitted. Teams of three are also permitted, but not encouraged -- and are expected to do more work. 

    Each team is expected to undertake a case study involving a statistical analysis of some data set.  The only substantive requirement is that your analysis should focus on multiple regression.  This analysis should demonstrate a sound statistical knowledge of regression (including goodness of fit and significance tests of coefficients).  The report is to be typed double-spaced and is expected to be on the order of 15 to 20 pages in length (this is not a rigid requirement).  The first page should contain an introduction which (i) motivates the problem, (ii) states all of the main assumptions [without mathematics], and (iii) briefly summarizes your findings. The main body of the report should contain a detailed development of your statistical analysis, including a mathematical formulation of both the problem studied and the analytical methods employed.  Use plots and graphs wherever possible to illustrate your results (preferably in JMP). But be sure to back these up with appropriate discussion and analyses. [Do not include graphs or tables that are not discussed in the text.] All source material (including software packages used) should be cited explicitly. The last page should summarize your findings and conclusions in detail. Finally, be sure to include page numbers in your report. (I write comments on every project, and am very unhappy when I have no page numbers to refer to!).

    Along with the hard copy of your project, you must send me an email attachment (preferably on the same day you turn in your project) including the following items:

     It is strongly recommended that you include all files in a single ZIP file with one of your names (or initials) in the title. If you send separate files, be sure to put a name (or initials) in each of the file names. If I get 10 files all called "ese302.jmp" for example, then they will get jumbled (or destroyed) when they are downloaded.

    There are no constraints on the subject of your case study.  You might start by looking through the set of  projects that are included in this web page.  (These projects are presented in their original form -- including possible errors. So don't assume that everything in them is correct. They  intended mainly to suggest possible topic areas and data sources. )  With respect to data, it is preferable to use real data from an experiment or survey that you or someone else has performed. For example, sports fans may wish to consider published data on their favorite players or teams.  (A variety of interesting data sources can also be found by ‘web surfing’.)  In any case, you must clearly specify the source of your data.

    Students often find it difficult to obtain the data sets they want to study. So it is advisable to start looking as soon as possible. There is a list of web sites given below where you can start to search for existing data.

    The final grade will be based on several factors: the appropriateness and sophistication of the analytical methods employed, the correctness of the analysis carried out, the logic and perceptiveness of the conclusions drawn, and the overall clarity of the presentation.

    2. DATA FOR REGRESSIONS

    One key point to remember in gathering data is that your data must involve properties of well-defined sampling units . For example, to study the relation between income and years of education, it would be ideal to have data on individual workers (sample units) with both the income and years of education for each individual. This is usually not possible. But often such data exists at, say, the state level. So here you could do a regression by taking states as sampling units and regressing per capita income of states against average years of education of state residents.

    A particularly vexing problem here is the preponderance of data in the form of summary tables. For example, if you only have a summary table listing average income for various education categories in the US, it is very difficult to run a regression on this data --because there is no clear sampling unit. Since most data you find will be in the form of summary tables, it is very difficult to use such data in regressions (without a host of additional assumptions). However, if you were able to find summary tables for each of a number of countries, then you could use 'country' as a meaningful sample unit in regression, and examine the relation between education and income across countries.

    So in short, you should try to find data for which the sampling unit is well defined, and hopefully for which there are sufficiently many samples to allow an interesting regression. A common rule-of-thumb here is to have at least 10 samples for every beta parameter estimated. So a simple regression (two beta parameters) should ideally have at least 20 samples. This does not mean that you shouldn't consider a wide range of possible explanatory variables. It only means that your final regression should have enough samples to allow reasonable estimation of each parameter. (See the notes on Stepwise Regression above for further discussion.)
     

    3. SELECTED WEB SITES FOR DATA SOURCES
     

    PENN CAMPUS RESOURCES

    http://guides.library.upenn.edu/data/
    http://www.cml.upenn.edu/
     

    GENERAL DATASET COLLECTIONS

    http://lib.stat.cmu.edu/datasets/
    http://www.stat.ucla.edu/cases/
    https://www.cia.gov/library/publications/the-world-factbook/index.html/

    http://genderstats.worldbank.org   
    http://www.icpsr.umich.edu/

    http://web.lexis-nexis.com/statuniv/
    http://www.lib.umich.edu/libhome/Documents.center/stats.html
     

    CENSUS DATA

    http://www.census.gov
    http://www.census.gov/DES/www/welcome.html
    http://dataferrett.census.gov/TheDataWeb/
    http://www.census.gov/apsd/www/statbrief/
     

    COMMODITIES

    http://www.carprices.com/
    http://www.consumerreports.org/
    http://www.diamonds.com/
    http://www.diamondfinder.com/
     

    CRIME

    http://www.albany.edu/sourcebook/
    http://www.ojp.usdoj.gov/bjs/dtdata.htm#crime
    http://bjsdata.ojp.usdoj.gov/dataonline/
    http://www.fbi.gov/ucr/ucr.htm#nibrs
     

    ENVIRONMENTAL

    http://www.epa.gov/enviro/html/ef_overview.html
    http://www.eia.doe.gov/
    http://www.pasda.psu.edu/
     

    INTERNATIONAL DATA

    http://www.geographic.org/
    http://www.un.org/databases/
    http://unstats.un.org/unsd/default.htm
    http://www.worldbank.org/data/

     

    MEDICAL

    http://www.cdc.gov/nchs/fastats/
    http://www.cdc.gov/scientific.htm
    http://www.nci.nih.gov/public/factbk95/index.htm
    http://www.who.ch/hst/hsp/a/country.htm
    http://www.lungusa.org/
    http://seer.cancer.gov/

    http://www.cdc.gov/brfss/smart/2002/summary_matrix_02.htm
     

    NATIONAL DATA

    http://www.bls.gov
    http://www.stat-usa.gov/econtest.nsf
     

    SPORTS

    http://www.sportstalk.com
    http://sportsillustrated.cnn.com/
    http://www.baseballprospectus.com/
    http://www.hockeyguide.com/
    http://www.nhl.com
    http://www.pgatour.com
     

    SURVEYS AND POLLS

    http://www.nua.ie/surveys/
    http://www.gallup.com/poll/releases/
    http://www.cnn.com/ALLPOLITICS/
     

    TRANSPORTATION

    http://www.bts.gov/ntda/
    http://www.apta.com/research/stats/

    http://www.njtide.org/links/index.html
    http://www.nhtsa.dot.gov/people/ncsa/

    http://www.ntsb.gov/Aviation/Stats.htm
    http://www.nhtsa.dot.gov/people/ncsa/fars.html
     

    (return to contents)


    EXAMPLE PROJECTS

      The following  projects have been selected as examples of the level and quality of analysis that I am looking for. However, please be aware that none

      these projects is free of errors (i.e., none have been "corrected"). So please don't think that because something appears in an example project that it is

      automatically "OK". If you are not sure about something, please ask me.

     



    (return to contents)


    LECTURE SLIDES

    return to contents)
     


    Last modified: Jan.11, 2014