FAQ for JMP %%%%%%%%%%%%%%% Q: Should we run decision tree & regression model on the entire data set? A: There should always be a training and a testing data set. For convenience, we have put them in the same file for mow. You should 'train' the decision tree and regression model on the trianing set (e.g. the first 3,000 rows), and then measure the prediction error (RMSE) on both training and testing data. %%%%%%%%%%%%%%% Q: What is RMSE? A: Root Mean Squared Error RMSE = sqrt(Sum((Residual)^2)/(# observations)) = sqrt(Sum_i (y(i) - y-hat(i))^2 / n) where Residual = actual - predicted = y - y-hat Some people prefer an "unbiased" estimator using (# observations - # of features selected) in the denominator, but the first formula I gave is much more standard %%%%%%%%%%%%%%% Q: What are the steps in jmp to separate the data into training and testing? A: There is an "exclude" option under "rows". which will exclude any data, which is selected. (I.e. not use it for building the model.) %%%%%%%%%%%%%%% Q: How to I get predictions from the decision trees or regessions back into the main data table? A: Under the magic red triangle (in the case of regression or neural nets) is an option "save model" which will add ππa column with the model. This column will have "predictions" for both the training and test set. Decision trees are uglier - see next question. %%%%%%%%%%%%%%% Q: I am having pretty good success looking at Decision Trees, but I do not see RMSE. I have run through all the options that are available and I don't see an Error anywhere. They were pretty easy to find on the Regression and Neural Net screens. Is it maybe called something else on this screen? A. Annoyingly, the decision tree does not compute the RMSE, although it willsave the "residuals" (true - predicted) under "save columns" (under the magic red triangle). We'll see later that if you run "k-fold cross-validation" (also under the magic red triangle), it computes the "SSE" (Sum of Squares Error = Sum _i (residual_i ^2)), so RMSE = sqrt(SSE/n) %%%%%%%%%%%%%%% Q: Why is XXX.jmp on the web? I have put up several other data sets which are not used in the homework. %%%%%%%%%%%%%%% How do you save the entire model with coefficients for stepwise regression? When you do a stepwise regression, there are two parts. first you do the feature selection. Then you select "make model" and actually fit the model. Once you have fit the model, you can go to the magic little red arrow in the upper left corner of screen and write the model (and equation) to your spreadsheet. If you are doing logistic regression, select "save probability" which will create three columns, including the most likely label. If you are doing linear regression, under the same magic triangle, select "save columns" and then "save prediction formula" to have your formula added to the spreadsheet.