A00-240 - SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling Credential Exam
Go back to SAS-Institute
In partitioning data for model assessment, which sampling methods are acceptable? (Choose two.)
Simple random sampling without replacement
Stratified random sampling without replacement
When mean imputation is performed on data after the data is partitioned for honest assessment, what is the most appropriate method for handling the mean imputation?
The sample means from the training data set are applied to the validation and test data sets.
An analyst generates a model using the LOGISTIC procedure. They are now interested in getting the sensitivity and specificity statistics on a validation data set for a variety of cutoff values. Which statement and option combination will generate these statistics?
Which method is NOT an appropriate way to score new observations with a known target in a logistic regression model?
Augment the training data set with new observations and rerun the LOGISTIC procedure.
An analyst investigates Region (A, B, or C) as an input variable in a logistic regression model. The analyst discovers that the probability of purchasing a certain item when Region = A is 1. What problem does this illustrate?
The total modeling data has been split into training, validation, and test data. What is the best data to use for model assessment?
Including redundant input variables in a regression model can:
Destabilize parameter estimates and increase the risk of overfitting.
A non-contributing predictor variable (Pr > |t| =0.658) is added to an existing multiple linear regression model. What will be the result?
An increase in R-Square
A company has branch offices in eight regions. Customers within each region are classified as either "High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable. Suppose there is a significant interaction between REGION and VALUE. What can you conclude?
The difference between average purchases for medium and high value customers depends on the region.
There are missing values in the input variables for a regression application. Which SAS procedure provides a viable solution?
Screening for non-linearity in binary logistic regression can be achieved by visualizing:
A trend plot of empirical logit versus a predictor variable.
A predictive model uses a data set that has several variables with missing values. What two problems can arise with this model? (Choose two.)
Complete case analysis means that fewer observations will be used in the model building process.
New cases with missing values on input variables cannot be scored without extra data processing.
In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data?
Training: 50% Validation: 50% Testing: 0%
Spearman statistics in the CORR procedure are useful for screening for irrelevant variables by investigating the association between which function of the input variables?
Rank-ordered values of the variables
An analyst knows that the categorical predictor, storeId, is an important predictor of the target. However, store_Id has too many levels to be a feasible predictor in the model. The analyst wants to combine stores and treat them as members of the same class level. What are the two most effective ways to address the problem? (Choose two.)
Cluster by using Greenacre's method to combine stores that are similar.
Use subject matter expertise to combine stores that are similar.
One common approach for predicting rare events in the LOGISTIC procedure is to build a model that disproportionately over-re presents those cases with an event occurring (e.g. a 50-50 event/non-event split). What problem does this present?
Only the intercept estimate is biased.
What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?
There is no ability to compare the effectiveness of different cleansing methods.
Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable?
Average Squared Error
This question will ask you to provide a missing option. Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure: Means Region / <insert option here> =levene;