Data analysis settings

Updated on July 5, 2022

You can change these settings to affect the Data analysis step of the predictive model configuration process that is described in Analyzing data. The settings include the names of the sections that are displayed for the step and the default values for particular options.

Setting	Description
Label
Wide of scheme	Change the label for cases not found in the development sample.
Missing	Change the label for missing values.
Residual group	Change the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another interval.
Remaining symbols	Change the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another category.
Ignored	Change the label for fields that are excluded from subsequent analysis and modeling.
Binning and grouping settings
Number of bins for numeric fields	Set the initial number of bins used to analyze the values of each numeric.
Number of bins for symbolic fields	Set the initial number of bins used to analyze the symbols of each symbolic field.
Create equal width intervals	Select this option to create equal width intervals by default.
Ignore ordering	This option is for symbolic predictors only, and by default, it is enabled. Select this option to combine a category with others most similar in behavior. When this option is disabled, the order of the symbolic categories is assumed to have some meaning and only the neighboring categories are grouped.
Use z-score instead of student's test	The z-score and student's test methods determine whether the behavior in different bins is similar. The student's test is the most widely used statistical method to see if two sets of data differ significantly. Select this option for compatibility with previous Prediction Studio versions.
Auto grouping	Select this option to set auto grouping as a default setting. For more information, see Auto grouping option for predictors.
Granularity	Set the highest acceptable probability that the difference in behavior between two adjacent intervals is spurious. Reducing the granularity reduces the number of intervals.
Minimum size (% of the sample)	Set the minimum number of sample cases in each interval. Use this setting to ensure that there is sufficient evidence of the behavior of cases in the interval for its behavior to be used in grouping. Intervals with few cases are combined with their nearest neighbor.
Merge bins below minimum size in one residual bin	This option is for symbolic predictors only. Bins below the minimum size are combined into a residual bin on the assumption that there are insufficient cases for their behavior to be a basis for predictor grouping.
Deselect predictors with performance below	Set the minimum level of predictive power for a field to continue as a predictor.
Display settings
Use scientific notation	Select this option to see values displayed in a scientific notation.
Real value precision	Set the number of decimal places to display real values.
Performance difference threshold	Set the maximum value for the Performance difference column in the Data analysis step. When you change a predictor's role and its performance difference value is higher than the threshold, the value is highlighted in red. This setting applies to the samples constructed with a validation set.

Previous topic Sample construction settings
Next topic Predictor grouping settings

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Data analysis settings

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.