The Data preparation step begins when you connect to a database or upload your data from a data set or a CSV file.The columns in the data source are used as predictors but you can later define their roles. For more information, see Defining the predictor role.
The data is necessary to create a statistically relevant sample with customer details that can be further segregated into different dataset types such as development, validation, and testing. The customer data that goes into development sample is used to develop predictive models. Data in the validation and test sample is used to validate and test model accuracy.
The data source contains customer and their previous behavior information. It should contain one record per customer, each record presented in the same structure. Ideally, the data should be present for all fields and customers but in most circumstances some missing data can be tolerated.
Based on your model selection and outcome field categorization, Prediction Studio generates data that you can view in the Graphical view tab and Tabular view tab. For more information, see Defining an outcome.
- Selecting a data source
Select a data source for the creation of predictive models. Before you select the input for the development, validation, and testing of data, make sure that these resources are available for you.
- Constructing a sample
A sample is a subset of historical data that you can extract when you apply a selection or sampling method to the data source. A sample construction helps to construct development, validation, and test data sets for analysis and modeling.
- Defining an outcome
Select a model type and define the outcome field representing the behavior that you want to predict in the model.