Model construction Before we start thinking about the model itself we first need to make a selection of features and then prepare them accordingly if they require it. We will focus only on product features because we want to check whether there are relationships among them that determine high or low value for customers. Below I have selected features that may be potentially relevant to our model. features = ['product_group' 'product_category' 'product_type' 'unit_of_measure' 'tax_exempt_yn' 'promo_yn' 'new_product_yn'] df_products[features].head Features can be of different types they can be categorical or continuous values. Depending on the type of model we are using some actions may be necessary such as converting to numeric values and normalization. In the case of the model we will use EBM this is not necessarily the case.
The algorithm itself will select the appropriate type of features and then deal with it. The price is a number and the units of measure have been broken down into values and categories. The next very important step is to properly divide the dataset into a set for training and for model validation. The way we do it is key. The data must not overlap Taiwan WhatsApp Number List and the dividing point itself is also important depending on the type of data whether it is time sensitive data . In the simplest form the set is randomly divided into parts of which the set for validation is usually the smaller one.
We will use the help of a method from the scikit learn library. . Perhaps it is related to our outliers and perhaps we have too little data or there is too little data in the data information that is essential to the model. There may also be an anomaly that causes customers to behave differently within a sub product group for some reason. However there is no doubt that the quality of our model is so high that we can confidently proceed to further analysis.