Online Shopping Website Analysis for Marketing Strategy Using Clickstream Data and Extra Trees Classifier Algorithm

Diah Prastiwi


On an online shopping website, the platform may provide a service to the shop owners by suggesting which items to promote. One possible consideration is price. If an item is priced more expensively than the average price of other items in the same category, then the item should be advertised more intensely, or repriced. Due to the quickly growing number of products and categories, calculating the average price in real time can be difficult or slow. Alternatively, one may employ machine learning algorithms. In this study, we use Extra Trees Classifier on clickstream data, which is user activity report. We demonstrate the algorithm on the clickstream data of a an online shopping website for pregnant women, retrieved from UCI Machine Learning Repository Dataset. The data has 14 attributes and 165474 entries. The model is trained on 75% of the data, and tested on the remaining 25%, with an observed accuracy of 99 %.


Clickstream Data; Extra Trees Classifier; Online Shopping Website

Full Text:



D. Ali, M. B. Hayat, L. Alagha, & O. K. Molatlhegi, ” An evaluation of machine learning and artificial intelligence models for predicting the flotation behavior of fine high-ash coal”, Advanced Powder Technology, vol. 29, no. 12, pp. 3493-3506, 2018, DOI:

R. Atanassov, P. Bose, M. Couture, A. Maheshwari, P. Morin, M. Paquette, M. Smid, & S. Wuhrer, ”Algorithms for optimal outlier removal”, Journal of Discrete Algorithms, vol. 7, no. 2, pp. 239-248, 2009, DOI:

M. C. Belavagi and B. Muniyal, ” Performance Evaluation of Supervised Machine Learning Algorithms for Intrusion Detection”, Procedia Computer Science, vol. 89, pp. 117-123, 2016, DOI:

C. Chatfield, ”Exploratory data analysis”, European Journal of Operational Research, vol. 23, no. 1, pp. 5-13, Jan 1986,


Clickstream Data For Online Shopping :

P. Geurts, D. Ernst, & L. Wehenkel,” Extremely randomized trees”, Machine Learning, vol. 63, pp. 3-42, 2006,


D. J. Hand, P. Christen, & N. Kirielle, “F*: an interpretable transformation of the F-measure”, Mach Learn, vol. 110, pp. 451–456, 2021, .

I. Jenhani, N. B. Amor, & Z. Elouedi, ” Decision trees as possibilistic classifiers”, International Journal of Approximate Reasoning, vol. 48, no. 3, pp. 784-807, 2008, DOI:

T. F. Laura, J. F. Kevin, & R. B. J. Katherine, “Independent, Dependent and Other Variables in Healthcare and Chaplaincy Research”. Journal of Health Care Chaplaincy, pp. 161-170, 2014, DOI :

Y. J. Lim, A. Osman, S. N. Salahuddin, A. R. Romle, & S. Abdullah, ”Factors Influencing Online Shopping Behavior: The Mediating Role of Purchase Intention,” Procedia Economics and Finance, vol. 35, pp. 401-410, 2016, DOI:

S. Manochandar and Punniyamoorthy, M., ” Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining”, Computers & Industrial Engineering, vol. 124, pp. 139-156, 2018, DOI:

A. Rogier, T. Donders, Geert J. M. G. van der Heijden, T. Stijnen, & K. G. M. Moons, ”Review: A gentle introduction to imputation of missing values,” Journal of Clinical Epidemiology, vol. 59, no. 10, pp. 1087-1091, Oct 2006,


J. J. Salazar, L. Garlan, J. Ochoa, & M. J. Pyrcz,” Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy”, Journal of Petroleum Science and Engineering, 2021, DOI:

K. Singh, M. Elhoseny, A. Singh, & A. Elngar, Machine Learning and the Internet of Medical Things in Healthcare, Academic Press, 2021, pp. 89-111,

S. Supriyadi, Y. Nurhadryani, & A. I. Suroso, "Website Content Analysis Using Clickstream Data and Apriori Algorithm." TELKOMNIKA (Telecommunication, Computing, Electronics and Control), vol. 16, no. 5, pp. 2118-2126, Oct. 2018,



  • There are currently no refbacks.