This prototype was developed after my workplace supervisor knew that I was pursuing a master’s degree related to data science and assigned me to explore the possibility to classify student data in a meaningful way with machine learning. The prototype was later evolved to predict if a student could successfully obtain an award in due time.
I choose to implement the prototype in KNIME since it is an open sourced visual workflow application and supports Python. It would be easier to explain the machine learning processes to the management.
The workflow starts with obtaining the data source by connecting to the Oracle database using JDBC and aggregates the information from different tables so that each row in the output table represents the study history of a unique student. After a series of data cleaning, the data is fed to a selected algorithm to train the classification model. The model classifies unseen data and output the data as CSV files which could be imported to an existing Power BI service for visualization.
The accuracy was around 80% with some caveats. I made suggestions on how the accuracy could be improved and gave a presentation to my team before I was transferred to another department. I was the one in a hundred staff of the IT department to take up the data science study and produced feasible outcome.