Skills Identification – Collection of Ai Community Dataset

describe Skills Identification:English Score

This study aims to design a skills dataset and uses Artificial Intelligence (AI) in the identification of the skills. In addition, it describes the abilities and expertise of a person in a specific area, such as strengths and weaknesses. Therefore, the university can use these results to provide matched learning styles, training courses or materials, and even learning opportunities for their skills.

Rajabhat Dataset: a dataset of skills identification

Abstract

How can the dataset describe the skills of the individual students as personal skills? This question motivates this research based on every university providing courses and examinations for students and staff. This study aims to design a skills dataset and uses Artificial Intelligence (AI) in the identification of the skills. In addition, it describes the abilities and expertise of a person in a specific area, such as strengths and weaknesses. Therefore, the university can use these results to provide matched learning styles, training courses or materials, and even learning opportunities for their skills.

There are many tools to identify the skills, such as quizzes or examinations, interviews, self-assessments, etc. In this study, it introduces English skills that are essential and used as the sample data by collecting the existing five-year data of VRU-TEP (Valaya Alongkorn Rajabhat University Test of English Performance) design dataset and producing the visualization.

Finally, this skills dataset locates in Rajabhat Dataset and uses in training/testing with Machine Learning algorithms. It is also available to share and reuse with the other Rajabhat University as a strategy for improving their skills in a university. For example, it prepares suitable courses for absence skills, whereas the presence abilities also need to be increased. This reduces a time-consuming process that impacts on the budget and academic systems. It is a well-design dataset for the AI approach by using supervised Machine Learning algorithms to learn and test the model. The accurate classification algorithms such as SVC (Support Vector Machine), Logistics Regression, K-Nearest Neighbors or Naïve Bayes will be applied at the end of this study.

Keywords AI, Classification Algorithms, Machine Learning, Rajabhat Dataset, Skill Identification

Introduction

Recently, the skill identification is crucial tools for university to understand the quality and to improve the students and staff skills. For example, students with an English skill are important for organization and businesses all over the world. Especially, in Thailand which is not the official language, and it is an open-door skill such employers and governments in all countries need English skills to improve their performance. VRU uses VRU-TEP (Test of English Performance) to assess students and staff in the university. It consists of Listening and Writing but an interview does not available on this examination. Therefore, the idea of the skills identification by using AI has been applied in this study.

Recently, the Artificial Intelligence concept introduced in various areas even medical or transportation such as image recognition (protein or cancer classification), UAV (Unman Automated Vehicle), face reorganization, EV Cars (Electrical Vehicles), etc. However, AI learn and predict from the previous data; Chatbots reply to human questions and predict the weather in daily life. Therefore, this study aims to apply AI in Education areas such as skill identification based on the university contains lots of information and databases.

For example, Academic Affairs services students and academic staff with tasks and contributes academic transactions to databases. Language Centre provides the VRU-TEP (Reading and Writing) English exam for students and teachers. It acts as English communication skills for them. It is very interesting to transform the data into a database and design the Rajabhat Dataset. This dataset is open to sharing and reuse with other Rajabhat Universities. Researchers or Administrators (President, Dean, or Director) assess strengths and weaknesses in their learning and teaching.

In Course designing, it needs the requirements to develop their curriculums: learning materials, teaching styles, organizing the content, design the assessment or evaluation method. It conducts the student to learn or choose the courses based on their progression. This is the pain point that needs Artificial Intelligence to support the algorithm of Machine Learning for accurate skill identification.

University improves their performance by increasing data to manage the learning or teaching. This study’s purpose is to develop a dataset for the Data-Driven approach that can use in decision-making to improve the effectiveness of instruction and student learning outcomes. For example, faculty members use it to identify the area of the curriculum and provide resources. AI monitors student progress on the Rajabhat Dataset with intervention to avoid the student falling behind.

Therefore, a dataset develops from the tables or database of the Language Center and connects with the database from the Academic Affair. This is the example of interoperability between two main organizations. It solves the abilities of working across the organization.

Methodology

Objectives

Learning from data is the AI approach called “data pattern”. This study uses “Rajabhat dataset”. It uses datasets for AI algorithms, Machine Learning (ML) to learn and test. This help to make the data more understandable, including graphs, charts, or maps. This research presents data with three stages: Hindsight, Insight, and Foresight. Therefore, Rajabhat Dataset introduces the dataset template to collect and support the AI algorithms. It describes and predicts the exciting or highlighting data or trends.

Design the dataset process

The database consists of relational tables. It uses to retrieve or access to the data whereas dataset is a part of database. It focusses on the fields or column from the tables. A dataset is a set of data. It is the future design or design the future from the existing data. It consists with columns (field of data) and rows (record of data). For examples, student record composes with student id, full name, email, address, phone, department, faculty, age, birthday, etc. School record present location, title, province, postcode, level of education, etc.

This study transforms the existing database to learning skills dataset. It reduces time and cost to develop the new dataset from scratch. It chooses the column as the independent variable called “feature” for learning and testing in the AI model. In addition, it extracts the crucial data, cleansing data (such as remove the unnecessary record or create new field), and transfer to new dataset table or file.

Table 1 English Skills dataset

IDNO	FULLNAME	STATUS	MAJOR	FACULTY	PreTest	PostTest	Year	Progression	Results
Student IDNO	Fullname of the student: firstname and lastname	Student status	Courses and major of studying	Unit/Faculty of the students	Score at the beging of the course 3^rd year students	Score taking the exam in the 3^rd year students	Academic year	Pretest score grater than Post test	Score that passing the level skill level or >60

In above table, it interconnects with the student dataset in the Rajabhat Dataset as support information. It prepares the dataset to describe the personal background information: school, address, age, gender, etc.

Collection Process

University develops lots of tables and databases. The database supports the manipulation: retrieve, access, update, insert, or delete the information. It helps the security of the info. Recently, the application service the customer passes through the database. It fairly describes the tasks from the transaction of the services. For example, it collects all the transactions during which students take the examination, apply to the courses, the teacher updates the grading, etc.

Collect data from the database, it depends on the among of the data. In small data, the web application collects the transaction from users. For large among of data, developers provide the API (Application Program Interface) to automatically import data from various tables or databases such as Web applications, IoT devices, and Social Media (Facebook, LINE, etc.). Finally, data warehouse (Google BigQuery, AWS, Snowflake, etc.) is the interoperability to integrate into the databases.

This study, it collects the data from the database with two approaches: a web application and API connection to the organization databases. It is in the beginning stage of the AI dataset collection. Therefore, data cleansing is mainly workload to prepare dataset for analyzing. For examples, it prepares dataset with filling in missing values, and standardize data format. In addition, Rajabhat Dataset provide the storage for the data collection. It locates and free available for every Rajabhat University members url:https://dataset.rajabhat.com. Otherwise, google sheets are also available to storage and share to the Rajabhat Dataset.

Analyze Process
Measure the effectiveness of the skills identification in the education. This is the main process of this study. Normally, AI uses dataset that already clean to use. The unnecessary fields will be identified and removed. It looks for the independent variable called “features” that impact to the results or “class”. For example, school has impact on the English score of the students. Major of studying presents the score from other majors. Class for the communication skills present presence or absence. It uses Machine Learning to identify the accurate values from learning and testing clusters.

Normally, the algorithms are divided in two different methods: classification and regression. It chooses the best algorithms to find the accurate data patterns. Different algorithms are suitable for different problems. Especially, the size of the dataset is the number of rows and columns in the dataset. Rows present the volume that uses for training and testing the results(classes). Columns are the effecting fields of the results. According to the different ability of Machine Learning algorithms, this study uses the supervised machine leaning algorithms to find the suitable for the skills dataset. It presents the CA (Correct Accurate values of the predictions) for the particular algorithms.

Visualize Process

In this study, it uses the Google Data Studio as tools to represent the graphs and charts. It makes the tree different sights of the information: Hindsight, Insight, and Foresight. It finds paint points of the identification of the skills from the students and staff.

Figure 1 Learning skills dataset in Rajabhat Dataset with AI

This study aims to use data from Rajabhat Dataset and collaborate with the existing databases in the university. It purposes the new dataset that use for describing the English skills that uses to predict the skills identification. AI Machine Learning is applied for the classification from the dataset. It presents the visualization as following figure.

Foresight is the ability to predict and to understand how long it takes of the results. However, insight describes about understanding or perception of the event from situation and problems. It is different from hindsight that understand after it occurred. It learns from the experiences. Then it makes a better understanding from data. In summary, foresight and insight are understand the future whereas hindsight understanding about the past. This is a reason of using three bases of visualizations in Rajabhat Dataset.

Figure 2 Basis of Visualization

Basically, AI uses Machine Learning algorithms to find the prediction results call “class”. It needs to find the appropriate variables or “feature” that impact to the prediction. therefore, it requires the process to analyze the suitable feature as following.