A Framework for data mining of structured semantic markup extracted from educational resources on University websites
Abstract
The coronavirus pandemic has forced education at all levels to change from face-to-face mode to online learning. In keeping with that purpose, Universities are releasing a significant number of educational resources on the Web to support virtual education. Final users, who need these educational resources, explore the Web through search engines such as Google, Yahoo, Yandex, or Bing; unfortunately, the search results they obtain lack accuracy and are not necessarily adequate to their requirements. This problem is because Web resources release does not consider their visibility or ease of being found. One way to improve the experience of users who browse the Web is by delivering more appropriate content in response to their searches. An alternative to enhancing the meaning of web searching results is embedding structured semantic markup in the HTML of web pages through standards such as JSON-LD and Schema.org vocabulary, in compliance with W3C recommendation. Search engines can interpret this markup to understand the resources being published and, consequently, improve the rightness of search results. For example, Google uses the structured semantic markup to show rich fragments, Rich Snippets, or even Knowledge Graph in user searches.This research proposes a framework that enables a systematic analysis of the websites of the top-ranking universities, focused on the educational content they provide to review the embedded semantic markup annotated by using JSON-LD and the Schema vocabulary. To this end, a worldwide list of the universities that are part of the top international ranking has been compiled. Then, by using Web Scraping techniques, we have analyzed these universities' Websites in search of educational resources and reviewed if the embedded structured markup is included. Finally, data mining techniques have been used to describe and organize the educational resources obtained.The contribution of this work is two-fold. Firstly, the analysis of embedded structured markup that uses Schema vocabulary and JSON-LD format in university websites. This analysis is relevant since previous research has not explicitly focused on the educational field or has not used a specific dataset within this context. Secondly, the proposal of a framework that allows accomplishing this type of analysis of embedded structured markup from a data collection phase to obtaining results and indicators on the data. It addresses the data mining process from download to the final data analysis to get information. The proposed framework consists of eleven components distributed in three well-defined layers: data access layer, service layer, and application layer. The framework component development process is defined by merging two methodologies, Design Science Research (DSR), to guide the creation of an artifact, and CRISP-DM, to address the data mining process. The architecture of the framework integrates tools such as Scrapy (Python), for web scraping and crawling functions, MongoDB for manipulating semi-structured data with a NoSQL management mode, Redis as an in-memory database (auxiliary) that through queries allows to determine if the URLs that are extracted in the Web Scraping process have already been processed or not (duplicate control), and Apache Kafka as a communication intermediary and facilitator of the flow or exchange of information between the other components.Moreover, this work provides a data set made up of the HTML pages of the universities' Web sites that can be used for further analysis.
Keywords: structured semantic markup, data mining, user experience, framework, educational resources
DOI: 10.54941/ahfe1001745
Cite this paper
More from this volume
- Tesla Model 3: Impact of Vertical Segmentation on Visual Search Time
- Methods to Promote Increased Usage of Voice Interaction in a Vehicle
- The influence of music on colour preference in vehicle environment
- Design of electric bicycle for take-away delivery based on KANO model and TRIZ Theory
- External Human-machine Interface Design for Automated Vehicles Based on Analytic Hierarchy Process
- User eXperience Heuristics for Geoportals
- User Experience Study on Self-Checkout System of Hypermarkets in Taiwan
- Eliciting potential for positive UX using psychological needs: Towards a user-centered method to identify technologies for UX in the car interior
- Designing a UX Mobile App for Hydration and Sustainability Tracking in Academia
- User Experience and Service Mode of Telecare System with Handheld Devices
- User Experience of Visual Perception for Smart Central Control System
- A User Experience Investigation on Using Augmented Reality Technology for Explaining Step-by-Step Instructions


AHFE Open Access