Mandarine Academy Recommender System (MARS) Dataset is captured from real-world open MOOC {https://dileap.com/}. The dataset offers both explicit and implicit ratings, for both French and English versions of the MOOC. Compared with classical recommendation datasets like Movielens, this is a rather small dataset due to the nature of available content (educational). However, the dataset offers insights into real-world ratings and provides testing grounds away from common datasets.
All items are available online for viewing in both French and English versions. All selected users had rated at least 1 item. No demographic information is included. Each user is represented by an id and job (if available).
Note that items considered in the dataset represent Resources (Tutorials, use cases, webcast) delivered in a video format with the possibility to select subtitles among 11 languages. These short videos are related to a specific software, skill, or job. Typically, resources are associated with courses, whereas one resource can be associated with multiple courses. For both French and English, the same kind of files is available in .csv format. We provide the following files:
Users: contains information about user ids and their jobs.
Items: contains information about items (resources) in the selected language. Contains a mix of feature types.
Ratings: Both explicit (Watch time) and implicit (page views of items).
The dataset files are written as comma-separated values files with a single header row. Columns that contain commas (,) are escaped using double quotes (« ). These files are encoded as UTF-8. .
Implicit Ratings
Contains observations of each resource page visited by users. Item id: Item unique identifier. User id: User unique identifier. Created at: Event date.
Explicit Ratings
Collects watch time per user/item pair. Item id: Item unique identifier. User id: User unique identifier. watch_percentage: Watch time provided in percentages. 0 means video not played. 100% means video is seen completely by the user. Created at: Event date. rating: A scale of 1-10 of the watch percentage. Each watched10% contributes 1 point to the rating column. For example, watching 30% of the videos equals a rating of 3, watching a 70% of a video equals a rating of 7.
Users
User information found in users.csv file. Each row represents a unique user. User id: Unique identification of a user. User ids are consistent between explicit ratings and implicit ratings (i.e., the same id refers to the same user across the dataset). Job: The job title associated with the user.
Items
Item information is contained in the file items.csv. Each line of this file after the header row represents one item, and has the following format: Item id: Item unique identification. Item ids are consistent between explicit ratings, implicit ratings, and items (i.e., the same id refers to the same item across the dataset). Language: Content language. Name: Content title. Nb views: Number of views. description: Content description (subtitles). created at: Content upload date. Difficulty: Content difficulty. Job: Related professions. Software: Related software. Theme: Related theme. Duration: Duration in seconds. Type: Tutorial / Use Case / Webcast.
Citing
When referring to the data, please cite the following paper: @data{DVN/BMY3UD_2022, author = {Hafsa, Mounir}, publisher = {Harvard Dataverse}, title = {{E-learning Recommender System Dataset}}, UNF = {UNF:6:PhD+xVW2pdkKj4z7qz8dtQ==}, year = {2022}, version = {V2}, doi = {10.7910/DVN/BMY3UD}, url = {https://doi.org/10.7910/DVN/BMY3UD} }
Request for Dataset
Use this form to request the dataset. Only non-commercial research is permitted on this dataset. This is a manual process so please allow at least two weeks to hear from us with the data sharing agreement.
MARS Dataset Download Form
We were unable to confirm your request
We have received your request. Someone from the R&D team will contact you shortly.