This is an R Markdown documentation for the visualizations produced using data processing and analysis results of the edX Learner and Course Analytics portion of the pipeline presented in the paper, Ginda, M., Richey, M. C., Cousino, M., & Börner, K. (2019). Visualizing learner engagement, performance, and trajectories to evaluate and optimize online course design. PloS one, 14(5), e0215964.
The sample data and visualizations documented on this site use data processing and analytic results generated by the edX Learner and Course Analytics and Visualization Pipeline for the MITxPro Fall 2016 course, Architecture of Complex Systems (MITProfessionalX+SysEngxB1+3T2016).
The sample data sets are provided in the project GitHub repository’s Data directory. The sample data sets are briefly described below. A full description of each data set and field list described in the supplemental materials, S1 Appendix of the PloS one paper.
Each figure is a statistical graph was created using R Studio; numerous packages from Tidyverse, including plyr, stringr, ggplot2; the colorspace package was used to generate color palettes for the visualizations; RCurl for importing data from GitHub.
Data A was created by the script edX-1-courseStructureMeta.R. The data is a CSV of the the course structure used to represent the content of the MITxPro course, Architecture of Complex Systems, Fall 2016.
## 'data.frame': 735 obs. of 14 variables:
## $ id : Factor w/ 735 levels "block-v1:MITProfessionalX+SysEngxB1+3T2016+type@chapter+block@16ff74448fa34af2a1c080c99d4e844c",..: 9 3 524 667 166 258 381 485 417 443 ...
## $ mod_hex_id : Factor w/ 735 levels "003a48b7833a46feb5b56a684a115796",..: 591 119 254 629 352 523 79 658 314 394 ...
## $ courseID : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
## $ mod_type : Factor w/ 11 levels "chapter","course",..: 2 1 7 9 4 4 6 6 6 6 ...
## $ name : Factor w/ 221 levels "","Action Plan",..: 19 125 124 124 NA NA 141 144 145 146 ...
## $ markdown : Factor w/ 118 levels " >>Drive Layout: Front-wheel, rear-wheel, or all-wheel<<\n \n [[(Architectural decision), Design decision]]\n \n\n",..: NA NA NA NA NA NA NA NA NA NA ...
## $ order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ childOrder : int 0 1 1 1 1 2 3 4 5 6 ...
## $ treelevel : int 0 1 2 3 4 4 4 4 4 4 ...
## $ chpModPar : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: NA 3 3 3 3 3 3 3 3 3 ...
## $ seqModPar : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: NA NA 22 22 22 22 22 22 22 22 ...
## $ vrtModPar : Factor w/ 121 levels "079e627ca30f4899858cdcdd4574b553",..: NA NA NA 110 110 110 110 110 110 110 ...
## $ parent : Factor w/ 184 levels "03a3841a0f684ecd93995d985355da91",..: 151 151 35 81 163 163 163 163 163 163 ...
## $ modparent_childlevel: Factor w/ 735 levels "03a3841a0f684ecd93995d985355da91/1",..: 584 585 143 337 640 647 648 649 650 651 ...
Data B was created by the script edX-5-learnerTrajectoryNet.R. The data represents the learner trajectory network of a student that completed the MITxPro course, Architecture of Complex Systems, Fall 2016.
The network uses JSON format that includes a list of node and list of edges. The nodes represent course structure, specially all content and activity modules, and description of the student’s interaction with the content over the entire course. The edge list represents the student’s transitions between course content nodes.
Data C was generated by the script edX-6-moduleUseAnalysis.R. The data represents an analysis of overall module engagement statistics for an identified all active student in MITxPro course, Architecture of Complex Systems, Fall 2016, based on a list of all active students in the course that was created by edX-4-eventLogFormatter.R
## 'data.frame': 551 obs. of 43 variables:
## $ mod_hex_id : Factor w/ 551 levels "003a48b7833a46feb5b56a684a115796",..: 253 389 54 490 218 288 355 204 78 356 ...
## $ courseID : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
## $ module.type : Factor w/ 7 levels "drag-and-drop-v2+block",..: 2 2 4 4 4 4 4 2 4 2 ...
## $ desc : Factor w/ 111 levels "","Action Plan",..: 1 1 72 75 76 77 78 1 79 1 ...
## $ L2 : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: 22 22 22 22 22 22 22 22 22 22 ...
## $ L2label : Factor w/ 42 levels "Action Plan (20 min)",..: 32 32 32 32 32 32 32 32 32 32 ...
## $ L1 : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ L1label : Factor w/ 8 levels "Get Started (25 min)",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ order : int 5 6 7 8 9 10 11 12 13 14 ...
## $ unq_stu : int 1560 0 1544 1543 1543 1543 1543 0 1543 0 ...
## $ prpstu : num 0.997 0 0.987 0.986 0.986 ...
## $ session_ct : int 4510 0 1544 1543 1543 1543 1543 0 1543 0 ...
## $ days_ct : int 4020 0 1544 1543 1543 1543 1543 0 1543 0 ...
## $ events : int 8081 0 1549 1545 1545 1544 1545 0 1547 0 ...
## $ avg_evt_stu : num 5.18 0 1 1 1 ...
## $ totalTime : num 24810 0 3150 1696 2075 ...
## $ avgTimeStu : num 15.9 0 2.04 1.1 1.35 ...
## $ avgTimeEvt : num 3.07 0 2.03 1.1 1.34 ...
## $ progress.i : int 0 0 1260 1340 1356 1357 1371 0 1328 0 ...
## $ recurse.i : int 4672 0 286 205 188 187 174 0 217 0 ...
## $ forward.o : int 5828 0 1384 1332 1331 1339 1351 0 1341 0 ...
## $ backward.o : int 0 0 162 212 213 205 194 0 204 0 ...
## $ attempts : int NA NA 1549 1545 1543 1544 1545 NA 1547 NA ...
## $ avgAttempts : num NA NA 1 1 1 ...
## $ points : int NA NA 1544 1543 1542 1543 1543 NA 1543 NA ...
## $ avgPointStu : num NA NA 1 1 0.999 ...
## $ maxPointsPrb : int NA NA 1 1 1 1 1 NA 1 NA ...
## $ loads : int NA NA NA NA NA NA NA NA NA NA ...
## $ avgLoadEvents : int NA NA NA NA NA NA NA NA NA NA ...
## $ load_time : int NA NA NA NA NA NA NA NA NA NA ...
## $ avgLoadTime : logi NA NA NA NA NA NA ...
## $ plays : int NA NA NA NA NA NA NA NA NA NA ...
## $ avgPlayEvents : num NA NA NA NA NA NA NA NA NA NA ...
## $ play_time : num NA NA NA NA NA NA NA NA NA NA ...
## $ avgPlayTime : num NA NA NA NA NA NA NA NA NA NA ...
## $ pause : num NA NA NA NA NA NA NA NA NA NA ...
## $ avgPauseEvents: num NA NA NA NA NA NA NA NA NA NA ...
## $ pause_time : num NA NA NA NA NA NA NA NA NA NA ...
## $ avgPauseTime : num NA NA NA NA NA NA NA NA NA NA ...
## $ seeks : int NA NA NA NA NA NA NA NA NA NA ...
## $ avgSeekEvents : num NA NA NA NA NA NA NA NA NA NA ...
## $ seek_time : num NA NA NA NA NA NA NA NA NA NA ...
## $ avgSeekTime : num NA NA NA NA NA NA NA NA NA NA ...
Data D was generated by the script edX-7-moduleUseAnalysis.R. The data represents an descriptive statistics for an active students enrolled in the in MITxPro course, Architecture of Complex Systems, Fall 2016, based on a list of all active students in the course that was created by edX-4-eventLogFormatter.R.
Information about students that could be used for re-identification of participants has been excluded from the data set, specifically, gender, year of birth (yob), and level of education (loe).
## 'data.frame': 1565 obs. of 35 variables:
## $ user_id : logi NA NA NA NA NA NA ...
## $ grade : num 0.94 0 0.94 0.83 0.97 0.52 0 1 0.94 0.96 ...
## $ cert_status : Factor w/ 2 levels "downloadable",..: 1 2 1 1 1 2 2 1 1 1 ...
## $ gender : logi NA NA NA NA NA NA ...
## $ yob : logi NA NA NA NA NA NA ...
## $ loe : logi NA NA NA NA NA NA ...
## $ sessions : int 65 22 53 45 54 28 5 24 48 37 ...
## $ days_unq : int 38 12 29 31 28 14 4 18 36 31 ...
## $ mods_unq : int 284 141 291 255 291 150 86 289 289 274 ...
## $ vid_mods : int 46 23 47 47 47 26 12 47 48 37 ...
## $ prb_mod : int 130 64 139 102 139 65 40 139 139 136 ...
## $ oa_mods : int 5 1 5 5 5 2 1 4 5 5 ...
## $ events : int 4110 439 1200 747 954 680 254 900 832 985 ...
## $ vid_events : int 3379 138 497 273 339 315 73 302 285 280 ...
## $ prb_events : int 224 90 271 150 256 133 98 267 225 267 ...
## $ oa_events : int 43 5 47 21 47 13 3 41 47 50 ...
## $ oa_peerAccessEvents: int 9 0 9 3 9 0 0 9 9 9 ...
## $ oa_getPeerEvents : int 18 0 20 6 20 0 0 19 21 26 ...
## $ seqNextEvents : int 66 37 67 64 61 33 23 58 64 64 ...
## $ seqPrevEvents : int 13 0 7 7 12 11 3 5 11 10 ...
## $ seqGotoEvents : int 16 5 18 23 22 14 0 24 0 3 ...
## $ modAccessEvents : int 54 31 54 52 54 32 20 53 54 53 ...
## $ total_time : num 3236 1074 2738 1369 2021 ...
## $ vid_time : num 1724 265 776 436 698 ...
## $ prb_time : num 187 62.7 220.1 96.6 163.1 ...
## $ oa_time : num 200.37 3.88 168.27 12.42 199.32 ...
## $ oa_peerAccessTime : num 5.517 0 47.25 0.467 23.9 ...
## $ oa_getPeerTime : num 107.62 0 100.72 5.43 165.03 ...
## $ seqNextTime : num 462 443 816 347 458 ...
## $ seqPrevTime : num 29 0 27.68 42.73 7.68 ...
## $ seqGotoTime : num 61.55 2.75 103.07 87.1 74.03 ...
## $ modAccessTime : num 443 223 356 344 342 ...
## $ prb_attempts : int 185 85 211 142 217 96 83 202 199 222 ...
## $ prb_correct : int 119 62 122 93 150 58 72 126 121 120 ...
## $ prb_totalPoints : int 128 67 133 97 159 63 74 140 131 132 ...
Data E was generated by combining three outputs of script edX-6-moduleUseAnalysis.R for active, certified and non-certified students. Script edX-6-moduleUseAnalysis.R calculates statistics for modules use of subsets list of students:
Data E combines the unq_stu
, events
, totalTime
, and avgavgTimeStu
variables from module use analysis applied to three different The data set of active students is provided in from Data C. The remaining two sets of data in Data E were generated from lists of student identifiers based on from an analysis of the cert_status
field in Data D.
## 'data.frame': 551 obs. of 18 variables:
## $ mod_hex_id : Factor w/ 551 levels "003a48b7833a46feb5b56a684a115796",..: 253 389 54 490 218 288 355 204 78 356 ...
## $ courseID : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
## $ module.type: Factor w/ 7 levels "drag-and-drop-v2+block",..: 2 2 4 4 4 4 4 2 4 2 ...
## $ desc : Factor w/ 111 levels "","Action Plan",..: 1 1 72 75 76 77 78 1 79 1 ...
## $ L2 : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: 22 22 22 22 22 22 22 22 22 22 ...
## $ L2label : Factor w/ 42 levels "Action Plan (20 min)",..: 32 32 32 32 32 32 32 32 32 32 ...
## $ L1 : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ L1label : Factor w/ 8 levels "Get Started (25 min)",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ order : int 5 6 7 8 9 10 11 12 13 14 ...
## $ unq_stu : int 1560 0 1544 1543 1543 1543 1543 0 1543 0 ...
## $ unq_stu.1 : int 1353 0 1353 1353 1353 1353 1353 0 1353 0 ...
## $ unq_stu.2 : int 207 0 191 190 190 190 190 0 190 0 ...
## $ events : int 8081 0 1549 1545 1545 1544 1545 0 1547 0 ...
## $ events.1 : int 7146 0 1353 1353 1354 1353 1353 0 1353 0 ...
## $ events.2 : int 935 0 196 192 191 191 192 0 194 0 ...
## $ totalTime : num 24810 0 3150 1696 2075 ...
## $ totalTime.1: num 21491 0 2789 1387 1871 ...
## $ totalTime.2: num 3320 0 361 309 204 ...