Note on the Sample Data and Visualizations

This is an R Markdown documentation for the visualizations produced using data processing and analysis results of the edX Learner and Course Analytics portion of the pipeline presented in the paper, Ginda, M., Richey, M. C., Cousino, M., & Börner, K. (2019). Visualizing learner engagement, performance, and trajectories to evaluate and optimize online course design. PloS one, 14(5), e0215964.

The sample data and visualizations documented on this site use data processing and analytic results generated by the edX Learner and Course Analytics and Visualization Pipeline for the MITxPro Fall 2016 course, Architecture of Complex Systems (MITProfessionalX+SysEngxB1+3T2016).

The sample data sets are provided in the project GitHub repository’s Data directory. The sample data sets are briefly described below. A full description of each data set and field list described in the supplemental materials, S1 Appendix of the PloS one paper.

Each figure is a statistical graph was created using R Studio; numerous packages from Tidyverse, including plyr, stringr, ggplot2; the colorspace package was used to generate color palettes for the visualizations; RCurl for importing data from GitHub.

Sample Data Index

Data A

Data A was created by the script edX-1-courseStructureMeta.R. The data is a CSV of the the course structure used to represent the content of the MITxPro course, Architecture of Complex Systems, Fall 2016.

## 'data.frame':    735 obs. of  14 variables:
##  $ id                  : Factor w/ 735 levels "block-v1:MITProfessionalX+SysEngxB1+3T2016+type@chapter+block@16ff74448fa34af2a1c080c99d4e844c",..: 9 3 524 667 166 258 381 485 417 443 ...
##  $ mod_hex_id          : Factor w/ 735 levels "003a48b7833a46feb5b56a684a115796",..: 591 119 254 629 352 523 79 658 314 394 ...
##  $ courseID            : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mod_type            : Factor w/ 11 levels "chapter","course",..: 2 1 7 9 4 4 6 6 6 6 ...
##  $ name                : Factor w/ 221 levels "","Action Plan",..: 19 125 124 124 NA NA 141 144 145 146 ...
##  $ markdown            : Factor w/ 118 levels " >>Drive Layout: Front-wheel, rear-wheel, or all-wheel<<\n \n [[(Architectural decision), Design decision]]\n \n\n",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ order               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ childOrder          : int  0 1 1 1 1 2 3 4 5 6 ...
##  $ treelevel           : int  0 1 2 3 4 4 4 4 4 4 ...
##  $ chpModPar           : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: NA 3 3 3 3 3 3 3 3 3 ...
##  $ seqModPar           : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: NA NA 22 22 22 22 22 22 22 22 ...
##  $ vrtModPar           : Factor w/ 121 levels "079e627ca30f4899858cdcdd4574b553",..: NA NA NA 110 110 110 110 110 110 110 ...
##  $ parent              : Factor w/ 184 levels "03a3841a0f684ecd93995d985355da91",..: 151 151 35 81 163 163 163 163 163 163 ...
##  $ modparent_childlevel: Factor w/ 735 levels "03a3841a0f684ecd93995d985355da91/1",..: 584 585 143 337 640 647 648 649 650 651 ...

Data B

Data B was created by the script edX-5-learnerTrajectoryNet.R. The data represents the learner trajectory network of a student that completed the MITxPro course, Architecture of Complex Systems, Fall 2016.

The network uses JSON format that includes a list of node and list of edges. The nodes represent course structure, specially all content and activity modules, and description of the student’s interaction with the content over the entire course. The edge list represents the student’s transitions between course content nodes.

Data C

Data C was generated by the script edX-6-moduleUseAnalysis.R. The data represents an analysis of overall module engagement statistics for an identified all active student in MITxPro course, Architecture of Complex Systems, Fall 2016, based on a list of all active students in the course that was created by edX-4-eventLogFormatter.R

## 'data.frame':    551 obs. of  43 variables:
##  $ mod_hex_id    : Factor w/ 551 levels "003a48b7833a46feb5b56a684a115796",..: 253 389 54 490 218 288 355 204 78 356 ...
##  $ courseID      : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
##  $ module.type   : Factor w/ 7 levels "drag-and-drop-v2+block",..: 2 2 4 4 4 4 4 2 4 2 ...
##  $ desc          : Factor w/ 111 levels "","Action Plan",..: 1 1 72 75 76 77 78 1 79 1 ...
##  $ L2            : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: 22 22 22 22 22 22 22 22 22 22 ...
##  $ L2label       : Factor w/ 42 levels "Action Plan (20 min)",..: 32 32 32 32 32 32 32 32 32 32 ...
##  $ L1            : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ L1label       : Factor w/ 8 levels "Get Started (25 min)",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ order         : int  5 6 7 8 9 10 11 12 13 14 ...
##  $ unq_stu       : int  1560 0 1544 1543 1543 1543 1543 0 1543 0 ...
##  $ prpstu        : num  0.997 0 0.987 0.986 0.986 ...
##  $ session_ct    : int  4510 0 1544 1543 1543 1543 1543 0 1543 0 ...
##  $ days_ct       : int  4020 0 1544 1543 1543 1543 1543 0 1543 0 ...
##  $ events        : int  8081 0 1549 1545 1545 1544 1545 0 1547 0 ...
##  $ avg_evt_stu   : num  5.18 0 1 1 1 ...
##  $ totalTime     : num  24810 0 3150 1696 2075 ...
##  $ avgTimeStu    : num  15.9 0 2.04 1.1 1.35 ...
##  $ avgTimeEvt    : num  3.07 0 2.03 1.1 1.34 ...
##  $ progress.i    : int  0 0 1260 1340 1356 1357 1371 0 1328 0 ...
##  $ recurse.i     : int  4672 0 286 205 188 187 174 0 217 0 ...
##  $ forward.o     : int  5828 0 1384 1332 1331 1339 1351 0 1341 0 ...
##  $ backward.o    : int  0 0 162 212 213 205 194 0 204 0 ...
##  $ attempts      : int  NA NA 1549 1545 1543 1544 1545 NA 1547 NA ...
##  $ avgAttempts   : num  NA NA 1 1 1 ...
##  $ points        : int  NA NA 1544 1543 1542 1543 1543 NA 1543 NA ...
##  $ avgPointStu   : num  NA NA 1 1 0.999 ...
##  $ maxPointsPrb  : int  NA NA 1 1 1 1 1 NA 1 NA ...
##  $ loads         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgLoadEvents : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ load_time     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgLoadTime   : logi  NA NA NA NA NA NA ...
##  $ plays         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgPlayEvents : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ play_time     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgPlayTime   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pause         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgPauseEvents: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ pause_time    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgPauseTime  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ seeks         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgSeekEvents : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ seek_time     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avgSeekTime   : num  NA NA NA NA NA NA NA NA NA NA ...

Data D

Data D was generated by the script edX-7-moduleUseAnalysis.R. The data represents an descriptive statistics for an active students enrolled in the in MITxPro course, Architecture of Complex Systems, Fall 2016, based on a list of all active students in the course that was created by edX-4-eventLogFormatter.R.

Information about students that could be used for re-identification of participants has been excluded from the data set, specifically, gender, year of birth (yob), and level of education (loe).

## 'data.frame':    1565 obs. of  35 variables:
##  $ user_id            : logi  NA NA NA NA NA NA ...
##  $ grade              : num  0.94 0 0.94 0.83 0.97 0.52 0 1 0.94 0.96 ...
##  $ cert_status        : Factor w/ 2 levels "downloadable",..: 1 2 1 1 1 2 2 1 1 1 ...
##  $ gender             : logi  NA NA NA NA NA NA ...
##  $ yob                : logi  NA NA NA NA NA NA ...
##  $ loe                : logi  NA NA NA NA NA NA ...
##  $ sessions           : int  65 22 53 45 54 28 5 24 48 37 ...
##  $ days_unq           : int  38 12 29 31 28 14 4 18 36 31 ...
##  $ mods_unq           : int  284 141 291 255 291 150 86 289 289 274 ...
##  $ vid_mods           : int  46 23 47 47 47 26 12 47 48 37 ...
##  $ prb_mod            : int  130 64 139 102 139 65 40 139 139 136 ...
##  $ oa_mods            : int  5 1 5 5 5 2 1 4 5 5 ...
##  $ events             : int  4110 439 1200 747 954 680 254 900 832 985 ...
##  $ vid_events         : int  3379 138 497 273 339 315 73 302 285 280 ...
##  $ prb_events         : int  224 90 271 150 256 133 98 267 225 267 ...
##  $ oa_events          : int  43 5 47 21 47 13 3 41 47 50 ...
##  $ oa_peerAccessEvents: int  9 0 9 3 9 0 0 9 9 9 ...
##  $ oa_getPeerEvents   : int  18 0 20 6 20 0 0 19 21 26 ...
##  $ seqNextEvents      : int  66 37 67 64 61 33 23 58 64 64 ...
##  $ seqPrevEvents      : int  13 0 7 7 12 11 3 5 11 10 ...
##  $ seqGotoEvents      : int  16 5 18 23 22 14 0 24 0 3 ...
##  $ modAccessEvents    : int  54 31 54 52 54 32 20 53 54 53 ...
##  $ total_time         : num  3236 1074 2738 1369 2021 ...
##  $ vid_time           : num  1724 265 776 436 698 ...
##  $ prb_time           : num  187 62.7 220.1 96.6 163.1 ...
##  $ oa_time            : num  200.37 3.88 168.27 12.42 199.32 ...
##  $ oa_peerAccessTime  : num  5.517 0 47.25 0.467 23.9 ...
##  $ oa_getPeerTime     : num  107.62 0 100.72 5.43 165.03 ...
##  $ seqNextTime        : num  462 443 816 347 458 ...
##  $ seqPrevTime        : num  29 0 27.68 42.73 7.68 ...
##  $ seqGotoTime        : num  61.55 2.75 103.07 87.1 74.03 ...
##  $ modAccessTime      : num  443 223 356 344 342 ...
##  $ prb_attempts       : int  185 85 211 142 217 96 83 202 199 222 ...
##  $ prb_correct        : int  119 62 122 93 150 58 72 126 121 120 ...
##  $ prb_totalPoints    : int  128 67 133 97 159 63 74 140 131 132 ...

Data E

Data E was generated by combining three outputs of script edX-6-moduleUseAnalysis.R for active, certified and non-certified students. Script edX-6-moduleUseAnalysis.R calculates statistics for modules use of subsets list of students:

  1. all students (data was taken from Data C),
  2. students who are certificate,
  3. students who were not certified.certified and non-certified students in MITxPro course, Architecture of Complex Systems, Fall 2016.

Data E combines the unq_stu, events, totalTime, and avgavgTimeStu variables from module use analysis applied to three different The data set of active students is provided in from Data C. The remaining two sets of data in Data E were generated from lists of student identifiers based on from an analysis of the cert_status field in Data D.

## 'data.frame':    551 obs. of  18 variables:
##  $ mod_hex_id : Factor w/ 551 levels "003a48b7833a46feb5b56a684a115796",..: 253 389 54 490 218 288 355 204 78 356 ...
##  $ courseID   : Factor w/ 1 level "MITProfessionalX+SysEngxB1+3T2016": 1 1 1 1 1 1 1 1 1 1 ...
##  $ module.type: Factor w/ 7 levels "drag-and-drop-v2+block",..: 2 2 4 4 4 4 4 2 4 2 ...
##  $ desc       : Factor w/ 111 levels "","Action Plan",..: 1 1 72 75 76 77 78 1 79 1 ...
##  $ L2         : Factor w/ 54 levels "03a3841a0f684ecd93995d985355da91",..: 22 22 22 22 22 22 22 22 22 22 ...
##  $ L2label    : Factor w/ 42 levels "Action Plan (20 min)",..: 32 32 32 32 32 32 32 32 32 32 ...
##  $ L1         : Factor w/ 8 levels "16ff74448fa34af2a1c080c99d4e844c",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ L1label    : Factor w/ 8 levels "Get Started (25 min)",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ order      : int  5 6 7 8 9 10 11 12 13 14 ...
##  $ unq_stu    : int  1560 0 1544 1543 1543 1543 1543 0 1543 0 ...
##  $ unq_stu.1  : int  1353 0 1353 1353 1353 1353 1353 0 1353 0 ...
##  $ unq_stu.2  : int  207 0 191 190 190 190 190 0 190 0 ...
##  $ events     : int  8081 0 1549 1545 1545 1544 1545 0 1547 0 ...
##  $ events.1   : int  7146 0 1353 1353 1354 1353 1353 0 1353 0 ...
##  $ events.2   : int  935 0 196 192 191 191 192 0 194 0 ...
##  $ totalTime  : num  24810 0 3150 1696 2075 ...
##  $ totalTime.1: num  21491 0 2789 1387 1871 ...
##  $ totalTime.2: num  3320 0 361 309 204 ...