E2Vec Tutorial

First, import the openla_feature_representation package with an arbitrary name, here lafr.

import openla_feature_representation as lafr

Initializing the class

This is the constructor:

e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)
  • ftmodel_path is the path to a fastText language model trained for this task

  • input_csv_dir_path is the path to a directory with the dataset (see below)

  • course_id is a string to identify files for the course to analyze within the info_dir directory (e.g. 'A-2023')

After getting your own e2Vec object, all methods the class provides can be used on it.

Generate sentences for the event log

The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:

sentences = e2Vec.generate_sentences(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=input_csv_dir_path,
    course_id=course_id,
)

If you need to select or filter a time span:

sentences = e2Vec.generate_sentences(
    sentences_dir_path=sentence_path,
    use_timespan=True,
    start_minute=0,
    total_minutes=90,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=input_csv_dir_path,
    course_id=course_id,
)
  • sentences_dir_path is the path to the directory where you want the sentence files to be written

  • eventstream_file_path is the path to the event stream csv file

  • input_csv_dir_path is the path to a directory with the dataset (see below)

  • course_id is a string to identify files for the course to analyze within the info_dir directory

  • use_timespan if True, the args below will be used to extract a timespan from the data (optional)

  • start_minute is the minute in the data the sentence generation should start (optional)

  • total_minutes is the number of minutes worth of sentences that should be generated (optional)

This function saves the sentences to a text file and returns a path to it.

Vectorize the sentences

This function returns a pandas DataFrame with the vectors generated from the sentences.

user_vectors = e2Vec.vectorize_sentences(sentences_file_path)
  • sentences_file_path is the path to the sentence files generated in the previous step

Concatenation

The class has a function to concatenate vectors by time (minutes) or weeks.

This will concatenate the vectors in 10-minute spans.

vectors = e2Vec.concatenate_vectors(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=eduData,
    course_id=course_id,
    start_minute=0,
    total_minutes=10,
)

This will concatenate the vectors by the week or lesson.

vectors = e2Vec.concatenate_vectors(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=eduData,
    course_id=course_id,
    by_weeks=True,
    start_minute=0,
)
  • sentences_dir_path is the path to the sentence files generated in the previous step

  • eventstream_file_path is the path to the event stream csv file

  • input_csv_dir_path is the path to a directory with the dataset (see below)

  • course_id is a string to identify files for the course to analyze within the info_dir directory

  • by_weeks concatenates vectors by week if True (by time by default)

  • start_minute is the minute in the data the sentence generation should start (optional)

  • total_minutes is the number of minutes worth of sentences that should be generated each time (optional)