E2Vec Tutorial¶

First, import the openla_feature_representation package with an arbitrary name, here lafr.

import openla_feature_representation as lafr

Initializing the class¶

This is the constructor:

e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)

ftmodel_path is the path to a fastText language model trained for this task
input_csv_dir_path is the path to a directory with the dataset (see below)
course_id is a string to identify files for the course to analyze within the info_dir directory (e.g. 'A-2023')

After getting your own e2Vec object, all methods the class provides can be used on it.

Generate sentences for the event log¶

The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:

sentences = e2Vec.generate_sentences(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=input_csv_dir_path,
    course_id=course_id,
)

If you need to select or filter a time span:

sentences = e2Vec.generate_sentences(
    sentences_dir_path=sentence_path,
    use_timespan=True,
    start_minute=0,
    total_minutes=90,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=input_csv_dir_path,
    course_id=course_id,
)

sentences_dir_path is the path to the directory where you want the sentence files to be written
eventstream_file_path is the path to the event stream csv file
input_csv_dir_path is the path to a directory with the dataset (see below)
course_id is a string to identify files for the course to analyze within the info_dir directory
use_timespan if True, the args below will be used to extract a timespan from the data (optional)
start_minute is the minute in the data the sentence generation should start (optional)
total_minutes is the number of minutes worth of sentences that should be generated (optional)

This function saves the sentences to a text file and returns a path to it.

Vectorize the sentences¶

This function returns a pandas DataFrame with the vectors generated from the sentences.

user_vectors = e2Vec.vectorize_sentences(sentences_file_path)

sentences_file_path is the path to the sentence files generated in the previous step

Concatenation¶

The class has a function to concatenate vectors by time (minutes) or weeks.

This will concatenate the vectors in 10-minute spans.

vectors = e2Vec.concatenate_vectors(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=eduData,
    course_id=course_id,
    start_minute=0,
    total_minutes=10,
)

This will concatenate the vectors by the week or lesson.

vectors = e2Vec.concatenate_vectors(
    sentences_dir_path=sentences_dir_path,
    eventstream_file_path=eventstream_file_path,
    input_csv_dir_path=eduData,
    course_id=course_id,
    by_weeks=True,
    start_minute=0,
)

sentences_dir_path is the path to the sentence files generated in the previous step
eventstream_file_path is the path to the event stream csv file
input_csv_dir_path is the path to a directory with the dataset (see below)
course_id is a string to identify files for the course to analyze within the info_dir directory
by_weeks concatenates vectors by week if True (by time by default)
start_minute is the minute in the data the sentence generation should start (optional)
total_minutes is the number of minutes worth of sentences that should be generated each time (optional)