E2Vec Tutorial¶
First, import the openla_feature_representation package with an arbitrary name, here lafr.
import openla_feature_representation as lafr
Initializing the class¶
This is the constructor:
e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)
ftmodel_pathis the path to a fastText language model trained for this taskinput_csv_dir_pathis the path to a directory with the dataset (see below)course_idis a string to identify files for the course to analyze within theinfo_dirdirectory (e.g.'A-2023')
After getting your own e2Vec object, all methods the class provides can be used on it.
Generate sentences for the event log¶
The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
If you need to select or filter a time span:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentence_path,
use_timespan=True,
start_minute=0,
total_minutes=90,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
sentences_dir_pathis the path to the directory where you want the sentence files to be writteneventstream_file_pathis the path to the event stream csv fileinput_csv_dir_pathis the path to a directory with the dataset (see below)course_idis a string to identify files for the course to analyze within theinfo_dirdirectoryuse_timespanifTrue, the args below will be used to extract a timespan from the data (optional)start_minuteis the minute in the data the sentence generation should start (optional)total_minutesis the number of minutes worth of sentences that should be generated (optional)
This function saves the sentences to a text file and returns a path to it.
Vectorize the sentences¶
This function returns a pandas DataFrame with the vectors generated from the sentences.
user_vectors = e2Vec.vectorize_sentences(sentences_file_path)
sentences_file_pathis the path to the sentence files generated in the previous step
Concatenation¶
The class has a function to concatenate vectors by time (minutes) or weeks.
This will concatenate the vectors in 10-minute spans.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
start_minute=0,
total_minutes=10,
)
This will concatenate the vectors by the week or lesson.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
by_weeks=True,
start_minute=0,
)
sentences_dir_pathis the path to the sentence files generated in the previous stepeventstream_file_pathis the path to the event stream csv fileinput_csv_dir_pathis the path to a directory with the dataset (see below)course_idis a string to identify files for the course to analyze within theinfo_dirdirectoryby_weeksconcatenates vectors by week ifTrue(by time by default)start_minuteis the minute in the data the sentence generation should start (optional)total_minutesis the number of minutes worth of sentences that should be generated each time (optional)