E2Vec Tutorial¶
First, import the openla_feature_representation
package with an arbitrary name, here lafr
.
import openla_feature_representation as lafr
Initializing the class¶
This is the constructor:
e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)
ftmodel_path
is the path to a fastText language model trained for this taskinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directory (e.g.'A-2023'
)
After getting your own e2Vec
object, all methods the class provides can be used on it.
Generate sentences for the event log¶
The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
If you need to select or filter a time span:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentence_path,
use_timespan=True,
start_minute=0,
total_minutes=90,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
sentences_dir_path
is the path to the directory where you want the sentence files to be writteneventstream_file_path
is the path to the event stream csv fileinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directoryuse_timespan
ifTrue
, the args below will be used to extract a timespan from the data (optional)start_minute
is the minute in the data the sentence generation should start (optional)total_minutes
is the number of minutes worth of sentences that should be generated (optional)
This function saves the sentences to a text file and returns a path to it.
Vectorize the sentences¶
This function returns a pandas DataFrame with the vectors generated from the sentences.
user_vectors = e2Vec.vectorize_sentences(sentences_file_path)
sentences_file_path
is the path to the sentence files generated in the previous step
Concatenation¶
The class has a function to concatenate vectors by time (minutes) or weeks.
This will concatenate the vectors in 10-minute spans.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
start_minute=0,
total_minutes=10,
)
This will concatenate the vectors by the week or lesson.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
by_weeks=True,
start_minute=0,
)
sentences_dir_path
is the path to the sentence files generated in the previous stepeventstream_file_path
is the path to the event stream csv fileinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directoryby_weeks
concatenates vectors by week ifTrue
(by time by default)start_minute
is the minute in the data the sentence generation should start (optional)total_minutes
is the number of minutes worth of sentences that should be generated each time (optional)