========================================================= E2Vec Tutorial ========================================================= .. role:: ipy(code) :language: python :class: highlight First, import the :ipy:`openla_feature_representation` package with an arbitrary name, here :ipy:`lafr`. .. code-block:: python import openla_feature_representation as lafr Initializing the class ==================================== This is the constructor: .. code-block:: python e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id) - :ipy:`ftmodel_path` is the path to a fastText language model trained for this task - :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below) - :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory (e.g. :ipy:`'A-2023'`) After getting your own :ipy:`e2Vec` object, all methods the class provides can be used on it. Generate sentences for the event log ==================================== The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them: .. code-block:: python sentences = e2Vec.generate_sentences( sentences_dir_path=sentences_dir_path, eventstream_file_path=eventstream_file_path, input_csv_dir_path=input_csv_dir_path, course_id=course_id, ) If you need to select or filter a time span: .. code-block:: python sentences = e2Vec.generate_sentences( sentences_dir_path=sentence_path, use_timespan=True, start_minute=0, total_minutes=90, eventstream_file_path=eventstream_file_path, input_csv_dir_path=input_csv_dir_path, course_id=course_id, ) - :ipy:`sentences_dir_path` is the path to the directory where you want the sentence files to be written - :ipy:`eventstream_file_path` is the path to the event stream csv file - :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below) - :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory - :ipy:`use_timespan` if :ipy:`True`, the args below will be used to extract a timespan from the data (optional) - :ipy:`start_minute` is the minute in the data the sentence generation should start (optional) - :ipy:`total_minutes` is the number of minutes worth of sentences that should be generated (optional) This function saves the sentences to a text file and returns a path to it. Vectorize the sentences ==================================== This function returns a pandas DataFrame with the vectors generated from the sentences. .. code-block:: python user_vectors = e2Vec.vectorize_sentences(sentences_file_path) - :ipy:`sentences_file_path` is the path to the sentence files generated in the previous step Concatenation ==================================== The class has a function to concatenate vectors by time (minutes) or weeks. This will concatenate the vectors in 10-minute spans. .. code-block:: python vectors = e2Vec.concatenate_vectors( sentences_dir_path=sentences_dir_path, eventstream_file_path=eventstream_file_path, input_csv_dir_path=eduData, course_id=course_id, start_minute=0, total_minutes=10, ) This will concatenate the vectors by the week or lesson. .. code-block:: python vectors = e2Vec.concatenate_vectors( sentences_dir_path=sentences_dir_path, eventstream_file_path=eventstream_file_path, input_csv_dir_path=eduData, course_id=course_id, by_weeks=True, start_minute=0, ) - :ipy:`sentences_dir_path` is the path to the sentence files generated in the previous step - :ipy:`eventstream_file_path` is the path to the event stream csv file - :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below) - :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory - :ipy:`by_weeks` concatenates vectors by week if :ipy:`True` (by time by default) - :ipy:`start_minute` is the minute in the data the sentence generation should start (optional) - :ipy:`total_minutes` is the number of minutes worth of sentences that should be generated each time (optional)