=========================================================
E2Vec Tutorial
=========================================================
.. role:: ipy(code)
   :language: python
   :class: highlight

First, import the :ipy:`openla_feature_representation` package with an arbitrary name, here :ipy:`lafr`.

.. code-block:: python

    import openla_feature_representation as lafr

Initializing the class
====================================

This is the constructor:

.. code-block:: python

    e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)

- :ipy:`ftmodel_path` is the path to a fastText language model trained for this task
- :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below)
- :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory (e.g. :ipy:`'A-2023'`)

After getting your own :ipy:`e2Vec` object, all methods the class provides can be used on it.

Generate sentences for the event log
====================================

The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:

.. code-block:: python

    sentences = e2Vec.generate_sentences(
        sentences_dir_path=sentences_dir_path,
        eventstream_file_path=eventstream_file_path,
        input_csv_dir_path=input_csv_dir_path,
        course_id=course_id,
    )

If you need to select or filter a time span:

.. code-block:: python

    sentences = e2Vec.generate_sentences(
        sentences_dir_path=sentence_path,
        use_timespan=True,
        start_minute=0,
        total_minutes=90,
        eventstream_file_path=eventstream_file_path,
        input_csv_dir_path=input_csv_dir_path,
        course_id=course_id,
    )

- :ipy:`sentences_dir_path` is the path to the directory where you want the sentence files to be written
- :ipy:`eventstream_file_path` is the path to the event stream csv file
- :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below)
- :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory
- :ipy:`use_timespan` if :ipy:`True`, the args below will be used to extract a timespan from the data (optional)
- :ipy:`start_minute` is the minute in the data the sentence generation should start (optional)
- :ipy:`total_minutes` is the number of minutes worth of sentences that should be generated (optional)

This function saves the sentences to a text file and returns a path to it.

Vectorize the sentences
====================================

This function returns a pandas DataFrame with the vectors generated from the sentences.

.. code-block:: python

    user_vectors = e2Vec.vectorize_sentences(sentences_file_path)

- :ipy:`sentences_file_path` is the path to the sentence files generated in the previous step

Concatenation
====================================

The class has a function to concatenate vectors by time (minutes) or weeks.

This will concatenate the vectors in 10-minute spans.

.. code-block:: python

    vectors = e2Vec.concatenate_vectors(
        sentences_dir_path=sentences_dir_path,
        eventstream_file_path=eventstream_file_path,
        input_csv_dir_path=eduData,
        course_id=course_id,
        start_minute=0,
        total_minutes=10,
    )

This will concatenate the vectors by the week or lesson.

.. code-block:: python

    vectors = e2Vec.concatenate_vectors(
        sentences_dir_path=sentences_dir_path,
        eventstream_file_path=eventstream_file_path,
        input_csv_dir_path=eduData,
        course_id=course_id,
        by_weeks=True,
        start_minute=0,
    )

- :ipy:`sentences_dir_path` is the path to the sentence files generated in the previous step
- :ipy:`eventstream_file_path` is the path to the event stream csv file
- :ipy:`input_csv_dir_path` is the path to a directory with the dataset (see below)
- :ipy:`course_id` is a string to identify files for the course to analyze within the :ipy:`info_dir` directory
- :ipy:`by_weeks` concatenates vectors by week if :ipy:`True` (by time by default)
- :ipy:`start_minute` is the minute in the data the sentence generation should start (optional)
- :ipy:`total_minutes` is the number of minutes worth of sentences that should be generated each time (optional)