Data Conversion Module Document

This module receives event stream data and convert it into the number of operations, page-wise summary of students behavior, etc.

OpenLA.data_conversion.convert_into_operation_count(event_stream, user_id=None, contents_id=None, operation_name=None, separate_marker_type=False, for_each_content=True)[source]

Convert event stream into how many times each learner used each operation in each content

Parameters
  • event_stream (EventStream) – EventStream instance

  • user_id (str or list[str] or None) – The user id(s) to aggregate. If it is None, column users in the argument ‘event_stream’ is used.

  • contents_id (str or List[str] or None) – The contents id to aggregate events.

  • operation_name (str or list[str] or None) – The operation(s) to aggregate. If it is None, all operations in event stream are used.

  • separate_marker_type (bool) – whether to count ‘MARKER’ operations by separating the type “difficult” or “important”

  • for_each_content (bool) – If True, total operation count is calculated for each content respectively.

Returns

Convert result which represents how many times each learner used each operation The DataFrame in the class has (index: row number, columns: ‘user id’, ‘contents id’, and each operation).

Return type

OperationCount

OpenLA.data_conversion.convert_into_page_transition(event_stream, user_id=None, contents_id=None, invalid_seconds=None, timeout_seconds=None, count_operation=True, operation_name=None, separate_marker_type=False)[source]

Convert event stream into how many times each learner used each operation and how long each learner stayed in each page with consideration of page transition.

Parameters
  • event_stream (EventStream) – EventStream instance

  • user_id (str or list[str] or None) – The user id(s) to aggregate. If it is None, column users in the argument ‘event_stream’ is used.

  • contents_id (str or List[str] or None) – The contents id to aggregate events.

  • invalid_seconds (int or None) – If reading seconds in a page do not reach “invalid seconds”, the event is not aggregated.

  • timeout_seconds (int or None) – If reading seconds in a page exceed “timeout_seconds”, the event is not aggregated. When this argument is default value ‘None’, all events are aggregated.

  • count_operation (bool) – Whether to count each operation in each page. If you only need reading time in each page, this argument is recommended to be set False to accelerate the aggregation.

  • operation_name (str or list[str] or None) – The operation(s) to aggregate. If it is None, all operations in event stream are used.

  • separate_marker_type (bool) – whether to count ‘MARKER’ operations by separating the type “difficult” or “important”

Returns

Convert result which represents how many times each learner used each operation and how long each learner stayed in each page with consideration of page transition. The DataFrame in the class has (index: row number, columns: [‘user id’, ‘contents id’, ‘page no’, ‘reading_seconds’, ‘time_of_entry’, ‘time_of_exit,’ each operations])

Return type

PageTransition

OpenLA.data_conversion.convert_into_page_wise(event_stream=None, page_transition=None, user_id=None, contents_id=None, invalid_seconds=None, timeout_seconds=None, count_operation=True, operation_name=None, separate_marker_type=False)[source]

Convert event stream into how many times each learner used each operation and how long each learner stayed in each page. The result is equivalent to the page-wise aggregation of “convert_into_page_transition” function.

Parameters
  • event_stream (EventStream) – Instance of class “EventStream”. If you have already converted event stream into PageTransition class, I recommend you to use argument ‘page_transition’ instead of ‘event_stream’

  • page_transition (PageTransition) – Instance of class “PageTransition”. You can use existing instance of PageTransition, instead of converting from EventStream. If you indicate both of ‘event_stream’ and ‘page_transition’, ‘page_transition’ is used for converting into PageWiseAggregation.

  • user_id (str or list[str] or None) – The user id(s) to aggregate. If it is None, column users in the argument ‘event_stream’ is used.

  • contents_id (str or List[str] or None) – The contents id to aggregate events.

  • invalid_seconds (int or None) – If reading seconds in a page do not reach “invalid seconds”, the event is not aggregated.

  • timeout_seconds (int or None) – If reading seconds in a page exceed “timeout_seconds”, the event is not aggregated. When this argument is default value ‘None’, all events are aggregated.

  • count_operation (bool) – Whether to count each operation in each page. If you only need reading time in each page, this argument is recommended to be set False to accelerate.

  • operation_name (str or list[str] or None) – Operation(s) to aggregate. If it is None, all operations in event stream are used.

  • separate_marker_type (bool) – whether to count ‘MARKER’ operations by separating the type “difficult” or “important”

Returns

Convert result which represents how many times each learner used each operation and how long each learner stayed in each page. The DataFrame in the class has (index: row number, columns:[‘user id’, ‘contents id’, ‘page no’, ‘reading_seconds’, each operation])

Return type

PageWiseAggregation

OpenLA.data_conversion.convert_into_time_range(course_info, event_stream, interval_seconds, contents_id, user_id=None, time_range_basis='minutes', start_time='start_of_stream', end_time='end_of_stream', lecture_week=None, count_operation=True, operation_name=None, separate_marker_type=False)[source]

Convert event stream into what page read longest and how many times each learner used each operation in specific time intervals.

Parameters
  • course_info (CourseInformation) – CourseInformation instance.

  • event_stream (EventStream) – EventStream instance

  • interval_seconds (int) – The interval to aggregate events.

  • contents_id (str) – The contents id to aggregate events.

  • user_id (str or list[str] or None) – The user id(s) to aggregate. If it is None, column users in the argument ‘event_stream’ is used.

  • start_time (str, tuple, pandas.Timestamp, or datetime.datetime) –

    The start time to aggregate. The available arguments is following:

    ’start_of_lecture’ … use lecture start time

    ’start_of_stream’ … use the oldest event time of ‘event_stream’

    (y, m, d, H, M, S) … use the time (year, month, day, hours, minutes, seconds). Each element is int type value.

    pandas.Timestamp or datetime.datetiime object.

  • end_time (str, tuple, pandas.Timestamp, or datetime.datetime) –

    The start time to aggregate. The available arguments is following:

    ’end_of_lecture’ … use lecture end time

    ’end_of_stream’ … use the latest event time of ‘event_stream’

    (y, m, d, H, M, S) … use the time (year, month, day, hours, minutes, seconds). Each element is int type value.

    pandas.Timestamp or datetime.datetiime object.

  • lecture_week (int) – The lecture week to aggregate events. If you indicate “start_time=’start_of_lecture’” or “end_time=’end_of_lecture’”, you must indicate this argument.

  • time_range_basis (str) – ‘seconds’, ‘minutes’, or ‘hours’.

  • count_operation (bool) – Whether to count each operation in each page. If you only need page transition, this argument is recommended to be set False to accelerate.

  • operation_name (str or list[str] or None) – The operation(s) to aggregate. If it is None, all operations in event stream are used.

  • separate_marker_type (bool) – whether to count ‘MARKER’ operations by separating the type “difficult” or “important”

Returns

Convert result which represents how many times each learner used each operation and what page each learner read in each time range. The DataFrame in the class has (index: row number, columns:[‘elapsed_time’, ‘start_of_range’, ‘end_of_range’, ‘user id’, ‘contents id’, ‘page no’, each operation]) pageno: 0 means the user did not open a contents.

Return type

TimeRangeAggregation

OpenLA.data_conversion.individual_reading_time(event_stream=None, user_id=None, contents_id=None, invalid_seconds=None, timeout_seconds=None, return_aggregation_result=False, time_unit='seconds', for_each_content=False, page_transition=None, pagewise_aggregation=None)[source]

Calculate each user’s total reading time.

Parameters
  • event_stream (EventStream) – Instance of class “EventStream”. If you have already converted event stream into PageTransition class, I recommend you to use argument ‘page_transition’ instead of ‘event_stream’

  • user_id (str or list[str] or None) – The user id(s) to aggregate. If it is None, column users in the argument ‘event_stream’ is used.

  • contents_id (str or List[str] or None) – The contents id to aggregate events.

  • invalid_seconds (int or None) – If reading seconds in a page do not reach “invalid seconds”, the event is not aggregated.

  • timeout_seconds (int or None) – If reading seconds in a page exceed “timeout_seconds”, the event is not aggregated. When this argument is default value ‘None’, all events are aggregated.

  • return_aggregation_result (bool) – If True, return PageWiseAggregation instance.

  • time_unit (str) – Time unit of reading time to return. Select from ‘seconds’, ‘minutes’, or ‘hours’

  • for_each_content (bool) – If True, total reading time is calculated for each content respectively.

  • page_transition (PageTransition) – Instance of class “PageTransition”. You can use existing instance of PageTransition, instead of converting from EventStream. If you indicate both of ‘event_stream’ and ‘page_transition’, ‘page_transition’ is used for calculating reading time.

  • pagewise_aggregation (PageWiseAggregation) – Instance of class “PageWiseAggregation”. You can use existing instance of PageWiseAggregation, instead of converting from EventStream or PageTransition. If you indicate both of ‘event_stream’, ‘page_transition’, and ‘pagewise_aggregation’, ‘page_transition’ is used for calculating reading time.

Returns

DataFrame related to users’ reading time. (index: row number, column: ‘userid’, (‘contentsid’), ‘reading_seconds/minutes/hours’

Return type

pandas.DataFrame