cycle_prediction package

cycle_prediction.t2e

class cycle_prediction.t2e.t2e(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)

Bases: object

A class for time to event data preprocessing, fitting and evaluation. Deals with datasets with the following format:

Column

Data type

Content

CaseID

int

Case identifier

ActivityID

int

Activity identifier

CompleteTimestamp

datetime

Timestamp of the event

__init__(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)

Initializes the t2e object with the desired setup.

Parameters
  • dataset (obj) – Dataframe of the trace dataset in the form of:

  • prefix (int) – Number of history prefixes to train with.

  • resolution (str) – remaining time resolution {‘s’: ‘seconds’, ‘h’:’hours’, ‘d’:’days’}

  • process_id_col (str) – column name to be used as process ID. default: ‘CaseID’

  • event_type_col (str) – column name to be used as event type. default: ‘ActivityID’

  • extra_censored (int) – Number of censored traces to artificially create from complete traces, default 0.

  • end_event_list (list) – list of (int) containing the process’s possible end events

  • dynamic_features (list) – list of time varying feature columns to include in the model.

  • static_features (list) – list of time invariant feature columns to include in the model.

  • transform (str) – Transform the output to a new space where it is less biased toward short traces. Accepted values (None, ‘log’, ‘power’). Default: ‘log’

  • fit_type (str) – ‘t2e’ (default) => for furture development.

  • censored (bool) – Whether to use/ignore the censored traces (if found).

preprocess()
a method responsible for:
  1. Removing traces longer than the desired prefix.

  2. Creating dynamic and static featires as per the initialization

Parameters

None

Returns

A pandas dataframe with the following format:

Column

Data type

Content

CaseID

int

Case identifier

ActivityID

int

Activity identifier

CompleteTimestamp

datetime

Timestamp of the event

fvt1

float

delta time to the next event

fvt2

float

hour since day start

fvt3

float

hour since week start

ActivityID_0

bool

Activity in one-hot form

ActivityID_n-1

Static_feature_0

bool

static feature in one-hot form

Static_feature_n-1

U

int

0/1 : censored/observed trace

T2E/D2E/S2E

float

Remaining time in seconds, hours or days

Return type

Updated self.dataset

split(train_prc, val_prc, scaling)

Spliting the dataset to train, validation and test sets.

The data nature requires a special function for this purpose

Parameters
  • train_prc (float) – Training percentage (include validation).

  • val_prc (str) – Validation percentage (% of the training set).

  • scaling (bool) – To scale numerical feature.

Returns

tensor of shape [n_examples, prefix, n_features]

X_val (object): tensor of shape [n_examples, prefix, n_features]

X_test (object): tensor of shape [n_examples, prefix, n_features]

y_train (object): tensor of shape [n_examples, 2]

y_val (object): tensor of shape [n_examples, 2]

y_test (object): tensor of shape [n_examples, 2]

Return type

X_train (object)

build_model(X_train, y_train, size_dyn, size_sta)

Build time to event model using a GRU network.

Parameters
  • X_train (object) – training set input features of shape [n_examples, prefix, n_features]

  • y_train (object) – training set labels of shape [n_examples, n_features]

  • size_dyn (int) – GRU units size.

  • size_sta (int) – Static branch hidden layer size (optional)

Returns

initialize self.model

fit(X_train, y_train, X_val, y_val, bs=64, exp_dir='20210105-203414', vb=True)

Fitting a time to event model using a GRU network.

Parameters
  • X_train (object) – training set input features of shape [n_examples, prefix, n_features]

  • y_train (object) – training set labels of shape [n_examples, n_features]

  • X_val (object) – validation set input features of shape [n_examples, prefix, n_features]

  • y_val (object) – validation set labels [n_examples, n_features]

  • bs (int) – batch size

  • exp_dir (str) – tensorboard path

  • vb (bool) – verbose (true/False)

Returns

fit self.model weights

Return type

self

predict(X)
A method to predict alpha & beta parameter for a given prefix of trace

after using the fit method to train the self.model

Parameters

X (tensor) – Input array of size [n_examples, prefix, n_features]

Returns

pandas dataframe with the shape [n_examples, 2]

Return type

y_pred (object)

evaluate(X, y)
A method to predict and evaluate the self.model after using the fit

method, given a test set with known ground truth

Parameters
  • X (tensor) – Input array of size [n_examples, prefix, n_features]

  • y (tensor) – Output array of size [nexample, 2]

Returns

pandas dataframe with the following format

Column

Data type

Content

T

float

True remaining time

U

float

Censored indicator

alpha

float

model prediction

beta

float

model prediction

T_pred

float

mode value of the generated pdf

error (days)

float

Error in days

MAE

float

Absolute error in days

Accurate

boolean

For development purpose

mae (float): Mean absolute error of all predictions

Return type

test_results_df

cycle_prediction.weibull_utils

cycle_prediction.weibull_utils.plot_top_predictions(result_df, lim=10, U=1)

Plot weibull pdf and survival curve for then remaining time.

Parameters
  • result_df – the result dataframe from the t2e.evaluate method

  • lim – limit to plot the best n predictions

  • U – 1:observed or 0:censored example to plot