cycle_prediction package¶

cycle_prediction.t2e¶

class cycle_prediction.t2e.t2e(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)¶

Bases: object

A class for time to event data preprocessing, fitting and evaluation. Deals with datasets with the following format:

Column	Data type	Content
CaseID	int	Case identifier
ActivityID	int	Activity identifier
CompleteTimestamp	datetime	Timestamp of the event

__init__(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)¶

Initializes the t2e object with the desired setup.

Parameters

dataset (obj) – Dataframe of the trace dataset in the form of:
prefix (int) – Number of history prefixes to train with.
resolution (str) – remaining time resolution {‘s’: ‘seconds’, ‘h’:’hours’, ‘d’:’days’}
process_id_col (str) – column name to be used as process ID. default: ‘CaseID’
event_type_col (str) – column name to be used as event type. default: ‘ActivityID’
extra_censored (int) – Number of censored traces to artificially create from complete traces, default 0.
end_event_list (list) – list of (int) containing the process’s possible end events
dynamic_features (list) – list of time varying feature columns to include in the model.
static_features (list) – list of time invariant feature columns to include in the model.
transform (str) – Transform the output to a new space where it is less biased toward short traces. Accepted values (None, ‘log’, ‘power’). Default: ‘log’
fit_type (str) – ‘t2e’ (default) => for furture development.
censored (bool) – Whether to use/ignore the censored traces (if found).

preprocess()¶

a method responsible for:

Removing traces longer than the desired prefix.
Creating dynamic and static featires as per the initialization

Parameters

None –

Returns

A pandas dataframe with the following format:

Column	Data type	Content
CaseID	int	Case identifier
ActivityID	int	Activity identifier
CompleteTimestamp	datetime	Timestamp of the event
fvt1	float	delta time to the next event
fvt2	float	hour since day start
fvt3	float	hour since week start
ActivityID_0	bool	Activity in one-hot form
…	…	…
ActivityID_n-1	…	…
Static_feature_0	bool	static feature in one-hot form
…	…	…
Static_feature_n-1	…	…
U	int	0/1 : censored/observed trace
T2E/D2E/S2E	float	Remaining time in seconds, hours or days

Return type

Updated self.dataset

split(train_prc, val_prc, scaling)¶

Spliting the dataset to train, validation and test sets.

The data nature requires a special function for this purpose

Parameters

train_prc (float) – Training percentage (include validation).
val_prc (str) – Validation percentage (% of the training set).
scaling (bool) – To scale numerical feature.

Returns

tensor of shape [n_examples, prefix, n_features]

X_val (object): tensor of shape [n_examples, prefix, n_features]

X_test (object): tensor of shape [n_examples, prefix, n_features]

y_train (object): tensor of shape [n_examples, 2]

y_val (object): tensor of shape [n_examples, 2]

y_test (object): tensor of shape [n_examples, 2]

Return type

X_train (object)

build_model(X_train, y_train, size_dyn, size_sta)¶

Build time to event model using a GRU network.

Parameters

X_train (object) – training set input features of shape [n_examples, prefix, n_features]
y_train (object) – training set labels of shape [n_examples, n_features]
size_dyn (int) – GRU units size.
size_sta (int) – Static branch hidden layer size (optional)

Returns

initialize self.model

fit(X_train, y_train, X_val, y_val, bs=64, exp_dir='20210105-203414', vb=True)¶

Fitting a time to event model using a GRU network.

Parameters

X_train (object) – training set input features of shape [n_examples, prefix, n_features]
y_train (object) – training set labels of shape [n_examples, n_features]
X_val (object) – validation set input features of shape [n_examples, prefix, n_features]
y_val (object) – validation set labels [n_examples, n_features]
bs (int) – batch size
exp_dir (str) – tensorboard path
vb (bool) – verbose (true/False)

Returns

fit self.model weights

Return type

self

predict(X)¶

A method to predict alpha & beta parameter for a given prefix of trace: after using the fit method to train the self.model

Parameters: X (tensor) – Input array of size [n_examples, prefix, n_features]
Returns: pandas dataframe with the shape [n_examples, 2]
Return type: y_pred (object)

evaluate(X, y)¶

A method to predict and evaluate the self.model after using the fit: method, given a test set with known ground truth

Parameters

X (tensor) – Input array of size [n_examples, prefix, n_features]
y (tensor) – Output array of size [nexample, 2]

Returns

pandas dataframe with the following format

Column	Data type	Content
T	float	True remaining time
U	float	Censored indicator
alpha	float	model prediction
beta	float	model prediction
T_pred	float	mode value of the generated pdf
error (days)	float	Error in days
MAE	float	Absolute error in days
Accurate	boolean	For development purpose

mae (float): Mean absolute error of all predictions

Return type

test_results_df

cycle_prediction.weibull_utils¶

cycle_prediction.weibull_utils.plot_top_predictions(result_df, lim=10, U=1)¶

Plot weibull pdf and survival curve for then remaining time.

Parameters

result_df – the result dataframe from the t2e.evaluate method
lim – limit to plot the best n predictions
U – 1:observed or 0:censored example to plot