cycle_prediction package¶
cycle_prediction.t2e¶
-
class
cycle_prediction.t2e.
t2e
(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)¶ Bases:
object
A class for time to event data preprocessing, fitting and evaluation. Deals with datasets with the following format:
Column
Data type
Content
CaseID
int
Case identifier
ActivityID
int
Activity identifier
CompleteTimestamp
datetime
Timestamp of the event
-
__init__
(dataset, prefix, resolution, process_id_col='CaseID', event_type_col='ActivityID', extra_censored=0, end_event_list=[], dynamic_features=[], static_features=[], transform='log', fit_type='t2e', censored=True)¶ Initializes the t2e object with the desired setup.
- Parameters
dataset (obj) – Dataframe of the trace dataset in the form of:
prefix (int) – Number of history prefixes to train with.
resolution (str) – remaining time resolution {‘s’: ‘seconds’, ‘h’:’hours’, ‘d’:’days’}
process_id_col (str) – column name to be used as process ID. default: ‘CaseID’
event_type_col (str) – column name to be used as event type. default: ‘ActivityID’
extra_censored (int) – Number of censored traces to artificially create from complete traces, default 0.
end_event_list (list) – list of (int) containing the process’s possible end events
dynamic_features (list) – list of time varying feature columns to include in the model.
static_features (list) – list of time invariant feature columns to include in the model.
transform (str) – Transform the output to a new space where it is less biased toward short traces. Accepted values (None, ‘log’, ‘power’). Default: ‘log’
fit_type (str) – ‘t2e’ (default) => for furture development.
censored (bool) – Whether to use/ignore the censored traces (if found).
-
preprocess
()¶ - a method responsible for:
Removing traces longer than the desired prefix.
Creating dynamic and static featires as per the initialization
- Parameters
None –
- Returns
A pandas dataframe with the following format:
Column
Data type
Content
CaseID
int
Case identifier
ActivityID
int
Activity identifier
CompleteTimestamp
datetime
Timestamp of the event
fvt1
float
delta time to the next event
fvt2
float
hour since day start
fvt3
float
hour since week start
ActivityID_0
bool
Activity in one-hot form
…
…
…
ActivityID_n-1
…
…
Static_feature_0
bool
static feature in one-hot form
…
…
…
Static_feature_n-1
…
…
U
int
0/1 : censored/observed trace
T2E/D2E/S2E
float
Remaining time in seconds, hours or days
- Return type
Updated self.dataset
-
split
(train_prc, val_prc, scaling)¶ Spliting the dataset to train, validation and test sets.
The data nature requires a special function for this purpose
- Parameters
train_prc (float) – Training percentage (include validation).
val_prc (str) – Validation percentage (% of the training set).
scaling (bool) – To scale numerical feature.
- Returns
tensor of shape [n_examples, prefix, n_features]
X_val (object): tensor of shape [n_examples, prefix, n_features]
X_test (object): tensor of shape [n_examples, prefix, n_features]
y_train (object): tensor of shape [n_examples, 2]
y_val (object): tensor of shape [n_examples, 2]
y_test (object): tensor of shape [n_examples, 2]
- Return type
X_train (object)
-
build_model
(X_train, y_train, size_dyn, size_sta)¶ Build time to event model using a GRU network.
- Parameters
X_train (object) – training set input features of shape [n_examples, prefix, n_features]
y_train (object) – training set labels of shape [n_examples, n_features]
size_dyn (int) – GRU units size.
size_sta (int) – Static branch hidden layer size (optional)
- Returns
initialize self.model
-
fit
(X_train, y_train, X_val, y_val, bs=64, exp_dir='20210105-203414', vb=True)¶ Fitting a time to event model using a GRU network.
- Parameters
X_train (object) – training set input features of shape [n_examples, prefix, n_features]
y_train (object) – training set labels of shape [n_examples, n_features]
X_val (object) – validation set input features of shape [n_examples, prefix, n_features]
y_val (object) – validation set labels [n_examples, n_features]
bs (int) – batch size
exp_dir (str) – tensorboard path
vb (bool) – verbose (true/False)
- Returns
fit self.model weights
- Return type
self
-
predict
(X)¶ - A method to predict alpha & beta parameter for a given prefix of trace
after using the fit method to train the self.model
- Parameters
X (tensor) – Input array of size [n_examples, prefix, n_features]
- Returns
pandas dataframe with the shape [n_examples, 2]
- Return type
y_pred (object)
-
evaluate
(X, y)¶ - A method to predict and evaluate the self.model after using the fit
method, given a test set with known ground truth
- Parameters
X (tensor) – Input array of size [n_examples, prefix, n_features]
y (tensor) – Output array of size [nexample, 2]
- Returns
pandas dataframe with the following format
Column
Data type
Content
T
float
True remaining time
U
float
Censored indicator
alpha
float
model prediction
beta
float
model prediction
T_pred
float
mode value of the generated pdf
error (days)
float
Error in days
MAE
float
Absolute error in days
Accurate
boolean
For development purpose
mae (float): Mean absolute error of all predictions
- Return type
test_results_df
-
cycle_prediction.weibull_utils¶
-
cycle_prediction.weibull_utils.
plot_top_predictions
(result_df, lim=10, U=1)¶ Plot weibull pdf and survival curve for then remaining time.
- Parameters
result_df – the result dataframe from the t2e.evaluate method
lim – limit to plot the best n predictions
U – 1:observed or 0:censored example to plot