l3wrapper.l3wrapper

The main module for the L3 estimator.

class l3wrapper.l3wrapper.L3Classifier(min_sup=0.01, min_conf=0.5, l3_root='/home/docs/l3wrapper_data', assign_unlabeled='majority_class', match_strategy='majority_voting', max_matching=1, specialistic_rules=True, max_length=0, rule_sets_modifier='standard')

The L3-based estimator implementing the scikit-learn estimator interface.

The model training relies on the L3 binaries. At this point they should be already iavailable. Instead, inference is enabled by the estimator itself. Hence, no L3 binaries are used at classification time (see predict()).

Parameters:
  • min_sup (float, default='0.01') – The minimum support threshold to be used while training.
  • min_conf (float, default='0.5') – The minimum confidence threshold to be used while training.
  • l3_root (str, default='$HOME/l3wrapper_data') – The root folder where L3 binaries are located.
  • assign_unlabeled (str, default='majority_class') – The strategy used to assign the classification label whenever there is no rule that matches the data point to classified.
  • match_strategy ({'majority_voting'}, default='majority_voting') – The strategy used to pick which rule or set of rules should be used to classify a data point. Supported values: - ‘majority_voting’ (default): choose the label using majority voting among the class labels predicted by the top max_matching rules. In case of a tie, the label is chosen arbitrarily.
  • max_matching (int, default=1) – The number of rules to be used for choosing the final label. It is used only when match_strategy=’majority_voting’.
  • specialistic_rules (bool, default=True) – Choose whether to prefer specialistic or general rules first at training time.
  • max_length (int, default=0) – The maximum length of the mined rules. Note that as per the L3 training procedure, this applies to the macro-itemset mining step, i.e. it is possible that their relative traductions to normal itemset contain rule antecedents longer than max_length. (default=0, i.e. no limit)
  • rule_sets_modifier ({'standard', 'level1'}, default='standard') – Use this parameter to modify the extracted rule sets. Option ‘level1’ retains only the level 1 rule set, discarding level 2. If ‘standard’, the original behavior of L3 is unchanged.
X_

The input passed during fit().

Type:ndarray, shape (n_samples, n_features)
y_

The labels passed during fit().

Type:ndarray, shape (n_samples,)
classes_

The classes seen at fit().

Type:ndarray, shape (n_classes,)
n_items_used_

The number of different items seen.

Type:int
lvl1_rules_

The level 1 rules (Rule) mined at fit().

Type:list
lvl2_rules_

The level 2 rules (Rule) mined at fit().

Type:list
n_lvl1_rules_

The number of level 1 rules.

Type:int
n_lvl2_rules_

The number of level 2 rules.

Type:int
fit(X, y, column_names=None, save_human_readable=False, remove_files=True)

A reference implementation of a fitting function for a classifier.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – The training input samples. No numerical inputs are allowed it.
  • y (array-like, shape (n_samples,)) – The target values. An array of int.
  • column_names (list, default=None) – A list containing the names to assign to columns in the dataset. They will be used when printing the human readable format of the rules.
  • remove_files (bool, default=True) – Use this parameter to remove all the file generated by the original L3 implementation at training time.
Returns:

self – Returns self.

Return type:

object

predict(X)

Predict the class labels for each sample in X.

Additionally, the method helps to characterize the rules used during the inference. Each record to be predicted is converted into a Transaction and the list of transactions is saved. From the transactions one can retrieve:

  • which level was used to classify it (level=-1 means that no rule has covered the record)
  • which Rule (or rules) was used to classify it.
Parameters:X (array-like, shape (n_samples, n_features)) – The input samples.
Returns:y – The label for each sample.
Return type:ndarray, shape (n_samples,)