pre_process module

pre_process.any_in(seq_a, seq_b)[source]
Parameters:
  • seq_a (list) – A list of items
  • seq_b (list) – A list of items
Returns:

seq_a – Returns a boolean value if any item of seq_a belongs to seq_b or visa versa

Return type:

bool

pre_process.binarizer(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s Binarizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Binarizer preprocessing.

Return type:

dictionary

pre_process.count_vectorizer(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s CountVectorizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to CountVectorizer preprocessing.

Return type:

dictionary

pre_process.get_class_name(cls)[source]
Parameters:cls – Contains the Sklearn’s preprocessing instance
Returns:cls.__class__.__name__ – Returns the class name of the pre-processed object.
Return type:String
pre_process.get_derived_colnames(trfm_name, col_names, *args)[source]
Parameters:
  • trfm_name (String) – Name of the derived field to be assigned after preprocessing
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pml_pp – Returns a list that contains names of the preprocessed features.

Return type:

list

pre_process.get_pml_derived_flds(trfm, col_names, **kwargs)[source]
Parameters:
  • trfm – Contains the Sklearn’s preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pml_pp – Returns a dictionary that contains attributes related to any preprocessing function .

Return type:

dictionary

pre_process.get_preprocess_val(ppln_sans_predictor, initial_colnames, model)[source]
Parameters:
  • model – Contains an instance of Sklearn model
  • ppln_sans_predictor – Contains an instance of Sklearn Pipeline
  • initial_colnames (list) – Contains list of feature/column names.
Returns:

  • pml_pp (dictionary)
  • Returns a dictionary that contains data related to pre-processing

pre_process.imputer(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s Imputer preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Imputer preprocessing.

Return type:

dictionary

pre_process.lbl_binarizer(trfm, col_names, **kwargs)[source]
Parameters:
  • trfm – Contains the Sklearn’s Label Binarizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Label Binarizer preprocessing.

Return type:

dictionary

pre_process.lbl_encoder(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s LabelEncoder preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to LabelEncoder preprocessing.

Return type:

dictionary

pre_process.max_abs_scaler(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s MaxabsScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to MaxabsScaler preprocessing.

Return type:

dictionary

pre_process.min_max_scaler(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s MinMaxScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to MinMaxScaler preprocessing.

Return type:

dictionary

pre_process.pca(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s PCA preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to PCA preprocessing.

Return type:

dictionary

pre_process.polynomial_features(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s PolynomialFeatures preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to PolynomialFeatures preprocessing.

Return type:

dictionary

pre_process.rbst_scaler(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s RobustScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to RobustScaler preprocessing.

Return type:

dictionary

pre_process.std_scaler(trfm, col_names, **kwargs)[source]
Parameters:
  • trfm – Contains the Sklearn’s Standard Scaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Standard Scaler preprocessing.

Return type:

dictionary

pre_process.tfidf_vectorizer(trfm, col_names)[source]
Parameters:
  • trfm – Contains the Sklearn’s TfIdfVectorizer preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to TfIdfVectorizer preprocessing.

Return type:

dictionary

pre_process.unround_scalers(scalar_val)[source]
Parameters:scalar_val (float) – A numpy float value
Returns:unround_val – Returns a numpy floating point number with a precision of 16 digits after decimal.
Return type:float