Gamma SMOTE

This SMOTE variant, known as Gamma SMOTE, leverages a gamma distribution for generating synthetic instances in the minority class of imbalanced datasets. The algorithm achieves this by interpolating between existing instances and their nearest neighbors, utilizing the characteristics of a gamma distribution. The pivotal parameters in this process are alpha and beta, which govern the shape and scale of the gamma distribution.

from sdgne.datagenerator.smote import Gamma_SMOTE

minority_column_label = 'class'
minority_class_label = 0

synthesizer = Gamma_SMOTE(dataset,minority_column_label,minority_class_label)

synthesize_data = synthesizer.data_generator()

Importing SMOTE

from sdgne.datagenerator.smote import Gamma_SMOTE

Creating a synthesizer

synthesizer = Gamma_SMOTE(dataset, minority_column_label, minority_class_label)

Parameters

dataset

required

pd.Dataframe

Represents a pandas data frame containing both, the original minority and the original majority data.

minority_column_label

required

string

Represents the column label. Eg. 'class', 'output'

minority_class_label

required

string

Represents the minority class label. Eg. '1', '0'

Returns

An instance of class Gamma_SMOTE.

Synthetic Data Generation

data_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter num_to_synthesize.

  • If num_to_synthesize is not defined, data_generator by default generates the `n` number of synthetic data such that the majority data and minority datasets get balanced.

  • If num_to_synthesize is defined and the dataset is already balanced, data_generator generates 2 * Number of original minority data.

  • If num_to_synthesize is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

synthesized_data = synthesizer.data_generator(no_of_syntetic_data)

Parameters

num_to_synthesize

optional (default: None)

integer

Represents the number of synthetic data to be generated.

Returns

A pandas data frame that combines original data and synthetic data.

Usage

synthesizer.data_generator()

synthesizer.data_generator(num_to_synthesize=100)

Last updated