SDD SMOTE

This SMOTE variant, known as SDD (Sample Density Distribution) SMOTE, leverages a density distribution for generating synthetic instances in the minority class of imbalanced datasets. It considers both the density of existing minority class instances and the average distances within the minority class and between the minority and majority classes. The algorithm aims to address the class imbalance by focusing on regions of the feature space with lower density and larger distances, where real instances are sparser. It uses a density-based approach to prioritize the generation of synthetic samples in under-represented regions. The key parameter is the radius, which influences the neighborhood for density and distance calculations. SDD SMOTE contributes to enhancing the diversity and representation of the minority class in the dataset.

from sdgne.datagenerator.smote import SDD_SMOTE

minority_column_label = 'class'
minority_class_label = 0

synthesizer = SDD_SMOTE(dataset,minority_column_label,minority_class_label)

synthesize_data = synthesizer.data_generator()

Importing SMOTE

from sdgne.datagenerator.smote import SDD_SMOTE

Creating a synthesizer

synthesizer = SDD_SMOTE(dataset, minority_column_label, minority_class_label)

Parameters

dataset

required

pd.Dataframe

Represents a pandas data frame containing both, the original minority and the original majority data.

minority_column_label

required

string

Represents the column label. Eg. 'class', 'output'

minority_class_label

required

string

Represents the minority class label. Eg. '1', '0'

Returns

An instance of class SDD_SMOTE.

Synthetic Data Generation

data_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter num_to_synthesize.

  • If num_to_synthesize is not defined, data_generator by default generates the `n` number of synthetic data such that the majority data and minority datasets get balanced.

  • If num_to_synthesize is defined and the dataset is already balanced, data_generator generates 2 * Number of original minority data.

  • If num_to_synthesize is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

synthesized_data = synthesizer.data_generator(no_of_syntetic_data)

Parameters

num_to_synthesize

optional (default: None)

integer

Represents the number of synthetic data to be generated.

Returns

A pandas data frame that combines original data and synthetic data.

Usage

synthesizer.data_generator()

synthesizer.data_generator(num_to_synthesize=100)

Last updated