SMOTE

This is the implementation of basic SMOTE technique to oversample data. It interpolates between existing minority instances and their nearest neighbors.

from sdgne.datagenerator.smote import SMOTE

minority_column_label = 'class'
minority_class_label = 0

synthesizer = SMOTE(dataset,minority_column_label,minority_class_label)

synthesize_data = synthesizer.data_generator()

Importing SMOTE

from sdgne.datagenerator.smote import SMOTE

Creating a synthesizer

synthesizer = SMOTE(dataset, minority_column_label, minority_class_label)

Parameters

dataset

required

pd.Dataframe

Represents a pandas data frame containing both, the original minority and the original majority data.

minority_column_label

required

string

Represents the column label. Eg. 'class', 'output'

minority_class_label

required

string

Represents the minority class label. Eg. '1', '0'

Returns

An instance of class SMOTE.

Synthetic Data Generation

data_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter num_to_synthesize.

  • If num_to_synthesize is not defined, data_generator by default generates the `n` number of synthetic data such that the majority data and minority datasets get balanced.

  • If num_to_synthesize is defined and the dataset is already balanced, data_generator generates 2 * Number of original minority data.

  • If num_to_synthesize is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

synthesized_data = synthesizer.data_generator(no_of_syntetic_data)

Parameters

num_to_synthesize

optional (default: None)

integer

Represents the number of synthetic data to be generated.

Returns

A pandas data frame that combines original data and synthetic data.

Usage

synthesizer.data_generator()

synthesizer.data_generator(num_to_synthesize=100)

Last updated