Balanced Autoencoder

The balanced autoencoder is the simplest form of autoencoder. It consists of an equal number of layers at the encoder and the decoder part. The number of neurons in the encoder mirrors in the decoder.

from sdgne.datagenerator.autoencoder import BalancedAutoencoder

minority_column_label = 'class'
minority_class_label = 0

synthesizer = BalancedAutoencoder(dataset,
                               minority_column_label,
                               minority_class_label)

synthesize_data = synthesizer.data_generator()

Importing Balanced Autoencoder

from sdgne.datagenerator.autoencoder import BalancedAutoencoder

Creating a synthesizer

synthesizer = BalancedAutoencoder(dataset,
                               minority_column_label,
                               minority_class_label)

Parameters

dataset

required

pd.Dataframe

Represents a pandas data frame containing both, the original minority and the original majority data

minority_column_label

required

string

Represents the column label. Eg. 'class' , 'output'

minority_class_label

required

string

Represents the minority class label. Eg. '1' , '0'

Returns

An instance of class BalancedAutoencoder.

Network Architecture

Below is the network architecture for the Balanced Autoencoder. Here, the encoder has two layers, with 22 and 20 nodes respectively. The bottleneck is a single Dense layer with 16 nodes. The decoder has two layers with 20 and 22 nodes respectively. The decoder layer activation is sigmoid.

encoder_dense_layers

22, 20

bottle_neck

16

decoder_dense_layers

20, 22

decoder_activation

sigmoid

Synthetic Data Generation

data_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter no_of_syntetic_data.

  • If no_of_syntetic_data is not defined, data_generator by default generates the `n` number of synthetic data such that the majority data and minority datasets get balanced.

  • If no_of_syntetic_data is defined and the dataset is already balanced, data_generator generates 2 * Number of original minority data.

  • If no_of_syntetic_data is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

synthesized_data = synthesizer.data_generator(no_of_syntetic_data)

Parameters

no_of_syntetic_data

optional (default: None)

integer

Represents the number of synthetic data to be generated

Returns

A pandas data frame that combines original data and synthetic data.

Usage

synthesizer.data_generator()

synthesizer.data_generator(no_of_syntetic_data=100)

synthesize_data returned from data_generator( ) adds a column `synthetic_data` to the data frame.

df['synthetic_data'] = 0 : For original data df['synthetic_data'] = 1 : For Synthetic generated data

Last updated