Heavy Decoder Autoencoder

The heavy decoder autoencoder has a higher number of layers in the decoder as compared to the encoder. This design choice can be useful where capturing intricate details during data reconstruction is important.

from sdgne.datagenerator.autoencoder import HeavyDecoderAutoencoder

minority_column_label = 'class'
minority_class_label = 0

synthesizer = HeavyDecoderAutoencoder(dataset,
                               minority_column_label,
                               minority_class_label)

Importing Heavy Decoder Autoencoder

from sdgne.datagenerator.autoencoder import HeavyDecoderAutoencoder

Creating a synthesizer

synthesizer = HeavyDecoderAutoencoder(dataset,
                               minority_column_label,
                               minority_class_label)

Parameters

dataset

required

pd.Dataframe

Represents a pandas data frame containing both, the original minority and the original majority data

minority_column_label

required

string

Represents the column label. Eg. 'class' , 'output'

minority_class_label

required

string

Represents the minority class label. Eg. '1' , '0'

Returns

An instance of class HeavyDecoderAutoencoder.

Network Architecture

Below is the network architecture for the Heavy Decoder Autoencoder. Here, the encoder has two layers, with 22 and 20 nodes respectively. The bottleneck is a single Dense layer with 16 nodes. The decoder has four layers with 18,20, 22, and 24 nodes respectively. The decoder layer activation is sigmoid.

encoder_dense_layers

22, 20

bottle_neck

16

decoder_dense_layers

18, 20, 22, 24

decoder_activation

sigmoid

Synthetic Data Generation

data_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter no_of_syntetic_data.

  • If no_of_syntetic_data is not defined, data_generator by default generates the `n` number of synthetic data such that the majority data and minority datasets get balanced.

  • If no_of_syntetic_data is defined and the dataset is already balanced, data_generator generates 2 * Number of original minority data.

  • If no_of_syntetic_data is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

synthesized_data = synthesizer.data_generator(no_of_syntetic_data)

Parameters

no_of_syntetic_data

optional (default: None)

integer

Represents the number of synthetic data to be generated

Returns

A pandas data frame that combines original data and synthetic data.

Usage

synthesizer.data_generator()

synthesizer.data_generator(no_of_syntetic_data=100)

synthesize_data returned from data_generator( ) adds a column `synthetic_data` to the data frame.

df['synthetic_data'] = 0 : For original data df['synthetic_data'] = 1 : For Synthetic generated data

Last updated