Evaluation

To evaluate the quality of synthetically generated data, we use functions from the Evaluation class.

from sdgne.evaluator.evaluator import Evaluation

evaluator = Evaluation(synthesized_data, minority_column_label, minority_class_label)

Importing Evaluation

from sdgne.evaluator.evaluator import Evaluation

Creating an evaluator

evaluator = Evaluation(synthesized_data, minority_column_label, minority_class_label)

The synthesized_data can be created using any Auto Encoder and should have a 'synthesized_data' column.

evaluator.duplicate_in_rows()

Calculate the percentage of duplicate rows between the original minority data and the synthetic minority data.

Returns

The percentage of duplicate rows.

evaluator.mean_and_std()

Calculate the mean and standard deviation for each column in the original and synthetic minority data.

Returns

A pandas data frame containing the mean difference, mean of original minority data, mean of synthetic minority data, standard deviation of original minority data, and standard deviation of synthetic minority data for each column.

Kullback-Leibler (KL) divergence

Kullback-Leibler (KL) divergence is a measure of how one probability distribution diverges from a second, expected probability distribution. In simple terms, it quantifies the difference between two probability distributions.

evaluator.plot_kde_density_graph()

Plots the kernel density estimation (KDE) graphs for each column of the original and synthetic minority data. Also calculates the highlighted area and KL divergence for each column.

Returns

plt

matplotlib.pyplot

Represents the KDE density plots

column_details

pd.Dataframe

Represents a data frame with the area of highlighted area and divergence for each column.

total_highlighted_area

float

Represents the total highlighted area in the plot.

total_kl_divergence

float

Represents total KL divergence.

average_kl_divergence

float

Represents average KL divergence calculated over all columns.

evaluator.plot_heat_maps()

Plots heat maps for the correlation matrices of the original and synthetic minority data.

Returns

The matplotlib.pyplot object contains the heat maps.

PreviousSDD SMOTE with incremental boosting NextGretel Evaluation

Last updated 1 year ago