Evaluation
To evaluate the quality of synthetically generated data, we use functions from the Evaluation
class.
Importing Evaluation
Creating an evaluator
The synthesized_data can be created using any Auto Encoder and should have a 'synthesized_data' column.
evaluator.duplicate_in_rows()
Calculate the percentage of duplicate rows between the original minority data and the synthetic minority data.
Returns
The percentage of duplicate rows.
evaluator.mean_and_std()
Calculate the mean and standard deviation for each column in the original and synthetic minority data.
Returns
A pandas data frame containing the mean difference, mean of original minority data, mean of synthetic minority data, standard deviation of original minority data, and standard deviation of synthetic minority data for each column.
Kullback-Leibler (KL) divergence
Kullback-Leibler (KL) divergence is a measure of how one probability distribution diverges from a second, expected probability distribution. In simple terms, it quantifies the difference between two probability distributions.
evaluator.plot_kde_density_graph()
Plots the kernel density estimation (KDE) graphs for each column of the original and synthetic minority data. Also calculates the highlighted area and KL divergence for each column.
Returns
plt
matplotlib.pyplot
Represents the KDE density plots
column_details
pd.Dataframe
Represents a data frame with the area of highlighted area and divergence for each column.
total_highlighted_area
float
Represents the total highlighted area in the plot.
total_kl_divergence
float
Represents total KL divergence.
average_kl_divergence
float
Represents average KL divergence calculated over all columns.
evaluator.plot_heat_maps()
Plots heat maps for the correlation matrices of the original and synthetic minority data.
Returns
The matplotlib.pyplot object contains the heat maps.
Last updated