# SMOTE

This is the implementation of basic SMOTE technique to oversample data. It interpolates between existing minority instances and their nearest neighbors.&#x20;

```python
from sdgne.datagenerator.smote import SMOTE

minority_column_label = 'class'
minority_class_label = 0

synthesizer = SMOTE(dataset,minority_column_label,minority_class_label)

synthesize_data = synthesizer.data_generator()
```

## Importing SMOTE

```python
from sdgne.datagenerator.smote import SMOTE
```

***

## Creating a synthesizer

```python
synthesizer = SMOTE(dataset, minority_column_label, minority_class_label)
```

### Parameters

<table data-header-hidden data-full-width="false"><thead><tr><th width="216.66666666666669">variable</th><th width="140">type</th><th>datatype</th><th>info</th></tr></thead><tbody><tr><td>dataset</td><td>required</td><td>pd.Dataframe</td><td>Represents a pandas data frame containing both, the original minority and the original majority data.</td></tr><tr><td>minority_column_label</td><td>required</td><td>string</td><td>Represents the column label. Eg. 'class', 'output'</td></tr><tr><td>minority_class_label</td><td>required</td><td>string</td><td>Represents the minority class label. Eg. '1', '0'</td></tr></tbody></table>

### Returns

An instance of class SMOTE.

## Synthetic Data Generation

### data\_generator()

The data generator function generates synthetic data using the synthesizer. It has an option parameter  `num_to_synthesize.`

* If `num_to_synthesize` is not defined, data\_generator by default generates the \`n\` number of synthetic data such that the majority data and minority datasets get balanced.
* If `num_to_synthesize` is defined and the dataset is already balanced, data\_generator generates 2 \* Number of original minority data.
* If `num_to_synthesize` is defined and the dataset is not balanced, it generates synthetic data equal to the value passed.

```python
synthesized_data = synthesizer.data_generator(no_of_syntetic_data)
```

### Parameters

<table data-header-hidden><thead><tr><th width="222">variable</th><th width="154">type</th><th width="107">datatype</th><th>info</th></tr></thead><tbody><tr><td><code>num_to_synthesize</code></td><td>optional (default: None)</td><td>integer</td><td>Represents the number of synthetic data to be generated.</td></tr></tbody></table>

### Returns

A pandas data frame that combines original data and synthetic data.

### Usage

`synthesizer.data_generator()`

`synthesizer.data_generator(num_to_synthesize=100)`
