Function to Solve Null Value Problem-Filling Null Values

We wanted to create a function, which can fill missing values automatically. Our aim is to create a function that can take the data frame as input, finds the missing values, and fills the missing values with appropriate values. 

So we have created our function, which will find out, numerical and categorical values from the data frame, then it will also find out which numerical and categorical values are missing.

 It will fill those numerical and categorical values with different methods, numerical values will be filled with mean and categorical values will be filled with the most frequently occurring category.

To Use this Function just pass pandas DataFrame to the function and it will do the rest.

Suppose the data frame is d

d=clean_missing(d)


def clean_missing(df):

    from sklearn.pipeline import Pipeline

    from sklearn.impute import SimpleImputer

    from sklearn.compose import ColumnTransformer

    df_missing_values = df.isnull().sum()

    df_numeric_columns = df.select_dtypes(include=["int64","float64"]).keys()

    columns_numeric_missing = [var for var in df_numeric_columns if df_missing_values[var]>0]

    

    df_categorical_columns = df.select_dtypes(include=["object"]).keys()

    columns_categorical_missing = [var for var in df_categorical_columns if df_missing_values[var]>0]

    

    numeric_value_mean_imputer = Pipeline(steps=[("imputer", SimpleImputer(strategy="mean"))])

    categorical_value_mode_imputer = Pipeline(steps=[("imputer", SimpleImputer(strategy="most_frequent"))])

    

    

    preprocessing = ColumnTransformer(transformers=[("mean_imputer", numeric_value_mean_imputer, columns_numeric_missing),

                                                ("mode_imputer", categorical_value_mode_imputer, columns_categorical_missing)])

    

    #scale data

    df_clean_null_value = preprocessing.fit_transform(df)


    df_missing_value_solve = pd.DataFrame(df_clean_null_value, columns=columns_numeric_missing+columns_categorical_missing)


    df.update(df_missing_value_solve)

    return df

 https://www.kaggle.com/drnitinmishra/missing-values-function



Comments

Popular posts from this blog

BASIC PANDAS FOR DATA ANALYSIS AND MACHINE LEARNING