Exploratory data analysis (EDA) right from the start

Contents

Introduction

exponential-data-analysis-7832438

Exploratory data analysis (EDA)

– Handle missing value
– Remove duplicates
– Treatment of outliers
– Normalization and scaling (numeric variables)
– Coding of categorical variables (dummy variables)
– Bivariate analysis

19w3vwi8mxoq-o-e2sqy8sa-6575184
1r2am-y3spq5zr0uqv_jn8w-1927261

Box plot after removing outliers

  1. 1szh0jvldjuy4giw6eqpmnw-9714154

  2. 19ro75ahgifhchuucmdr-gq-5802884

    1wenfumtud7-uuwumoyfgag-6208749

  3. 1tt9gcgywxds5on-jkyvbfg-4278093

1trqxavmlxmx6wotiyyrfua-3816054

1at t0jz44ut4pue_lcvsmg-2567669

1fywb0gmnhcanrz88si1zfa-7796749

1adetm64zyfcgcs9vr62qow-9271201

Handling duplicate records

1bevbr5aefkcpvnyvmn7oaq-4345751

1ssw-f5x7dv5vs4ptgethrg-4771750

1sc8rigixtehv2k-k3_dcxg-6185536

Handling of outliers

13dahuoteomzyrlel2e_gba-6106036

Box plot before removing outliers

121yw90ga0dfxlzdzs6f_ca-1429381

1r2am-y3spq5zr0uqv_jn8w-1927261

Bivariate analysis

  1. Two categorical variables

    1. Bar graphic
    2. Clustered bar chart
    3. Dot chart

1aqfnkea591trjfpom6sc_g-3290719

Correlation between all variables

Normalize and scale

1l_md5imwoiipm7eu3feopg-8826867

11biyzv2s6rzpemep6jcmog-3273117

CODING

17pezd8ftpk_t7kzsnjozya-9632233

1rn-cyrzynhh6gqhxs48njg-1878424

1lzl75byxk6k-jjx4ltbzww-4704033

About the Author

ritika_photo-4510066

Ritika Singh | – Data scientist

I am a data scientist by profession and a blogger by passion. I have worked on machine learning projects for over 2 years. Here you will find articles on “Machine Learning, Stats, Deep Learning, NLP and Artificial Intelligence".

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.