Data science articles 2020

Contents

Overview

  • Here is a list of the 10 top articles published this year by DataPeaker
  • Articles have been sorted in descending order, based on your opinions.
  • Feel free to add more articles in the comment section that you think the community should read..

Introduction

Writing is the best way to improve retention. Turning your learnings into your own words not only leads to better understanding, it also leads to an innate observation, which in turn leads to enhance curiosity.

In summary, writing elevates your learning process to unfathomable levels.

Writing is the core of DataPeaker principles. We have always tried to offer the best possible content and 2020 it was no different for us. More than 500 articles published this year, the writing journey never stops for us.

In this article, we highlight the 10 Articles most read by the Data Science community on our blog, published this year.

So let's get the ball rolling!

The best performing article on our blog is the one based on the most fundamental questions you ask a data scientist or data analyst in an interview.

“How many data science projects have you completed so far?”

github5-1531329

The answer makes the difference. Data science is not a field where theoretical understanding helps you get started. It is the projects you carry out and the practice you have that determines your probability of success.

Simply taking courses or obtaining certifications is not enough. Almost everyone we know is certified in various aspects of data science. You don't add any value to your resume if you don't combine it with practical experience.

But, Which data science project should i choose? At DataPeaker we love to collect the best data science projects each month and, in this article, We've compiled the best open source data science projects for the month of June 2020.

You can check it here.

feature-image-normalization-vs-standardization-6738153

The characteristic scale helps you convert multiple variables that have a myriad of units of measure, as kilograms, rupees, years, etc., in unitless measurements. But the question is what scaling method to use?

One of the obstacles every data scientist faces is the dilemma of choosing between Normalization and Standardization.. Most courses do not focus on this topic. The feature scale is one of the most important preprocessing steps and playing around with this concept without proper knowledge can lead to an inaccurate or biased model..

The article also talks about why some machine learning models improve dramatically with feature scaling, while others don't even move a little.

You can read the article here.

“What are the best tools to perform data science tasks? And what tool should you pick up as a newcomer to data science? “

tools-used-for-data-science-and-big-data-7878004

The essence of the article is covered in the question above. Once we identify what to learn on a personal level, or do at a professional level with the data, we need to identify the tools that best suit the task. This article is about identifying the best tuning tool.

Data science is a very broad subject and each spectrum requires that data be treated in a unique way.. And since their models tend to have a big impact on the organization's decisions, it is really important to identify which tools to use.

The article is divided into 2 parts, the first one focuses on the tools to handle Big Data in terms of volume, variety and speed. The next part talks about tools for data science in terms of: reporting and business intelligence, predictive modeling and machine learning, artificial intelligence.

You can read the article here.

2020 will go down in the history books as the year that changed all mankind. Every facet of life was affected by the Coronavirus and it was imperative that people from all domains come together and contribute to solving this problem..

image2-2-5157014

The article covers the use of Generative Adversarial Networks (GAN), a technique of oversampling real-word biased Covid-19 data to predict mortality risk. This story gives us a better understanding of how the data preparation steps, like handling unbalanced data, will improve the performance of our model.

The data and the central model of this article are considered from the recent study (july of 2020) about “COVID-19 Patient Health Prediction Using Driven Random Forest Algorithm” by Celestine Iwendi, Ali Kashif Bashir, Atharva Peshkar. et al. This study used the Random Forest algorithm powered by the AdaBoost model and predicted the mortality of individual patients with a 94% precision. In this article, the same model and the same model parameters were considered to clearly analyze the improvement of the precision of the existing model by using the GAN-based oversampling technique.

You can read the article here.

Why deep learning?

This is a perfect question. We are flooded with machine learning algorithms. There is no shortage in counting and any type of data can be solved using any of these algorithms.

man-nueral-network-red-8391823

What's more, deep learning algorithms require great computing power. Then, Is it necessary to use these algorithms?

This article is a testament to all the queries that question the need for deep learning and its neural networks, like convolutional neural networks (CNN), recurrent neural networks (RNN), artificial neural networks (ANN), etc. Deep learning replaces machine learning in terms of decision boundaries and feature engineering.

You can read the article here.

Many of us still don't know the different domains of the data industry. We still use these terms interchangeably and it causes a lot of confusion during communication.

ds-vs-ba3-3736335

There is an increase in demand for both Business Analytics and Data Science. The size of its market is expected to reach $ 100 billion and $ 140 billion, respectively, to 2025. Therefore, it only makes sense to understand what both domains really mean, your responsibilities and what are the similarities that lead to these terms being used interchangeably.

A DataPeaker, we have come across many aspiring analytics professionals who want to choose “Business Analytics” O “Data Science” as a career, but they are not even sure of the distinction between these two roles. Before diving into your own choice, you must be clear about which path you want to take, truth? It could be a career-defining choice!!

This article explores the similarities and differences between business analytics and data science and tries to give you a better picture.

You can read the article here.

Some of the simplest tasks, how to join tables, may seem complicated in Python. This article is a simple guide to bonding 2 tables using pandas library without problems.

how_to_join_dataframes_python-9690237

Our seventh best performing article will help you understand the different types of combinations in Pandas.:

  • Inner join in Pandas
  • Join full in Pandas
  • Union Left in Pandas
  • Join the right in Pandas

You can read the article here.

data_science_projects_github-3564080

This is the second article from an open source data science project to appear on this list. We take this as a clear sign that learning hasn't taken a backseat when it comes to aspiring data science..

This article contained the top open source data science projects for the month of April. The list includes-

  • Convert any image into a 3D photo
  • Transform a picture into a cartoon illustration
  • Single shot multiple object tracking
  • OpenAI Jukebox: a generative model for music
  • ShyNet: privacy-friendly, cookie-free web analytics
  • Soccer Analysis Manual

You can read the article here.

5-python-ides-3682008

Coding is a very personal experience for any data scientist, business analyst, data analyst or any programmer.

We've all reached a point in our coding journey where we feel that a particular tool is detrimental to our efficiency.. The reason may vary from your coding style, your position on the learning path or any other reason that makes the tool incompatible for you.

That's where identifying the correct IDE comes in.. An IDE helps us write and execute Python code for analysis, data science, software development and a host of other tasks. There are several IDEs on the market at the moment, with its own set of features, advantages and disadvantages.

You can read the article here.

How do we represent that data in a way that helps our leadership team or decision makers reach consensus quickly??

The answer to the above question is a concise visualization. You can't create a model in Excel or Python and just hope that stakeholders understand the implications.

amazing-dashboard-templates-for-excel-6045793

Excel has been a market leader when it comes to EDA and visualization tasks for more than 35 years. Most companies trust him, especially the small ones due to their characteristics.

In this article, we analyze the following panels:

  • Online sales tracking
  • Marketing analysis
  • Projects management
  • Income tracking
  • Human resources management

You can read the article here.

Final notes

Year 2020 it was a leap for the machine learning community. I hope these articles on data science are helpful to you on your learning journey.. Let us know your thoughts in the comments below..

Keep learning! And never stop writing!

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.