Starting with Kaggle | The first look at Kaggle

Contents

This article was published as part of the Data Science Blogathon

Introduction

Every current career needs to have a community, a group of people we can talk to about work, the mistakes, ideas and learn. Kaggle is the world's largest and most popular data science community. Having a community like this helps us feel that we "belong", which is one of the crucial feelings for our social interaction and our health.

In this article, we will see Kaggle as a complete community and Kaggle as a platform: all its different tools, services and resources available so that we can learn like practice data science.

Let's see the interface we get when we visit Kaggle for the first time.

56769screenshot20357-4251898

Before you start using Kaggle, we need to create an account and then login, you can see both options in the upper right corner. Once i'm done with that, this is what it might look like.

90248screenshot20358-2029611

Some of the things visible here may be different for you because the interface is customized with the way I have used Kaggle so far from the moment I signed up.

Navbar and everything we have at our disposal in Kaggle:

62386screenshot20360-3091979

Once I click on 'more', those are all the things I can access from my Kaggle account.

75241screenshot20361-3319797

In my opinion, there is 4 important things that make Kaggle “THE BEST”.

1. Free courses and certificates available

There are many courses available in multiple domains of machine learning and data science. Not only courses are available, after each lesson, but there are also practice notebooks (training) available to get acquainted with the topic. To get your free Kaggle certificate, it is necessary to complete all tasks and exercises.

53223screenshot20362-7878913
20483screenshot20363-9160728
44530screenshot20364-3094428

There are few more courses, but through this, I wanted to show you that there is such a diversity of topics in these courses that you don't have to go anywhere, at any time to feel lost in an issue or problem, get help from here.

Let me show you what these courses look like with an example:

59463screenshot20367-8543587
48640screenshot20368-7917057

At the end of each course, there is an additional lesson, which is different in terms of content but similar to the use case and understanding of the course. They mostly include some famous theme and / or powerful. Here we have AutoML (de Google) to automate machine learning.

2. A huge collection of publicly available data sets / contributed to practice / to work

For any data science or machine learning or deep learning task, we need data and a lot of it most of the time. Instead of browsing different sites for different types / data set sizes, Kaggle provides a common place for a large collection of all these data sets. You can use them with one click. They are extremely easy to use.

29028screenshot20369-8522682
32956screenshot20370-9281940

Once you click “Data sets” in the navigation bar, this is what you will see. You can search for a specific data set, to import / contribute your own dataset to the community or study or start working on a dataset, shown on this page. (Trend data sets, Popular data sets, Recently viewed data sets)

For demonstration, I will search for a specific data set (“sunspot data set”). Let's see how it looks.

29789screenshot20371-7931344

The number in the red selection is the number of positive votes that people gave, for the most relevant option / I like it. Let's explore and see this dataset in detail.

There are many things we can use to find out more about this data and start working right away.

  • You can download the dataset,
  • create a new Kaggle Notebook with this dataset already loaded.
  • Some details about the columns within the data.
  • Activities involving this data.
  • Finally, but not less important, all notebooks created and shared publicly to date that use this data.

3. Data Science Competencies / machine learning / deep learning

Although I have not participated in any of them, I love how we completed an issue in real time together with the Kaggle community and won amazing cash prizes (if we participate in that particular competition). I definitely want to participate sometime soon, I hope the images motivate you. It is not necessary that only large companies or rich companies can do that. You can do that too. There are certain protocols that must be followed and voila, you have your own competition hosted.

79134screenshot20376-1388455
50413screenshot20377-9405335

I have ranked the completed competition to date based on their reward value. Look closely.

4. Kaggle Notebooks (code)

For any task related to data science or computer science, we have to write at least some code. Kaggle provides us with its own Notebook environment with a certain limit of how much we can store in them. (collectively on account), how many hours of GPU available and how many hours of TPU available. They are fully integrated with all Kaggle services and can be used independently like any other notebook environment (Datalore, Google Colab, Jupyter, etc.), which means you can use them for your practice, kaggle competitions, Kaggle courses, analyzing some Kaggle / or non-Kaggle data sets and many more. You must check them.

92661screenshot20378-7517554

Clicking on that black button, create your notebook or open someone else's notebook you want to read and learn / compare. All these visible notebooks are explicitly shared publicly, which means that your notebooks will not be visible to anyone unless you choose to do so.

To switch from CPU to GPU or TPU, follow this:

72424screenshot20380-4251022

These are most of the functional options that you have at your disposal regarding this laptop:

89069screenshot20381-3102904
62421screenshot20382-8573959

Let's see how to use them with data (imported / taken directly from Kaggle / downloaded from url, etc.) and get started on your data science assignments.

90089screenshot20383-7931089
93972screenshot20384-7494173

Here I will show you how to use that dataset from “Sunspots” what we saw before. Start by searching.

48408screenshot20385-1494437
98343screenshot20386-1007680

Now the data is loaded correctly. The selection in the image above is the directory in which it is stored. Let's see a little pandas code on how to import the dataset.

44616screenshot20387-4312854

The last thing you can do after completing your project / job is to share it with the community on Kaggle. This is an important step because by sharing our ideas, our work, we expand the utilities available to the community and support each other. We grow thanks to others.

To the left of the big blue button in the upper right, you will see a “Share” button. Click on that and select Public from the drop-down menu.

31300screenshot20388-7976762

Hope you liked what you saw in this guide and are eager to start using Kaggle.

Gargeya Sharma

B.Tech Computer Science 3er año
Specialized in data science and deep learning
Data Scientist Intern at Upswing Cognitive Hospitality Solutions
For more information, check my github home page

LinkedIn GitHub

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.