Kaggle Competition | Kaggle Issue List

Contents

Introduction

Do I have the necessary skills to participate in Kaggle competitions?

Have you ever faced this question? At least i did, when i was a sophomore, when I used to fear Kaggle just by imagining the level of difficulty he offers. This fear was similar to my fear of water. My fear of water did not allow me to take swimming lessons. But nevertheless, later I learned: “Until the moment you don't step on the water, can't you see how deep it is”. A similar philosophy applies to Kaggle. Don't conclude until you try it!!

kaggle-logo-transparent-300-1024x465-5658914

Kaggle, the home of data science, provides a global platform for competencies, solutions for clients and job board. Here is the Kaggle screenshot, these competitions don't just make you think outside the box, they also offer an attractive cash prize.

But nevertheless, people hesitate to participate in these contests. Some of the main reasons are listed below:

  1. They look down on your skill level, knowledge and techniques acquired.
  2. Regardless of your skill level, choose the problem that offers the highest prize money.
  3. They fail to confuse their skill level with the difficulty level of the problem.

I think this problem comes from Kaggle himself. Kaggle.com does not provide any information that can help people choose the most appropriate problem that matches their skill set. As a result, has become a chore for beginners / intermediates decide what is the right problem to start with.

What will you learn in this article?

In this article, we have opened the deadlock to choose the appropriate kaggle problem according to your skill set, tools and techniques. Here, we have illustrated each Kaggle problem with the level of difficulty and the level of skills needed to solve it.

In the last part, we have defined the correct approach to tackle a kaggle problem for the following cases:

Case 1: i have coding experience, but i'm new to machine learning.

Case 2: I've been in the analytics industry for over 2 years, but I'm not comfortable with R / Python

Case 3: I'm good with coding and machine learning, I need something challenging to work on

Case 4: I am a newbie to both machine learning and coding language, but I want to learn

Kaggle Issue List

1. Titanic: machine learning from disasters

Target: A classic popular problem to start your journey with machine learning. You are given a set of attributes of the passengers on board and you need to predict who would have survived after the ship sank.

titanic-4547234

Difficulty level

a) Machine learning skills: easy

b) Coding skills: easy

c) Acquire Mastery Skills: easy

d) Tutorials available – Very complete

2. First step with Julia

Target: This is a problem to identify characters in the Google Street View image using an upcoming tool, Julia.

julia-5605648

Difficulty level in each of the attributes:

a) Machine learning skills: easy

b) Coding skills – Half

c) Acquire Mastery Skills: easy

d) Tutorial available – Full

3. Digit Recognizer

Target: You are given a data with pixels in handwritten digits and you need to conclusively say which digit it is. This is a classic problem for the Latent Markov model.

Difficulty level in each of the attributes:

a) Machine learning skills: half

b) Coding skills – Half

c) Acquire Mastery Skills: easy

d) Tutorial available: available but without hand grip

4. Word Bag with Popcorn Bag

Target: You are given a number of movie reviews and you need to find the hidden sentiment in this statement. The purpose of this problem statement is to introduce you to the Google package – Word2Vec.

It is a fantastic package that helps you convert words in finite dimensional space. In this way we can build analogies just by looking at the vector. A very simple example is that your algorithm can generate analogies like: Rey – Man + Woman will give you queen.

popcorn-3400048

Difficulty level in each of the attributes:

a) Machine learning skills – Hard

b) Coding skills – Half

c) Acquire Mastery Skills: easy

d) Tutorial available – Available but without hand grip

5. Dirty Document Noise Removal

Target: You may be familiar with a technology known as OCR. Simply convert handwritten documents into digital documents. But nevertheless, Is not perfect. Your job here is to use machine learning to make it perfect..

documents-3979703

Difficulty level in each of the attributes:

a) Machine learning skills – Hard

b) Coding skills – Hard

c) Acquire Mastery Skills: hard

d) Tutorial available – No

6. San Francisco crime classification

Target: Predict the category of crimes that occurred in the city by the bay.

san-francisco-4992579

Difficulty level in each of the attributes:

a) Machine learning skills: very difficult

b) Coding skills: very difficult

c) Acquire Mastery Skills: hard

d) Tutorial available – No

7. Weather / taxi trajectory prediction location

Target: There are two problems based on the same data sets. You are provided with a taxi driver and are supposed to predict where the taxi is going or how long it will take to complete the journey.

taxi-1-6609405

Difficulty level in each of the attributes:

a) Machine learning skills: easy

b) Coding skills – Hard

c) Acquire Mastery Skills: half

d) Tutorial available: some reference codes available

8. Facebook recruitment: human the bot

Target: If you have a problem understanding a new domain, must solve this. You are given the details of the tender and are expected to classify the bidder as bot or human. This has the richest data source available of all the problems in Kaggle.

fb-7912754

Difficulty level in each of the attributes:

a) Machine learning skills: half

b) Coding skills – Half

c) Acquire Mastery Skills: half

d) Tutorial available: no support available as it is a recruiting contest

Note: I have not covered Kaggle contests that offer prize money in this article., since they are all related to a specific domain. Let me know your thoughts on them in the comment section below..

Now we will look for the right approach for people who have different skill sets at different stages of life to start their Kaggle journey!!

Case 1: i have coding experience, but i'm new to machine learning.

Paso 1: The first Kaggle problem you need to address is: Taxi trajectory prediction. The reason is that the problem has a complex dataset that includes a JSON format in one of the columns that indicates the set of coordinates that the taxi has visited. If you can break this down, getting an initial estimate on the target or target time doesn't need machine learning. Therefore, you can use your coding strength to find your worth in this industry.

Paso 2: Your next step should be to take: Titanic. The reason is that by now you will understand how to handle complex data sets. Therefore, now is the perfect time to try to solve pure machine learning problems. With abundance of solutions / scripts available, will be able to build a good solution.

Paso 3: Now you are ready for something big. Try Facebook recruiting. This will help you appreciate how understanding the domain can help you get the most out of machine learning..

Once you have all these pieces in place, you can test any problem on Kaggle.

Case 2: I've been in the analytics industry for over 2 years, but I'm not comfortable with R / Python

Paso 1: You should start by taking a picture on Titanic. The reason is that you already know how to build a predictive algorithm. You should now strive to learn languages ​​like R and Python. With a large number of solutions / scripts available, you will be able to build different types of models in both R and Python. This problem will also help you understand some advanced machine learning algorithms.

Paso 2: The next step should be Facebook recruiting. The reason is that, given the simplicity of the data structure and the richness of the content, you will be able to join correct tables and make a predictive algorithm on this. This will also help you appreciate how understanding the domain can help you get the most out of machine learning..

Suggestions: You are now ready for something very different from your comfort zone.. Read problems like Diabetic Retinopathy Screening, Clicks on Avinto context ads, Classification of crimes and find the domain of your interest. Now try applying what you have learned so far.

Now is the time to try something more complex to code. Try taxi trajectory prediction or denoising dirty documents. Once you have all these pieces in place, now you can try any problem in Kaggle.

Case 3: I'm good with coding and machine learning, I need something challenging to work on

Paso 1: You have many options in Kaggle. The first option is to master a new language like Julia. You can start with First Step with Julia. The reason is that this will give you additional exposure to what Julia can do besides Python or R.

Paso 2: The second option is to develop skills with additional mastery. You can try Avito Context, Search Relevance o Facebook – Human vs. Bot.

Case 4: I am a newbie to both machine learning and coding language, but I want to learn

Paso 1: You should start your kaggle journey with Titanic. The reason is that the first step for you is to learn languages ​​like R and Python. With a large number of solutions / scripts available, you will be able to build different types of models in both R and Python. This problem will also help you understand some machine learning algorithms.

Paso 2: Then i should take: Facebook Recruiting. The reason is that, given the simplicity of the data structure and the richness of the content, you will be able to join correct tables and make a predictive algorithm on this. This will also help you appreciate how understanding the domain can help you get the most out of machine learning..

Once you are done with these, can address the issues based on your interest.

Few tricks to be a fair competition in Kaggle

This is not a complete list of hacks, but it's meant to get you off to a good start. The full list deserves a new post on its own:

  1. Make sure to submit a solution (even sample submission will do this job) before the last registration date, if you want to participate in the contest in the future.
  2. Understand the domain before moving on to the data. For instance, in bot versus human, you need to understand how the online bidding platform works before you start the journey with data.
  3. Create your own evaluation algorithm that can mimic the Kaggle test score. A simple cross-validation of 10 times generally works fine.
  4. Try to extract as many characteristics as possible from the train data; feature engineering is usually the part that pushes you from the percentile 40 above percentile 10 superior.
  5. As usual, a single model does not place it at the top 10. You need to make a lot of models and assemble them. It can be multiple models with different algorithms or different sets of variables.

Final notes

There are multiple benefits that I have come to realize after working on Kaggle issues. I learned R / Python on the go. I think it's the best way to learn the same. What's more, interacting with people from the discussion forum on various issues will help you get a deeper scoop on machine learning and mastery.

In this article, We illustrate various Kaggle problems and rank its essential attributes on the level of difficulty. We also tackled various real life cases and got the right approach to get involved in Kaggle.

Have you been involved in any Kaggle issues? Did you see any significant benefit from doing the same? Let us know your thoughts on this guide in the comment section below..

If you like what you have just read and want to continue learning about analytics, subscribe to our emails, Follow us on twitter or like ours page the Facebook.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.