Spam detection: a deep learning app

Contents

This post was made public as part of the Data Science Blogathon

What every big tech company wants is the safety and security of their customers. By detecting spam alerts in emails and messages, they want to protect their network and improve the trust of their customers. Apple's official messaging app and Google's official chat app, In other words, Gmail, are prime examples of such applications where the spam detection and filtering procedure works well to protect users against spam alerts. Then, if you are looking to create a spam detection system, this text is for you.

sourceSpam detection

What is the so-called Spam?

Electronic messages are a crucial means of communication between many people around the world.. But various people and corporations misuse this feature to deliver unsolicited bulk messages which are commonly referred to as spam SMS.. Spam SMS can include drug advertisements, software, adult content, insurance or other fraudulent advertisements. Various spam filters typically provide a protection mechanism that will design a system to recognize spam..

Spam detection

After submitting your personal data, such as mobile phone number or email address on any platform, they started advertising their unusual products by constantly pinging him. They try to advertise by sending constant emails and with the help of your contact details they keep sending you messages and are doing WhatsApp more today. Therefore, the result is nothing more than a host of spam alerts and notifications popping up in your inbox. Often, This is where the task of spam detection comes in..

Spam detection means detecting spam messages or emails by understanding the content of the text so that you only receive notifications about your messages or emails that are crucial to you. If spam messages are found, are automatically transferred to a spam folder and you are never notified of such alerts. This helps improve the user experience., since many spam alerts can annoy many users.

What is spam filtering?

Could you guess when you will become the target of hackers? Yes, if you are thinking about spam, you are on the right path. Whenever spam reaches your email or message inbox, you are in the hands of hackers and they will call you their target. When it comes to technology, humans tend to be the weakest link in most IT security situations. Attackers will constantly try to trick you, manipulating users to click things they shouldn't through a range of methods. Often, these “tricks” are done through email, Since email platforms can target a considerable number of people and perhaps a very “economic”. After clicking on the inappropriate available in spam emails, exposed your important and personal data to hackers. The role of spam filtering arises as email is widely used to harness users and their most powerful data.. Institutions should use a spam filter to reduce the danger of users clicking something they shouldn't, successively keeping your internal data protected from a cyber attack.

BECAUSE IT IS IMPORTANT?

The implementation of spam filtering is of exclusive relevance to all institutions. The main role of spam filtering is to keep junk out of email mailboxes.. Plus you can treat spam filtering like a friend who manages your life seamlessly by displaying only safe and desired emails. Spam filtering is actually used as an anti-malware tool because the only trick of hackers is to share the attachments in the mail and request your credentials.. Another aspect that is not neglected is the elimination of Graymail. Graymail is an email that a user has previously chosen to receive, but that you don't really need or need in your inbox. Graymail is not considered spam, since these emails do not usually infiltrate a corporation. what is considered gray mail is decided by user actions over time, and spam filtering platforms will gobble it up to find out what is or is not wanted in an inbox.

Until now, what you learned is spam detection, what and why. I'm pretty sure it was very clear to him. Now, this time is for implementation. Here, in this part, we train machine learning models to detect spam in your email with the help of the Python language. I'll start this task by importing the required Python libraries and, therefore, the dataset you would like for this task is spam.csv

Paso 1: -Import dependents

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
nltk.download('stopwords')
import re
import sklearn
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

Paso 2: -Get the SMS dataset

sms = pd.read_csv('Spam SMS Collection', sep = 't', names=['label','message'])
sms.head()
sms.drop_duplicates(inplace=True)
sms.reset_index(drop=True, inplace=True)
plt.figure(figsize=(8,5))
sns.countplot(x='label', data=sms)
plt.xlabel('SMS Classification')
plt.ylabel('Count')
plt.show()

Paso 3: -Message cleaning

corpus = []
ps = PorterStemmer()
for i in range(0,sms.shape[0]):
    message = re.sub(pattern='[^ a-zA-Z]', repl=" ", string=sms.message[i])
#Cleaning special character from the message

    message = message.lower() #Converting the entire message into lower case
    words = message.split() # Tokenizing the review by words
    words = [word for word in words if word not in set(stopwords.words('english'))] 

#Removing the stop words

    words = [ps.stem(word) for word in words] #Stemming the words
    message=" ".join(words) #Joining the stemmed words
    corpus.append(message) #Building a corpus of messages

Paso 4: -Creating the Bag of Words model

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=2500)
X = cv.fit_transform(corpus).toarray()

Paso 5: -Extract dependent variable from data set

y = pd.get_dummies(sms['label'])
y = y.iloc[:, 1].values

Paso 6: -train_test_split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=0)

Paso 7: -Checking the naive Bayes classifier alpha

best_accuracy = 0.0
alpha_val = 0.0
for i in np.arange(0.0,1.1,0.1):
    temp_classifier = MultinomialNB(alpha=i)
    temp_classifier.fit(X_train, y_train)
    temp_y_pred = temp_classifier.predict(X_test)
    score = accuracy_score(y_test, temp_y_pred)
    print("Accuracy score for alpha={} is: {}%".format(round(i,1), round(score*100,2)))
    if score>best_accuracy:
        best_accuracy = score
        alpha_val = i
print('--------------------------------------------')
print('The best accuracy is {}% with alpha value as {}'.format(round(best_accuracy*100, 2), round(alpha_val,1)))

Paso 8: -Prediction

def predict_spam(sample_message):
    sample_message = re.sub(pattern='[^ a-zA-Z]',repl=" ", string = sample_message)
    sample_message = sample_message.lower()
    sample_message_words = sample_message.split()
    sample_message_words = [word for word in sample_message_words if not word in set(stopwords.words('english'))]
    ps = PorterStemmer()
    final_message = [ps.stem(word) for word in sample_message_words]
    final_message=" ".join(final_message)
    temp = cv.transform([final_message]).toarray()
    return classifier.predict(temp)
result = ['Wait a minute, this is a SPAM!','Ohhh, this is a normal message.']
msg = "Hi! You are pre-qualified for Premium SBI Credit Card. Also get Rs.500 worth Amazon Gift Card*, 10X Rewards Point* & more. Click "
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

PRODUCTION

Wait a minute, This is SPAM!

msg = "[Update] Congratulations Shivani, Your account is activated for investment in Stocks. Click to invest now: "
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

PRODUCTION

Wait a minute, This is SPAM!

msg = "Your Stockbroker FALANA BROKING LIMITED reported your fund balance Rs.1500.5 & securities balance 0.0 as of the end of MAY-20. Balances do not cover your bank, DP & PMS balance with the broking entity. Check details at [email protected] If the email Id is not correct, kindly update with your broker."
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

PRODUCTION

Ohhh, this is a normal me

Summary

Therefore, often, this is how you will train a machine learning or especially a deep learning model so that they can detect if an email or a message is spam or not. A spam detector detects spam messages or emails by understanding the content of the text so that you only receive notifications about messages or emails that are vital to you. I hope this post helps you increase your reach towards spam detection. In the current scenario, we can't afford to lose our security so easily. Let's start a campaign together with AnalyticsVidya to reduce cybercrime. Be happy to ask your valuable questions in the comment section below.. For more deep learning applications, Click here.

The media shown in this post is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.