This post was made public as part of the Data Science Blogathon
What every big tech company wants is the safety and security of their customers. By detecting spam alerts in emails and messages, they want to protect their network and improve the trust of their customers. Apple's official messaging app and Google's official chat app, In other words, Gmail, are prime examples of such applications where the spam detection and filtering procedure works well to protect users against spam alerts. Then, if you are looking to create a spam detection system, this text is for you.
What is the so-called Spam?
Electronic messages are a crucial means of communication between many people around the world.. But various people and corporations misuse this feature to deliver unsolicited bulk messages which are commonly referred to as spam SMS.. Spam SMS can include drug advertisements, software, adult content, insurance or other fraudulent advertisements. Various spam filters typically provide a protection mechanism that will design a system to recognize spam..
Spam detection
After submitting your personal data, such as mobile phone number or email address on any platform, they started advertising their unusual products by constantly pinging him. They try to advertise by sending constant emails and with the help of your contact details they keep sending you messages and are doing WhatsApp more today. Therefore, the result is nothing more than a host of spam alerts and notifications popping up in your inbox. Often, This is where the task of spam detection comes in..
Spam detection means detecting spam messages or emails by understanding the content of the text so that you only receive notifications about your messages or emails that are crucial to you. If spam messages are found, are automatically transferred to a spam folder and you are never notified of such alerts. This helps improve the user experience., since many spam alerts can annoy many users.
What is spam filtering?
Could you guess when you will become the target of hackers? Yes, if you are thinking about spam, you are on the right path. Whenever spam reaches your email or message inbox, you are in the hands of hackers and they will call you their target. When it comes to technology, humans tend to be the weakest link in most IT security situations. Attackers will constantly try to trick you, manipulating users to click things they shouldn't through a range of methods. Often, these “tricks” are done through email, Since email platforms can target a considerable number of people and perhaps a very “economic”. After clicking on the inappropriate available in spam emails, exposed your important and personal data to hackers. The role of spam filtering arises as email is widely used to harness users and their most powerful data.. Institutions should use a spam filter to reduce the danger of users clicking something they shouldn't, successively keeping your internal data protected from a cyber attack.
BECAUSE IT IS IMPORTANT?
The implementation of spam filtering is of exclusive relevance to all institutions. The main role of spam filtering is to keep junk out of email mailboxes.. Plus you can treat spam filtering like a friend who manages your life seamlessly by displaying only safe and desired emails. Spam filtering is actually used as an anti-malware tool because the only trick of hackers is to share the attachments in the mail and request your credentials.. Another aspect that is not neglected is the elimination of Graymail. Graymail is an email that a user has previously chosen to receive, but that you don't really need or need in your inbox. Graymail is not considered spam, since these emails do not usually infiltrate a corporation. what is considered gray mail is decided by user actions over time, and spam filtering platforms will gobble it up to find out what is or is not wanted in an inbox.
Until now, what you learned is spam detection, what and why. I'm pretty sure it was very clear to him. Now, this time is for implementation. Here, in this part, we train machine learning models to detect spam in your email with the help of the Python language. I'll start this task by importing the required Python libraries and, therefore, the dataset you would like for this task is spam.csv
Paso 1: -Import dependents
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import nltk from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer nltk.download('stopwords') import re import sklearn from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score
Paso 2: -Get the SMS dataset
sms = pd.read_csv('Spam SMS Collection', sep = 't', names=['label','message']) sms.head()
sms.drop_duplicates(inplace=True) sms.reset_index(drop=True, inplace=True) plt.figure(figsize=(8,5)) sns.countplot(x='label', data=sms) plt.xlabel('SMS Classification') plt.ylabel('Count') plt.show()
Paso 3: -Message cleaning
corpus = [] ps = PorterStemmer() for i in range(0,sms.shape[0]): message = re.sub(pattern='[^ a-zA-Z]', repl=" ", string=sms.message[i])
#Cleaning special character from the message message = message.lower() #Converting the entire message into lower case words = message.split() # Tokenizing the review by words words = [word for word in words if word not in set(stopwords.words('english'))] #Removing the stop words words = [ps.stem(word) for word in words] #Stemming the words message=" ".join(words) #Joining the stemmed words corpus.append(message) #Building a corpus of messages
Paso 4: -Creating the Bag of Words model
from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(max_features=2500) X = cv.fit_transform(corpus).toarray()
Paso 5: -Extract dependent variable from data set
y = pd.get_dummies(sms['label']) y = y.iloc[:, 1].values
Paso 6: -train_test_split
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=0)
Paso 7: -Checking the naive Bayes classifier alpha
best_accuracy = 0.0 alpha_val = 0.0 for i in np.arange(0.0,1.1,0.1): temp_classifier = MultinomialNB(alpha=i) temp_classifier.fit(X_train, y_train) temp_y_pred = temp_classifier.predict(X_test) score = accuracy_score(y_test, temp_y_pred) print("Accuracy score for alpha={} is: {}%".format(round(i,1), round(score*100,2))) if score>best_accuracy: best_accuracy = score alpha_val = i print('--------------------------------------------') print('The best accuracy is {}% with alpha value as {}'.format(round(best_accuracy*100, 2), round(alpha_val,1)))
Paso 8: -Prediction
def predict_spam(sample_message): sample_message = re.sub(pattern='[^ a-zA-Z]',repl=" ", string = sample_message) sample_message = sample_message.lower() sample_message_words = sample_message.split() sample_message_words = [word for word in sample_message_words if not word in set(stopwords.words('english'))] ps = PorterStemmer() final_message = [ps.stem(word) for word in sample_message_words] final_message=" ".join(final_message) temp = cv.transform([final_message]).toarray() return classifier.predict(temp)
result = ['Wait a minute, this is a SPAM!','Ohhh, this is a normal message.']
msg = "Hi! You are pre-qualified for Premium SBI Credit Card. Also get Rs.500 worth Amazon Gift Card*, 10X Rewards Point* & more. Click " if predict_spam(msg): print(result[0]) else: print(result[1])
PRODUCTION
Wait a minute, This is SPAM!
msg = "[Update] Congratulations Shivani, Your account is activated for investment in Stocks. Click to invest now: " if predict_spam(msg): print(result[0]) else: print(result[1])
PRODUCTION
Wait a minute, This is SPAM!
msg = "Your Stockbroker FALANA BROKING LIMITED reported your fund balance Rs.1500.5 & securities balance 0.0 as of the end of MAY-20. Balances do not cover your bank, DP & PMS balance with the broking entity. Check details at [email protected] If the email Id is not correct, kindly update with your broker." if predict_spam(msg): print(result[0]) else: print(result[1])
PRODUCTION
Ohhh, this is a normal me
Summary
Therefore, often, this is how you will train a machine learning or especially a deep learning model so that they can detect if an email or a message is spam or not. A spam detector detects spam messages or emails by understanding the content of the text so that you only receive notifications about messages or emails that are vital to you. I hope this post helps you increase your reach towards spam detection. In the current scenario, we can't afford to lose our security so easily. Let's start a campaign together with AnalyticsVidya to reduce cybercrime. Be happy to ask your valuable questions in the comment section below.. For more deep learning applications, Click here.
The media shown in this post is not the property of DataPeaker and is used at the author's discretion.