This is an emerging research area targeting to solve social science theories by using computation, and most notably with Big Data and Machine learning techniques.
While checking the web analytics logs of my website, I discovered that there are too many bruteforce attack attempts originated from Brasil and France. Until today, I was just using Akismet Anti-Spam to prevent spam comments but it wasn’t enough. Then, I found a solution by adding new security plugins that are already used by 100K+ WordPress users. To be honest, I am really surprised that I’ve just met with this problem. Here, there is a list of the active security plugins on my site. You may consider using them.
1. Akismet Anti-Spam: Used by millions, Akismet is quite possibly the best way in the world to protect your blog from spam. Your site is fully configured and being protected, even while you sleep.
2. Anti-Spam by CleanTalk: Max power, all-in-one, no Captcha, premium anti-spam plugin. No comment spam, no registration spam, no contact spam, protects any WordPress forms.
3. Anti-Malware Security and Brute-Force Firewall: This Anti-Virus/Anti-Malware plugin searches for Malware and other Virus like threats and vulnerabilities on your server and helps you remove them. It’s always growing and changing to adapt to new threats so let me know if it’s not working for you.
4. Protection against DDoS
5. Stop User Enumeration: User enumeration is a technique used by hackers to get your login name if you are using permalinks. This plugin stops that.
6. WP Security Optimizer: Protect your site from vulnerability scanner and hackers
When you discover that your time series have the similar trend, you may want to measure how much are they correlated. In that case, the Pearson correlation coefficient is one of the most widely used metric. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. (Source: Wikipedia) If you are interested to know more about that, this paper may be relevant to you. For others who want to calculate the Pearson value, Scipy library provides a function named “pearsonr”. Alternatively, numpy library has the function named “corrcoef”. Here is my example:
(1) It can be easily discovered that the two plots have the similar trend, even though the scale of y values are different.
(2) By using pearsonr function of Scipy library, we calculate the Pearson correlation coefficient. Here are the Python codes.
from scipy.stats.stats import pearsonr import numpy as np import sys
if __name__ == "__main__": list1 = [241, 69, 72, 143, 128, 68, 126, 82, 126, 108, 68, 90, 81, 60, 72, 93, 80, 97, 65, 74, 71] list2 = [621711, 190310, 204282, 319612, 367879, 200600, 329108, 226406, 399833, 253989, 233108, 301069, 257548, 206579, 255322, 268418, 279106, 304694, 216643, 236923, 254406] if len(list1) != len(list2): print("error, two series should contain same size of elements") sys.exit # scipy library print("scipy result: ", pearsonr(list1, list2)) # numpy library print("numpy result: ", str(np.corrcoef(list1, list2)))
(3) As a result, we see that two series are highly correlated, with a Pearson coefficient value as 0.94. (Pearson’s R has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation)
(4) Alternatively, you may want to see how that value will be affected when we change one single value from the series. To discover that, change the value of the last element for the list1 from 71 to 710.
(5) You will observe that the Pearson score decreased significantly from 0.94 to 0.21.
Topic discovery is an important research area, and one of the most important algorithms used in this field is Latent Dirichlet Allocation, which is an unsupervised learning algorithm based on statistics, and its inventor is Columbia University Professor David M Blei. In his research paper published in 2013 (click to view the paper), he gives the details of the algorithm. (He is the co-author with Andrew Ng and Michael I. Jordan) Until today, many variations of the algorithm are invented for different needs, but I mostly focused on the LDA algorithms that are capable to discover topics on short texts, such as Tweets. I will share the prominent extensions of LDA from this post.
I was thinking in this morning, especially after the last Cambridge Analytica scandal on Facebook, that there should be a new kind of privacy-first data analysis process in Facebook without sharing the data with external companies. In the current system, the flow is: the user accepts permission request of the application, and the app owner is collecting the data on its own platform and doing analysis on it, selling it to another firm, etc… Instead, the data analysis task should be executed on the control of Facebook. In the new system, when the app wants permission, the facebook will alert: this app wants to do an analysis of your data, we will never share the data with him, the analysis that the app will do will be executed on my platform, I reviewed and controlled their codes, (like the Apple Store code review process ), and we’ll share the result of the analysis with both you and the app owner ( such as you are supporting 80% conservative party), and the app owner will also tell you how and for what purpose it will use this result.
Recently, I tried several products to extract demographic information from a profile image. My target was to obtain information about age, gender, and ethnicity. I found the prominent companies in the sector are Clarifai and Face++. I integrated my trial software with both products and I found Clarifai’s accuracy better than Face++. My reasons are:
- Clarifai provides the probability value of its predictions. (predicted gender is female with a probability %52) So, it is possible to eliminate the results having low prediction score. On the contrast, Face++ does not provide that value. This is an unwanted situation because, in binary classification technique, the prediction always has a result, even its score is not very high.
- Clarifai correctly predicted the ethnicity of the image below as “White”, while Face++ wrongly predicted it as “Black”. But on the other hand, Clarifai could not found the gender value correctly (female %51, male %49) while Face++ correctly marked it as male (we don’t know its probability).
- The disadvantage of Clarifai is its low quota for free usages. It permits only 2500 API calls per month for free accounts. But Face++ does not specify any upper limit for free accounts. It has only one single limitation, which is one single API call per second.
I hope my hands-on experience with these services will help you choose the right product.
Result of Clarifai: (https://clarifai.com/demo)
Gender: feminine (prob. score: 0.510), masculine(prob. score: 0.490)
Age: 55 (prob. score: 0.356)
Ethnicity (Multicultural appearance): White: (prob. score: 0.981)
Result of Face++: (https://www.faceplusplus.com/attributes/#demo)
Ethnicity (Multicultural appearance): Black
A very inspiring research is made at the end of 2016. With the help of deep learning, now it is possible to generate images from given texts.
Could you imagine some use cases based on this technology? I found an interesting use case.. Imagine you are in a police station, about a robbery occurred in a bank… The thief could not be found and you explain the visual profile of thief as you are the unique eyewitness of this event. At that time, a computer automatically generates the image of thief based on the visual details you describe… At the same time, the computer increases the precision of that visual by matching it with other records of past robbery events.
Within the last month, the future of education was one of the main topics in Davos. There were very interesting debates, and in of them, Jack Ma (the founder of Alibaba) told that it is strongly and urgently needed to change the current education system due to the rising impact of robots. Since robots are able to obtain the knowledge, by learning from their past experiences, they will do most of the things people do today. In order to adapt ourselves to the modern world, we need to educate our children in a way that cannot be copied by robots. Rather than teaching mathematics or physics to our children, we should support their more humanistic skills such as music and art.
I agree with Jack Ma’s ideas and I think we need to think more about people’s main advantages and disadvantages over robots in the next 20 years. Today, our children start learning to code in primary school, in order to communicate better with the robots and understand their logic. But when the world will be dominated by robot activities, all the things will be changed and humans should be in a place where robots do not see them as a threat.
I started to the MongoDB developer course given online by MongoDB University. I have worked a lot with Mongo at Vodafone but I was using only 10-20% of its key features. Now at Politecnico, the things are more complex so I need to pay more attention to the performance issues. In my research project, I use MongoDB to store Tweets and perform text analysis over the records.
I currently completed the Week-1 course. I hope I will learn more in the upcoming weeks.
In these days, my research motivation is to find some insights by analyzing Twitter data to understand how English people react to Brexit referendum. There are various researches already made about this topic, and most of them are done by universities in England such as Imperial College London and the University of Bristol. I found it as a quite interesting research topic since social media is an important environment to present our ideas to the community and there is a need for more research to understand people’s opinions. I will give more detailed information about my study in the upcoming weeks. If you have any recommendation for me, please feel free to send me an email.