Failed to load latest commit information. View code. Instructions and Navigation All of the code is organized into folders. MIT License. Releases No releases published. Packages 0 No packages published. Contributors 2. Data harvesting is the entry point for any social media analysis project. There are two main ways to collect data for an analysis: by connecting to APIs or by crawling and scraping the social networks. It is crucial to understand how the data was collected in order to be aware of the bias that might introduced.
Different harvesting techniques will require customized approaches to data preprocessing and analysis workflow, which we will explain in further chapters.
A widely used term, API Application Programming Interface is defined as a set of instructions and standards to access a web based software application. But what does it mean in real life? Firstly, APIs allow users to send a request for a particular resource, such as Facebook or Twitter , and receive some data in response.
It is worth noting that all API providers fix some limitations on the quantity or type of data which users can obtain. APIs give access data processing resources, such as AlchemyAPI that receives in a request verbatim textual data and sends in response all results of the analysis, such as nouns, verbs, entities, and so on. In our case, the APIs are used either to get data from social networks or to execute some complex processing on them.
In order to access and manipulate APIs, we have to install the urllib2 library:. In some cases, if you fail to perform the installation. You can also try using the request library, which is compatible with Python 2. Scraping or web scraping is a technique to extract information from websites. In order to perform the task, we need a scraper that is able to extract information that we need and structure it in a predefined format.
When we decide to build a scraping strategy, we have to take into consideration the terms and conditions, as some websites do not allow scraping. Python offers very useful tools to create scrapers and crawlers, such as beautifulsoup and scrapy. In this section, we briefly introduce the main techniques which lie behind the social media analysis process and bring intelligence to the data.
We also present how to deal with reasonably big amount of data using our development environment. However, it is worth noting that the problem of scaling and dealing with massive data will be analyzed in Chapter 9 , Social Data Analytics at Scale - Spark and Amazon Web Services.
The recent growth in the volume of data created by mobile devices and social networks has dramatically impacted the need for high performance computation and new methods of analysis.
Historically, large quantities of data big data were analyzed by statistical approaches which were based on sampling and inductive reasoning to derive knowledge from data. A more recent development of artificial intelligence, and more specifically, machine learning, enabled not only the ability to deal with large volume of data, but it brought a tremendous value to businesses and consumers by extracting valuable insights and hidden patterns.
Field of study that gives computers ability to learn without being specifically programmed for it. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that allow to This approach is similar to a person who increases his knowledge on a subject by reading more and more books on the subject. There are three main approaches in machine learning: supervised learning , unsupervised learning , and reinforcement learning. Supervised learning assumes that we know what the outputs are of each data point.
Unsupervised learning is used when we do not know the outputs. In the case of cars, we only have technical specifications: acceleration, price, engine type.
Then we cluster the data points into different groups clusters of similar cars. In our case, we will have the clusters with similar price and engine types. Then, we understand similarities and differences between the cars. The third type of machine learning is reinforcement learning , which is used more in artificial intelligence applications. It consists of devising an algorithm that learns how to behave based on a system of rewards. This kind of learning is similar to the natural human learning process.
It can be used in teaching an algorithm how to play chess. In the first step, we define the environment-the chess board and all possible moves.
Then the algorithm starts by making random moves and earns positive or negative rewards. When a reward is positive, it means that the move was successful, and when it is negative, it means that it has to avoid such moves in the future. After thousands of games, it finishes by knowing all the best sequences of moves. In real-life applications, many hybrid approaches are widely used, based on available data and the complexity of problems. Machine learning is a basic tool to add intelligence and extract valuable insights from social media data.
There exist other widespread concepts that are used for social media analysis: Text Analytics, Natural Language Processing, and Graph Mining.
The first notion allows to retrieve non trivial information from textual data, such as brands or people names, relationships between words, extraction of phone numbers, URLs, hashtags, and so on.
Natural Language Processing is more extensive and aims at finding the meaning of the text by analyzing text structure, semantics, and concepts among others. Social networks can also be represented by graph structures. The last mining technique enables the structural analysis of such networks.
These methods help in discovering relationships, paths, connections and clusters of people, brands, topics, and so on, in social networks. In our analysis, we will use some libraries that enable flexible data structures, such as pandas and sframe. The advantage of sframe over pandas is that it helps to deal with very big datasets which do not fit RAM memory. We will also use a pymongo library to pull collected data from MongoDB, as shown in the following code:. Visualization is one of the most important parts of data science process.
It helps in the initial steps of analysis, for example, to choose the right method or algorithm according to the structure of the data, but essentially to present in a simple way the obtained results.
Python ecosystem proposes many interesting libraries, such us plotly , matplotlib , and seaborn , among others, as follows. In the following chapters, we will focus on three of them. Once you set up the whole environment, you can create your first project.
If you use Linux or macOS machine, you can open a terminal and go to your working directory. Then, use the following command to create your project directory:. At the same time, we initialize an empty repository in Git in terminal on Linux or macOS, or in Git bash on Windows :. Now it's time to start working on a real project.
The avalanche of social network data is a result of communication platforms being developed for the last two decades. These are the platforms that evolved from chat rooms to personal information sharing and finally, social and professional networks.
These platforms collectively have reach of more than a billion individuals across the world, sharing their activities and interaction with each other.
Sharing of their data by these media through APIs and other technologies has given rise to a new field called social media analytics. This has multiple applications, such as in marketing, personalized recommendations, research, and societal.
Python is one of the most programming languages used for these techniques. However, manipulating the unstructured-data from social networks requires a lot of precise processing and preparation before coming to the most interesting bits. In the next chapter, we will see the way this data from social networks can be harnessed, processed, and prepared to make a sandbox for interesting analysis and applications in the subsequent chapters.
Siddhartha Chatterjee is an experienced data scientist with a strong focus in the area of machine learning and big data applied to digital e-commerce and CRM and social media analytics. Since , he has worked at OgilvyOne Worldwide, a leading global customer engagement agency in Paris, as a lead data scientist and set up the social media analytics and predictive analytics offering. From to , he was a senior data scientist and head of semantic data of Publicis, France.
Michal Krystyanczuk is the co-founder of The Data Strategy, a start-up company based in Paris that builds artificial intelligence technologies to provide consumer insights from unstructured data.
Previously, he worked as a data scientist in the financial sector using machine learning and big data techniques for tasks such as pattern recognition on financial markets, credit scoring, and hedging strategies optimization. He specializes in social media analysis for brands using advanced natural language processing and machine learning algorithms. He is an enthusiast of cognitive computing and information retrieval from different types of data, such as text, image, and video.
Build machine and deep learning systems with the newly released TensorFlow 2 and Keras for the lab, production, and mobile devices. Untangle your web scraping complexities and access web data with ease using Python scripts. Publication date: July Apple Macos And Ios Machine Learning And Knowledge Build Chatbot Interactions: Responsive, Pci Dss: An Integrated However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them.
This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business. Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs.
You will also perform web scraping and visualize data using Scrappy and Beautifulsoup. Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.
0コメント