In one of my previous articles, we discussed why one might want to extract data from the Internet. Also, I have created a small article that demonstrates how to extract data for the real estate agency; Now, we would take a step in a slightly different direction — our new target is Google.
Imagine we’re a marketing agency trying to work for a company and promote it in google search results. In this situation, it’s important to trace the results of marketing activities to understand how different marketing activities influence clients' positions in search results.
Note: According to the robots.txt available on google.com the crawling of the search results is generally not desired. However, we also see that there are some exceptions for some bots (like tweeter, etc). Quick search on how to scrape google is showing dozens of similar tutorials but for other languages, so why not show the one for Elixir? …
In one of my previous articles, we discussed why one might want to extract data from the Internet. Now it’s time to be more specific to showcase one of the possible use cases — “The real estate market.”
A bit about the real estate market
The real estate market is a highly competitive market, with many companies trying to approach the same client. As soon as the amount of properties advertised through the internet is constantly growing, the importance of publicly available data is also playing a big role. …
This article will describe what web scraping is, how it’s different from web crawling, and finally, who and why uses it.
For not to have one meaning is to have no meaning, and if words have no meaning our reasoning with one another, and indeed with ourselves, has been annihilated— Aristotle
Web Scraping is a process of extracting data structured data from a given web page (or a group of pages).
Crawling is a process of visiting (literally crawling) web pages which potentially will be extracted during the Web Scraping process.
These two processes are normally complimenting each other, though to extract data, it’s required to visit a page. To discover a page, it’s usually required to extract the data from the already visited page. …
I will demonstrate how Crawly can extract data that is behind a login (this is not a trivial problem to solve without the right tools).
This might be required in several cases. For example when you want to extract profile data from another social network. Or for example, when you want to extract and analyze special offers that are only available to members of a specific website.
In this article, we will show how to set up a crawler that will extract data from http://quotes.toscrape.com/ (a special website built to experiment with web crawling), this site has some information that is only available if you can authenticate your requests. …
In one of our previous articles, we were discussing how to perform web scraping with the help of Elixir. In this article, I will take another step into the interesting world of data scraping and will investigate how to handle the interactivity of the modern web.
One of the ways of solving this problem is to simulate an asynchronous request with an additional POST. Although, in my opinion, this approach adds, even more, complexity and fragility to your code. …