Hey everyone! I am delighted that my article about scraping Google with Elixir and Crawly has gathered 200 claps! And as I have promised, we will take extra steps to follow google’s pagination and get more meaningful data from the results. So let’s do it practically!
Setting the task
Many online businesses rely on Search Engines as their primary revenue stream, especially eCommerce websites, tools, and service providers. It’s known that 80% of the clicks are taken by the website, which holds the first position in the search results.
Usually, digital companies would start from the paid search advertisement to…
In one of my previous articles, we discussed why you might want to scrape data from the Internet. We have shown how to extract data from multiple websites to organize a price monitoring solution for a real estate agency.
Here we want to show how to create a web scraper even if you don’t know how to program it!
As in the previous example, we will be interested in data from one of the Swedish real estate websites. For this example, we will take the Hemnet website. And we want to get: URLs, prices, addresses, images. …
In one of my previous articles, we discussed why one might want to extract data from the Internet. In another article, we were expanding the case and demonstrating how to use scraping as a part of the price monitoring solution. Now, let's dig a bit toward the area of journalism. It might sound a bit unexpected, but journalists rely on web scraping quite a bit these days. I may refer to the following articles as examples:
From our side, we can think of the following use case: Let’s imagine you’re running a local news website, and you’re…
In one of my previous articles, we discussed why one might want to extract data from the Internet. Also, I have created a small article that demonstrates how to extract data for the real estate agency; Now, we would take a step in a slightly different direction — our new target is Google.
Imagine we’re a marketing agency trying to work for a company and promote it in google search results. In this situation, it’s important to trace the results of marketing activities to understand how different marketing activities influence clients' positions in search results.
Note: According to the robots.txt…
In one of my previous articles, we discussed why one might want to extract data from the Internet. Now it’s time to be more specific to showcase one of the possible use cases — “The real estate market.”
A bit about the real estate market
The real estate market is a highly competitive market, with many companies trying to approach the same client. As soon as the amount of properties advertised through the internet is constantly growing, the importance of publicly available data is also playing a big role. …
This article will describe what web scraping is, how it’s different from web crawling, and finally, who and why uses it.
For not to have one meaning is to have no meaning, and if words have no meaning our reasoning with one another, and indeed with ourselves, has been annihilated— Aristotle
Web Scraping is a process of extracting data structured data from a given web page (or a group of pages).
Crawling is a process of visiting (literally crawling) web pages which potentially will be extracted during the Web Scraping process.
These two processes are normally complimenting each other, though…
I will demonstrate how Crawly can extract data that is behind a login (this is not a trivial problem to solve without the right tools).
This might be required in several cases. For example when you want to extract profile data from another social network. Or for example, when you want to extract and analyze special offers that are only available to members of a specific website.
In one of our previous articles, we were discussing how to perform web scraping with the help of Elixir. In this article, I will take another step into the interesting world of data scraping and will investigate how to handle the interactivity of the modern web.
One of the ways…
Software developer at Erlang Solutions.