Image for post
Image for post

In one of my previous articles, we discussed why one might want to extract data from the Internet. Also, I have created a small article that demonstrates how to extract data for the real estate agency; Now, we would take a step in a slightly different direction — our new target is Google.

Imagine we’re a marketing agency trying to work for a company and promote it in google search results. In this situation, it’s important to trace the results of marketing activities to understand how different marketing activities influence clients' positions in search results.

Note: According to the robots.txt available on google.com the crawling of the search results is generally not desired. However, we also see that there are some exceptions for some bots (like tweeter, etc). Quick search on how to scrape google is showing dozens of similar tutorials but for other languages, so why not show the one for Elixir? …


Image for post
Image for post
Monitoring real estate data with Elixir. (Picture was taken from https://pixabay.com/)

In one of my previous articles, we discussed why one might want to extract data from the Internet. Now it’s time to be more specific to showcase one of the possible use cases — “The real estate market.”

A bit about the real estate market

The real estate market is a highly competitive market, with many companies trying to approach the same client. As soon as the amount of properties advertised through the internet is constantly growing, the importance of publicly available data is also playing a big role. …


Image for post
Image for post
Scraping spider is about to extract data from the web (Image from Pixabay)

This article will describe what web scraping is, how it’s different from web crawling, and finally, who and why uses it.

Starting from definitions

For not to have one meaning is to have no meaning, and if words have no meaning our reasoning with one another, and indeed with ourselves, has been annihilated— Aristotle

Web Scraping is a process of extracting data structured data from a given web page (or a group of pages).

Crawling is a process of visiting (literally crawling) web pages which potentially will be extracted during the Web Scraping process.

These two processes are normally complimenting each other, though to extract data, it’s required to visit a page. To discover a page, it’s usually required to extract the data from the already visited page. …


Image for post
Image for post
Entering a private area might be a challenge. Well, not for us.

In our previous articles, we have covered the topic of web scraping with Elixir, and have shown how Crawly can simplify this work if you need to render dynamic content.

I will demonstrate how Crawly can extract data that is behind a login (this is not a trivial problem to solve without the right tools).

This might be required in several cases. For example when you want to extract profile data from another social network. Or for example, when you want to extract and analyze special offers that are only available to members of a specific website.

In this article, we will show how to set up a crawler that will extract data from http://quotes.toscrape.com/ (a special website built to experiment with web crawling), this site has some information that is only available if you can authenticate your requests. …


Image for post
Image for post
Image by Gerd Altmann from Pixabay

Introduction

In one of our previous articles, we were discussing how to perform web scraping with the help of Elixir. In this article, I will take another step into the interesting world of data scraping and will investigate how to handle the interactivity of the modern web.

Since the number of websites using Javascript for content rendering grows, the demand for extracting data from those also grows. Interactivity itself adds some complexity to the data extraction process, as it’s not possible to get the full content with a regular request made from a command-line HTTP client.

One of the ways of solving this problem is to simulate an asynchronous request with an additional POST. Although, in my opinion, this approach adds, even more, complexity and fragility to your code. …

About

Oleg Tarasenko

Software developer at Erlang Solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store