Unified domain-specific language for collecting and processing data of social media

Published:

Cite: Butakov, N., Petrov, M., Mukhina, K., Nasonov, D. & Kovalchuk, S. (2018). Unified domain-specific language for collecting and processing data of social media. Journal of Intelligent Information Systems, 51(2), 389-414.


Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different users and individual crawling scenarios, that can range from a simple query to a complex workflow containing multiple steps and requiring data from different networks to be collected. To address this problem, our paper proposes an approach based on a developed domain specific language (DSL) and architecture of distributed crawling system. The DSL has a declarative style that requires the user to define the description of needed data and based on an ontological model of social networks and the essential crawling techniques. Thus, the crawling system can be applied to collect the data from different online social networks within complex workflows along with the exploitation of various crawling methods implemented in a distributed computing environment. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.


[online]