This article takes you through an example how to query, transform and visualize data from social media. We are going to collect tweets from twitter, store them in HDFS (Hadoop distributed file system) and use JAQL and Java MapReduce application to manipulate and transform the data. Finally, we will visualize the results using a spreadsheet-style […]
Jaql is one of the languages that helps to abstract complexities of MapReduce programming framework within Hadoop. It’s a loosely typed functional language with lazy evaluation (it means that Jaql functions are not materialized until they are needed). Jaql’s data model is based on JSON Query Language, it’s a fully expressive programming language (compared to […]
Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and MapReduce. HDFS is a distributed, scalable, Java-based file system that allows to store large volumes of unstructured data. MapReduce, which is covered in the next chapter, is a framework for performing calculations on the data stored in HDFS. Since Hadoop is a system running […]
What is Hadoop? Hadoop is an open source project that offers a platform to work with Big Data and helps to overcome volume and variety challenges . From volume perspective it allows to process, store and analyze massive amounts of data. From variety perspective it allows to work with mixture of structured and unstructured data, which […]