I had a chance to work on a predictive analytics project for a US car manufacturer XXX (I will keep name of the company confidential). The goal of the project was to evaluate the feasibility to use Big Data analysis solutions for manufacturing to solve different operational needs. The objective was to determine a business […]
BigSheets is a spreadsheet-style tool for business analysts provided with IBM InfoSphere BigInsights, a platform based on the open source Apache Hadoop project. BigSheets enables non-programmers to iteratively explore, manipulate, and visualize data stored in your distributed file system. This article demonstrates how to analyze and visualize data we collected from Twitter in my previous […]
This article takes you through an example how to query, transform and visualize data from social media. We are going to collect tweets from twitter, store them in HDFS (Hadoop distributed file system) and use JAQL and Java MapReduce application to manipulate and transform the data. Finally, we will visualize the results using a spreadsheet-style […]
Jaql is one of the languages that helps to abstract complexities of MapReduce programming framework within Hadoop. It’s a loosely typed functional language with lazy evaluation (it means that Jaql functions are not materialized until they are needed). Jaql’s data model is based on JSON Query Language, it’s a fully expressive programming language (compared to […]
Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and MapReduce. HDFS is a distributed, scalable, Java-based file system that allows to store large volumes of unstructured data. MapReduce, which is covered in the next chapter, is a framework for performing calculations on the data stored in HDFS. Since Hadoop is a system running […]
What is Hadoop? Hadoop is an open source project that offers a platform to work with Big Data and helps to overcome volume and variety challenges . From volume perspective it allows to process, store and analyze massive amounts of data. From variety perspective it allows to work with mixture of structured and unstructured data, which […]
Since I’m building my expertise in Big Data Analytics, I decided to install BigInsights on my computer (into virtualized environment). IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. Apache Hadoop is the open source software framework, used to reliably managing large volumes of structured and unstructured data. BigInsights Basic Edition is available for […]