I had a chance to work on a predictive analytics project for a US car manufacturer XXX (I will keep name of the company confidential). The goal of the project was to evaluate the feasibility to use Big Data analysis solutions for manufacturing to solve different operational needs. The objective was to determine a business […]
BigSheets is a spreadsheet-style tool for business analysts provided with IBM InfoSphere BigInsights, a platform based on the open source Apache Hadoop project. BigSheets enables non-programmers to iteratively explore, manipulate, and visualize data stored in your distributed file system. This article demonstrates how to analyze and visualize data we collected from Twitter in my previous […]
This article takes you through an example how to query, transform and visualize data from social media. We are going to collect tweets from twitter, store them in HDFS (Hadoop distributed file system) and use JAQL and Java MapReduce application to manipulate and transform the data. Finally, we will visualize the results using a spreadsheet-style […]
Jaql is one of the languages that helps to abstract complexities of MapReduce programming framework within Hadoop. It’s a loosely typed functional language with lazy evaluation (it means that Jaql functions are not materialized until they are needed). Jaql’s data model is based on JSON Query Language, it’s a fully expressive programming language (compared to […]
InfoSphere Streams radically extends the state-of-the-art in big data processing; it’s a high-performance computing platform that allows users to develop and reuse applications to rapidly ingest, analyze, & correlate information as it arrives from thousands of real-time sources. You can download the 90-day trial version here. Here is the list of products I used in […]
Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and MapReduce. HDFS is a distributed, scalable, Java-based file system that allows to store large volumes of unstructured data. MapReduce, which is covered in the next chapter, is a framework for performing calculations on the data stored in HDFS. Since Hadoop is a system running […]
What is Hadoop? Hadoop is an open source project that offers a platform to work with Big Data and helps to overcome volume and variety challenges . From volume perspective it allows to process, store and analyze massive amounts of data. From variety perspective it allows to work with mixture of structured and unstructured data, which […]
“Early analytics adopters are extending their leadership. If you want to lead, you have to know analytics, and if you want to be on the forefront of analytics, you have to put your arms around Big Data.” [13] Due to Big Data analytics, companies are able to gain a more complete understanding of their business, customers, […]
What is Big Data? Today, information technology is producing more and more data. According to IBM [12], every day we create 2.5 quintillion (2.5 x 1018) bytes of data and this rapid growth brings new challenges to data analytics. Due to the lower cost of storage and other technologies more and more companies can afford to […]