

Introducing the second part of my research about Big Data Analytics. Please feel free to provide me a feedback on both content and English grammar (since English is not my native language). I would really appreciate it!
What is Big Data?
Today, information technology is producing more and more data. According to IBM [12], every day we create 2.5 quintillion (2.5 x 1018) bytes of data and this rapid growth brings new challenges to data analytics. Due to the lower cost of storage and other technologies more and more companies can afford to store, process, and analyze data that was previously ignored – data such as social media content, web logs, climate information, digital pictures, cell phone GPS signals, information from sensors, to name just a few.
We reached the time when traditional data management technologies limit us from storing, processing, and analyzing data due to its large size, high rate at which data arrives, lack of structure, and its quality/trustworthiness. These limits are discussed later as 4Vs – volume, velocity, variety, and veracity. The important thing to realize at this point is – Big Data is not just about large sets of data. Big Data is about adopting new technologies that enable the storage, processing, and analysis of data what was previously ignored due to the limitations of traditional data management technologies. [13] Big data is about dealing with unstructured data which takes around 80 % of all world’s data. [14] Big Data is about dealing with large amounts of data and streaming data and being able to figure out what’s important and what’s not important.
Unstructured data
Frank Ohlhorst says in his book Big Data Analytics: Turning Big Data into Big Money [15] “Big Data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can no longer effectively handle either the size of the data set or the scale and growth of the data set. In other words, the data set has grown so large that it is difficult to manage and even harder to garner value of it. The primary difficulties are the acquisition, storage, searching, sharing, analytics, and visualization of data.”
Big Data allows us to process and analyze all available data, rather than analyzing a sample of data. It allows us to process and analyze all types of data, even unstructured data such us posts to social media sites. It allows us to increase the rate of analysis in order to generate more accurate and timely insight to our business. Altogether, Big Data technologies and techniques bring together a large volume and variety of data and allow us to deliver new level of business insight, customer service, and revenue opportunities. It enhances analytics and complement traditional business analytics methods and technologies. It’s not Big Data or traditional analytics, it’s about how to do they work together. Big Data complements Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), and Decision Support Systems (DSS). It’s not a replacement.
Big Data defined by four Vs
Big Data is typically defined by four Vs: volume, velocity, variety, and veracity. These four Vs represent challenges Big Data deals with: [13] [15]
Volume – In 2009, the world had about 0.8ZB of data; in 2010, we crossed the 1ZB marker, and at the end of 2011 that number was 1.8ZB. Six or seven years from now, the number is estimated to be around 35ZB. [13] Imagine how much valuable business information must be hidden in this amount of data. On the other hand, this tremendous growth brings new challenges to analytics. Challenges that can be address only by adopting new technologies and techniques.
Velocity – Velocity is defined as the rate at which data arrives at the enterprise and is processed to be well understood. Being able to understand and respond to data in-motion puts you in a position of power. IBM considers velocity to be the most significant capability of Big Data. “The more time that passes, the less the potential competitive advantage you have, and the less return on data (ROD) you’re going to experience. We feel this ROD metric will be one that will dominate future IT landscape in a Big Data world: we’re used to talking about return on investment (ROI), which talks about the entire solution investment; however, in a Big Data world, ROD is a finer granularization that helps fuel future Big Data investments.” [13] Examples of areas where reacting faster gives you an advantage can be the health of a traffic system, the health of a patient, the health of a network infrastructure, or the health of a loan portfolio.
Variety – Big Data is really about trying to capture all of the data that can help better decision making. Most of the data is semistructured or unstructured, such as images, freeform text, or sound. Imagine customer call center and that you are being able to detect the change in tone of a frustrated client as well as the content of the call. Having this insight can help your company to reduce customer churn.
Veracity – Veracity refers to the quality of trustworthiness of the data. Big Data gives you the opportunity to analyze all of the data, but on the other hand the data contains lots of noise. Company has to be able to transform the data into trustworthy insight and discard noise. Example of untrustworthy noise can be data generated by spambots (the 2012 presidential election in Mexico with fake Twitter accounts is a good example).
These four Vs of Big Data demonstrates the basic challenges Big Data tries to address. The challenges that cannot be adequately addressed by traditional data management technologies.
Conclusion
We are experiencing explosion of the data – from petabytes to zettabytes. The amount of data getting generated every day is unbelievable and most of that data is unstructured data. We know how to deal with structured data, we got massive data warehouses, data marts, but the world is evolving and we need to face new challenges. Five or ten years ago we worried about reporting (business intelligence), but today it’s about much more. It’s about getting better insights into our businesses, it’s about having the knowledge which helps us differentiate and overcome our competitors. It’s about being able to better listen to our customers, to know what they think about our products and services, to know how to allocate resources optimally, to know how to manage risk in real time, to find insight in a noisy set of sources. It’s all about valuable information hidden in data we weren’t able to analyze by traditional approaches.
Bibliography
[12] | IBM, “Bringing big data to the enterprise,” [Online]. Available: http://www-01.ibm.com/software/data/bigdata/. [Accessed 24 January 2013]. |
[13] | P. C. Zikopoulos, D. deRoos, K. Parasuraman, T. Deutsch, D. Corrigan and J. Giles, Harness the Power of Big Data: The IBM Big Data Platform, McGraw-Hill, 2013. |
[14] | R. LeBlanc, “Why Big Data? Why Now?,” in IBM Information on Demand 2011, Las Vegas, 2011. |
[15] | F. J. Ohlhorst, Big Data Analytics: Turning Big Data into Big Money, New Jersey: John Wiley & Sons, Inc., 2013. |