Unless you have been living in a dark cave or deep inside a rain forest, the term Big Data should be familiar. Some business executives even use it as a prestigious status “we deal with Big Data” as if it is an indicator to the size and wealth of their business. Consultants, on the other hand, like to promote data collection and analytics: data is the solution! Is it? Data is also the problem.
What is Big Data? It is just a buzzword for the good old data that has outgrown the processing, storage, acquisition and transportation capacity of existing systems. A business that used to deal with few thousand rows in Excel may see Big Data in data sets exceeding few hundred millions of records. Databases that are several terabytes in size are Big Data to most enterprises today but may not be so in the next few years.
Big Data comprises both structured and unstructured, or loosely structured, data and may reside in files as text format. The volume, speed of generation and variety of Big Data pause great challenges to business enterprises but hide promising potentials at the same time.
Influenced by business trends and fierce marketing, fueled by competition and assisted by the relatively low cost of data acquisition and storage, businesses tend to collect all sorts of data thinking that super powerful business intelligence and analytics tools will be able to extract gold from these piles of data ores. Unfortunately, most businesses find very low ROI in such endeavors and many get trapped deep inside their data mines and spend even more on rescue operations.
Unless one is very naive to get sold on intelligent systems that can extract knowledge and derive business insights from piles of raw data, the journey from data acquisition to actionable knowledge is paved with hard work, hard earned money, well defined processes, true leadership, system design and implementation. There is no one size fits all recipe for data collection, processing and analytics. Failing to realize this, for lack of knowledge or lack of other options, is one of the root causes of Big Data projects going south.
Collecting more (or less) data than your business need may turn your Big Data project into a total loss – either due to additional transportation, processing and storage power required or lack of context data to use in analytics and mining models. For example, a large store selling food items may never need to know the color of customers’ cars (unless contracted by an automobile industry) and a car dealership need not collect data on customers’ eating habits. Collecting more data increases maintenance and processing overhead, so more is not always better.
To be able to calculate the ROI for a Big Data project, you need to establish goals and metrics first. Measurements and indicators are equally important. Data is not an end itself; it’s a means for deriving business insights, improving processes, achieving better customer service, increasing efficiency and improving your competitive edge. You have to strike a balance in data collection and retention – data management and maintenance is inevitable and it does carry a considerable price label. There are also rules and regulations to be observed and these may change across geographical regions, including user privacy. A good ROI is not necessarily financial; flight data and health data may improve safety and well being which are equally valuable.
If a data model is invalid or data is inaccurate or incomplete, analytics tools will not be able to turn mess into good insights and Big Data will backfire. Logistics is probably a good example. According to Trax Technologies, 30% of the invoices are problematic leading to major errors and losses in settlements. In some industries, knowledge should be available in real time and even the best and most comprehensive data will be useless if those nice dashboards take ages to render at which times decisive decisions have been taken or disasters are too late to prevent.
Today, Big Data is available as a commodity either from social media platforms through open APIs or just for sale by data providers, usually with low quality. Many entrepreneurs are trying to derive added value from such data to sell later as market studies, mostly predicting the future. They can make some money if they manage to sell their reports but their profits are considered Big Data losses for those who buy them – future prediction is not that simple and source data is usually dirty and not totally authentic leading to misinformed decision making.
As a business, if you are not making the best out of small data – the core operational data for your business, you are likely to rock the Big Data boat. Your actual customer feedback is more important than online surveys and ratings, and randomly inspecting your QA line is usually more accurate than data mining social media comments. Small data has higher ROI rates for most businesses.
This is not saying that Big Data is irrelevant or unprofitable. However, many Big Data projects end up as white elephants and not cash cows. With Big Data comes a long laundry list of data governance that could either produce a cash stream or end up as a data tsunami.