In order to remain competitive, and viable, businesses now have to deal with a vast and rapidly growing sea of what has been termed ‘Big Data’. They need to be able to transform this raw data, often in real-time, into more meaningful insights about their markets, customers, competitors, and to measure and manage their performance more accurately using using techniques such as ‘Data Analytics’. In many cases this represents a paradigm shift from their comfort zone of approaches based more on experience, guesswork, or painstakingly constructed models of reality.
It used to be that Big Data and Data Analytics were the preserve of large global corporations but consider this definition for Big Data: When volume, velocity and variety of data exceed an organization’s storage or compute capacity for accurate and timely decision making. Big Data is a relative term. Every organization will rapidly reach a point where the volume, variety and velocity of their data will be something that they have to address.
This is probably one of our longest posts as it covers a lot of areas in relation to Big Data and Data Analytics. The intention is to provide a broad base of knowledge to help you understand how it can potentially help your business. As with our other posts we have originally issued this article via our website blog section ( ) and we welcome you to take a look at our other resources here, and to contact us to discuss your particular challenges and requirements.
A Brief History
Big Data is far from being a new concept, we just gave it a new name a few years back. Probably the earliest examples of Big Data date back to Mesopotamia 7,000 years ago when accounting practices were introduced to record the growth of crops and herds. The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5,000 characters per second. Thereby reducing the task from weeks to merely hours.
In 1965 the United Stated Government decided to build the first data centre to store over 742 million tax returns and 175 million sets of fingerprints by transferring all those records onto magnetic computer tape that had to be stored in a single location. The project was later dropped out of fear for ‘Big Brother’, but it is generally accepted that it was the beginning of the electronic data storage era.
In 1989 British computer scientist Tim Berners-Lee invented the World Wide Web. He wanted to facilitate the sharing of information via a ‘hypertext’ system. As of the ‘90s the creation of data is catalyzed as more and more devices are connected to the internet. In 1995 the first super-computer was built, which was able to do as much work in a second than a calculator operated by a single person can do in 30,000 years.
It was only in 2005 that Roger Mougalas from O’Reilly Media first coined the term Big Data, only a year after they created the term Web 2.0. It referred to Big Data as a large set of data that is almost impossible to manage and process using traditional business intelligence tools.
A Few Facts
- The number of bits of information stored in the digital universe is thought to have exceeded the number of stars in the physical universe
- If you burned all the data created in just one day onto DVD’s and stacked them on top of each other it would reach to the moon and back
- Every day we generate as much information as we did from the beginning of time to 2003
- 90% of existing data has been created in the past two years alone meaning more data has been created in the past 24 months than our entire history
- The NSA is thought to analyze 1.6% of all global internet traffic at around 30 petabytes (30 million gigabytes) per day, but quite staggeringly less than 1% of all global data is ever analyzed.
- Every minute we send 204 million e mails, 1.8 million Facebook likes and we send 278 thousand Tweets
- Facebook users share 30 billion pieces of data every day
- There are 1.2 billion smartphones in the world
- By 2020 we will have over 6.1 billion smartphone users globally and this will be greater than the number of fixed line subscriptions
- The Internet of Things connected devices will rise from 13 billion to 50 billion by 2020
- 570 new websites arrive every minute of every day
- The Internet of Things (IoT) will generate in excess of $300 billion revenue by 2020.
- By the year 2020 about 1.7 megabytes of new information will be produced every second for every human being and we will be dealing with 40 Zettabytes of data
- Data Centres now occupy an area of land equal to 60,000 football fields
- AT&T is believed to hold the worlds largest volume of data in a single DBMS at 312 terabytes with 2 trillion rows
- Estimates show that large companies with at least 10,000 employees store an average of 200 terabytes in data – and that figure climbs daily. That’s more data than all the information that was produced by humanity up to the 21st century
- By 2015 the demand for data and analytics resources will reach 4.4 million jobs globally, but only one-third of those jobs will be filled.
What is Big Data and Data Analytics?
Big Data describes a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.
- Unstructured data: Information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.
- Multi-structured data: Data formats and types which can be derived from interactions between people and machines, such as web applications or social networks.
Data Analytics is a term used for the techniques used to interrogate Big Data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The findings from this can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. It uses techniques such as the simultaneous application of statistics, computer programming, operations research and data visualization to help communicate insight.
Firms may apply analytics to business data, to describe, predict, and improve business performance. Specifically, areas within analytics include predictive analytics, enterprise decision management, retail analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, sales force sizing and optimization, price and promotion modeling, predictive science, credit risk analysis, and fraud analytics. Since analytics can require extensive computation (see big data), the algorithms and software used for analytics harness the most current methods in computer science, statistics, and mathematics.
Big Data analytics is a rapidly evolving science and now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences.
Key Uses of Big Data and Data Analytics
There are many applications for Big Data and Data Analytics and the following three highlight just a few of these to help explain their importance.
Market and Customer Understanding
Businesses can employ dynamic pricing to allow them to offer different prices at different times in different places to different consumers. This allows them to optimize revenue by incorporating real-time datasets, including supplier and inventory data, models of consumer likelihood to purchase and financial forecasts.
Businesses can monitor the web and social media for mentions of their brand by consumers. They can review the analytics for their own digital assets (website, microsites, blog, social media, and third-party signals) and use this information to identify potential product extensions.
Marketers can use Big Data to determine the optimal channels to place their products. Big Data can also be harnessed to test and predict likely consumer reaction to various marketing messages
Internet of Things (IoT)
These are everyday objects embedded with electronics, software, sensors, and network connectivity, which enables them objects to collect and exchange data.
There are mainly three types of technologies that enable IOT:
- Radio Frequency Identification (RFID) and Near-Field Communication (NFC): In the 2000s, RFID was the dominant technology. In more recent years NFC has become more prevalent
- Optical tags and Quick Response Codes (QCR): Phone cameras decode QCR using image-processing techniques.
- Bluetooth Low Energy (BLE) – All newly releasing smartphones have BLE hardware in them. Tags based on BLE can signal their presence at a power budget that enables them to operate for up to one year on a lithium coin cell battery.
Each thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure. Experts estimate that the IoT will consist of almost 50 billion objects by 2020 and represents $14.4 trillion in value to companies and industries across the globe.
Sensors on a commercial aircraft can generate up to 20 terabytes of data an hour. Car manufacturers are incorporating technologies that continually reporting back data collected from onboard car sensors and dealer service systems. In the UK, Thames Water combines data from embedded connected sensors on its pipes and treatment facilities with data from a wide range of business systems. This enables them to track and predict the cost risk and performance of its assets resulting in faster response times, saving money and improving customer service.
IT Operations Analytics (ITOA) (also known as Advanced Operational Analytics or IT Data Analytics) can provide the necessary insight to identify meaningful information buried in piles of complex data and can help IT operations teams to proactively determine risks, impacts, or the potential for outages that may come out of various events that take place in the environment (e.g., application and infrastructure changes). This enables a new way for operations to proactively manage IT system performance, availability, and security in complex and dynamic environments with less resources and greater speed.
ITOA contributes both to the top and bottom line of any organization by reducing operations costs, and increasing business value through greater user experience and reliability of business transactions. In addition, conventional analytics and problem-solving responses generally only respond to events that have occurred. The new generation ITOA solutions can use historical and real-time data to build predictive models of future behavior and help to mitigate risks.
Big Data and Data Analytics provides a set of powerful tools to business and should not be seen as a ‘magic bullet’! Leadership teams who can build comprehensive strategies and plans to apply these technologies and techniques effectively, and to ensure that their teams are involved and enabled will be able to leverage the benefits.
A recent employers survey by the Tech Partnership shows that Big Data Analytics represents the most significant area of IT skills gaps. While data may be part of the answer to the productivity gap it also appears that barriers to accessing analytical talent are preventing businesses from fully harnessing their potential.
The problem is finding people with the right mix of skills. The data scientists who combine technical skills, mathematical, analytical and industry knowledge, along with the business acumen and soft skills to turn data into value for employers are very hard to find and they are starting to be referred to as ‘unicorns’
Data analysts must be able to ask the right business questions, analyze the resulting data effectively, and understand the appropriate statistical techniques in order to harness the multitudes of unstructured data. Data analysts must also be able to apply a wide range of skills when extracting and analyzing data, and presenting the results to executive management or departmental managers, such as business acumen, presentation skills, database skills, analysis skills, and often coding abilities.
Big Data needs highly performant technologies to efficiently process large quantities of data in short elapsed times. The tools available to handle the volume, velocity, and variety of big data have improved greatly in recent years and, in general, are not now prohibitively expensive with much of the software available as open source. Key techniques include A/B testing, crowdsourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualization. Hadoop, the most commonly used framework, combines commodity hardware with open-source software. It takes incoming streams of data and distributes them onto cheap disks. Hadoop also provides tools for analyzing the data.
Big data has increased the demand of information management specialists in that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics.
The availability of Big Data, low-cost commodity hardware, and new information management and analytic software have produced a unique moment in the history of data analysis. The convergence of these trends means that businesses of all sizes now have the capabilities to analyze vast data sets quickly and cost-effectively. This represents a genuine leap forward and a clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and profitability.
Big Data and Data Analytics is a rapidly developing science and the next phase will be Predictive Analytics. Rather than react to insights gained through data analysis, businesses will use a combination of real-time, historical and third-party data to build progressively more sophisticated forecasts of what might happen in their business.
Research has indicated that predictive maintenance can generate savings of up to 12% over scheduled repairs, leading to a 30% reduction in maintenance costs and a 70% cut in downtime from equipment breakdowns. For a manufacturing plant or a transport company, achieving these results from data-driven decisions can add up to significant operational improvements and savings opportunities.
These new approaches will also enable businesses to capitalize on opportunities to market products to customers, such as targeting prospective customers after key events. Forrester analysts Rowan Curran and Mike Gualtieri in a Forrester Wave research paper entitled Big Data Predictive Analytics Solutions, Q2 2015 stated that predictive analytics have never been more relevant and easier to use, and offer ways for forward-thinking enterprises to succeed in competitive sectors.