Big data has created a significant shift in enterprise technology and stands to transform
much of what the modern enterprise is today. Digital data is everywhere and global data is growing at
40% per year, 90% of the data has been created in the past two years alone. Companies capture millions and trillions of bytes of information about their customers, suppliers,
and operations, and millions of networked sensors are being embedded in the physical world in
devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating
data.
This data comes from everywhere: from sensors used to gather climate information,
banking transactions, financial market data, transaction records of online purchases,
posts to social media sites, digital pictures and videos posted online, and from
cell phone GPS signals to name a few.
But what exactly is big data? Big data encompasses a complex and large set of diverse structured
and unstructured datasets that are difficult to process using traditional data management practices
and tools.
Big data is too big, moves too fast, or doesn't fit the structures of your database architectures.
Big data spans 3 v's
Variety – Big data extends beyond structured data(OLTP systems) including unstructured data of all varieties: text, audio, video, click streams, log files and more.
Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
VARIETY
Structured
Unstructured
Semistructured
All the above
VELOCITY
Batch
Near time
Real time
Streams
VOLUME
Terabytes, Petabytes
Records
Transactions
Tables, files
Why Big Data
Big data is more than a challenge;
It is an opportunity
To understand trends and consumer sentiments in real time.
To find insight in new and emerging types of data.
To make your business more agile &
To answer questions that, in the past, were beyond reach.
Big Data Processing - Types
Big Data Processing Is Uniquely Suited For Some Types Of Data
Traditional object-relational techniques, supplemented by VLDB technology, will continue to meet most data management needs. Organizations must examine potential size and growth of data sources closely when evaluating new usage opportunities. Besides the potential to grow large, certain characteristics of data make it suitable for horizontally scalable, distributed processing:
Poorly structured, lightly structured, or unstructured data. Big data processing technologies are particularly well-suited to large volumes of lightly structured data such as web pages, blogs, and messaging protocols (email, instant messaging, and microblogs). This type of data adapts well to a hierarchical schema with sparsely populated attributes, which is the basis of many big data advancements driven by Web 2.0 companies such as Google, Facebook, and Yahoo.
Simply structured data streams. Data generated from a sensor network, such as RFID or medical equipment, can be accessed as a stream of data representing a simple structure of measured values. Distributed processing is well-suited to handling sensor network traffic, making big data processing technology a natural extension.
Binary or character encoded file data. Images and audio files are often best represented by a hierarchical database schema in which individual records are stored as objects. The structure of these systems closely parallels that of distributed networks, making this type of data a good fit for big data processing. Social networking and search companies are at the forefront of this usage scenario.
much of what the modern enterprise is today. Digital data is everywhere and global data is growing at
40% per year, 90% of the data has been created in the past two years alone. Companies capture millions and trillions of bytes of information about their customers, suppliers,
and operations, and millions of networked sensors are being embedded in the physical world in
devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating
data.
This data comes from everywhere: from sensors used to gather climate information,
banking transactions, financial market data, transaction records of online purchases,
posts to social media sites, digital pictures and videos posted online, and from
cell phone GPS signals to name a few.
But what exactly is big data? Big data encompasses a complex and large set of diverse structured
and unstructured datasets that are difficult to process using traditional data management practices
and tools.
Big data is too big, moves too fast, or doesn't fit the structures of your database architectures.
Big data spans 3 v's
Variety – Big data extends beyond structured data(OLTP systems) including unstructured data of all varieties: text, audio, video, click streams, log files and more.
Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
VARIETY
Structured
Unstructured
Semistructured
All the above
VELOCITY
Batch
Near time
Real time
Streams
VOLUME
Terabytes, Petabytes
Records
Transactions
Tables, files
Why Big Data
Big data is more than a challenge;
It is an opportunity
To understand trends and consumer sentiments in real time.
To find insight in new and emerging types of data.
To make your business more agile &
To answer questions that, in the past, were beyond reach.
Big Data Processing - Types
Big Data Processing Is Uniquely Suited For Some Types Of Data
Traditional object-relational techniques, supplemented by VLDB technology, will continue to meet most data management needs. Organizations must examine potential size and growth of data sources closely when evaluating new usage opportunities. Besides the potential to grow large, certain characteristics of data make it suitable for horizontally scalable, distributed processing:
Poorly structured, lightly structured, or unstructured data. Big data processing technologies are particularly well-suited to large volumes of lightly structured data such as web pages, blogs, and messaging protocols (email, instant messaging, and microblogs). This type of data adapts well to a hierarchical schema with sparsely populated attributes, which is the basis of many big data advancements driven by Web 2.0 companies such as Google, Facebook, and Yahoo.
Simply structured data streams. Data generated from a sensor network, such as RFID or medical equipment, can be accessed as a stream of data representing a simple structure of measured values. Distributed processing is well-suited to handling sensor network traffic, making big data processing technology a natural extension.
Binary or character encoded file data. Images and audio files are often best represented by a hierarchical database schema in which individual records are stored as objects. The structure of these systems closely parallels that of distributed networks, making this type of data a good fit for big data processing. Social networking and search companies are at the forefront of this usage scenario.
0 comments: