Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity.
Big Data has the potential to help companies improve operations and make faster, more intelligent decisions. The data is collected from a number of sources including emails, mobile devices, applications, databases, servers and other means. This data, when captured, formatted, manipulated, stored and then analyzed, can help a company to gain useful insight to increase revenues, get or retain customers and improve operations.
Is Big Data a Volume or a Technology?
While the term may seem to reference the volume of data, that isn’t always the case. The term big data, especially when used by vendors, may refer to the technology (which includes the tools and processes), that an organization requires to handle the large amounts of data and storage facilities. The term is believed to have originated with web search companies who needed to query very large distributed aggregations of loosely-structured data.
The complexity of systems, both engineered and biological, jumps considerably when you go from monolithic to distributed architectures.
Here’s a handy table developed to describe the differences between small, medium, and Big data:
An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.
When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big Data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.