The Dilemma of Big Data

Data Processing, Manipulating, storing, Growing, Securing, Privacy, Backing up, Archiving & Retrieving

Summary

  Small data in an environment could be a problem to manage , Now a days with hundreds or thousands of applications with capabilities of processing Petabyte or Exabyte of data from multiple resources in a real time with high speed and performance. These types of applications requiring huge compute resources and storage spaces with accurate and secure results. This post will travel deeply in challenges and solutions, based on knowing issues and predicting future's demanding....

white Paper(1): Challenges and Opportunities with Big Data

Date: from Nov. 2011 to Feb. 2012

Authors:

A community white paper developed by leading researchers across the United States

Divyakant Agrawal, UC Santa Barbara 

Philip Bernstein, Microsoft 

Elisa Bertino, Purdue Univ. 

Susan Davidson, Univ. of Pennsylvania 

Umeshwar Dayal, HP 

Michael Franklin, UC Berkeley 

Johannes Gehrke, Cornell Univ. 

Laura Haas, IBM 

Alon Halevy, Google 

Jiawei Han, UIUC 

H. V. Jagadish, Univ. of Michigan (Coordinator) 

Alexandros Labrinidis, Pittsburgh Univ. 

Sam Madden, MIT 

Yannis Papakonstantinou, UC San Diego 

Jignesh M. Patel, Univ. of Wisconsin 

Raghu Ramakrishnan, Yahoo! 

Kenneth Ross, Columbia Univ. 

Cyrus Shahabi, Univ. of Southern California 

Dan Suciu, Univ. of Washington 

Shiv Vaithyanathan, IBM 

Jennifer Widom, Stanford Univ.

First Impression

Before going through this paper I was thinking about the big data from a narrow angle and fully correlated with what I used to do as senior system engineer, infrastructure components; storage, backup, archiving & ….etc, in multiple types of applications where only the concepts of big data with the following factors; volume, variety and velocity will be applied. This will lead fails to the previous short list by not take into account the other factors such as privacy and usability.

New vision and broad range of concepts were replaced and necessitate rethinking in many aspects of big data.

 

Abstraction:

The paper created as a result of a distributed conversation among many researchers in an educational and commercial vendors fields, service providers.

The subject adapted due to the promise of big data is real and potential of the revolution of data flood among multiple sources and applications becoming as essential part of our lifestyle.

Big data have a huge potential of creating a tremendous economy value for the upcoming decades, the predictions 140,000- 190,000 employment and 1.5 million managers will be hired to become data-literate.

Research analyses:

The authors divided the research in two main parts as showing in the below figure(1):

  • Phases in processing pipeline
  • Challenges in big data analyses

The subject of this paper coverage many aspects of the big data and provided many predictions for human interests with high quality of distributed discussions relies on researcher backgrounds and society usage.

This research empower the inspired people, institutions, organizations, and others to ask questions of data and analyze it. The strength came from discussions of all big data processing pipeline stages with many examples, what fit to be used in big data approaches and considered as a success factors of currently used applications and services.

The research suggested effective methods of phases in processing pipeline, in the area of data acquisition and recording an intelligent filtration of what’s useful data to collect and what we can discarded without impacting the desired results plus write a metadata describing original data with provenance. In information extraction suggested a structured data from source devices. In data integration, aggregation and presentation reusing the same data set in many functions in different processing manners. In query processing more details can be associated to avoid misleading results. In interpretation providing results with supplementary information to provenance outcomes and persuasion requesters on how the result derived.

What missed in this research?

  • Big data is real and a worldwide issue and always highlighted as a “research frontier “which requires a tremendous global efforts to achieve progress. There was no planning in this research to repeat such this teamwork.
  • Lack of discussions about standardized inputs format of data that may correlated to each other and may mitigated query processing and facilitate integration between broad ranges of applications and improve response time.

Conclusions:

It’s a big data era, thousands miles begins with a single step, and by changing the system designing architectures and combining many functions or processing on less number of systems, we will avoid a big data duplication and improve data set usability by reusing it in many manners. This research is just a start of series of researches to be followed about this essential subject.

References

Divyakant Agrawal, (UC Santa Barbara), Philip Bernstein, (Microsoft), Elisa Bertino, (Purdue Univ.), Susan Davidson, (Univ. of Pennsylvania), Umeshwar Dayal, (HP), Michael Franklin, (UC Berkeley), Johannes Gehrke, (Cornell Univ.), Laura Haas, (IBM), Alon Halevy, (Google), Jiawei Han, (UIUC), H. V. Jagadish, (Univ. of Michigan) (Coordinator), Alexandros Labrinidis, (Pittsburgh Univ.), Sam Madden, (MIT), Yannis Papakonstantinou, (UC San Diego), Jignesh M. Patel, (Univ. of Wisconsin), Raghu Ramakrishnan, (Yahoo!), Kenneth Ross, (Columbia Univ.), Cyrus Shahabi, (Univ. of Southern California), Dan Suciu, (Univ. of Washington), Shiv Vaithyanathan, (IBM), Jennifer Widom, (Stanford Univ.)

http://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdf from Nov. 2011 to Feb. 2012