Date: from Nov. 2011 to Feb. 2012
A community white paper developed by leading researchers across the United States
Divyakant Agrawal, UC Santa Barbara
Philip Bernstein, Microsoft
Elisa Bertino, Purdue Univ.
Susan Davidson, Univ. of Pennsylvania
Umeshwar Dayal, HP
Michael Franklin, UC Berkeley
Johannes Gehrke, Cornell Univ.
Laura Haas, IBM
Alon Halevy, Google
Jiawei Han, UIUC
H. V. Jagadish, Univ. of Michigan (Coordinator)
Alexandros Labrinidis, Pittsburgh Univ.
Sam Madden, MIT
Yannis Papakonstantinou, UC San Diego
Jignesh M. Patel, Univ. of Wisconsin
Raghu Ramakrishnan, Yahoo!
Kenneth Ross, Columbia Univ.
Cyrus Shahabi, Univ. of Southern California
Dan Suciu, Univ. of Washington
Shiv Vaithyanathan, IBM
Jennifer Widom, Stanford Univ.
Before going through this paper I was thinking about the big data from a narrow angle and fully correlated with what I used to do as senior system engineer, infrastructure components; storage, backup, archiving & ….etc, in multiple types of applications where only the concepts of big data with the following factors; volume, variety and velocity will be applied. This will lead fails to the previous short list by not take into account the other factors such as privacy and usability.
New vision and broad range of concepts were replaced and necessitate rethinking in many aspects of big data.
The paper created as a result of a distributed conversation among many researchers in an educational and commercial vendors fields, service providers.
The subject adapted due to the promise of big data is real and potential of the revolution of data flood among multiple sources and applications becoming as essential part of our lifestyle.
Big data have a huge potential of creating a tremendous economy value for the upcoming decades, the predictions 140,000- 190,000 employment and 1.5 million managers will be hired to become data-literate.
The authors divided the research in two main parts as showing in the below figure(1):
- Phases in processing pipeline
- Challenges in big data analyses
The subject of this paper coverage many aspects of the big data and provided many predictions for human interests with high quality of distributed discussions relies on researcher backgrounds and society usage.
This research empower the inspired people, institutions, organizations, and others to ask questions of data and analyze it. The strength came from discussions of all big data processing pipeline stages with many examples, what fit to be used in big data approaches and considered as a success factors of currently used applications and services.
The research suggested effective methods of phases in processing pipeline, in the area of data acquisition and recording an intelligent filtration of what’s useful data to collect and what we can discarded without impacting the desired results plus write a metadata describing original data with provenance. In information extraction suggested a structured data from source devices. In data integration, aggregation and presentation reusing the same data set in many functions in different processing manners. In query processing more details can be associated to avoid misleading results. In interpretation providing results with supplementary information to provenance outcomes and persuasion requesters on how the result derived.
What missed in this research?
- Big data is real and a worldwide issue and always highlighted as a “research frontier “which requires a tremendous global efforts to achieve progress. There was no planning in this research to repeat such this teamwork.
- Lack of discussions about standardized inputs format of data that may correlated to each other and may mitigated query processing and facilitate integration between broad ranges of applications and improve response time.
It’s a big data era, thousands miles begins with a single step, and by changing the system designing architectures and combining many functions or processing on less number of systems, we will avoid a big data duplication and improve data set usability by reusing it in many manners. This research is just a start of series of researches to be followed about this essential subject.
Divyakant Agrawal, (UC Santa Barbara), Philip Bernstein, (Microsoft), Elisa Bertino, (Purdue Univ.), Susan Davidson, (Univ. of Pennsylvania), Umeshwar Dayal, (HP), Michael Franklin, (UC Berkeley), Johannes Gehrke, (Cornell Univ.), Laura Haas, (IBM), Alon Halevy, (Google), Jiawei Han, (UIUC), H. V. Jagadish, (Univ. of Michigan) (Coordinator), Alexandros Labrinidis, (Pittsburgh Univ.), Sam Madden, (MIT), Yannis Papakonstantinou, (UC San Diego), Jignesh M. Patel, (Univ. of Wisconsin), Raghu Ramakrishnan, (Yahoo!), Kenneth Ross, (Columbia Univ.), Cyrus Shahabi, (Univ. of Southern California), Dan Suciu, (Univ. of Washington), Shiv Vaithyanathan, (IBM), Jennifer Widom, (Stanford Univ.)
http://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdf from Nov. 2011 to Feb. 2012