We can’t imagine a world without data storage; a place where every aspect about a person or an organization, every operation performed, or every aspect which can be documented is lost directly after use. It leads to the loss of ability to extract valuable information and knowledge, perform detailed analysis as well as provide new opportunities. In today’s life anything ranging from customer names and addresses, to products available, to purchases made, to employees hired etc. has become essential. Data is the building block upon which any organization depends. Big Data is similar to Small Data, but bigger in size and having data bigger consequently requires different approaches i.e. techniques, tools & architectures and it emphasis to solve new problems and old problems in a better way. It is the collection of large and complex datasets which are difficult to process using traditional database management tools or data processing applications. The size of big data is continuously growing.
Characteristics of Big Data
The attributes of big data(3 V’s) are as follows:
• Volume: Volume refers to the ‘amount of data’, which is growing day by day at a very fast speed. Initially storing such data was challenging because of high storage costs. However with decreasing storage costs, this problem has been somewhat resolved.
• Velocity: Velocity is the rate of growth and how fast the data are gathered for being analysis. It is also defined as the rate at which different sources generate the data every day.
• Variety: Variety provides information about the types of data such as structured, unstructured, semi-structured. Documents to databases to excel tables to pictures and videos and audios in hundreds of formats, data is now available.
Big Data Architecture:
The architecture of Big Data involves multiple distinct phases which are discussed below:
Data Acquisition: The first phase is acquiring the data itself. With the growing sources the rate of data generation is also increasing. This phase of big data life cycle defines which type of platforms would be needed to deliver the resultant data product. Data collection is an insignificant step of the process which involves gathering unstructured data from different sources.
Data Extraction: All the data generated and acquired in first phase is not of use. It includes large amount of redundant or unimportant data. The challenges presented in data extraction are dual: firstly, according to nature and initial context of generated data it has to decide which data to keep and which to discard. Secondly, deficiency of a common platform has its own set of challenges. Due to broad range of data that exists, bringing them under a common platform to standardize data extraction is a major challenge.
Data Collation: Data from a single source often is not sufficient for analysis. Multiple data sources are often united to give a bigger picture for the process of analysis. For example, a weather prediction software takes data from many sources which disclose the daily humidity,
temperature, precipitation etc.
Data Structuring: In this phase data is presented and stored for further use in a structured format. Queries can be easily applied on the structured data. Data structuring includes of organizing the data in a particular manner. Many new platforms can make query on unstructured data also.
Data Visualization: Once the data is structured, queries are made on the data,the next step is to present the data in a visual format. Data Analysis stage includes targeting areas of interest and providing results based on structured data. Raw data cannot be used to gain insights or for judging patterns, therefore civilizing the data becomes important.
Data Interpretation: The final stage in Big Data Life Cycle includes interpretation and gaining valuable information from the processed data. The information gained can be of two types: Retrospective Analysis includes gaining insights about events and actions that have already happened. Prospective Analysis includes judging patterns and discriminating trends for future from already generated.
Big Data Analytics
Big Data Analytics mostly includes collecting data from different sources, modify it in a way so that it becomes available to be consumed by analysts and finally deliver data products to users. The process of converting large amounts of unstructured raw data, obtained from different sources to a data product useful for organizations forms the base of Big Data Analytics. Big data analytics is the process of collecting, organizing and analysing large sets of data to discover patterns and other useful information. We cannot design an experiment that fulfills our favourite statistical model. In large-scale applications of analytics, a large amount of work is needed just for cleaning the data, so it can be used by a machine learning model. The most important goal of Big Data Analytics is to enable organizations to make better decisions. Big data analytics includes many software tools commonly used as part of advanced analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis.
Applications Of Big Data
Big data has found many applications in various fields today which are as follows:
Fraud detection is one of the most persuasive Big Data application example.It mostly occurs in transaction processing businesses. In most cases, fraud is exposed after a point when the damage has been done and we are left with the option to minimize the loss and change policies to prevent it from happening again. Big data is vastly used in the fraud detection in the banking sectors by finding out all the mischief tasks done. It detects the misuse of credit cards and debit cards, unauthorized customers etc. In businesses big data helps a lot in knowing the shopping patterns of customers and the strategies of competitors so that they can apply them in their businesses for sales improvement.
Conventionally, the health care industry fall back in using Big Data because of limited capability to normalize and combine data. But now Big data analytics have improved healthcare by providing proper medicines. Various researches are going on the data to see what treatments are beneficial for any circumstances, to find out the patterns related to side effects of drugs and obtain other important information that can help patients and reduce costs. The recently developed technologies in healthcare which have improved the ability to work with such data are electronically recording of data, imaging data, data generation of patient etc.
The National Oceanic and Atmospheric Administration gathers data every minute of every day from land and sea. The weather forecasting department daily big data to analyze and obtain value from over terabytes of data. It is very essential to understand everything about weather due to the increasing facts of climate change globally like from what’s going to happen tomorrow to what’s coming next year. We need to analyze a large set of data for weather forecasting which leads to several advantages like saving lives, improving the quality of life, reducing risks etc.
Challenges with Big Data
Below are a few challenges which come along with Big Data:
Data Quality –The data is very messy, not consistent and incomplete. The filters should be applied in such a way that useful data should not be discarded.
Discovery – Finding insights on Big Data is very difficult to handle. One has to include very powerful algorithms to analyze large data and finding its insights..
Storage – The problem of managing the data become more serious when data generated is more. The question arises is Where to store that data?. We need a storage system which can easily level up or down when required.
Analytics – In the case of Big Data, mostly we are naive about the kind of data we are dealing presently or we have to deal in future so process of data analysis becomes more difficult.
Security – To secure the stored data is the major challenge. It includes user authentication, restricting access for a user, recording data access histories, correct use of data encryption methods etc.
Lack of talent– To handle all the aspects of big data all we need is a competent team of software developers, analysts and data researchers because a lot of Big Data projects are flowing in major organizations.
Big Data Privacy
Many challenges for data security and privacy arises due to arrival of Big Data. More and more technologies should be incorporated so as to handle the huge amount of Data and make it secure efficiently. Present technologies for securing large amount of data are slow. To violate privacy, especially with data relating to individuals and organizations is a serious topic of concern. To protecting confidential data from the access of any unauthorized user is a very difficult task and no real solutions have yet been developed in this field. Organizations working with big data need to take this matter seriously and make sure that the data storage and location should be properly protected from any misuse. This can be implemented by using unique databases, having devoted database servers, encrypting the data, having multiple security levels, having separate authentication units and ensuring secure system operations, data transmission and data flow control.
The objective of this paper is to describe, review, and reflect on big data. Big data is the large and complex datasets and it is generated from various sources like social media comments, playing a video game, email attachments etc. There is complexity in big data such as velocity, variety and volume. These three terms are more challenging for big data analytics. It elaborates on the concepts of big data followed by the applications and the challenges faced by it. Big Data is an evolving field where much of the research is yet to be done. Big data, at present, is handled by the software. Many challenges in the big data system need further research attention so that new technologies came into existence and it became more beneficial for society.