What is Big Data?
What is Big Data? How do you learn big data?
What is big data and what is its use?
Goals and Objective :
- What is Big Data?
- Parameters of Big Data
- Traditional Solutions
- Challenges of Big Data
- Hadoop -A Good Solution
- Hadoop Vs RDBMS
What is Big Data?
Big Data is Data in a huge volume that is unprocessed. We as a customer or a user of a particular service generate lots of data every single second. We need storage space to store that information and computation power to process that data.
If I have generated 2 Terabytes of data then do I call it a Big Data problem?
The answer is No.
So, What are some parameters that help us to categorize whether a particular problem is a big data problem or not?
Parameters are 3V’s:
- Volume: Data got generated in a huge amount that is more than Terabytes of data.
2. Velocity: Data got generated at high speed. For example, NASA is generating more than 2 Gigabytes of data every hour.
3. Variety: Data consists of a huge variety. For example, Music, Video, Text, etc, are different varieties of data.
Challenges of Big Data :
Now we know that people are generating lots of data so this big data comes with lots of challenges :
- Storage: Storage plays a main important role, as we are generating more than Terabytes of data but where to store that data? We need a database to store this huge amount of data.
- Computational Efficiency: Huge amount of data requires lots of processing capabilities which eventually get decreasing by increasing data.
- Data Loss: Due to the corruption of physical device or hardware failure, data loss occurs. That is we need to have some important recovery strategies to recover the data.
- Cost: With the increase in data and computation capabilities it will eventually increase the cost of both the challenges. So we want cost-effective strategies to deal with these challenges.
Traditional Solutions
Traditionally we are using MySQL, Oracle to structure and store the data effectively. But there are limitations in terms of Scalability because when it comes to increasing the volume of data then the time to denormalize the data becomes a challenge.
Another problem with the traditional method is that we can only store structured data that means the traditional database will struggle to store structured data.
Hadoop -A Good Solution
In order to rectify the limitations of traditional methods, Hadoop comes into play.
Hadoop is an open-source distributed computing framework that manages :
- Support Huge Volume
- Storage Efficiency
- Good Data Recovery Solution
- Horizontal Scaling
- Cost-Effective
- Easy for Programmers and Non-Programmers
So with these types of features of Hadoop Can we replace RDBMS?
The answer is No because Hadoop and RDBMS are having there own advantages and disadvantages.
Hadoop Vs RDBMS
- Hadoop works on Dynamic Schema whereas RDBMS works on Static Schema.
- Hadoop works on Linear Scale whereas RDBMS works on the Non-Linear Scale.
- Hadoop works on more than Petabytes of data whereas RDBMS works on Gigabytes of data.
- Hadoop works on Batch processing whereas RDBMS works on Interactive and Batch processing.
- Hadoop works on ‘Write Once, Read Many Times’ whereas RDBMS works on ‘Read Write Many Times’.