Why Big Data?

7 min readSep 17, 2020

In the era of technology, where Wealth is Data and Data is Wealth. Have you ever wondered how these tech giants stores there data. And how come they could do that. Today we’ll be learning about how tech giant handle their data.

What is Big Data ?

Big Data is a field that treats ways to analyze , systematically extract information form, or deal with data sets that are too large or complex to be dealt with by traditional data processing application software.

Big Data is a term that describes the large volume of data — both structured and unstructured ,that inundates a business day to day basis. But it’s not the amount of data that’s important . It’s what organization do with data that matters.

Big Data Storage problem of Google :

If we talk about Google , it is one of the biggest IT company in the World . Approx 20 PetaBytes Data is stored in Google servers or system per day. To purchase 20 PB of hardware everyday is out of questions. On other hand if Google purchases 20 PB of hardware per day . Then it will take huge amount of time to store data and manipulate it. Till then you might give up searching.

Imagine you want to search about anything and the results takes hours to load.will you still use it.No Right? and these problems are called Big data and the solution is also called Big data. Amazed!! There’s lot more for you to amaze.

Some Quick Stats

Google gets over 3.5 billion searches daily.
Google remains the highest shareholder of the search engine market, with 87.35% of the global search engine market share as of January 2020. Big Data stats for 2020 show that this translates into 1.2 trillion searches yearly, and more than 40,000 search queries per second.
WhatsApp users exchange up to 65 billion messages daily.
5 million businesses are actively using the WhatsApp Business app to connect with their customers. There are over 1 billion WhatsApp groups worldwide?
Internet users generate about 2.5 quintillion bytes of data each day.
With the estimated amount of data we should have by 2020 (40 zettabytes), we have to ask ourselves what’s our part in creating all that data. So, how much data is generated every day? 2.5 quintillion bytes. Now, this number seems rather high, but if we look at it in zettabytes, i.e., 0.0025 zettabytes this doesn’t seem all that much. When we add to that the fact that in 2020 we should have 40 zettabytes, we’re generating data at a regular pace.
By 2020, every person will generate 1.7 megabytes in just a second.
In 2019, there are 2.3 billion active Facebook users, and they generate a lot of data.

WHICH COMPANY HAS THE MOST SERVERS?

Given the answer to the last question, you’d be forgiven for thinking the answer to this question was Google. Actually, the answer is Amazon. They host their estimate 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Google and Microsoft are both presumed to have about 1,000,000 servers, but neither will release exact figures.

LINK FOR ABOVE STATS.

Let us now get deep into the stats!!

Examples of how some Tech giants are using Big Data Analytics

Facebook

There’s a lot of data stored on Facebook, and a lot of its users’ own content. That content is the most important asset on the service, and users need to believe it’s secure, otherwise they won’t share. Getting storage right is critical — and is helping define how Facebook designs its data centers.

Instead of servers that include compute, memory, flash storage and HDD storage, Facebook’s disaggregated server model splits the various server components across separate racks, allowing it to tune the components for specific services and to use what Qin calls “smarter hardware refreshes” to extend useful life. By separating server resources mixes of compute, memory and storage on different racks can be combined, for example, to deliver a set of servers that can run Hadoop. As loads and usage change, the balance of components that power a service can be changed — keeping inefficiencies to a minimum.

Qin notes that the key to this approach is faster networking, with the latest technologies used to build Facebook’s first fabric-based data center in Iowa. The system is designed to work at speeds up to the network card line rate — though it’s not yet operating at that speed, as the service doesn’t need the bandwidth. Qin expects this approach to extend the life of storage modules, as Facebook can swap out memory and CPU on a different, faster, schedule.

2. Amazon

The online retail goliath has got access to a gigantic measure of information on its clients; names, locations, installments, and search accounts are altogether documented in its information bank. While this data is put to use in publicizing calculations, Amazon likewise utilizes the data to improve client relations, a region that numerous big data users disregard.

Whenever you contact the Amazon help work area with an inquiry, don’t be astounded when the worker on the opposite end has already received a large portion of the relevant data about you close by. The applicable data takes into consideration a quicker, progressively practical client administration experience that does exclude illuminating your name multiple times.

3. American Express

The American Express Company is utilizing big data to break down and anticipate shopper conduct. By taking a gander at authentic exchanges and fusing more than 100 factors, the organization uses refined prescient models instead of conventional business insights based on knowing the past.

Current time permits an increasingly precise conjecture of potential beat and client dedication. American Express has guaranteed that, in their Australian market, they can anticipate 24% of records that will close within four months.

4. BDO

National bookkeeping and review firm BDO puts enormous information examination to use in recognizing danger and extortion during reviews. Where, previously, finding the wellspring of inconsistency would include various meetings and long periods of labor, counseling with personal information initially and takes into consideration a fundamentally limited field and streamlined procedure.

In one case, BDO Consulting Director Kirstie Tiernan noted, they had the option to cut a rundown of thousands of merchants down to twelve and, from that point, audit information exclusively for irregularities. A particular source was generally recognized rapidly.

5. Capital One

Marketing is one of the most widely recognized uses for enormous intelligence, and Capital One is at the highest point of the game, using massive information management to enable them to guarantee the achievement of all client contributions.

Through an examination of the social economics and ways of managing the money of clients, Capital One decides the ideal occasions to display different ideas to customers along these lines, expanding the change rates from their interchanges.

In addition to the fact that this results in better uptake, advertising systems become unquestionably more focused on and pertinent, in this way, improving spending assignment.

6. General Electric (GE)

GE is utilizing the information from sensors on apparatus like gas turbines and fly motors to distinguish approaches to improve working procedures and unwavering quality. The resultant reports are then passed to GE’s examination group to create instruments and enhancements for expanded proficiency.

The organization has assessed that information could support efficiency in the US by 1.5%, which, over a 20-year time frame, could spare enough money to raise typical national salaries by as much as 30%.

7. Netflix

The entertainment streaming service has an abundance of information and examination, giving knowledge into the survey propensities for many global customers. Netflix utilizes this information to commission unique programming content that interests all around just as acquiring the rights to movies and arrangement boxsets that they realize will perform well with specific crowds.
For instance, Adam Sandler has demonstrated disliked in the US and UK showcases as of late, yet Netflix green-lit four new films with the on-screen character in 2015, equipped with the information that his past work had been effective in Latin America

Management of BigData by IT industries ?

To overcome such type of problem IT companies use “Distributed Data Storage” technique. In this technique we use Master Slaves topology.

Master slaves Topology:

In this topology we create one system as a master and other systems as slaves of it. In this technique if we have to Store the data of 100 gb and we have system of 10 gb hard drive capacity . Then we use 10 system . In which one system is work as a master and other 9 work as slaves. These systems contribute their hard drive to the master system by a some network . Thus a Master system work as it has 100 gb hard drive to store data. So after doing this we can overcome the problem of Volume . In this technique all systems are work in parallel so the velocity problem is also solved. There Master node is also known as Name node and Slave node is also known as Data node.

So Lets call it a wrap!! Much more to be be discussed in next blogs. Till then Stay connected. If you liked my work please show your love through that Clap.

Meanwhile connect me on —

Azeemushan Ali - Trainee - LinuxWorld Informatics Pvt Ltd | LinkedIn

I'm Azeemushan Ali, a software developer working in domains of Python,Machine Learning and Cloud. I am involved in…

www.linkedin.com

Azeemushan Ali Portfolio

Azeemushan Ali, a software developer working in domains of Python,Machine Learning and Cloud. He's currently pursuing…

azeemushanali.tech