The biggest BOOM in the current world of technology is
maintaining the data we generate.Now a days we humans are generating
very very large amounts of data in such a way that gigabytes, terabytes
would look very very small, in fact would be considered infinity small
in front of the data we are generating.We must do something to maintain
this data.Data is generally generated from various sources like sharing
pics on social media, online music players, online video chats etc
etc... We can say the characteristics of this data as VOLUME, VELOCITY, VAREITY.
VOLUME: It
refers to the amount of data we are generating.Today humans are
generating very large amounts of data ie. in terms of zettabytes(10^30).
Let us look at a very simple example, suppose a person is uploading a
very minimum of 5 pics on Facebook. Each pic is of size say 2Mb. But in
one day not only does this person upload pictures but nearly million
other people all around the world upload pictures. Looking at the
numbers itself we can imagine the amount of data we are generating just
by uploading pictures.
VELOCITY: Refers to the speed at
which new data is generated and the speed at which data moves around.
Just think of social media messages going viral in seconds. For example
Google receives 400,000 search queries every second, we share 300,000
pics on Whatsapp every second. So we can understand at what rate we are
generating data.
VARIETY: Refers to various forms of
data that can be generated. For example pictures,videos,tweets,search
queries in search engines etc etc... Handling this kind of data would be
really tough. As 95% of the data generated is
unstructured(pictures,videos etc..) we can't possibly handle them using
our relational databases(ex. SQL,MYSQL). we need to develop techniques
to compress these kinds of data and develop new ways for it's storage.
So now let us formally define what big data actually is BIG DATA
is any kind of data which our system would find it difficult to
process. Suppose I have a system that can process around 1gb data
efficiently and to this system I request it to process 1tb, definitely
it wouldn't be able to process it as efficiently as it processes the 1gb
data. So for this machine we can say that the 1tb we provided it as big
data. Various technologies are emerging using which this large amount
of unstructured data is handled like Hadoop,H-Base,Hive.