Wednesday, 1 April 2015

BIG DATA

The biggest BOOM in the current world of technology is maintaining the data we generate.Now a days we humans are generating very very large amounts of data in such a way that gigabytes, terabytes would look very very small, in fact would be considered infinity small in front of the data we are generating.We must do something to maintain this data.Data is generally generated from various sources like sharing pics on social media, online music players, online video chats etc etc... We can say the characteristics of this data as VOLUME, VELOCITY, VAREITY.  
VOLUME: It refers to the amount of data we are generating.Today humans are generating very large amounts of data ie. in terms of zettabytes(10^30). Let us look at a very simple example, suppose a person is uploading a very minimum of 5 pics on Facebook. Each pic is of size say 2Mb. But in one day not only does this person upload pictures but nearly million other people all around the world upload pictures. Looking at the numbers itself we can imagine the amount of data we are generating just by uploading pictures.  
VELOCITY: Refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. For example Google receives 400,000 search queries every second, we share 300,000 pics on Whatsapp every second. So we can understand at what rate we are generating data.
VARIETY: Refers to various forms of data that can be generated. For example pictures,videos,tweets,search queries in search engines etc etc... Handling this kind of data would be really tough. As 95% of the data generated is unstructured(pictures,videos etc..) we can't possibly handle them using our relational databases(ex. SQL,MYSQL). we need to develop techniques to compress these kinds of data and develop new ways for it's storage.
 So now let us formally define what big data actually is BIG DATA is any kind of data which our system would find it difficult to process. Suppose I have a system that can process around 1gb data efficiently and to this system I request it to process 1tb, definitely it wouldn't be able to process it as efficiently as it processes the 1gb data. So for this machine we can say that the 1tb we provided it as big data. Various technologies are emerging using which this large amount of unstructured data is handled like Hadoop,H-Base,Hive.