Monday, 19 August 2013

Big Data ,Map reduce ,Hadoop and related terms


Big Data is the term used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
Example : Face book hosts approximately 10 billion photos taking up one peta byte of storage.
The real  issue is not in acquiring and storing the big data, what you do with the acquired data matters.

Big Data analytics
With Big data and Big data analytics it is possible to
Analyze millions of SKU(stock keeping unit)s to determine optimal prices that maximize profit and clear inventory.
Recalculate entire risk portfolios in minutes and understand future possibilities to mitigate risk.
Mine customer data for insights that drive new strategies for customer acquisition, retention, campaign optimization and next best offers.
Quickly identify customers who matter the most.
Generate retail coupons at the point of sale based on the customer's current and past purchases, ensuring a higher redemption rate.
Send tailored recommendations to mobile devices at just the right time, while customers are in the right location to take advantage of offers.
Analyze data from social media to detect new market trends and changes in demand.
Use click stream analysis and data mining to detect fraudulent behavior.
Determine root causes of failures, issues and defects by investigating user sessions, network logs and machine sensors
Source
90% of the world’s data was mostly generated in the last 2+ yrs.
All this new data is coming from  smart phones, social networks, trading platforms etc
This data might be structured,semi structured or non structured(majority)

Map Reduce
Google apps team designed an algorithm to help them in the massive they get (getting) acquired.
The large data calculations are chopped to smaller chunks and mapped to many computers, then when calculations were done they are brought back to produce the resulting data set. This is called Map-Reduce.
This algorithm was later used to develop an open source project called Hadoop









Driving force

The two ingredients forcing the  business to look in to hadoop are
        1. data > 10 TB
        2. High calculation complexity

Hadoop will be playing the central role in
     1. Statistical analysis
     2. ETL processing
     3. Business intelligence






In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers. ~ Grace hopper


                                                 
Thanks for reading....cheers :-) :-)


No comments:

Post a Comment