Apache Mahout is one of the projects from Apache to implement scalable Machine Learning Algorithms.
From the high level Apache Mahout mainly deals with the following three areas.
1. Recommendation Engine.
2. Clustering.
3. Classification.
Recommendation Engine : Recommendation Engine analyze the user data (preferences or ratings) and helps in introducing or recommending the products/friends etc to them which they never know earlier.
Ex : Face book suggesting friends , Amazon suggesting the products which the user might be interested in.
Clustering : This part of the machine learning techniques allows to group various entities based on the particular charcters or features.
Ex: Google News does the grouping of particular stories related to a news and presents to the user.
Classification : This part helps in automatically classifying documents,images,implement spam filters and can be extended to various domains.
Ex : Yahoo mail identifying the spam.
This publish talks about the Mahout Recommendation part and steps to set the same on Ubuntu.
Executing a Example :
The data is usually fed from a file or it can be directly done from data tables using JDBC models.In the following example data is inputted from a file.
Usually the data in the input file is of the following format.if user 101 rates a book book id (450) with rating 4....the entry in the input file would be
101,450,4.0.
So have the input file call it as input.dat with some relevant data
example :
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0
and one more file to be supplied to the command is recommend.dat(this file will just contain the id of the user to whom the recommendation is targeted eg:101)
Now run the following command on the terminal
$ bin/mahout recommenditembased --input input.dat --usersFile recommend.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION
similarities in the above command can be changed to any of the following
SIMILARITY_PEARSON_CORRELATION,SIMILARITY_COOCCURRENCE , SIMILARITY_LOGLIKELIHOOD,SIMILARITY_TANIMOTO_COEFFICIENT,SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE,SIMILARITY_EUCLIDEAN_DISTANCE
cheers....Thanks for Reading....... :-) :-)
From the high level Apache Mahout mainly deals with the following three areas.
1. Recommendation Engine.
2. Clustering.
3. Classification.
Recommendation Engine : Recommendation Engine analyze the user data (preferences or ratings) and helps in introducing or recommending the products/friends etc to them which they never know earlier.
Ex : Face book suggesting friends , Amazon suggesting the products which the user might be interested in.
Clustering : This part of the machine learning techniques allows to group various entities based on the particular charcters or features.
Ex: Google News does the grouping of particular stories related to a news and presents to the user.
Classification : This part helps in automatically classifying documents,images,implement spam filters and can be extended to various domains.
Ex : Yahoo mail identifying the spam.
This publish talks about the Mahout Recommendation part and steps to set the same on Ubuntu.
Mahout mainly focuses on collaborative filtering techniques for recommendation. Every user has different taste and preferences but they will follow some patterns.The recommendation engine tries to identify the patterns from the existing data and predicts what the users may like or prefer and recommend them with the particular recommendations.
Mahout currently implements a collaborative filtering engine that supports the user-based, item-based and Slope-one recommend-er systems. Other algorithms available in are the k-means, fuzzy k-Means clustering, Canopy, Dirichlet and Mean-Shift.The other approach available is content based recommendations,but when content is considered for recommendation building the generic recommendation engine may not be effective. for eg: recommendation engine built to recommend pizza stuff considering the toppings,cheese content etc cannot be used for other domains. The same pizza recommendation engine cannot be used to recommend books or any other domain.So the Mahout mainly emphasizes on collaborative filtering.
Installation on Ubuntu :
Prerequisites:
a.. java >=1.6 version
b. Maven installed
c. Hadoop downloaded ( http://www.apache.org/dyn/closer.cgi/hadoop/common/ )
d. set JAVA_HOME and HADOOP_HOME (hadoop home should point to the extracted directory of the downloaded in the above step)
(collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm so is the above step d)
Steps :
1. Download the stable version of Mahout
https://cwiki.apache.org/confluence/display/MAHOUT/Downloads
2. Unpack the above download and navigate to the mahout directory
3. run mvn install (the build should succeed)
Executing a Example :
The data is usually fed from a file or it can be directly done from data tables using JDBC models.In the following example data is inputted from a file.
Usually the data in the input file is of the following format.if user 101 rates a book book id (450) with rating 4....the entry in the input file would be
101,450,4.0.
So have the input file call it as input.dat with some relevant data
example :
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0
and one more file to be supplied to the command is recommend.dat(this file will just contain the id of the user to whom the recommendation is targeted eg:101)
Now run the following command on the terminal
$ bin/mahout recommenditembased --input input.dat --usersFile recommend.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION
similarities in the above command can be changed to any of the following
SIMILARITY_PEARSON_CORRELATION,SIMILARITY_COOCCURRENCE , SIMILARITY_LOGLIKELIHOOD,SIMILARITY_TANIMOTO_COEFFICIENT,SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE,SIMILARITY_EUCLIDEAN_DISTANCE
cheers....Thanks for Reading....... :-) :-)
No comments:
Post a Comment