Mahout Demo

Posted by Big Data Memo on May 22, 2017

下载安装及修改环境配置

# set mahout environment
export MAHOUT_HOME=/home/hadoop/mahout
export MAHOUT_CONF_DIR=/$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH

# SET MAHOUT hodoop env
export HADOOP_HOME_WARN_SUPPRESS=not_null

Test

kmeans聚类测试

# 下载测试数据
[hadoop@NN01 mahout]$ wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
# put to hadoop
[hadoop@NN01 mahout]$ hdfs dfs -put synthetic_control.data ./testdata
# 测试Kmeans
[hadoop@NN01 mahout]$ mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

结果:

1.0 : [distance=40.25481372701428]: [33.67,38.675,39.742,41.989,37.291,43.975,31.909,25.878,31.08,15.858,13.95,23.097,19.983,21.692,31.579,38.57,33.376,38.843,41.936,33.534,39.195,32.897,25.343,18.523,15.089,17.771,22.614,25.313,23.687,29.01,41.995,35.712,40.872,41.669,32.156,25.162,24.98,23.705,18.413,20.975,14.906,26.171,30.165,27.818,35.083,39.514,37.851,33.967,32.338,34.977,26.589,28.079,19.597,24.669,23.098,25.685,28.215,34.94,36.91,39.749]
17/05/22 13:23:59 INFO ClusterDumper: Wrote 6 clusters
17/05/22 13:23:59 INFO MahoutDriver: Program took 2341635 ms (Minutes: 39.02725)

查看输出:

[hadoop@NN01 mahout]$ hdfs dfs -ls ./output
Found 15 items
-rw-r--r--   2 hadoop supergroup        194 2017-05-22 13:21 output/_policy
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:23 output/clusteredPoints
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 12:50 output/clusters-0
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 12:54 output/clusters-1
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:21 output/clusters-10-final
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 12:57 output/clusters-2
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:01 output/clusters-3
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:04 output/clusters-4
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:07 output/clusters-5
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:10 output/clusters-6
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:13 output/clusters-7
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:16 output/clusters-8
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 13:19 output/clusters-9
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 12:50 output/data
drwxr-xr-x   - hadoop supergroup          0 2017-05-22 12:50 output/random-seeds
[hadoop@NN01 mahout]$ hdfs dfs -ls ./output/data
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2017-05-22 12:50 output/data/_SUCCESS
-rw-r--r--   2 hadoop supergroup     335470 2017-05-22 12:49 output/data/part-m-00000
[hadoop@NN01 mahout]$ hdfs dfs -ls ./output/clusters-0
Found 7 items
-rw-r--r--   2 hadoop supergroup        194 2017-05-22 12:50 output/clusters-0/_policy
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00000
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00001
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00002
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00003
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00004
-rw-r--r--   2 hadoop supergroup       1891 2017-05-22 12:50 output/clusters-0/part-00005
[hadoop@NN01 mahout]$ mahout vectordump -i ./output/data/part-m-00000
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /home/hadoop/hadoop/bin/hadoop and HADOOP_CONF_DIR=/home/hadoop/hadoop/etc/hadoop
MAHOUT-JOB: /home/hadoop/mahout/mahout-examples-0.12.0-job.jar
.......
{0:34.3354,33:7.97526,25:34.0258,58:11.6989,49:11.0276,50:10.6075,8:26.7666,9:33.0887,24:34.8249,57:11.7426,59:10.1521,26:8.8228,16:29.2769,17:31.7612,23:26.5812,18:34.6501,35:9.20801,39:17.8322,27:12.6338,56:13.9132,47:14.3998,48:15.6964,51:15.1897,55:15.013,31:16.6505,52:9.07618,32:18.0782,34:9.27375,36:12.8791,38:6.97605,46:9.44524,40:13.3297,41:6.32571,45:10.4253,44:16.7155,42:12.1314,43:11.8421,29:6.27907,4:24.5189,37:12.7293,12:26.3833,54:9.84551,13:35.6614,53:17.9085,30:13.6444,28:12.6937,19:24.9401,21:26.8485,22:28.7144,20:33.4339,2:31.9529,6:27.6961,15:27.685,14:32.6631,11:26.2327,10:31.3705,1:30.9375,3:31.1457,7:29.8744,5:24.3933}
17/05/22 13:43:57 INFO MahoutDriver: Program took 13199 ms (Minutes: 0.21998333333333334)

clusterdump bug 攻略