使用 github上已有的开源项目
1)git clone https://github.com/wzhe06/ipdatabase.git2)编译下载的项目: mvn clean package- DskipTests
3)安装jar包到自己的 maven仓库
mvn install: install-file -Dfile=${编译的jar包路径}/target/ipdatabase-1.0-SNAPSHOT jar -DgroupId=com.ggstar -DartifactId=ipdatabase -Dversion=1.0 -Dpackaging=jar
4)添加依赖到pom
com.ggstar ipdatabase 1.0
org.apache.poi poi-ooxml 3.14 org.apache.poi poi 3.14
5)将源码main/resource下的ipDatabase.csv和ipRegion.xlxs拷贝到当前项目的resource目录下
6)ip解析工具类
/** * IP解析工具类 */object IpUtils { def getCity(ip:String): Unit ={ IpHepler.findRegionByIp(ip) }}
7)打包到yarn运行
在pom文件排除spark打包,因为环境上有。
org.scala-lang scala-library ${scala.version} provided org.apache.spark spark-sql_2.11 ${spark.version} provided org.apache.spark spark-hive_2.11 ${spark.version} provided
打包时注意,pom.xml中需要添加如下plugin
maven-assembly-plugin jar-with-dependencies
提交运行
/bin/spark-submitclass com.rz.log.SparkstatcleanJobYARN--name SparkstatcleanJobYARN--master yarn--executor-memory 1G--num-executors 1\--files /home/hadoop/Lib/ipDatabase. CSV, /home/hadoop/lib/ipRegion XlSx \/home/hadoop/lib/sql-1.0-jar-with-dependencies.jar \hdfs://hadoop001:8020/imooc/input/* hdfs://hadoop001: 8020/imooc/clean