I'm using xsbt-proguard-plugin, which is an SBT plugin for working with Proguard.
I'm trying to come up with a Proguard configuration for a Hive Deserializer I've written, which has the following dependencies:
// project/Dependencies.scala
val hadoop      = "org.apache.hadoop"          %  "hadoop-core"          % V.hadoop
val hive        = "org.apache.hive"            %  "hive-common"          % V.hive
val serde       = "org.apache.hive"            %  "hive-serde"           % V.hive
val httpClient  = "org.apache.httpcomponents"  %  "httpclient"           % V.http 
val logging     = "commons-logging"            %  "commons-logging"      % V.logging
val specs2      = "org.specs2"                 %% "specs2"               % V.specs2      % "test"
Plus an unmanaged dependency:
// lib/UserAgentUtils-1.6.jar
Because most of these are either for local unit testing or are available within a Hadoop/Hive environment anyway, I want my minified jarfile to only include:
- The Java classes SnowPlowEventDeserializer.class and SnowPlowEventStruct.class
- org.apache.httpcomponents.httpclient
- commons-logging
- lib/UserAgentUtils-1.6.jar
But I'm really struggling to get the syntax right. Should I start from a whitelist of classes I want to keep, or explicitly filter out the Hadoop/Hive/Serde/Specs2 libraries? I'm aware of this SO question but it doesn't seem to apply here.
If I initially try the whitelist approach:
// Should be equivalent to sbt> package
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
  proguardLibraryJars := Nil,
  proguardOptions := Seq(
    "-keepattributes *Annotation*,EnclosingMethod",
    "-dontskipnonpubliclibraryclassmembers",
    "-dontoptimize",
    "-dontshrink",
    "-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventDeserializer",
    "-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventStruct"
  )
)
Then I get a Hadoop processing error, so clearly Proguard is still trying to bundle Hadoop:
proguard: java.lang.IllegalArgumentException: Can't find common super class of [[Lorg/apache/hadoop/fs/FileStatus;] and [[Lorg/apache/hadoop/fs/s3/Block;]
Meanwhile if I try Proguard's filtering syntax to build up the blacklist of libraries I don't want to include:
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
  proguardLibraryJars := Nil,
  proguardOptions := Seq(
    "-keepattributes *Annotation*,EnclosingMethod",
    "-dontskipnonpubliclibraryclassmembers",
    "-dontoptimize",
    "-dontshrink",
    "-injars  !*hadoop*.jar"
  )
)
Then this doesn't seem to work either:
proguard: java.io.IOException: Can't read [/home/dev/snowplow-log-deserializers/!*hadoop*.jar] (No such file or directory)
Any help greatly appreciated!
 
     
    