Is it possible for a spark UDF to return more than one value? If so how are the individual items accessed in the dataframes API.
            Asked
            
        
        
            Active
            
        
            Viewed 4,922 times
        
    3
            
            
        - 
                    2UDFs can only return single column values. These values can be collections or tuples but they can't be multiple values. If you really need to you can return a tuple and then split it using a command like `$"colname_1"`, `$"colname_2"` etc – evan.oman Dec 27 '16 at 00:37
- 
                    related question: http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe – savx2 Dec 27 '16 at 03:26
2 Answers
4
            You have three options:
- Return a - Seqof items of the same type to create- arraycolumn.- udf(() => Seq(1.0, 2.0, 3.0))
- Return a - Map:- udf(() => Map("x" -> 1.0, "y" -> -1.0))
- Return a product (tuple or an instance of a case class) to create - structcolumn.- udf(() => (1.0, "foo", 5))
 
    
    
        user7337271
        
- 1,662
- 1
- 14
- 23
- 
                    1Thanks. How about the second part of the question? My current solution is to add an additional select op to access individual items. Is there another way to flatten the returned values? – savx2 Dec 27 '16 at 03:29
- 
                    
 
    