Database Reference
In-Depth Information
Example 5-13. Loading CSV with textFile() in Scala
import Java.io.StringReader
import au.com.bytecode.opencsv.CSVReader
...
val input = sc . textFile ( inputFile )
val result = input . map { line =>
val reader = new CSVReader ( new StringReader ( line ));
reader . readNext ();
}
Example 5-14. Loading CSV with textFile() in Java
import au.com.bytecode.opencsv.CSVReader ;
import Java.io.StringReader ;
...
public static class ParseLine implements Function < String , String []> {
public String [] call ( String line ) throws Exception {
CSVReader reader = new CSVReader ( new StringReader ( line ));
return reader . readNext ();
}
}
JavaRDD < String > csvFile1 = sc . textFile ( inputFile );
JavaPairRDD < String []> csvData = csvFile1 . map ( new ParseLine ());
If there are embedded newlines in fields, we will need to load each file in full and
parse the entire segment, as shown in Examples 5-15 through 5-17 . This is unfortu‐
nate because if each file is large it can introduce bottlenecks in loading and parsing.
The different text file loading methods are described “Loading text files” on page 73 .
Example 5-15. Loading CSV in full in Python
def loadRecords ( fileNameContents ):
"""Load all the records in a given file"""
input = StringIO . StringIO ( fileNameContents [ 1 ])
reader = csv . DictReader ( input , fieldnames = [ "name" , "favoriteAnimal" ])
return reader
fullFileData = sc . wholeTextFiles ( inputFile ) . flatMap ( loadRecords )
Example 5-16. Loading CSV in full in Scala
case class Person ( name : String , favoriteAnimal : String )
val input = sc . wholeTextFiles ( inputFile )
val result = input . flatMap { case ( _ , txt ) =>
val reader = new CSVReader ( new StringReader ( txt ));
reader . readAll (). map ( x => Person ( x ( 0 ), x ( 1 )))
}
Search WWH ::




Custom Search