Chapter 15 References

“Apache Solr.” 2019.

“Apache Spark and Cern Open Data Analysis, an Example.” 2017.

“Apache Spark Officially Sets a New Record in Large-Scale Sorting.” 2014.

“Azure Wikipedia.” 2018.

“Big Compute Wikipedia.” 2019.

“Big Data Wikipedia.” 2019.

“Bioinformatics Applications on Apache Spark.” 2018.

Ceruzzi, Paul E. 2012. Computing: A Concise History. MIT Press.

“Churn Prediction with Apache Spark Machine Learning.” 2017.

Cleveland, William S. 2001. “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics?”

“Cloudera Wikipedia.” 2018.

“Databricks Documentation.” 2018.

“Databricks Wikipedia.” 2018.

“Dataproc Wikipedia.” 2018.

Dean, Jeffrey, and Sanjay Ghemawat. 2008. “MapReduce: Simplified Data Processing on Large Clusters.” Commun. ACM 51 (1): 107–13.

French, Carl. 1996. Data Processing and Information Technology. Cengage Learning Business Press.

Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. “The Google File System.” In Proceedings of the Nineteenth Acm Symposium on Operating Systems Principles. New York, NY, USA: ACM.

Group, World Bank. 2016. The Data Revolution. World Bank Publications.

Hinton, Geoffrey E, Simon Osindero, and Yee-Whye Teh. 2006. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527–54.

“Hortonworks Microsoft.” 2018.

“Hortonworks Wikipedia.” 2018.

“IBM Cloud Wikipedia.” 2018.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, 1097–1105.

Kuhn, Max and Johnson, Kjell. 2019. “Feature Engineering and Selection: A Practical Approach for Predictive Models.”

Laudon, Kenneth C, Carol Guercio Traver, and Jane P Laudon. 1996. “Information Technology and Systems.” Cambridge, MA: Course Technology.

“MapR Wikipedia.” 2018.

“Maven Repository: Home Page.” 2019.

“Maven Repository: Repositories.” 2019.

“Profvis.” 2018.

“RStudio Connect.” 2019.

“RStudio Server Pro.” 2019.

“Running Spark on Mesos.” 2018.

“Running Spark on Yarn.” 2018.

Samuel, Arthur L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal of Research and Development 3 (3): 210–29.

“Sort Benchmark.” 2019.

“Spark Integration with Cloud Infrastructures.” 2019.

“Spark-Solr Spark Package.” 2019.

“Spark Streaming Programming Guide.” 2018.

“Spark Wins Cloudsort Benchmark as the Most Efficient Engine.” 2016.

“The History of R’s Predecessor, S, from Co-Creator Rick Becker.” 2016.

Webster, Merriam. 2006. “Merriam-Webster Online Dictionary.” Webster, Merriam.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wu, C.F. Jeff. 1997. “Statistics = Data Science?”

Xie, Grolemund, Allaire. 2018. R Markdown: The Definite Guide. 1st ed. CRC Press.

Zaharia, Matei, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. “Spark: Cluster Computing with Working Sets.” HotCloud 10 (10-10): 95.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20.