Chapter 15 References

“Apache Solr.” 2019. http://lucene.apache.org/solr/.

“Apache Spark and Cern Open Data Analysis, an Example.” 2017. https://db-blog.web.cern.ch/blog/luca-canali/2017-08-apache-spark-and-cern-open-data-example.

“Apache Spark Officially Sets a New Record in Large-Scale Sorting.” 2014. https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html.

“Azure Wikipedia.” 2018. https://en.wikipedia.org/wiki/Microsoft_Azure.

“Big Compute Wikipedia.” 2019. https://www.nimbix.net/glossary/big-compute/.

“Big Data Wikipedia.” 2019. https://en.wikipedia.org/wiki/big_data.

“Bioinformatics Applications on Apache Spark.” 2018. https://academic.oup.com/gigascience/article/7/8/giy098/5067872.

Ceruzzi, Paul E. 2012. Computing: A Concise History. MIT Press.

“Churn Prediction with Apache Spark Machine Learning.” 2017. https://mapr.com/blog/churn-prediction-sparkml/.

Cleveland, William S. 2001. “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics?”

“Cloudera Wikipedia.” 2018. https://en.wikipedia.org/wiki/Cloudera.

“Databricks Documentation.” 2018. https://docs.databricks.com/spark/latest/sparkr/sparklyr.html.

“Databricks Wikipedia.” 2018. https://en.wikipedia.org/wiki/Databricks.

“Dataproc Wikipedia.” 2018. https://en.wikipedia.org/wiki/Google_Cloud_Dataproc.

Dean, Jeffrey, and Sanjay Ghemawat. 2008. “MapReduce: Simplified Data Processing on Large Clusters.” Commun. ACM 51 (1): 107–13.

French, Carl. 1996. Data Processing and Information Technology. Cengage Learning Business Press.

Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. “The Google File System.” In Proceedings of the Nineteenth Acm Symposium on Operating Systems Principles. New York, NY, USA: ACM.

Group, World Bank. 2016. The Data Revolution. World Bank Publications.

Hinton, Geoffrey E, Simon Osindero, and Yee-Whye Teh. 2006. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527–54.

“Hortonworks Microsoft.” 2018. https://hortonworks.com/partner/microsoft/.

“Hortonworks Wikipedia.” 2018. https://en.wikipedia.org/wiki/Hortonworks.

“IBM Cloud Wikipedia.” 2018. https://en.wikipedia.org/wiki/IBM_cloud_computing.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, 1097–1105.

Kuhn, Max and Johnson, Kjell. 2019. “Feature Engineering and Selection: A Practical Approach for Predictive Models.” http://www.feat.engineering/.

Laudon, Kenneth C, Carol Guercio Traver, and Jane P Laudon. 1996. “Information Technology and Systems.” Cambridge, MA: Course Technology.

“MapR Wikipedia.” 2018. https://en.wikipedia.org/wiki/MapR.

“Maven Repository: Home Page.” 2019. https://mvnrepository.com.

“Maven Repository: Repositories.” 2019. https://mvnrepository.com/repos.

“Profvis.” 2018. https://rstudio.github.io/profvis/.

“RStudio Connect.” 2019. https://www.rstudio.com/products/connect/.

“RStudio Server Pro.” 2019. https://www.rstudio.com/products/rstudio-server-pro/.

“Running Spark on Mesos.” 2018. https://spark.apache.org/docs/latest/running-on-mesos.html.

“Running Spark on Yarn.” 2018. https://spark.apache.org/docs/latest/running-on-yarn.html.

Samuel, Arthur L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal of Research and Development 3 (3): 210–29.

“Sort Benchmark.” 2019. http://sortbenchmark.org/.

“Spark Integration with Cloud Infrastructures.” 2019. https://spark.apache.org/docs/latest/cloud-integration.html.

“Spark-Solr Spark Package.” 2019. https://spark-packages.org/package/LucidWorks/spark-solr.

“Spark Streaming Programming Guide.” 2018. https://spark.apache.org/docs/latest/streaming-programming-guide.html.

“Spark Wins Cloudsort Benchmark as the Most Efficient Engine.” 2016. https://spark.apache.org/news/spark-wins-cloudsort-100tb-benchmark.html.

“The History of R’s Predecessor, S, from Co-Creator Rick Becker.” 2016. https://blog.revolutionanalytics.com/2016/07/rick-becker-s-talk.html.

Webster, Merriam. 2006. “Merriam-Webster Online Dictionary.” Webster, Merriam.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wu, C.F. Jeff. 1997. “Statistics = Data Science?”

Xie, Grolemund, Allaire. 2018. R Markdown: The Definite Guide. 1st ed. CRC Press.

Zaharia, Matei, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. “Spark: Cluster Computing with Working Sets.” HotCloud 10 (10-10): 95.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20.