Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R – the favorite languages of Data Scientists – along with SQL-based front ends. With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing.Geared for experienced developers, Introduction to Apache Spark for Big Data & Machine Learning provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions. Students will leave this course armed with the skills they require to begin working with Spark in a practical, real world environment.
Date | Time | Price | Option |
---|---|---|---|
Please contact us at info@toptalentlearning.com or 469-721-6100 for this course schedule. |
Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most.
Spark Introduction
The first look at Spark
Spark Data structures
Caching
DataFrames and Datasets
Spark SQL
Spark and Hadoop
Spark API
Spark ML Overview
GraphX
Time Permitting Topics
Spark Streaming
Workshop
This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.
This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools. Working in a hands-on learning environment, students will explore:
Need different skills or topics? If your team requires different topics or tools, additional skills or custom approach, this course may be further adjusted to accommodate. We offer additional Spark, big data, data science, Hadoop, AI / machine learning / deep learning, programming, analytics, and other related topics that may be blended with this course for a track that best suits your needs. Our team will collaborate with you to understand your needs and will target the course to focus on your specific learning objectives and goals.
This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment.
Students should have attended the course(s) below, or should have basic skills in these areas: