Friday 3 May 2019

PySpark Index


         PySpark 
        • Core module
        • SQL module
         Spark Core module index
         Part – 1: Fundamentals
        • A couple of minutes discussion
        • Big Data
          •  Too Much Data discussion
        • Challenges with BigData
        • Initially who solved Big Data challenges
        • Hadoop creator profile
        • What can do by Hadoop?
          • Advantages
          • Limitations of Hadoop
          • Spark overcome those limitations
        • Spark creator profile
         Part – 2: Introduction
        • What is Apache Spark?
        • Purpose of Spark
        • Spark is written in
        • Can Spark integrate with Hadoop?
        • What kind of files spark support?
        • Is Spark depending on Hadoop?
        • Can I install spark on windows?
        • History of Spark
        • Spark shines on where?
        • Spark is Fast
        • Spark features
        • Data Processing terminology
        • Spark - Before and After
        • Spark is called as unified stack
         Part – 3: Spark modules and terminology
        • Apache Spark Components or modules
          • Core
          • SQL
          • Streaming
          • MLib
          • GraphX
          • SparkR
        • Cluster Managers
        • Storage Layers for Spark
        • Spark Execution Model
        • Spark Terminology table
        • Spark follows…
        • Driver program
        • Executors
        • SparkContext
        • How many SparkContext objects can create for one application?
        • Stopping SparkContext object
        • SparkContext responsibilities
        • Spark 1.x version
        • Solution in Spark 2.x
        • Understanding Spark Cluster Architecture
        • Anatomy of Spark Application
        • Components
        • Py4J
        • Spark clusters
          • Spark clusters: Standalone cluster
          • Spark on YARN
            • YARN - client mode
            • YARN - cluster mode
         Part – 4: RDD
        • Importance of RDD
        • Partitions in RDD     
        • Creating RDD
        • Caching
        • Persistent
        • Fault-Recovery Mechanism
        • If RAM is inefficient to store RDD then where it stores?
        • RDD features
        • Spark RDD Operations
        • Transformations
        • Types of Transformations
          • Narrow Transformations
          • Wide Transformations
        • Actions
        • Limitation of RDD
        • RDD Operations
          • Transformations & Actions
        • Programs
        • Coalesce and Re-partition
        • Internals of Job execution in Spark
        • PySpark URL to find more examples from official website
         Spark SQL module index
        • Spark SQL introduction
        • How can we write Spark SQL programs?
        • Spark SQL features
          • Integrated
          • Unified data access
          • Performance optimization
        • DataFrame
        • Introduction about DataFrame
        • Creating DataFrame by loading csv file
        • DataFrame can support what kind of file formats?
        • DataFrame characteristics
        • Programming languages to created DataFrame
        • Why DataFrame
        • Custom management
        • Optimized execution plan
        • Spark SQL execution plan
        • Spark SQL terminology
        • Spark SQL programs
        • PySpark URL to find more examples from official website