Hadoop topics
1. Big Data
| |
1.4 Big data
1.5 Who is generating... 1.6 Data measurement table 1.7 Types of Big data 1.8 Analytics 1.9 V | |
2.Terminology
| |
2.1 Node
2.2 Rack 2.3 Network switch 2.4 Cluster and types 2.5 Master-Slave Architecture 2.6 Data Pipeline 2.7 Distributed data storage 2.8 Serial processing 2.9 Parallel processing. 2.10 Scale out/in and up/down | |
3. Hadoop Basics
4. Introduction | |
4.1 What is Hadoop?
4.2 History
4.3 Why Hadoop?
4.4 Hadoop versions
4.5 Advantages & Limitations | |
5. HDFS
| |
6. sixth chapter
| |
6.1 One
6.2 Two | |
7. Sixth chapter
| |
7.1 Snapshot of commands
7.2 Execution of commands
| |
8. HDFS Configurations
| |
8.1 What is configuration
8.2 Why should we configure
| |
8. Challenges in HDFS
| |
8.1 Name Node failures
8.2 Secondary Name Node failures
8.3 Data Node failures
8.4 Where HDFS fits
8.5 Where HDFS may not fits
| |
9. Hadoop clustering
| |
9.1 Adding new nodes
9.2 Removing the existing nodes
9.3 Checking the Dead nodes
9.4 Restarting the Dead nodes
| |
10. MAPREDUCE
| |
10.1 Architecture of Map Reduce
10.2 JobTracker
10.3 Why JobTracker
10.4 Role of JobTracker
10.5 TaskTracker
10.6 Why TaskTracker
10.7 Role of TaskTracker
10.8 Job execution flow
| |
11. Hadoop Data types
| |
11.1 What Data type
11.2 Where Data type is useful
| |
12. Input formats
| |
12.1 What is input format
12.2 Importance of input format
12.3 Text
12.4 Key value
12.5 Sequence file
12.6 Nline
| |
13. Output formats
| |
13.1 What is output format
13.2 Importance of output format
13.3 Text
13.4 Sequence file
| |
14. Mapper
| |
14.1 What is Mapper
14.2 Main task of Mapper
14.3 Importance of Mapper
14.4 Advantages and limitations
14.5 Programs in Mapper
| |
15.1 What is Reducer
15.2 Main task of Reducer
15.3 Importance of Reducer
15.4 Advantages and limitations
15.5 Programs in Reducer
| |
16. Combiner
| |
16.1 What is Combiner
16.2 Main task of Combiner
16.3 Importance of Combiner
16.4 Advantages and limitations
16.5 Programs in Combiner
| |
17. Partitioner
| |
17.1 What is Partitioner
17.2 Main task of Partitioner
17.3 Importance of Partitioner
17.4 Advantages and limitations
17.5 Programs in Partitioner
| |
18. Counter
| |
18.1 What is Counter
18.2 Main task of Counter
18.3 Importance of Counter
18.4 Advantages and limitations
18.5 Programs in Counter
| |
19. Distributed Cache
| |
19.1 What is Distributed Cache
19.2 Main task of Distributed Cache
19.3 Importance of Distributed Cache
19.4 Advantages and limitations
19.5 Programs in Distributed Cache
| |
20. Joins
| |
20.1 Joins in Map side
20.2 Importance of Joins in Map side
20.3 Where Map side joins fits
20.4 Joins in Reduce side
20.5 Importance of Joins in Reduce side
20.6 Where Reduce side joins fits
| |
21. Compression
| |
21.1 What is Compression
21.2 Why Compression is required
21.3 How to enable compression
21.4 How to disable compression
| |
22. Map Reduce Schedulers
| |
22.1 What is scheduler
22.2 Importance
22.3 FIFO
22.4 Capacity
22.5 Fair
| |
23.1 Map Reduce jobs in java
23.2 Map Reduce jobs in local mode
23.3 Map Reduce jobs in pseudo mode
23.4 Map Reduce jobs in cluster mode
| |
24. YARN
| |
24.1 What is YARN
24.2 YARN importance
24.3 Where it is helpful
24.4 YARN & Map Reduce difference
| |
25. Apache PIG
| |
25.1 What it is
25.2 Pig vs Map Reduce
25.3 Pig vs SQL
25.4 Data types in Pig
25.5 Pig local mode execution
25.6 Pig Map Reduce mode execution
| |
26. PIG UDFs
| |
26.1 What it is UDF
26.2 Write UDF
26.3 Use UDF
26.4 Importance
| |
27.1 What it is Filter
27.2 Write Filter
27.3 Use Filter
27.4 Importance
| |
28.1 What it is Load function
28.2 Write Load function
28.3 Use Load function
28.4 Importance
| |
29. Store functions
| |
29.1 Use Store function
29.2 Importance
| |
30. Apache HIVE
| |
30.1 What it is
30.2 Architecture
30.3 Driver
30.4 Compiler
30.5 Integration with Hadoop
30.6 Hive Query Language
30.7 Hive QL vs SQL
30.8 Hive DLL and DML
| |
31.1 CLI
31.2 Hive server
31.3 Hwi
31.4 Metastore
31.5 Metastore configuration
| |
32. Metastore in HIVE
| |
32.1 Metastore
32.2 Metastore configuration
| |
33. Hive UDFs
| |
33.1 What it is UDF
33.2 Write UDF
33.3 Use UDF
33.4 Importance
| |
34. Hive UDAFs
| |
34.1 Use UDAF
34.2 Importance
| |
35. Hive UDTFs
| |
35.1 Use UDAF
35.2 Importance
| |
36.1 What is Hive partitions
36.2 Importance
36.3 How to write
36.4 Limitations
| |
37. Hive Buckets
| |
37.1 What is Hive bucket
37.2 Importance
37.3 How to write
| |
38. Hive SerDe
| |
38.1 What is SerDe
38.2 Importance
38.3 How to write
| |
39. Integration
| |
39.1 Hive and Hbase integration
| |
40.1 What it is
40.2 Commands
| |
41. Apache Hbase
| |
41.1 What it is
41.2 user cases
41.3 Basics
| |
42. Installation of Hbase
| |
42.1 Local mode
42.2 Psuedo mode
42.3 Cluster mode
| |
43.1 Architecture overview
43.2 Storage
| |
44. Usage of Hbase
| |
44.1 Key design
44.2 Bloom filter
44.3 Versioning
44.4 Co-processors
44.5 Filters
| |
45. Hbase clients
| |
45.1 REST
45.2 Thrift
45.3 Hive
45.4 Web based UI
| |
46. Hbase Admin
| |
46.1 Schema definition
46.2 CURD operations
| |
47. Apache SQOOP
| |
47.1 What it is
47.2 Connecting RDB using Sqoop
47.3 Commands
| |
48. Apache FLUME
| |
48.1 What it is
48.2 Examples
| |
49. Apache OOZIE
| |
49.1 What it is
49.2 Executing workflow jobs
49.3 Monitoring workflow jobs
|