Hadoop

Hadoop topics

1. Big Data
	1.1 Data and Information 1.2 Steps for data processing 1.3 Data storage 1.4 Big data 1.5 Who is generating... 1.6 Data measurement table 1.7 Types of Big data 1.8 Analytics 1.9 V
2.Terminology
	2.1 Node 2.2 Rack 2.3 Network switch 2.4 Cluster and types 2.5 Master-Slave Architecture 2.6 Data Pipeline 2.7 Distributed data storage 2.8 Serial processing 2.9 Parallel processing. 2.10 Scale out/in and up/down
3. Hadoop Basics 4. Introduction	3.1 Hadoop server roles 3.2 Hadoop cluster
	4.1 What is Hadoop? 4.2 History 4.3 Why Hadoop? 4.4 Hadoop versions 4.5 Advantages & Limitations
5. HDFS
	5.1 The origin of HDFS 5.2 Using HDFS, what to do? 5.3 Originally HDFS applied on 5.4 Assumptions and goals 5.5 HDFS follows... 5.6 Block 5.7 Daemon 5.8 Namenode 5.9 Datanode 5.10 Secondary Namenode 5.11 File read 5.12 File write
6. sixth chapter
	6.1 One 6.2 Two
7. Sixth chapter
	7.1 Snapshot of commands 7.2 Execution of commands
8. HDFS Configurations
	8.1 What is configuration 8.2 Why should we configure
8. Challenges in HDFS
	8.1 Name Node failures 8.2 Secondary Name Node failures 8.3 Data Node failures 8.4 Where HDFS fits 8.5 Where HDFS may not fits
9. Hadoop clustering
	9.1 Adding new nodes 9.2 Removing the existing nodes 9.3 Checking the Dead nodes 9.4 Restarting the Dead nodes
10. MAPREDUCE
	10.1 Architecture of Map Reduce 10.2 JobTracker 10.3 Why JobTracker 10.4 Role of JobTracker 10.5 TaskTracker 10.6 Why TaskTracker 10.7 Role of TaskTracker 10.8 Job execution flow
11. Hadoop Data types
	11.1 What Data type 11.2 Where Data type is useful
12. Input formats
	12.1 What is input format 12.2 Importance of input format 12.3 Text 12.4 Key value 12.5 Sequence file 12.6 Nline
13. Output formats
	13.1 What is output format 13.2 Importance of output format 13.3 Text 13.4 Sequence file
14. Mapper
	14.1 What is Mapper 14.2 Main task of Mapper 14.3 Importance of Mapper 14.4 Advantages and limitations 14.5 Programs in Mapper
15. Reducer
	15.1 What is Reducer 15.2 Main task of Reducer 15.3 Importance of Reducer 15.4 Advantages and limitations 15.5 Programs in Reducer
16. Combiner
	16.1 What is Combiner 16.2 Main task of Combiner 16.3 Importance of Combiner 16.4 Advantages and limitations 16.5 Programs in Combiner
17. Partitioner
	17.1 What is Partitioner 17.2 Main task of Partitioner 17.3 Importance of Partitioner 17.4 Advantages and limitations 17.5 Programs in Partitioner
18. Counter
	18.1 What is Counter 18.2 Main task of Counter 18.3 Importance of Counter 18.4 Advantages and limitations 18.5 Programs in Counter
19. Distributed Cache
	19.1 What is Distributed Cache 19.2 Main task of Distributed Cache 19.3 Importance of Distributed Cache 19.4 Advantages and limitations 19.5 Programs in Distributed Cache
20. Joins
	20.1 Joins in Map side 20.2 Importance of Joins in Map side 20.3 Where Map side joins fits 20.4 Joins in Reduce side 20.5 Importance of Joins in Reduce side 20.6 Where Reduce side joins fits
21. Compression
	21.1 What is Compression 21.2 Why Compression is required 21.3 How to enable compression 21.4 How to disable compression
22. Map Reduce Schedulers
	22.1 What is scheduler 22.2 Importance 22.3 FIFO 22.4 Capacity 22.5 Fair
23. Map Reduce programming model
	23.1 Map Reduce jobs in java 23.2 Map Reduce jobs in local mode 23.3 Map Reduce jobs in pseudo mode 23.4 Map Reduce jobs in cluster mode
24. YARN
	24.1 What is YARN 24.2 YARN importance 24.3 Where it is helpful 24.4 YARN & Map Reduce difference
25. Apache PIG
	25.1 What it is 25.2 Pig vs Map Reduce 25.3 Pig vs SQL 25.4 Data types in Pig 25.5 Pig local mode execution 25.6 Pig Map Reduce mode execution
26. PIG UDFs
	26.1 What it is UDF 26.2 Write UDF 26.3 Use UDF 26.4 Importance
27. PIG Filters
	27.1 What it is Filter 27.2 Write Filter 27.3 Use Filter 27.4 Importance
28. Load functions
	28.1 What it is Load function 28.2 Write Load function 28.3 Use Load function 28.4 Importance
29. Store functions
	29.1 Use Store function 29.2 Importance
30. Apache HIVE
	30.1 What it is 30.2 Architecture 30.3 Driver 30.4 Compiler 30.5 Integration with Hadoop 30.6 Hive Query Language 30.7 Hive QL vs SQL 30.8 Hive DLL and DML
31. Services in HIVE
	31.1 CLI 31.2 Hive server 31.3 Hwi 31.4 Metastore 31.5 Metastore configuration
32. Metastore in HIVE
	32.1 Metastore 32.2 Metastore configuration
33. Hive UDFs
	33.1 What it is UDF 33.2 Write UDF 33.3 Use UDF 33.4 Importance
34. Hive UDAFs
	34.1 Use UDAF 34.2 Importance
35. Hive UDTFs
	35.1 Use UDAF 35.2 Importance
36. Hive Partitions
	36.1 What is Hive partitions 36.2 Importance 36.3 How to write 36.4 Limitations
37. Hive Buckets
	37.1 What is Hive bucket 37.2 Importance 37.3 How to write
38. Hive SerDe
	38.1 What is SerDe 38.2 Importance 38.3 How to write
39. Integration
	39.1 Hive and Hbase integration
40. Apache Zookeeper
	40.1 What it is 40.2 Commands
41. Apache Hbase
	41.1 What it is 41.2 user cases 41.3 Basics
42. Installation of Hbase
	42.1 Local mode 42.2 Psuedo mode 42.3 Cluster mode
43. Architecture of Hbase
	43.1 Architecture overview 43.2 Storage
44. Usage of Hbase
	44.1 Key design 44.2 Bloom filter 44.3 Versioning 44.4 Co-processors 44.5 Filters
45. Hbase clients
	45.1 REST 45.2 Thrift 45.3 Hive 45.4 Web based UI
46. Hbase Admin
	46.1 Schema definition 46.2 CURD operations
47. Apache SQOOP
	47.1 What it is 47.2 Connecting RDB using Sqoop 47.3 Commands
48. Apache FLUME
	48.1 What it is 48.2 Examples
49. Apache OOZIE
	49.1 What it is 49.2 Executing workflow jobs 49.3 Monitoring workflow jobs

Sunday, 2 April 2017

Hadoop

Hadoop topics

Recent update