Monday, July 21, 2014

Pig script for finding max temprature


  Finding the max temperature using Pig script


The input data set can be obtain here :
   https://drive.google.com/file/d/0BwiqVGNpnBVIbDZ6Q1V1RThxYXc/edit?usp=sharing

My Hadoop path is  :  /usr/local/hadoop/
Hadoop user is        :  /home/hduser



Steps :
  • Start  Hadoop   hduser@kaustuv-studio14:/home/kaustuv$ /usr/local/hadoop/bin/start-all.sh 
  •  Copy & ensure input exist in HDFS   (use -copyFromLocal command )
                hduser@kaustuv-studio14:/usr/local/hadoop$ bin/hadoop dfs -ls /home/hduser/
                           ( This will list weather.txt  file )
  •  Start Pig grunt shell in MapReduce mode   hduser@kaustuv-studio14:/home/kaustuv$ pig
  •   Write the following Max temp pig script  
A = load '/home/hduser/weather.txt' AS (f1: chararray);
B = foreach A generate SUBSTRING(f1, 4, 8) AS (year: chararray), SUBSTRING(f1, 38,43) AS (temp: chararray) ;
C = group B by $0;
Max_temp = foreach C generate group,
MAX(B.temp);
store Max_temp INTO 'MAX_Temp_Output' ;


Internally pig script is converted  into MapReduce program we can check the progress of this MR program via  web interfaces of namenode & job tracker  also.

 Output will be stored under  MAX_Temp_Output folder inside users home directory here  '/user/hduser'.

  • Output can be verified  using 'cat' command 
hduser@kaustuv-studio14:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hduser/MAX_Temp_New_Output/part-r-00000

This will list 

Warning: $HADOOP_HOME is deprecated.

1941    106.2
1942    183.9
1943    176.7
1944    156.2
1945    130.6
1946    152.3
1947    191.1
1948    175.9
1949    181.1
1950    208.8
1951    168.8
1952    122.6
1953    126.5
1954    232.3
1955    130.2
1956    114.6
1957    187.7
1958    184.5
1959    229.9
1960    204.7
1961    173.8
1962    130.8
1963    187.9
1964    144.3
1965    186.1
1966    155.9
1967    173.8
1968     93.8
1969    146.4
1970    181.4
1971    136.4
1972    128.5
1973    119.9
1974    203.2
1975    132.3
1976    157.8
1977    150.6
1978    140.0
1979    158.9
1980    119.3
1981    217.2
1982    141.2
1983    122.1
1984    154.2
1985    146.0
1986    187.9
1987    219.2
1988    164.0
1989    120.2
1990    118.0
1991    142.4
1992    149.3
1993    190.6
1994    157.4
1995    145.6
1996    107.2
1997    219.0
1998    125.2
1999    143.0
2000    195.0
2001    147.4
2002    180.2
2003    111.4
2004    168.8
2005    194.4
2006    153.8
2007    155.2
2008    140.0