Skip to content

Commit 03ad064

Browse files
Task/scala code/ch10 (#22)
* inital commit * Average Monoid Use Aggregate By Key * ch10-TopNUseTakeOrdered * ch10-TopNUseMapPartitions * log4j * TopNUseMapPartitions * Task/scala code/ch10 biman (#21) * Average Monoid Use Aggregate By Key * Average Monoid Use Combine By Key * Average Monoid Use gropup By Key and reduce by key * created the file for DNABaseCountBasicUsingCombineByKey * DNABaseCountBasic Using Combine By Key * DNABaseCountBasic Using Group By Key * DNABaseCountBasic Using Reduce By Key * DNABaseCountBasic * DNABaseCountBasicInMapperCombinerUsingReduceByKey * InMapper Combiner using local Aggregation * InMapper Combiner using Map Reduce * InMapper Combiner using MapPartitions * Min Max force Empty Partitions * Min Max force Map Partitions * Min Max force Map Partitions * StructuredToHierarchicalToXmlRDD * StructuredToHierarchicalToXmlRDD * StructuredToHierarchicalToXmlDF * added the scripts * added the readme Co-authored-by: Deepak <[email protected]>
1 parent 804ea2b commit 03ad064

File tree

61 files changed

+2751
-26
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2751
-26
lines changed

.gitignore

+1-2
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,4 @@
33
build
44
.gradle
55
.idea
6-
!gradle-wrapper.jar
7-
scala
6+
!gradle-wrapper.jar

code/chap01/scala/.gitignore

-4
This file was deleted.

code/chap02/scala/.gitignore

-4
This file was deleted.

code/chap03/scala/.gitignore

-3
This file was deleted.

code/chap04/scala/.gitignore

-3
This file was deleted.

code/chap06/scala/.gitignore

-3
This file was deleted.

code/chap07/scala/.gitignore

-3
This file was deleted.

code/chap08/scala/.gitignore

-3
This file was deleted.

code/chap10/scala/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
data/*.gz

code/chap10/scala/README.md

+63-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,63 @@
1-
Scala Solutions
1+
# Chapter 10
2+
3+
The Program covers the following algorithms
4+
* ###Average Monoid Use Aggregate By Key
5+
* `org.data.algorithms.spark.ch10.AverageMonoidUseAggregateByKey` (Spark Program)
6+
* `./run_spark_applications_scripts/average_monoid_use_aggregate_by_key.sh` (Shell Script to call the spark application)
7+
* ###Average Monoid Use Combine By Key
8+
* `org.data.algorithms.spark.ch10.AverageMonoidUseCombineByKey` (Spark Program)
9+
* `./run_spark_applications_scripts/average_monoid_use_combine_by_key.sh` (Shell Script to call the spark application)
10+
* ###Average Monoid Use Group By Key
11+
* `org.data.algorithms.spark.ch10.AverageMonoidUseGroupByKey` (Spark Program)
12+
* `./run_spark_applications_scripts/average_monoid_use_group_by_key.sh` (Shell Script to call the spark application)
13+
* ###Average Monoid Use Reduce By Key
14+
* `org.data.algorithms.spark.ch10.AverageMonoidUseReduceByKey` (Spark Program)
15+
* `./run_spark_applications_scripts/average_monoid_use_reduce_by_key.sh` (Shell Script to call the spark application)
16+
* ###DNA Base Count Basic inmapper Combiner Using Combine By Key
17+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicInMapperCombinerUsingCombineByKey` (Spark Program)
18+
* `./run_spark_applications_scripts/dna_base_count_basic_in_mapper_combiner_using_combine_by_key.sh` (Shell Script to call the spark application)
19+
* ###DNA Base Count Basic inmapper Combiner Using Group By Key
20+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicInMapperCombinerUsingGroupByKey` (Spark Program)
21+
* `./run_spark_applications_scripts/dna_base_count_basic_in_mapper_combiner_using_group_by_key.sh` (Shell Script to call the spark application)
22+
* ###DNA Base Count Basic inmapper Combiner Using Reduce By Key
23+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicInMapperCombinerUsingReduceByKey` (Spark Program)
24+
* `./run_spark_applications_scripts/dna_base_count_basic_in_mapper_combiner_using_reduce_by_key.sh` (Shell Script to call the spark application)
25+
* ###DNA Base Count Basic Using Combine By Key
26+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicUsingCombineByKey` (Spark Program)
27+
* `./run_spark_applications_scripts/dna_base_count_basic_using_combine_by_key.sh` (Shell Script to call the spark application)
28+
* ###DNA Base Count Basic Using Group By Key
29+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicUsingGroupByKey` (Spark Program)
30+
* `./run_spark_applications_scripts/dna_base_count_basic_using_group_by_key.sh` (Shell Script to call the spark application)
31+
* ###DNA Base Count Basic Using Mappartitions
32+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicUsingMappartitions` (Spark Program)
33+
* `./run_spark_applications_scripts/dna_base_count_basic_using_mappartitions.sh` (Shell Script to call the spark application)
34+
* ###DNA Base Count Basic Using Reduce By Key
35+
* `org.data.algorithms.spark.ch10.DNABaseCountBasicUsingReduceByKey` (Spark Program)
36+
* `./run_spark_applications_scripts/dna_base_count_basic_using_reduce_by_key.sh` (Shell Script to call the spark application)
37+
* ###inmapper Combiner Use Mappartitions
38+
* `org.data.algorithms.spark.ch10.InMapperCombinerUseMappartitions` (Spark Program)
39+
* `./run_spark_applications_scripts/in_mapper_combiner_use_mappartitions.sh` (Shell Script to call the spark application)
40+
* ###inmapper Combiner Using Local Aggregation
41+
* `org.data.algorithms.spark.ch10.InMapperCombinerUsingLocalAggregation` (Spark Program)
42+
* `./run_spark_applications_scripts/in_mapper_combiner_using_local_aggregation.sh` (Shell Script to call the spark application)
43+
* ###inmapper Combiner Using Map Reduce
44+
* `org.data.algorithms.spark.ch10.InMapperCombinerUsingMapReduce` (Spark Program)
45+
* `./run_spark_applications_scripts/in_mapper_combiner_using_map_reduce.sh` (Shell Script to call the spark application)
46+
* ###Min Max Force Empty Partitions
47+
* `org.data.algorithms.spark.ch10.MinMaxForceEmptyPartitions` (Spark Program)
48+
* `./run_spark_applications_scripts/min_max_force_empty_partitions.sh` (Shell Script to call the spark application)
49+
* ###Min Max Use Mappartitions
50+
* `org.data.algorithms.spark.ch10.MinMaxUseMappartitions` (Spark Program)
51+
* `./run_spark_applications_scripts/min_max_use_mappartitions.sh` (Shell Script to call the spark application)
52+
* ###Structured To Hierarchical To Xml Dataframe
53+
* `org.data.algorithms.spark.ch10.StructuredToHierarchicalToXmlDataframe` (Spark Program)
54+
* `./run_spark_applications_scripts/structured_to_hierarchical_to_xml_dataframe.sh` (Shell Script to call the spark application)
55+
* ###Structured To Hierarchical To Xml RDD
56+
* `org.data.algorithms.spark.ch10.StructuredToHierarchicalToXmlRDD` (Spark Program)
57+
* `./run_spark_applications_scripts/structured_to_hierarchical_to_xml_rdd.sh` (Shell Script to call the spark application)
58+
* ###Top N Use Map Partitions
59+
* `org.data.algorithms.spark.ch10.TopNUseMapPartitions` (Spark Program)
60+
* `./run_spark_applications_scripts/top_n_use_map_partitions.sh` (Shell Script to call the spark application)
61+
* ###Top N Use Take Ordered
62+
* `org.data.algorithms.spark.ch10.TopNUseTakeOrdered` (Spark Program)
63+
* `./run_spark_applications_scripts/top_n_use_take_ordered.sh` (Shell Script to call the spark application)

code/chap10/scala/build.gradle

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
apply plugin: 'scala'
2+
apply plugin: 'application'
3+
4+
ext.scalaClassifier = '2.13'
5+
ext.scalaVersion = '2.13.7'
6+
ext.sparkVersion = '3.2.0'
7+
8+
group 'org.data.algorithms.spark.ch10'
9+
version '1.0-SNAPSHOT'
10+
11+
repositories {
12+
mavenLocal()
13+
mavenCentral()
14+
}
15+
16+
dependencies {
17+
implementation "org.scala-lang:scala-library:$scalaVersion"
18+
implementation "org.apache.spark:spark-core_$scalaClassifier:$sparkVersion"
19+
implementation "org.apache.spark:spark-sql_$scalaClassifier:$sparkVersion"
20+
}
21+
22+
application {
23+
mainClass = project.hasProperty("mainClass") ? project.getProperty("mainClass") : "NULL"
24+
}
+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
ATCGGGATCCGGG
2+
ATTCCGGGATTCCCC
3+
ATGGCCCCCGGGATCGGG
4+
CGGTATCCGGGGAAAAA
5+
aaattCCGGAACCGGGGGTTT
6+
CCTTTTATCGGGCAAATTTTCCCGG
7+
attttcccccggaaaAAATTTCCGGG
8+
ACTGACTAGCTAGCTAACTG
9+
GCATCGTAGCTAGCTACGAT
10+
AATTCCCGCATCGATCGTACGTACGTAG
11+
ATCGATCGATCGTACGATCG
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
a,2
2+
a,3
3+
a,4
4+
a,5
5+
a,7
6+
b,4
7+
b,5
8+
b,6
9+
c,3
10+
c,4
11+
c,5
12+
c,6
+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
23,24,22,44,66,77,44,44,555,666
2+
12,4,555,66,67,68,57,55,56,45,45,45,66,77
3+
34,35,36,97300,78,79
4+
120,44,444,445,345,345,555
5+
11,33,34,35,36,37,47,7777,8888,6666,44,55
6+
10,11,44,66,77,78,79,80,90,98,99,100,102,103,104,105
7+
6,7,8,9,10
8+
8,9,10,12,12
9+
7777
10+
222,333,444,555,666,111,112,5,113,114
11+
5555,4444,24
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
distributionBase=GRADLE_USER_HOME
2+
distributionPath=wrapper/dists
3+
distributionUrl=https\://services.gradle.org/distributions/gradle-6.8-bin.zip
4+
zipStoreBase=GRADLE_USER_HOME
5+
zipStorePath=wrapper/dists

code/chap10/scala/gradlew

+185
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
#!/usr/bin/env sh
2+
3+
#
4+
# Copyright 2015 the original author or authors.
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# https://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
19+
##############################################################################
20+
##
21+
## Gradle start up script for UN*X
22+
##
23+
##############################################################################
24+
25+
# Attempt to set APP_HOME
26+
# Resolve links: $0 may be a link
27+
PRG="$0"
28+
# Need this for relative symlinks.
29+
while [ -h "$PRG" ] ; do
30+
ls=`ls -ld "$PRG"`
31+
link=`expr "$ls" : '.*-> \(.*\)$'`
32+
if expr "$link" : '/.*' > /dev/null; then
33+
PRG="$link"
34+
else
35+
PRG=`dirname "$PRG"`"/$link"
36+
fi
37+
done
38+
SAVED="`pwd`"
39+
cd "`dirname \"$PRG\"`/" >/dev/null
40+
APP_HOME="`pwd -P`"
41+
cd "$SAVED" >/dev/null
42+
43+
APP_NAME="Gradle"
44+
APP_BASE_NAME=`basename "$0"`
45+
46+
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
47+
DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'
48+
49+
# Use the maximum available, or set MAX_FD != -1 to use that value.
50+
MAX_FD="maximum"
51+
52+
warn () {
53+
echo "$*"
54+
}
55+
56+
die () {
57+
echo
58+
echo "$*"
59+
echo
60+
exit 1
61+
}
62+
63+
# OS specific support (must be 'true' or 'false').
64+
cygwin=false
65+
msys=false
66+
darwin=false
67+
nonstop=false
68+
case "`uname`" in
69+
CYGWIN* )
70+
cygwin=true
71+
;;
72+
Darwin* )
73+
darwin=true
74+
;;
75+
MSYS* | MINGW* )
76+
msys=true
77+
;;
78+
NONSTOP* )
79+
nonstop=true
80+
;;
81+
esac
82+
83+
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
84+
85+
86+
# Determine the Java command to use to start the JVM.
87+
if [ -n "$JAVA_HOME" ] ; then
88+
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
89+
# IBM's JDK on AIX uses strange locations for the executables
90+
JAVACMD="$JAVA_HOME/jre/sh/java"
91+
else
92+
JAVACMD="$JAVA_HOME/bin/java"
93+
fi
94+
if [ ! -x "$JAVACMD" ] ; then
95+
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
96+
97+
Please set the JAVA_HOME variable in your environment to match the
98+
location of your Java installation."
99+
fi
100+
else
101+
JAVACMD="java"
102+
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
103+
104+
Please set the JAVA_HOME variable in your environment to match the
105+
location of your Java installation."
106+
fi
107+
108+
# Increase the maximum file descriptors if we can.
109+
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
110+
MAX_FD_LIMIT=`ulimit -H -n`
111+
if [ $? -eq 0 ] ; then
112+
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
113+
MAX_FD="$MAX_FD_LIMIT"
114+
fi
115+
ulimit -n $MAX_FD
116+
if [ $? -ne 0 ] ; then
117+
warn "Could not set maximum file descriptor limit: $MAX_FD"
118+
fi
119+
else
120+
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
121+
fi
122+
fi
123+
124+
# For Darwin, add options to specify how the application appears in the dock
125+
if $darwin; then
126+
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
127+
fi
128+
129+
# For Cygwin or MSYS, switch paths to Windows format before running java
130+
if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then
131+
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
132+
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
133+
134+
JAVACMD=`cygpath --unix "$JAVACMD"`
135+
136+
# We build the pattern for arguments to be converted via cygpath
137+
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
138+
SEP=""
139+
for dir in $ROOTDIRSRAW ; do
140+
ROOTDIRS="$ROOTDIRS$SEP$dir"
141+
SEP="|"
142+
done
143+
OURCYGPATTERN="(^($ROOTDIRS))"
144+
# Add a user-defined pattern to the cygpath arguments
145+
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
146+
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
147+
fi
148+
# Now convert the arguments - kludge to limit ourselves to /bin/sh
149+
i=0
150+
for arg in "$@" ; do
151+
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
152+
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
153+
154+
if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
155+
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
156+
else
157+
eval `echo args$i`="\"$arg\""
158+
fi
159+
i=`expr $i + 1`
160+
done
161+
case $i in
162+
0) set -- ;;
163+
1) set -- "$args0" ;;
164+
2) set -- "$args0" "$args1" ;;
165+
3) set -- "$args0" "$args1" "$args2" ;;
166+
4) set -- "$args0" "$args1" "$args2" "$args3" ;;
167+
5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
168+
6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
169+
7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
170+
8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
171+
9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
172+
esac
173+
fi
174+
175+
# Escape application args
176+
save () {
177+
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
178+
echo " "
179+
}
180+
APP_ARGS=`save "$@"`
181+
182+
# Collect all arguments for the java command, following the shell quoting and substitution rules
183+
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"
184+
185+
exec "$JAVACMD" "$@"

0 commit comments

Comments
 (0)