forked from sergutsan/groovyck
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathw2-simpleAndComplexDataTypes.tex
More file actions
737 lines (622 loc) · 28.4 KB
/
Copy pathw2-simpleAndComplexDataTypes.tex
File metadata and controls
737 lines (622 loc) · 28.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
% TODO: Notes on style
%
% Identation
% Variable names
%
% git commit file
%
% Reciprocal of Integer.parseInt() -> String.valueOf(int)
%
% TODO: Comment that "class definitions are like blueprints" and they
% do not use memory, only using new or calling methods uses actual
% memory (simplifying things a bit).
%
% PART-TIMERS
% Day 2: create account on GitHub like bbk-pij-2012-XX
% for loops
%
%
% Java Decaf should have arrays
\section{Introduction}
\label{sec:introduction}
Programs have lots of data, they are basically
ways of getting data, transforming data, and giving data back to the
world.
%
Data types, as we have already seen, tell you which kinds of data you
have in your program.
Computers store data in their memory in the form of bits. The word
``bit'' is a contraction of ``BInary digiT'', so a bit is
either~1~or~0, true or false, high or low, on or off. Bits are
organised in groups of 8 called \emph{bytes}. These bytes ---groups of
eight bits--- are used to store any data in the memory of the
computer. When you have got~1024~bytes, you have got a kilobyte or kB;
when you have~1024~kB you have got a megabyte or MB, and so on for
gigabytes~(GB), terabytes~(TB), and petabytes~(PB). Note that in
computing everything is measured in powers of~2 ($2^3 = 8, 2^{10} =
1024$) and not in powers of~10 as in normal life (10, 100,
1000\ldots). That is because
computers count with bits~(2) and human beings\footnote{Not all humans
used only the ten fingers in their hands to count. Some languages like
French or Irish have remains of base-20 counting.} count with their
fingers~(10).
\section{Simple data types}
\label{sec:simple-data-types}
Simple data types can be thought as boxes in the computer's
memory. Every time you declare a variable in your program, you can
think\footnote{This is only a metaphor and is not supposed to be an
accurate description of how memory is managed on a modern
computer. Explaining how things like the registers, the stack, and
the heap work, the differences between actual machines and virtual
machines, etc; are out of the scope of this document.} of the
computer as creating a little box in its memory to store your
variable. That box has two tags on it: one of them holds the name of
the variable and the other holds the type
(Figure~\ref{fig:var1}). In the same way, you can think of
assignment as putting a value inside that box (Figure~\ref{fig:var2}).
\begin{figure}[hbtp]
\centering
\includegraphics[width=\textwidth]{gfx/variables1}
\caption{Declaring a variable can be seen as creating a
box. The box has a tag for the name and another for the type.}
\label{fig:var1}
\end{figure}
\begin{figure}[hbtp]
\centering
\includegraphics[width=\textwidth]{gfx/variables2}
\caption{Assigning a value to a variable can be seen as putting a
value in the box. If there was something in the box, it is
overwritten and lost forever.}
\label{fig:var2}
\end{figure}
\subsection{Static typing and dynamic typing}
\label{sec:strong-typing-weak}
You have already seen that Groovy, as most languages (including Java
and Java Decaf), puts some
restrictions to what you can use as a ``name'' tag for your boxes. You
know that they have to start with a letter: ``count'' is a valid name
but ``/me'' or ``5thNumber'' are not. You also know that some words
cannot be used by the programmer for variables names because it would
be confusing for the computer, e.g. ``while'', ``if'', ``System'', or
``println''.
Many programming languages also place one restriction on the ``type''
tag: once you decide the type of a variable, you cannot change
it. This is like people having static opinions that they do not want
to change. This type of languages are called \emph{statically typed
languages} and, contrary to people with fixed ideas, they are
not necessarily bad or obnoxious. Java is an example of
statically typed language.
Some programming languages allow you to change the type of your
variables as you go along, so you can have a variable that sometimes
is an int and later in time is a boolean or a String. These languages
are called \emph{dynamically typed languages}. They have pros and cons
compared to their statically typed counterparts. Groovy is an example
of a dynamically typed language. The following code excerpt is valid
in Groovy but not in Java. In Java a variable cannot be a String,
then become a boolean, and then a String again.
Note that \emph{true} is a
boolean while \emph{``true''} (in quotes) is a String.
\begin{verbatim}
String str = "This is a string";
str = true; // This is different to: str = "true"
if (str) {
str = "This is a different string";
}
\end{verbatim}
There are many more things to know about typing in programming
languages (strong vs. weak, inferred vs. manifest, duck typing, and
much more) but for now it will not be necessary to go into those
details.
\subsection{Most common simple types}
\label{sec:most-common-simple}
\subsubsection{Integer numbers}
\label{sec:integers}
Integers (\verb+int+) are probably the most used simple data type, as
integer numbers are used for two of the most common operations in
computing: counting and indexing. Integers use~32~bits of memory, and
an integer variable can hold values between~-2,147,483,648 and
2,147,483,647~(inclusive). This data type is large enough for the
numbers needed in~90\%~of programs most people write. We have already
seen how to use it:
\begin{verbatim}
int count = 1
\end{verbatim}
There are two other kinds of integers for some special uses. When a
program needs very large (positive or negative) values, there is a
type \verb+long+, for \emph{long integer}. It is called ``long''
because it uses 64~bits instead of 32, which means a long integer
variable can hold values between \mbox{$-9.22 \cdot 10^{18}$} and
\mbox{$9.22 \cdot 10^{18}$}. There is also a ``short''
integer that uses only 16~bits of memory (\verb+short+); this was
sometimes useful to save memory when Java was first released in 1995
and computers were more limited but is hardly ever used with modern
computers.
\subsubsection{Floating-point (decimal/rational) numbers}
\label{sec:float-point-decim}
Not all numbers are integer. Examples of common non-integer numbers
include the result of a division of two integers where one is not a
multiple of the other, and real-measurements like your height, your
weight, and the distance between your workplace and your home (unless
you work at home). In maths, these are called real numbers. In
computing they are usually called floating-point numbers and are
represented as a list of significant numbers and an exponent. The term
``floating-point'' refers to the fact that the decimal point can "float",
that is, it can be placed anywhere in the number as long as the the
exponent is changed accordingly.
$$ 1.23 \cdot 10^{-3} = 123 \cdot 10^{-5} = 0.00123 \cdot 10^0 $$
We cannot use superindex notation in a plain-text file, so we need a
special way of writing these numbers.
Those three ways of depicting the number above can be written
as \verb+1.23E-3+, \verb+123E-5+, and \verb+0.00123+, and the three are
equivalent.
In Groovy and Java real numbers are usually represented with the
simple data type \verb+double+, that uses 64~bits. There is also a
32-bit version called \verb+float+ but, as with \verb+short+, it is
hardly ever used today.
\paragraph{Important note.}
Floating-point numbers do \emph{not} have infinite precision, and
operating with them can cause rounding errors. There is a special type
for representing real values where precision is paramount (like in
banking): \verb+BigDecimal+.
\paragraph{Equality. }
Due to rounding errors, it does not make sense to test for equality
among real numbers as we can do with integers. Two numbers could be
notionally the same but be different due to rounding errors, so they
are never compared with equality with ``==''. Instead, what is
usually done is test whether the difference is less that some
precision limit appropriate for the application (e.g.~1.0E-6), as in
the example below. Note: Math.abs() returns the absolute value of the
number inside the brackets, e.g. \verb+Math.abs(-3)+ returns \verb+3+.
% Example used to be
% double d1 = 2 / 3
% double d2 = 2 * 1000 / 9000 * 3
% but it did not work with Java Decaf. Oliver Smart was helpful in
% finding a better example.
\begin{verbatim}
double d1 = (0.1+0.1)/0.3;
double d2 = 2.0* 1000.0/9000.0*3.0;
// WRONG!
if (d1 == d2) {
// this is not printed due due to rounding errors
println "They are the same (wrong comparison)"
}
// RIGHT!
if (Math.abs(d1 - d2) < 10E-6) {
println "They are the same (right comparison)"
}
\end{verbatim}
\subsubsection{Boolean (binary) values}
\label{sec:bool-binary-valu}
This simple data type represents one bit of information. It can hold
the values \verb+true+ and \verb+false+.
\subsubsection{Characters}
\label{sec:characters}
Text is composed of characters: 'a', 'b', 'c'\ldots The \verb+char+
simple type is
used to represent characters. It uses 16~bits, meaning it can
represent any of~65,536~different characters.
Actually, the 16~bits of a char represent a Unicode symbol. Unicode is
a computing industry standard for the consistent encoding,
representation and handling of text expressed in most of the world's
writing systems. It includes symbols from most writing systems in
the world, including alphabets like Latin, Cyrillic, Arabic, or
Hebrew; syllabaries like Japanese katakana and hiragana, or
Cherokee; and many more.
You may have noticed that we have not mentioned String yet. This is
because String is a complex type.
\section{Complex types}
\label{sec:complex-types}
Complex types are types of data that do not fit in a box, not even in
one of the big 64-bit boxes used for \verb+double+. Because they do
not fit in the ``boxes'', computers have to store them somewhere
else. However, they also need to know where they are\ldots~and that is
what the boxes are used for.
Modern computers have \emph{a lot} of memory. Long forgotten are the
days when Bill Gates said: ``640kB of memory should be enough for
everything''. Part of a computer's memory is used for the boxes (in a part of
memory called ``the stack'') and most of the rest is used for
everything else, including complex data (that part is called, quite
unceremoniously, ``the heap'').
When your Groovy, Java Decaf, or Java code uses some complex data, the computer
stores that data in some region of the heap ---identified by a \emph{memory
address}; then it stores the address in a box in the stack, much in
the way it stores integers and booleans. This looks similar to
Figure~\ref{fig:compledata}. The memory address in the box can be seen
as ``pointing to'' the place in memory where the real data is
stored. For this reason, we will call it a \emph{pointer}.
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{gfx/variables-string}
\caption{A String is a type of complex data type. The data itself is
stored in the heap and its address is stored in box in the stack
---pointing to it. }
\label{fig:compledata}
\end{figure}
Using complex data types is different from using simple types in
several ways. For starters, as complex types are not stored in the
stack but in the heap, you have to \emph{allocate} memory-space in the
heap to store the data. This is done with the reserved word
\verb+new+; you also need brackets to pass arguments (as if using a
method: this is because complex types have something called a
\emph{constructor method} that we will study in more detail later). We are
going to see now several examples, starting with the pervasive
Strings.
\paragraph{One final note about names.}
Complex data types usually have names that start with capital
letters. This is not compulsory but it is what everybody expects:
simple types with non-capital letters (e.g.~\verb+int+, \verb+double+)
and complex letters with capital letters (e.g.~\verb+String+,
\verb+Array+, \verb+List+, \verb+Customer+\ldots). If you do not
capitalise your complex types other people will get very confused when
they read your code. This is not only impolite, it is also
unprofessional.
\subsection{Declaration and initialisation}
\label{sec:decl-init}
When working with simple types, the memory is always used as soon as
you declare the variable. In other words, your program will use the
same amount of memory if you type \verb+int i+ and if you type
\verb+int i = 1+: it always uses 32~bits of your computer's memory.
This is not true with complex types. When a complex type is declared
(e.g.~\verb+String str+) the computer only reserves the box for the pointer,
but nothing else. It is only when the variable is \emph{allocated}
(e.g.~\verb+str = new String()+
or \verb+str = new String("This is a String")+) that the total amount
of memory is used. Note that the difference is huge. Boxes are 32-bit
or 64-bits long, but there is no limit to the size of a complex type:
it could be several kB or even MB (types so big are a rarity,
though).
Sometimes the pointer will
not point to any address in memory. This can be useful in
some cases that will become clearer as we learn more about
programming, including error detection and release of memory that is
no longer needed (remember that complex types can use a lot of
memory). To make a pointer point to nowhere, we use the
reserved word~\verb+null+ (with non-capital letters).
\begin{verbatim}
String str
str = null
\end{verbatim}
A pointer pointing to null is called a \emph{null pointer}. This
basically means that the address in the box is zero (rather than an
obscure hexadecimal number like 0x1a3ec74); the computer
knows there is never anything at address zero, so it knows a null
pointer is not being used to point to any real
data~(Figure~\ref{fig:nullpiounter}).
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{gfx/variables-string-null}
\caption{A null pointer is basically a zero address, pointing nowhere.}
\label{fig:nullpiounter}
\end{figure}
As a null pointer does not point to any real data, if you try to
access a variable that is pointing to null, the computer will complain
with a \verb+NullPointerException+, as in the following example
(try it!):
\begin{verbatim}
String str = null
println str.length()
\end{verbatim}
\subsection{String}
\label{sec:string}
Strings are everywhere. Every program, except the most trivial, uses
strings: user names, passwords, addresses, configuration options, data
input from the keyboard, a webpage read through the Wi-Fi
connection\ldots~almost anything is a String. It is the most widely
used complex type.
At their most basic, strings are sequences of characters. You already
know how to read a piece of text from the user:
\begin{description}
\item{In Groovy:} String str = System.console().readLine()
\item{In Java Decaf: } String str = nextLine()
\end{description}
If you want to create a string in your program without reading it from
the user, the basic form of creating a string is like this:
\begin{verbatim}
String str = new String("This is a String")
\end{verbatim}
but you already know that this can be made in an easier way:
\begin{verbatim}
String str = "This is a String"
\end{verbatim}
The former two statements are equivalent in Groovy\footnote{They are
NOT equivalent in Java, but the difference is subtle and we will
learn more about it when the time comes.},
only the second is more convenient and most people use it
instead of the other.
It is important to note that a String with only one character is
still a String.
% Strings are enclosed in double quotes, while
% characters are enclosed in simple quotes:
% FIXME: this is
% \begin{verbatim}
% char oneChar = 'a'
% String oneCharString = "a"
% \end{verbatim}
% Simple proof of non-equivalence in Java:
%
% public class StringTest {
% public static void main(String args[]) {
% String s1 = System.console().readLine();
% String s2 = new String("QWE");
% String s3 = "QWE";
% if (s1 == s2) System.out.println( "s1 == s2");
% if (s2 == s3) System.out.println( "s2 == s3");
% if (s3 == s1) System.out.println( "s3 == s1");
% }
% }
You already know that you can do some things with strings using
\emph{methods}. Although we are going to learn more about methods on
the next section, we can introduce some of them now to help with the
exercises. Assuming you have a String called \verb+str+, you can use
the following methods:
\begin{description}
\item[str.length(): ] this method returns the length of the string.
\item[str.charAt(): ] this method requires an integer inside the
brackets, and returns the character at the specified position;
note that the first character is at position 0, not 1.
\item[str.substring(): ] this method requires two integers inside the
brackets, separated by a comma, and returns another string that
begins at the character specified by the first number and extends to
the character specified by the second number minus one\footnote{This
results in the substring having a length of
secondNumber~-~firstNumber.}.
\end{description}
Here is a brief example of how they are used. Note that if you want to
use double quotes inside double quotes, you have to \emph{escape} them
using the backslash (``\textbackslash'') sign: the backslash will make
Groovy and Java treat the next double quote as a literal double quote
and not as the end of the string.
\begin{verbatim}
String str;
str = new String("This is an example")
println "Initial string: \"" + str + "\"" // Note the escaped quotes!
int l = str.length()
println "The length of the string is " + l
char c = str.charAt(0)
println "The first character is " + c
String str2 = str.substring(8,18)
println "The substring from char 8 to char 17 is \"" + str2 + "\""
\end{verbatim}
The result of this little program is:
\begin{verbatim}
Initial string: "This is an example"
The length of the string is 18
The first character is T
The substring from char 8 to char 17 is "an example"
\end{verbatim}
\subsection{You own structures: classes}
\label{sec:you-own-structures}
String is the most common complex type, but it is not the only one. As
a matter of fact, programmers can create their own complex
types very easily. In order to create new types of complex data, we use the
keyword \verb+class+. You can think of it as creating new classes of
data.
A new class of data must have a name (remember: by convention complex
data types start with capital letters, classes too) and be defined in
between curly brackets. A complex type is composed of several other
types, simple of complex. Let's see an example:
\begin{verbatim}
class Person {
String name;
int age;
}
\end{verbatim}
This code defines a new class of data that represents a person, and
it is composed of a String for the name and an integer for the age. As
you can see, both simple and complex types can be used inside a
class. You can even have data on a specific class inside the same
class, as in the extended example below:
\begin{verbatim}
class Person {
String name;
int age;
Person father;
Person mother;
}
\end{verbatim}
Now a person has a String for a name, an integer for the age, and two
variables of type Person for the father and the mother. Once you start
working with complex data on a regular basis (which means from now on)
you can see that it is crucial to use good identifiers for your
names. Compare the former class with this very-badly-named example:
\begin{verbatim}
class P {
String s;
int n;
P p1;
P p2;
}
\end{verbatim}
Both examples contain the same information at a logical level, and the
computer will behave equally well with both. However, human
programmers reading the code will have a hell of a time trying to
understand what the second class really represents, while the first
example is obvious. Remember: \textbf{source is written once but is read many
times}, so your code should be as clear as possible.
Variables in a class are often called \emph{fields}. Sometimes they
are called \emph{member fields} because they can be seen as being
``part of'' (members of) the class. So the class \verb+Person+ can be
said to have four fields: one of type \verb+String+, one of type
\verb+int+, and two of type \verb+Person+. Fields in a class are
accessed using a dot, as in the following example:
\begin{verbatim}
Person employee = new Person();
employee.name = "John Smith";
employee.age = 45;
println "BOSS: How old are you, " + employee.name + "?"
println "EMPLOYEE: I am " + employee.age + " years old.";
\end{verbatim}
In this example we have created one variable of class Person, a
so-called
\emph{instance} of the class. Each of these instances of a class
are also called \emph{objects}. Every time you use the
keyword \verb+new+ you are creating another object (this involves
looking for a place in memory to store it and place the corresponding
pointer pointing to it). You can think of a \emph{class} as a
blueprint, and thus creating a new instance (by using \verb+new+)
is creating a new object according to that blueprint.
Note that fields on an object are accessed using a dot to
separate the name of the variable that identifies the object
(e.g.~employee) and the name of the variable inside the object, the
field (e.g.~name), as in \verb+employee.name+.
We can make the example slightly more complicated by using more than
one instance of the class Person.
\begin{verbatim}
Person john = new Person();
john.name = "John Smith";
john.age = 35;
Person mary = new Person();
mary.name = "Mary Smith";
mary.age = 32;
Person student = new Person();
student.name = "John Smith, Jr.";
student.age = 5;
student.father = john
student.mother = mary
println "TEACHER: How old are you, " + student.name + "?"
println "LITTLE JOHN: I am " + student.age + " years old, sir.";
println "TEACHER: Who is your mother?"
println "LITTLE JOHN: " + student.mother.name + ", sir.";
\end{verbatim}
There are two important things to learn from this example. First, we
can access data inside an object that is inside another object.
Actually, there is no limit to the level of
encapsulation you can achieve with complex data (as long as you do not
run out of memory). If you want to access a field of a field you only
need to use two dots, as in \verb+student.mother.name+.
Second, we
have not initialised all fields in the objects of class Person. When
you create a new object, the fields of the object have to be
initialised as any other variable, otherwise they will not have the
values you want them to have. In this example, we do not know who are
the father and the mother of objects \verb+john+ and \verb+mary+. If
we try to access them, the computer will complain. On the other hand,
if the mother of object \verb+mary+ was properly initialised, we could
know the name of the grandfather of the student by typing something
like:
\begin{verbatim}
println "The father of my mother is " + student.mother.father.name;
\end{verbatim}
Next day we will see how to add methods to your classes,
like those that you know from String.
\subsection{Boxed types: Integer, Double, and Character}
\label{sec:boxed-types:-integer}
For every simple type in Groovy, Java Decaf, and Java, there is a ``boxed'' complex
version. The most important ones are summarised in
Table~\ref{tab:jajksdfj}.
Note that the name of the complex type uses capital letters and is
sometimes a longer, complete-word version of the simple type's name.
\begin{table}[htbp]
\centering
\begin{tabular}{|l|l|}
\hline
Simple & Complex \\
\hline
\hline
int & Integer \\
double & Double \\
char & Character \\
\hline
\end{tabular}
\caption{Most important simple types and their boxed counterparts}
\label{tab:jajksdfj}
\end{table}
The complex version of a simple type works mostly like any other
complex type. The variable itself contains a pointer that points to
the address in memory where the actual value is stored
(Figure~\ref{fig:doubled}). Why have
complex versions of simple types? Other data types have a lot of
information (i.e. many characters in a String) and that is why they
need to be stored indirectly, but this is not the case with boxed
types. Why then? Remember that complex types of data
can have their own methods, and sometimes this is very useful. You
have already come across one of the most used ones:
\verb+Integer.parseInt()+,
which is used to convert strings that contain only a number into an
\verb+int+. Boxed types have several other methods, and we will see
some of them, but \verb+parseInt()+ is the one more frequently used.
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{gfx/variables-double}
\caption{A boxed type, like any complex type, consists of a pointer
to a place in memory where the actual data is stored}
\label{fig:doubled}
\end{figure}
Why are these complex types called ``boxed''? This is because they can
be seen as a box wrapping around a simple type. In the old days, using
boxed types was cumbersome and boring. You could need to type
something like:
\begin{verbatim}
int i = 1;
Integer boxedI = new Integer(i);
int i2 = boxedI.intValue()
\end{verbatim}
You had to explicitly create the boxed type like any other complex
type by using \verb+new+, and then you needed a method of the boxed
type to get the simple type from inside. This was a lot of work when
you had many variables and had to perform conversion from
simple type to boxed type very often, for example if you have to make
comparisons of data between two different sources (one simple type,
the other boxed type). This why Java (and Groovy) now
allow what is called \emph{auto-boxing}, which is just a fancy name to
say that you can use them in exactly the same way and the computer
takes care of all the boxing-in and boxing-out. In other words, the
following code is perfectly legal:
\begin{verbatim}
int i1 = 1
Integer i2 = 1
if (i1 == i2) {
println "Nowadays you can use boxed and simple types together!"
}
\end{verbatim}
Note that you do not need to write
explicitly \verb+i2 = new Integer(1)+, but i2 is still a complex boxed
type, its box contains just a pointer to some place in memory where a
``1'' is stored. The variable i1, on the other hand, is a \emph{bona
fide} integer variable, a box with
a ``1'' inside (see Figure~\ref{fig:integervars}). However, thanks
to auto-boxing, you can compare one with the other: the computer will
automatically extract the ``1'' inside the boxed type to perform the
comparison with the other ``1''. You do not need to do it yourself!
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{gfx/variables-integer}
\caption{Even if i1 and i2 are, strictly speaking, two very
different things, auto-boxing makes it easy to work with both types
at the same time.}
\label{fig:integervars}
\end{figure}
\subsection{A final note on terminology}
\label{sec:final-note-name}
If you read other books or web pages about Groovy or Java, you may
notice that they use terms that are different from the ones we have
used in this section. We go through some of them here for the sake of
clarity.
\begin{description}
\item[Simple type: ] A type that is stored in its own box, usually
32-bit or 64-bit long. These are usually called \emph{primitive}
types in the Java world. I prefer to call them simple data types to
distinguish them clearly from complex data types.
\item[Complex type: ] A type that has two parts: the box contains a
pointer, and it points to a place in memory where the actual value
is stored. In the Java world, these are simply called \emph{classes}
and \emph{objects}, which is precise but fails to be explicit on the
difference between simple and complex data types, and this can be a
source of confusion.
\item[Pointer: ] The content of the box in a complex type, a memory
address where actual data is stored. This is sometimes called a
\emph{reference} or a \emph{handle} to prevent confusion with
pointers of other languages (like C) that behave in a slightly
different way. I think many programming languages have constructs
that share the same name and behave in slightly different ways
(Strings is a major example) so this is not of much
concern. Besides, it looks to me like very confusing to
access a null ``reference'' or a null ``handle'' and
get a Null\textbf{Pointer}Exception as a consequence.
\end{description}
%%%%%%%%%%%%%%5
% Methods, functions
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: