|
| 1 | += Migrating your Data from SQL |
| 2 | +:description: Using MySQL as a starting point, this guide demonstrates \ |
| 3 | +how to migrate your existing data from SQL tables to documents stored in a Couchbase bucket. |
| 4 | +:page-topic-type: guide |
| 5 | +:page-pagination: full |
| 6 | + |
| 7 | +[abstract] |
| 8 | +{description} |
| 9 | + |
| 10 | +== Introduction |
| 11 | + |
| 12 | +Couchbase offers a number of strategies for migrating your existing data; |
| 13 | +in this example, we will begin with a sample student record database stored in MySQL, |
| 14 | +and use the Couchbase server tool `cbimport` to copy the data into a Couchbase cluster. |
| 15 | + |
| 16 | +== Prerequisites |
| 17 | + |
| 18 | +Before you begin this exercise, you should have installed and set up a Couchbase cluster on your local machine. |
| 19 | +You will find instructions for creating a fresh cluster here: xref:getting-started:do-a-quick-install.adoc[Couchbase Server Installation] |
| 20 | + |
| 21 | +To use `cbimport`, you will need to install the Couchbase `CLI` package. |
| 22 | +You will find the location of the package and instructions for installing it, here: xref:cli:cli-intro.adoc[] |
| 23 | + |
| 24 | +If you're running through the examples, |
| 25 | +then you will also need an existing MySQL installation with the preexisting table structure |
| 26 | +defined in xref:student-record-sql-database-section[the following section]: |
| 27 | + |
| 28 | +[#student-record-sql-database-section] |
| 29 | +== Student Record database |
| 30 | + |
| 31 | +The database we will convert will consist of a relational structure with three tables: |
| 32 | + |
| 33 | +[plantuml,student-record-erd] |
| 34 | +.Student records SQL database |
| 35 | +.... |
| 36 | +include::partial$diagrams/student-record-erd.puml[] |
| 37 | +.... |
| 38 | + |
| 39 | +which we will convert to a document model suitable for storage in our Couchbase bucket: |
| 40 | + |
| 41 | +[#document-model] |
| 42 | +[plantuml,student-document-database-design] |
| 43 | +.Student document model |
| 44 | +.... |
| 45 | +include::partial$diagrams/student-document-database-design.puml[] |
| 46 | +.... |
| 47 | + |
| 48 | +You will see that our document model is not an exact mapping of the SQL database: |
| 49 | +we have taken the `enrollments` records and added them directly as a list of sub-documents |
| 50 | +within each student record: |
| 51 | + |
| 52 | +[source, json] |
| 53 | +---- |
| 54 | +[ |
| 55 | +{"student-id": 1, |
| 56 | +"enrollments": [{"course-id": 3, "final-score": null, "date-enrolled": "2024-04-18", "date-completed": null}, {"course-id": 1, "final-score": null, "date-enrolled": "2025-03-05", "date-completed": null}], "student-name": "Harriet Hill", "date-of-birth": "1970-03-06"}, |
| 57 | +{"student-id": 2, "enrollments": [{"course-id": 1, "final-score": null, "date-enrolled": "2025-03-01", "date-completed": null}], "student-name": "Steven Morris", "date-of-birth": "1984-03-05"}, |
| 58 | +{"student-id": 3, "enrollments": [{"course-id": null, "final-score": null, "date-enrolled": null, "date-completed": null}], "student-name": "Jenny Mills", "date-of-birth": "1969-11-06"} |
| 59 | +] |
| 60 | +---- |
| 61 | + |
| 62 | +TIP: In the early stages of your migration, it is a good idea to design the new structure of your Couchbase collections. |
| 63 | +This will make it easier to work out the `cbimport` command parameters you will need for the migration. |
| 64 | + |
| 65 | + |
| 66 | +== Step {counter:step}: Extract your `Course` data from MySQL |
| 67 | + |
| 68 | +The first stage of your migration is to extract the data a file format that the `cbimport` utility can work with. |
| 69 | +`cbimport` can work with comma-separated value files or JSON-formatted files. |
| 70 | +Because we already know that we will be embedding our `enrollment` records into the record for each student, |
| 71 | +makes sense to use the more versatile JSON structure. |
| 72 | + |
| 73 | +Fortunately, MySQL has a number of SQL functions that make working with JSON data fairly straightforward, |
| 74 | +so we'll start by migrating the `course` table into a JSON file: |
| 75 | + |
| 76 | +[source, mysql] |
| 77 | +.Extract the `course` table into the file: `/var/lib/mysql-files/courses.json` |
| 78 | +---- |
| 79 | +SELECT JSON_OBJECT( |
| 80 | + 'course-id', course.`course-id`, |
| 81 | + 'course-name', course.`course-name`, |
| 82 | + 'faculty', course.faculty, |
| 83 | + 'credit-points', course.`credit-points` |
| 84 | + ) FROM course |
| 85 | +INTO OUTFILE '/var/lib/mysql-files/courses.json' |
| 86 | +---- |
| 87 | + |
| 88 | +Using the `JSON_OBJECT` function, the command will `SELECT` every record in the table and output it to a file. |
| 89 | +Each line of the file will correspond to a single record: |
| 90 | + |
| 91 | +[source,jsonlines] |
| 92 | +---- |
| 93 | +{"faculty": "Art", "course-id": 1, "course-name": "Art History", "credit-points": 50} |
| 94 | +{"faculty": "Art", "course-id": 2, "course-name": "Fine Art", "credit-points": 30} |
| 95 | +{"faculty": "Design", "course-id": 3, "course-name": "Graphic Design", "credit-points": 70} |
| 96 | +{"faculty": "English", "course-id": 4, "course-name": "Creative Writing", "credit-points": 70} |
| 97 | +---- |
| 98 | + |
| 99 | +NOTE: Strictly speaking, the JSON output is not a well-formed JSON document because it isn't structured as an array. |
| 100 | +Nevertheless, `cbimport` will read each line as a separate record. |
| 101 | + |
| 102 | +== Step {counter:step}: Extract your `Student` data from MySQL |
| 103 | + |
| 104 | +This case is slightly different because we want to include the enrollment details with each student record |
| 105 | +(see the xref:document-model[]) |
| 106 | + |
| 107 | +We can handle this JSON structure by using a more involved SELECT: |
| 108 | +as well as extracting the student records, we can simultaneously pull in the enrollments for each student: |
| 109 | + |
| 110 | +[source,mysql] |
| 111 | +.Extract `students` and their `enrollments`. |
| 112 | +---- |
| 113 | +SELECT JSON_OBJECT( |
| 114 | + 'student-id', student.`student-id`, |
| 115 | + 'student-name', student.name, |
| 116 | + 'date-of-birth', student.`date-of-birth`, |
| 117 | + 'enrollments', IF (COUNT(enrollment.`course-id`) = 0, JSON_ARRAY(), JSON_ARRAYAGG( |
| 118 | + JSON_OBJECT( |
| 119 | + 'course-id', enrollment.`course-id`, |
| 120 | + 'date-enrolled', enrollment.`date-enrolled`, |
| 121 | + 'date-completed', enrollment.`date-completed`, |
| 122 | + 'final-score', enrollment.`score` |
| 123 | + ) |
| 124 | + )) |
| 125 | + ) |
| 126 | +FROM student |
| 127 | + LEFT OUTER JOIN enrollment ON enrollment.`student-id` = student.`student-id` |
| 128 | +GROUP BY student.`student-id` |
| 129 | +INTO OUTFILE '/var/lib/mysql-files/students.json'; |
| 130 | +---- |
| 131 | + |
| 132 | +In addition to the `JSON_OBJECT` function call that extracts the student details, |
| 133 | +we are also using the `JSON_ARRAYAGG` function to build an array within each student record. |
| 134 | +The data for this list is retrieved through the `LEFT OUTER JOIN` |
| 135 | +which provides the foreign key link between the student and the enrollment record. |
| 136 | + |
| 137 | +We also use the ``IF (COUNT(enrollment.`course-id`) = 0`` statement |
| 138 | +to ensure that there are existing enrollment records attached to the current student. |
| 139 | +If there are no enrollment records, then that portion of the query uses `JSON_ARRAY()` to return an empty list. |
| 140 | + |
| 141 | +[source, jsonlines] |
| 142 | +---- |
| 143 | +{"student-id": 1, "enrollments": [{"course-id": 3, "final-score": 0, "date-enrolled": "2025-03-10", "date-completed": null}, {"course-id": 1, "final-score": 0, "date-enrolled": "2025-03-10", "date-completed": null}], "student-name": "Hilary Wells", "date-of-birth": "1990-08-09"} |
| 144 | +{"student-id": 2, "enrollments": [{"course-id": 2, "final-score": 0, "date-enrolled": "2025-03-10", "date-completed": null}], "student-name": "Ashley Matthews", "date-of-birth": "1987-07-01"} |
| 145 | +{"student-id": 3, "enrollments": [{"course-id": 1, "final-score": 0, "date-enrolled": "2025-03-10", "date-completed": null}], "student-name": "Boregard Johnson", "date-of-birth": "1985-03-23"} |
| 146 | +{"student-id": 4, "enrollments": [], "student-name": "Toni Jones", "date-of-birth": "1984-10-02"} |
| 147 | +---- |
| 148 | + |
| 149 | +== Step {counter:step}: Create your bucket, scope, and collections. |
| 150 | + |
| 151 | +You will need to create the bucket, scope, and collections to hold the data on your Couchbase cluster. |
| 152 | + |
| 153 | +For information on creating buckets, scopes, and collections, |
| 154 | +read the sections on xref:manage:manage-buckets/bucket-management-overview.adoc[Managing Buckets] |
| 155 | +and xref:manage:manage-scopes-and-collections/manage-scopes-and-collections.adoc[Managing Scopes and Collections] |
| 156 | + |
| 157 | +''' |
| 158 | + |
| 159 | +.Set up your cluster |
| 160 | + |
| 161 | +. Using the Couchbase admin console, the command line tool, or the REST API, |
| 162 | +create a new bucket on your cluster called `student-bucket`. |
| 163 | + |
| 164 | +. Create a new scope called `art-school-scope` within `student-bucket`. |
| 165 | + |
| 166 | +. Create two new collections (`student-record-collection` and `course-record-collection`) inside `art-school-scope`. |
| 167 | + |
| 168 | + |
| 169 | +== Step {counter:step}: Import your data |
| 170 | + |
| 171 | +In this step, you will use `cbimport` to load your two JSON files into your cluster. |
| 172 | + |
| 173 | +''' |
| 174 | + |
| 175 | +.Import the course data |
| 176 | + |
| 177 | +Use the following command to import `courses.json` into your cluster. |
| 178 | + |
| 179 | +[source,console] |
| 180 | +---- |
| 181 | +./cbimport json --cluster 127.0.0.1:8091 \ |
| 182 | + --username Administrator --password password \ |
| 183 | + --bucket student-bucket \ |
| 184 | + --dataset file:///var/lib/mysql-files/courses.json \ |
| 185 | + --format lines \ |
| 186 | + --generate-key %course-id% \ |
| 187 | + --scope-collection-exp art-school-scope.course-record-collection |
| 188 | +---- |
| 189 | + |
| 190 | +The parameters used are as follows: |
| 191 | + |
| 192 | +[horizontal,labelwith=25,itemwidth=75] |
| 193 | +`--cluster`:: |
| 194 | +The address and port of the Couchbase cluster receiving the imported data. |
| 195 | + |
| 196 | +`--username`:: |
| 197 | +A valid admin-level user to log on to the cluster |
| 198 | + |
| 199 | +`--bucket`:: |
| 200 | +The name of the destination bucket for the imported data. |
| 201 | + |
| 202 | +`--dataset`:: |
| 203 | +The full path of the JSON file where the import data can be found. |
| 204 | ++ |
| 205 | +IMPORTANT: Remember to include the `file:://` prefix. |
| 206 | + |
| 207 | +`--format`:: |
| 208 | +This is the of the JSON data that `cbimport` is importing. |
| 209 | +The value can be `lines` or `lists`. |
| 210 | +For this exercise, the value should be set to `lines`. |
| 211 | ++ |
| 212 | +For a detailed explanation of the `--format`, see xref:tools:cbimport-json.adoc#DATASET_FORMATS[Dataset formats]. |
| 213 | + |
| 214 | +`--generate-key`:: |
| 215 | +This tells `cbimport` how to generate the key for the imported data. |
| 216 | +You can use any combination of fields in the data to generate the key. |
| 217 | +In this exercise, we simply set the key to match the `course-id` field in the imported data. |
| 218 | + |
| 219 | +`--scope-collection-exp`:: |
| 220 | +This defines an expression that tells `cbimport` which scope and collection the data will be imported to. |
| 221 | +The expression can be a static value (as we have used above), or a combination of field identifiers from the import data. |
| 222 | ++ |
| 223 | +For more information, see the section on the xref:tools:cbimport-json.adoc#SCOPE_COLLECTION_PARSER[Scope/Collection Parser] |
| 224 | + |
| 225 | + |
| 226 | +''' |
| 227 | + |
| 228 | +.Import the student data |
| 229 | + |
| 230 | +The student JSON file can be imported in much the same way: |
| 231 | + |
| 232 | +[source, console] |
| 233 | +---- |
| 234 | +./cbimport json --cluster 127.0.0.1:8091 \ |
| 235 | +--username Administrator \ |
| 236 | +--password password \ |
| 237 | +--bucket student-bucket \ |
| 238 | +--dataset file:///var/lib/mysql-files/students.json \ |
| 239 | +--format lines \ |
| 240 | +--generate-key %student-id% \ |
| 241 | +--scope-collection-exp art-school-scope.student-record-collection |
| 242 | +---- |
| 243 | + |
| 244 | +== Step {counter:step}: Check your data |
| 245 | + |
| 246 | +Use the web admin console to examine your imported records to make sure they are correct. |
| 247 | + |
| 248 | +image::tutorials:cbimported-data.png[] |
| 249 | + |
| 250 | +== Further reading |
| 251 | + |
| 252 | +For more information about `cbimport`, read the xref:tools:cbimport.adoc[cbimport guide]. |
| 253 | + |
| 254 | +If you would like to know more about MySQL JSON functions, |
| 255 | +then you will find a comprehensive reference https://dev.mysql.com/doc/refman/9.2/en/json-function-reference.html[here]. |
| 256 | + |
| 257 | + |
| 258 | + |
| 259 | + |
| 260 | + |
| 261 | + |
| 262 | + |
| 263 | + |
0 commit comments