Importing data

MySQL has a very powerful data import method that loads your raw file content into database tables in large chunks, swift and efficient.

First, we will need to have a look at the source file containing the data we want to import.

Investigating the source file

A typical source file is structured like a table with rows and columns. Each line in the file correlates to a row in the database table, except for the header line which contains the column names and not data values. All data lines are then divided into exactly N fields, which correlates to the table columns.

The following example shows a data file with population figures for some countries in the world in the years 2005, 2010 and 2015. The data file has 13 rows and 4 columns:

Aruba;100031;101597;103889
Andorra;81223;84419;70473
Afghanistan;24399948;27962207;32526562
Angola;17912942;21219954;25021974
Albania;3011487;2913021;2889167
Arab World;313430911;353112237;392022276
Argentina;39145491;41222875;43416755
Armenia;3014917;2963496;3017712
American Samoa;59117;55636;55538
Antigua and Barbuda;82565;87233;91818
Australia;20394800;22031750;23781169
Austria;8227829;8363404;8611088
Azerbaijan;8391850;9054332;9651349

Every line has three semicolons that separates the fields from each other. The last field is terminated by the line ending. Thus, we say that the fields are terminated by semicolons. In theory, it could have been any character serving as a separator between the fields.

The next important step is to determine what data types are in play.

Loading the data into the database

The LOAD DATA INFILE syntax is your friend. This data manipulation statement reads rows from a text file into a table at very high speed.

Besides specifying the source file, you must also define what character separates each field in the data row (usually a semicolon or a tab), and whether to skip the first row if it’s the header row with column titles.

LOAD DATA
  INFILE '/tmp/smiley_supervision.csv'
  INTO TABLE
    supervision
  FIELDS TERMINATED BY ';'
  IGNORE 1 ROWS;

Line 2 specifies the input file. The file path must be single-quoted.

Line 4 is the name of a table created specially for this dataset. In our example, it needs to have exactly as many columns as in the text file, and the data types and -lengths must match.

Line 5 specifies which character is used as field separator. Here, it’s semicolon, but other characters, such as tab, are just as common. The character must be single-quoted. To specify tab, use the escape sequence ‘/t’.

Line 6 instructs the engine to skip the first line (ignore one row), because this is not a data row, only column headers with field names.