MySQL has a very powerful data import method that loads your raw file content into database tables in large chunks, swift and efficient.
First, we will need to have a look at the source file containing the data we want to import.
Investigating the source file
A typical source file is structured like a table with rows and columns. Each line in the file correlates to a row in the database table, except for the header line which contains the column names and not data values. All data lines are then divided into exactly N fields, which correlates to the table columns.
The following example shows a data file with population figures for some countries in the world in the years 2005, 2010 and 2015. The data file has 13 rows and 4 columns:
Aruba;100031;101597;103889 Andorra;81223;84419;70473 Afghanistan;24399948;27962207;32526562 Angola;17912942;21219954;25021974 Albania;3011487;2913021;2889167 Arab World;313430911;353112237;392022276 Argentina;39145491;41222875;43416755 Armenia;3014917;2963496;3017712 American Samoa;59117;55636;55538 Antigua and Barbuda;82565;87233;91818 Australia;20394800;22031750;23781169 Austria;8227829;8363404;8611088 Azerbaijan;8391850;9054332;9651349
Every line has three semicolons that separates the fields from each other. The last field is terminated by the line ending. Thus, we say that the fields are terminated by semicolons. In theory, it could have been any character serving as a separator between the fields.
The next important step is to determine what data types are in play.
Loading the data into the database
The LOAD DATA INFILE syntax is your friend. This data manipulation statement reads rows from a text file into a table at very high speed.
Besides specifying the source file, you must also define what character separates each field in the data row (usually a semicolon or a tab), and whether to skip the first row if it’s the header row with column titles.
LOAD DATA INFILE '/tmp/smiley_supervision.csv' INTO TABLE supervision FIELDS TERMINATED BY ';' IGNORE 1 ROWS;
Line 2 specifies the input file. The file path must be single-quoted.
Line 4 is the name of a table created specially for this dataset. In our example, it needs to have exactly as many columns as in the text file, and the data types and -lengths must match.
Line 5 specifies which character is used as field separator. Here, it’s semicolon, but other characters, such as tab, are just as common. The character must be single-quoted. To specify tab, use the escape sequence ‘/t’.
Line 6 instructs the engine to skip the first line (ignore one row), because this is not a data row, only column headers with field names.