How to Insert 1 Million Records Lightning-Fast
When working with large datasets, inserting a massive number of records into a database can become painfully slow. Fortunately, MySQL provides an optimized solution: the LOAD DATA
statement. This method is significantly faster than traditional insertion methods, making it ideal for bulk data imports.
In this tutorial, I'll demonstrate how to generate, export, and import one million records into a MySQL database using Laravel. The principles discussed can be applied to various PHP applications.
Creating the Database Schema
Let's start by defining a schema for a shops
table:
Schema::create('shops', function (Blueprint $table) {
$table->id();
$table->string('name');
$table->text('description');
$table->string('email')->unique();
$table->date('established_date');
$table->time('opening_time');
$table->decimal('rating', 3, 2);
$table->boolean('is_open');
});
Generating Test Data
To populate our database, we'll use the Faker library to generate realistic sample data:
$faker = \Faker\Factory::create();
$shops = [];
for ($i = 0; $i < 1_000_000; $i++) {
$shops[] = [
'name' => $faker->company,
'description' => $faker->paragraph,
'email' => $faker->unique()->safeEmail,
'established_date' => $faker->date,
'opening_time' => $faker->time,
'rating' => $faker->randomFloat(2, 0, 5),
'is_open' => $faker->boolean,
];
// insert in batches to avoid memory issues
if ($i % 1000 === 0) {
DB::table('shops')->insert($shops);
$shops = [];
}
}
// insert any remaining records
if (!empty($shops)) {
DB::table('shops')->insert($shops);
}
Exporting Data to CSV
Next, we'll export the records to a CSV file:
$filePath = storage_path('app/shops.csv');
$file = fopen($filePath, 'w');
DB::table('shops')->chunkById(10_000, function ($shops) use ($file) {
foreach ($shops as $shop) {
$row = [
$shop->id,
$shop->name,
$shop->description,
$shop->email,
$shop->established_date,
$shop->opening_time,
$shop->rating,
$shop->is_open,
];
fputcsv($file, $row, eol: "\n");
}
});
fclose($file);
Now that we have all of our data in a CSV file, we can clean the shops
table:
TRUNCATE TABLE shops;
Importing data (slow)
This is the standard approach for writing code to import data:
DB::statement('ALTER TABLE shops DISABLE KEYS');
$handle = fopen(storage_path('app/shops.csv'), 'r');
$shops = [];
while (($data = fgetcsv($handle, 500, ',')) !== false) {
$shops[] = [
'id' => $data[0],
'name' => $data[1],
'description' => $data[2],
'email' => $data[3],
'established_date' => $data[4],
'opening_time' => $data[5],
'rating' => $data[6],
'is_open' => $data[7],
];
// insert in batches to avoid memory issues
if (count($shops) % 1000 === 0) {
DB::table('shops')->insert($shops);
$shops = [];
}
}
// insert any remaining records
if (count($shops) > 0) {
DB::table('shops')->insert($shops);
}
fclose($handle);
DB::statement('ALTER TABLE shops ENABLE KEYS');
The issue with this approach is that it’s slow. It took approximately 7 minutes and 40 seconds to import all the data.
Preparing for Import
Before doing LOAD DATA
import, we need to make a few configurations:
1. Check and Enable local_infile
First, verify if the local_infile
feature is enabled in MySQL database:
SHOW GLOBAL VARIABLES LIKE 'local_infile';
If it's OFF, enable it:
SET GLOBAL local_infile = TRUE;
2. Configure Laravel Database Connection
In config/database.php
, add the local_infile
option:
'mysql' => [
// ... other configurations
'options' => extension_loaded('pdo_mysql') ? array_filter([
PDO::MYSQL_ATTR_LOCAL_INFILE => true,
]) : [],
],
Importing Data Using LOAD DATA (fast)
Now we're ready to import the data efficiently:
$filePath = storage_path('app/shops.csv');
DB::statement('ALTER TABLE shops DISABLE KEYS');
DB::unprepared("
LOAD DATA LOCAL INFILE '$filePath'
INTO TABLE shops
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\n'
IGNORE 0 ROWS
(id, name, description, email, established_date, opening_time, rating, is_open)
");
DB::statement('ALTER TABLE shops ENABLE KEYS');
Performance Insights
Wit this approach, importing a 233 MB CSV file with one million records took approximately 40 seconds—a remarkable improvement over traditional insertion methods.
Some Considerations
- Disable keys before bulk import and re-enable them after import.
- For security reasons
local_infile
should be disabled when it's not in use. - Large CSV file imports slow down over time, so it's better to import two 100-million record files than one 200-million record file.