How to Insert 1 Million Records Lightning-Fast

December 12, 2024 Enxas

When working with large datasets, inserting a massive number of records into a database can become painfully slow. Fortunately, MySQL provides an optimized solution: the LOAD DATA statement. This method is significantly faster than traditional insertion methods, making it ideal for bulk data imports.

In this tutorial, I'll demonstrate how to generate, export, and import one million records into a MySQL database using Laravel. The principles discussed can be applied to various PHP applications.

Creating the Database Schema

Let's start by defining a schema for a shops table:

Schema::create('shops', function (Blueprint $table) {
	$table->id();
	$table->string('name');
	$table->text('description');
	$table->string('email')->unique();
	$table->date('established_date');
	$table->time('opening_time'); 
	$table->decimal('rating', 3, 2);
	$table->boolean('is_open');
});

Generating Test Data

To populate our database, we'll use the Faker library to generate realistic sample data:

$faker = \Faker\Factory::create();

$shops = [];

for ($i = 0; $i < 1_000_000; $i++) {
	$shops[] = [
		'name' => $faker->company,
		'description' => $faker->paragraph,
		'email' => $faker->unique()->safeEmail,
		'established_date' => $faker->date,
		'opening_time' => $faker->time,
		'rating' => $faker->randomFloat(2, 0, 5),
		'is_open' => $faker->boolean,
	];

	// insert in batches to avoid memory issues
	if ($i % 1000 === 0) {
		DB::table('shops')->insert($shops);
		$shops = [];
	}
}

// insert any remaining records
if (!empty($shops)) {
	DB::table('shops')->insert($shops);
}

Exporting Data to CSV

Next, we'll export the records to a CSV file:

$filePath = storage_path('app/shops.csv');
		
$file = fopen($filePath, 'w');

DB::table('shops')->chunkById(10_000, function ($shops) use ($file) {
	foreach ($shops as $shop) {
		$row = [
			$shop->id, 
			$shop->name, 
			$shop->description, 
			$shop->email, 
			$shop->established_date, 
			$shop->opening_time, 
			$shop->rating, 
			$shop->is_open,
		];

		fputcsv($file, $row, eol: "\n");
	}
});

fclose($file);

Now that we have all of our data in a CSV file, we can clean the shops table:

TRUNCATE TABLE shops;

Importing data (slow)

This is the standard approach for writing code to import data:

DB::statement('ALTER TABLE shops DISABLE KEYS');

$handle = fopen(storage_path('app/shops.csv'), 'r');
$shops = [];

while (($data = fgetcsv($handle, 500, ',')) !== false) {
	$shops[] = [
		'id' => $data[0],
		'name' => $data[1],
		'description' => $data[2],
		'email' => $data[3],
		'established_date' => $data[4],
		'opening_time' => $data[5],
		'rating' => $data[6],
		'is_open' => $data[7],
	];

	// insert in batches to avoid memory issues
	if (count($shops) % 1000 === 0) {
		DB::table('shops')->insert($shops);
		$shops = [];
	}
}

// insert any remaining records
if (count($shops) > 0) {
	DB::table('shops')->insert($shops);
}

fclose($handle);

DB::statement('ALTER TABLE shops ENABLE KEYS');

The issue with this approach is that it’s slow. It took approximately 7 minutes and 40 seconds to import all the data.

Preparing for Import

Before doing LOAD DATA import, we need to make a few configurations:

1. Check and Enable local_infile

First, verify if the local_infile feature is enabled in MySQL database:

SHOW GLOBAL VARIABLES LIKE 'local_infile';

If it's OFF, enable it:

SET GLOBAL local_infile = TRUE;

2. Configure Laravel Database Connection

In config/database.php, add the local_infile option:

'mysql' => [
    // ... other configurations
    'options' => extension_loaded('pdo_mysql') ? array_filter([
        PDO::MYSQL_ATTR_LOCAL_INFILE => true,
    ]) : [],
],

Importing Data Using LOAD DATA (fast)

Now we're ready to import the data efficiently:

$filePath = storage_path('app/shops.csv');

DB::statement('ALTER TABLE shops DISABLE KEYS');

DB::unprepared("
    LOAD DATA LOCAL INFILE '$filePath' 
    INTO TABLE shops 
    FIELDS TERMINATED BY ',' 
    OPTIONALLY ENCLOSED BY '\"' 
    LINES TERMINATED BY '\\n' 
    IGNORE 0 ROWS 
    (id, name, description, email, established_date, opening_time, rating, is_open)
");

DB::statement('ALTER TABLE shops ENABLE KEYS');

Performance Insights

Wit this approach, importing a 233 MB CSV file with one million records took approximately 40 seconds—a remarkable improvement over traditional insertion methods.

Some Considerations

Disable keys before bulk import and re-enable them after import.
For security reasons local_infile should be disabled when it's not in use.
Large CSV file imports slow down over time, so it's better to import two 100-million record files than one 200-million record file.