Introduction
When we last left off, we finished a small PHP script than handle some US zipcode management. We were able to do a couple of simple things such as finding our whether or not a zipcode exist and retrieving information about the zipcode. However, it wasn't truly complete. There are some other functionalities that are missing. First, what if we need to find a list of zipcodes within a given radius of a zipcode? Second, what about within a given coordinate? And lastly, we need to improve our code to optimize efficiency without losing any speed.
Before we continue, lets take a look at our source code from where we left off:
<?php
// Load the necessary classes
loadClass('DLCommon');
class DLZipcode {
/**
* The mysqli object used for interaction with the database.
*
* @var mysqli
*/
private $mysqli;
/**
* The table in the database where the zipcodes are stroed.
*
* @var string
*/
private $table = 'zipcode';
/**
* Create a new DLZipcode object.
*
* @param string $host The host to connect to the database.
* @param string $user The username to connect to the database.
* @param string $pass The password to connect to the database
* @param string $db The database.
* @return DLZipcode
*/
public function __construct($host, $user, $pass, $db) {
// Create the mysqli object
DLCommon::mysqli($this->mysqli, $host, $user, $pass, $db);
}
/**
* Retrieve the given zipcode information. The retrned value is an
* associative array with the following keys: zipcode, latitude, longitude,
* city, and state.
*
* @param string $zipcode The zipcode to retrieve the information for.
* @return array|NULL
*/
public function getZipcode($zipcode) {
// First, make sure the zipcode is valid
if(!$this->validZipcode($zipcode)) {
// Try to fetch zipcode from database
$sql = 'SELECT * FROM %s WHERE `zipcode`="%s" LIMIT 1';
$result = DLCommon::query($this->mysqli, $sql, $this->table, $zipcode);
$result = $result->fetch_assoc();
if(is_array($result)) { return $result; }
}
// If we gotten here, then the zipcode was not found on the database
return NULL;
}
/**
* Determine whether or not the given input is a valid zipcode.
*
* @param string $zipcode The string to validate.
* @return boolean
*/
public function validZipcode($zipcode) {
$result = is_numeric($zipcode);
$result &= isset($zipcode[4]);
return $result;
}
/**
* Determine whether or not the zipcode exist in the database.
* @param string $zipcode The zipcode to validate.
* @return boolean
*/
public function zipcodeExist($zipcode) {
// Fetch the zipcode from the database, if it exist, then we should get
// an array back, otherwise, it is false
if($this->getZipcode($zipcode)) { return TRUE; }
else { return FALSE; }
}
}
?>
Improving Performance
So, first thing to notice is that the method zipcodeExist is relying on the method getZipcode to find out whether or not a given zipcode exist. Though this doesn't really matter much, but in most cases, people would need to find our whether or not a zipcode exist before they want to get the zipcode information. So, we should change that around. Also, what if the user uses the methods more than once on the same zipcode? The current version will make a connection to the database each time and check the zipcode information. That is not a good way to go about things since connection to the database are expensive and with a lot of people accessing it, it can get overloaded and breaks down. It is better to cache the result so the next time we need it, we can just get it from the cache. So, using those ideas, our updated methods now look like the following:
<?php
...
/**
* Create a new DLZipcode object.
*
* @param string $host The host to connect to the database.
* @param string $user The username to connect to the database.
* @param string $pass The password to connect to the database
* @param string $db The database.
* @return DLZipcode
*/
public function __construct($host, $user, $pass, $db) {
// Create the mysqli object
DLCommon::mysqli($this->mysqli, $host, $user, $pass, $db);
// Initialize the cache
$this->cache = array();
}
/**
* Retrieve the given zipcode information. The retrned value is an
* associative array with the following keys: zipcode, latitude, longitude,
* city, and state.
*
* @param string $zipcode The zipcode to retrieve the information for.
* @return array|NULL
*/
public function getZipcode($zipcode) {
// See if the zipcode exist
if(!$this->zipcodeExist($zipcode)) { return NULL; }
// zipcodeExist will have stored the zipcode info if we gotten here
return $this->cache[$zipcode];
}
/**
* Determine whether or not the zipcode exist in the database.
*
* @param string $zipcode The zipcode to validate.
* @return boolean
*/
public function zipcodeExist($zipcode) {
// First, make sure the zipcode is valid
if(!$this->validZipcode($zipcode)) { return FALSE; }
// See if we have cache data
if(isset($DLZ->cache[$zipcode])) { return TRUE; }
// MySQL work
$sql = 'SELECT * FROM %s WHERE `zipcode` = "%s" LIMIT 1;';
$result = DLCommon::query($this->mysqli, $sql, $this->table, $zipcode);
$result = $result->fetch_assoc();
// Validate result
if($result === NULL) { return FALSE; }
// Build result array
foreach($result as $key => $value) { $data[$key] = $value; }
// Cache the result
$this->cache[$zipcode] = $data;
return TRUE;
}
...
?>
Caching data and using it will greatly improve your performance especially if the methods are used a lot within the execution of the script. Of course, when the script finished its execution, the cache will be thrown away. So, what is there to do? We can keep the cache in the filesystem. And since the zipcode database shouldn't be updated very often, holding a cache for this data for a long time shouldn't be a concern. Obviously, reading from the filesystem isn't much better than reading off the database, if not worse in some situations, but we are only reading it off the filesystem once compared to the multiple times if it was in the database. Obviously, we don't want to waste time writing to the filesystem if no new updates were added to the cache and we don't want to write it each time a zipcode is fetched from the database. Doing that would be the same as just sticking with using the database. Ideally, what we need to do is restore the information from the cache once (at the start when the script is initialize) and write to it once (at the termination of the script and only if the cache has been updated). Fortunately, we can do both. We can restore the cache in the constructor and write to the cache with the magic function __destruct and an extra variable which will serve as a flag as to whether or not the cache has been updated. One last problem though, what should we name our file so it doesn't conflict with any of our other files on our filesystem. The answer is to hash a string and use that hash as the filename to ensure that no one touches our file. It is not 100% fool proof but for the most part, it should work. Our updated code is now:
<?php
...
/**
* Create a new DLZipcode object.
*
* @param string $host The host to connect to the database.
* @param string $user The username to connect to the database.
* @param string $pass The password to connect to the database
* @param string $db The database.
* @return DLZipcode
*/
public function __construct($host, $user, $pass, $db) {
// Create the mysqli object
DLCommon::mysqli($this->mysqli, $host, $user, $pass, $db);
// See if we have cache data
$filename = hash('sha256', 'DLZipcode Cache');
if(file_exists($filename) {
$this->cache = unserialize(file_get_contents($filename));
// We should remove the cache if it starts to get to big - 1MB
if(filesize($filename) > 1048576) { unlink($filename); }
} else { $this->cache = array(); }
// Set the flag that no updates has been made to the cache
$this->cacheUpdated = FALSE;
}
/**
* Write to the filesystem the cache data if the cache has been updated upon
* the termination of the script.
*/
private function __destruct() {
// Only write to the filesystem if the cache has been updated
if($this->cacheUpdated) {
$filename = hash('sha256', 'DLZipcode Cache');
file_put_contents($filename, serialize($this->cache));
}
}
/**
* Determine whether or not the zipcode exist in the database.
*
* @param string $zipcode The zipcode to validate.
* @return boolean
*/
public function zipcodeExist($zipcode) {
// First, make sure the zipcode is valid
if(!$this->validZipcode($zipcode)) { return FALSE; }
// See if we have cache data
if(isset($DLZ->cache[$zipcode])) { return TRUE; }
// MySQL work
$sql = 'SELECT * FROM %s WHERE `zipcode` = "%s" LIMIT 1;';
$result = DLCommon::query($this->mysqli, $sql, $this->table, $zipcode);
$result = $result->fetch_assoc();
// Validate result
if($result === NULL) { return FALSE; }
// Build result array
foreach($result as $key => $value) { $data[$key] = $value; }
// Cache the result
$this->cache[$zipcode] = $data;
// Set the flag that the cache has been updated
$this->cacheUpdated = TRUE;
return TRUE;
}
...
?>
And that is it. Our methods are now much more optimized and thanks to the cache, the more we use this script, the faster it should become. However, we don't want to have our cache be too large because then we would run out of memory. So, to limit the cache, I added in an extra check to make it delete the cache once its filesize is over 1MB.
Searching Time
Now that we are caching our data and have improved our simpler methods, it is time to work on with the more compicated methods. We will begin by creating a method that will retrieve a list of zipcodes within a given radius for a given coordinate. Since our coordinates are stored as degrees in the database, we should be expecting them as degrees in our methods.
The big question though is how do we do it. When I started this script, I began by researching the various methods people have used to get the zipcodes. My initial thought was that there have to be some form of trigonomic method that would work. It turns out, there was but as I read what other people have said, it isn't the way to go because it was slow as it have to traverse the entire list of zipcodes and do the test.
Another option was to select all the zipcodes within a square and then using that smaller list of zipcodes to calculate the distance. I personally didn't like this method since it will require two SQL statements each time someone needs the information. However, logistically, this seems like the most reasonable way to go about it since we will be working with a smaller set of zipcodes by initially removing zipcodes that are too far away.
The last method I found people to use was to overload the database by doing all the calulation beforehand and storing the results within the database. Although this is probably the fastest way for when the user needs to retrieve the information, it really is not a feasible way of doing things. It requires too much space as you need to make every single calculation. Plus, you will be limited in the radius. Lets say you only calculated the radius up to 20 miles. Now, what if you need a list of zipcodes that is within 50 miles, what can you do? Do you plan on calculating the entire list again? This just takes too much time and is too limited in its capabilities.
Of the three options, I removed the third option right off the bat because it is just not feasible. I tried the second method first and the results were good. It was fast and is flexible but I still didn't like the fact that it requires two SQL statements. Lastly, I tried by method of using that trigonomic function. Surprisingly, it was everybit as fast as the second method if not faster (since it requires only one SQL statement). I tried getting a list of zipcodes within a 100 mile radius and it look only about two-tenth of a second! My zipcode database consist over 32,000 records so it is not like it is small. I couldn't understand why so many other people didn't like my method. Although I understand their logic, it is a fact that it is still fast. I wonder if any of the people that said it was slow actually tried it before they gave their information.
I am not going to go into the details of how the trigonomic function works, that can be better explained by a math major. You can do research about it yourself. Anyway, where is the code for the within radius method:
<?php
...
/**
* Retrieve an associative array of all the zipcodes that is within the
* given radius of the given coordinates.
*
* @param numeric $latitude The latitude of the center point (in
* degrees).
* @param numeric $longitude The longitude of the center point (in
* degrees).
* @param numeric $radius The radius of the search in miles.
* @return array
*/
public function withinRadiusCoordinate($latitude, $longitude, $radius) {
// MySQL work
$sql = 'SELECT zipcode, city, state, TRUNCATE(acos('.
'cos(radians(%s))*cos(radians(%s))*'.
'cos(radians(latitude))*cos(radians(longitude)) + '.
'cos(radians(%s))*sin(radians(%s))*'.
'cos(radians(latitude))*sin(radians(longitude)) + '.
'sin(radians(%s))*sin(radians(latitude)))*3437.777078, 6) '.
'AS distance FROM %s HAVING distance<=%s';
$result = DLCommon::query($DLZ->mysqli, $sql,
$latitude, $longitude, $latitude, $longitude, $latitude,
$this->table, $radius
);
// Build result
$data = array();
while($row = $result->fetch_assoc()) {
$zip = $row['zipcode'];
foreach($row as $key => $value) { $data[$zip][$key] = $value; }
}
return $data;
}
...
?>
Basically, this method just require some research. But once you find the formula, just plug it in and you should get the result you are looking for. Now that we got this method, we can easily implement the method withinRadiusZipcode since all we need to do is get the coordinate of the given zipcode, then call this withinRadiusCoordinate method.
<?php
...
/**
* Retrieve an associative array of all the zipcodes that is within the
* given radius of the given zipcode.
*
* @param numeric $zipcode The zipcode to be used as the center
* @param numeric $radius The radius of the search in miles.
* @return array
*/
public function withinRadiusZipcode($zipcode, $radius) {
// Get the zipcode
$zipcode = $this->getZipcode($zipcode);
// Fetch the list of zipcodes within the specified $radius
$result = $this->withinRadiusCoordinate(
$zipcode['latitude'], $zipcode['longitude'], $radius
);
return $result;
}
...
?>
And that is it. Now that we have these two methods added, our script just became a lot more functional. Obviously, some tinkering can still be done to make it better but it is good enough. It has good performance and does what it needs to.
Conclusion
Although this tutorial has shown you how to create an script that will handle management of US zipcodes, I hope that you also learn some important techniques about programming from it. First, it is always a good idea to plan out how your script is going to be link. You don't necessary need to plan every single detail, but you should have an understanding of what core methods should be needed. You should also always go back and review your code. Look for ways to improve its performance if possible and clean it up when you can. Neatly document your source code so that when you come back to it at a later time, you can understand the logic. Just take a look at how some of my methods has more lines of documentation than lines of code. The Internet is a resource so use it to do some research to see how some other people are tackling the problem. Most of the time, you'll end up finding something.
Source Code
Below is the final source code for this tutorial:
<?php
// Load the necessary classes
loadClass('DLCommon');
class DLZipcode {
/**
* Cached data.
*
* @var array
*/
private cache;
/**
* The mysqli object used for interaction with the database.
*
* @var mysqli
*/
private $mysqli;
/**
* The table in the database where the zipcodes are stroed.
*
* @var string
*/
private $table = 'zipcode';
/**
* Create a new DLZipcode object.
*
* @param string $host The host to connect to the database.
* @param string $user The username to connect to the database.
* @param string $pass The password to connect to the database
* @param string $db The database.
* @return DLZipcode
*/
public function __construct($host, $user, $pass, $db) {
// Create the mysqli object
DLCommon::mysqli($this->mysqli, $host, $user, $pass, $db);
// See if we have cache data
$filename = hash('sha256', 'DLZipcode Cache');
if(file_exists($filename) {
$this->cache = unserialize(file_get_contents($filename));
// We should remove the cache if it starts to get to big - 1MB
if(filesize($filename) > 1048576) { unlink($filename); }
} else { $this->cache = array(); }
// Set the flag that no updates has been made to the cache
$this->cacheUpdated = FALSE;
}
/**
* Write to the filesystem the cache data if the cache has been updated upon
* the termination of the script.
*/
private function __destruct() {
// Only write to the filesystem if the cache has been updated
if($this->cacheUpdated) {
$filename = hash('sha256', 'DLZipcode Cache');
file_put_contents($filename, serialize($this->cache));
}
}
/**
* Retrieve the given zipcode information. The retrned value is an
* associative array with the following keys: zipcode, latitude, longitude,
* city, and state.
*
* @param string $zipcode The zipcode to retrieve the information for.
* @return array|NULL
*/
public function getZipcode($zipcode) {
// See if the zipcode exist
if(!$this->zipcodeExist($zipcode)) { return NULL; }
// zipcodeExist will have stored the zipcode info if we gotten here
return $this->cache[$zipcode];
}
/**
* Determine whether or not the given input is a valid zipcode.
*
* @param string $zipcode The string to validate.
* @return boolean
*/
public function validZipcode($zipcode) {
$result = is_numeric($zipcode);
$result &= isset($zipcode[4]);
return $result;
}
/**
* Retrieve an associative array of all the zipcodes that is within the
* given radius of the given coordinates.
*
* @param numeric $latitude The latitude of the center point (in
* degrees).
* @param numeric $longitude The longitude of the center point (in
* degrees).
* @param numeric $radius The radius of the search in miles.
* @return array
*/
public function withinRadiusCoordinate($latitude, $longitude, $radius) {
// MySQL work
$sql = 'SELECT zipcode, city, state, TRUNCATE(acos('.
'cos(radians(%s))*cos(radians(%s))*'.
'cos(radians(latitude))*cos(radians(longitude)) + '.
'cos(radians(%s))*sin(radians(%s))*'.
'cos(radians(latitude))*sin(radians(longitude)) + '.
'sin(radians(%s))*sin(radians(latitude)))*3437.777078, 6) '.
'AS distance FROM %s HAVING distance<=%s';
$result = DLCommon::query($DLZ->mysqli, $sql,
$latitude, $longitude, $latitude, $longitude, $latitude,
$this->table, $radius
);
// Build result
$data = array();
while($row = $result->fetch_assoc()) {
$zip = $row['zipcode'];
foreach($row as $key => $value) { $data[$zip][$key] = $value; }
}
return $data;
}
/**
* Retrieve an associative array of all the zipcodes that is within the
* given radius of the given zipcode.
*
* @param numeric $zipcode The zipcode to be used as the center
* @param numeric $radius The radius of the search in miles.
* @return array
*/
public function withinRadiusZipcode($zipcode, $radius) {
// Get the zipcode
$zipcode = $this->getZipcode($zipcode);
// Fetch the list of zipcodes within the specified $radius
$result = $this->withinRadiusCoordinate(
$zipcode['latitude'], $zipcode['longitude'], $radius
);
return $result;
}
/**
* Determine whether or not the zipcode exist in the database.
*
* @param string $zipcode The zipcode to validate.
* @return boolean
*/
public function zipcodeExist($zipcode) {
// First, make sure the zipcode is valid
if(!$this->validZipcode($zipcode)) { return FALSE; }
// See if we have cache data
if(isset($DLZ->cache[$zipcode])) { return TRUE; }
// MySQL work
$sql = 'SELECT * FROM %s WHERE `zipcode` = "%s" LIMIT 1;';
$result = DLCommon::query($this->mysqli, $sql, $this->table, $zipcode);
$result = $result->fetch_assoc();
// Validate result
if($result === NULL) { return FALSE; }
// Build result array
foreach($result as $key => $value) { $data[$key] = $value; }
// Cache the result
$this->cache[$zipcode] = $data;
// Set the flag that the cache has been updated
$this->cacheUpdated = TRUE;
return TRUE;
}
}