This project uses PHP, SQL and mySQLi to analyse page hits and spider (robot) activity on your web site. With this page, I will attempt to describe how I undertook the project, in case you should wish to try such a project (or similar). This is a good and useful way to learn about PHP, and how it interfaces with mySQL.
You will need PHP and mySQL/mySQLi facilities via your web hosting company. The mySQLi content can be replaced by mySQL if required. You will need PHP GD to produce the graphics (optional). This project is suitable as a first PHP project, but a working knowledge of HTML is assumed. I strongly recommend doing the PHP and SQL tutorials on W3Schools.com, and using this excellent site as a reference at all times.
Click here to see the latest analysis on my domain
(1) Set up a mySQL table to store the data
This is easily done via phpMyAdmin, or alternatively can be done in SQL by the CREATE TABLE 'yourdb' command. Use an auto-incrementing integer
field to index the table as a primary key. Create fields for the data that you will need to store:
(2) Create php files: prog.php, dbload.php and analysis.php
"prog.php" is the program file that you wish to analyse. For each program, you will need to add the following code
within the PHP. All PHP code should be entered between the usual PHP tags <?php and ?>.
Note that each file must have the extension .php, so any existing .htm files will have to be renamed, and re-directions put in place. It is worth it though! PHP works server-side, and is compiled (parsed) at the server. This means that you will have to load the files to the server each time you wish to check them. Also, it means that confidential information such as passwords cannot be seen by viewing the source.
(3) Make the connection to the database, and obtain key information
The first part of the dbload.php program should make the link to the mySQL database, and obtain ip, spider,
browser and link information. You will need to enter your own userid, password and database in the first line below.
(4) Store data in the mySQL database
The second part of the dbload.php program will write info to the mySQL database.
Here I introduce two derived variables: 'browser' (which will contain spider info too) and 'link'. The following code should be inserted into the code above, just before the close statement.
xday (incremental day counter) and xmonth (incremental month counter) will need calculating and storing if you wish to produce a graph of hits/crawls by day (see sections 7 and 8).
Over time, you can analyse the variables browser and link where they contain 'Other' to see which spiders are crawling your site, and who is linking to you; and then expand the code above.
(5) Analyse the data - simple data dump
Now, you can start on the program analysis.php, which basically can be very simple, or as advanced as you like.
Firstly, make the connection to the database as before. The following is a simple dump of info for the last 100 records.
(6) Analyse the data - spider analysis
Make the connection to the database as before. This is a table containing a count of spider crawls.
(7) Introduce graphics
Here I use PHP GD2 to produce a dynamic graph of hits over the past month.
I introduce a variable xday which increments by 1 each day. This will need adding to the database load program.
The following code should go in a new program called analysis2.php, which is called from analysis.php as follows:
Program analysis2.php:
(8) xmonth and xday
Here is an example of code to calculate xmonth and xday. The code is set as at March 2010, with a value of xday=1 equating to 1st Jan 2010.
It will need amendment going forwards to cater for future dates, leap years etc.
(9) Pie charts
PHP scripts to produce pie charts are available free on the internet.
An example is provided below :
Click here for atokar.net solution (this is what I used).
(10) Identifying the visitor's country
There are several resources on the internet providing free look-up databases of ip address to country.
An example is provided below :
Click here for webnet77.com solution (this is what I used).
(11) Feedback
Please feed back any comments or mistakes or areas where clarification is required. Thanks.