Serv-U logs analysis with Awstats: Design Document

Update: For a simple and immediate solution see: Using Awstats to analyse Serv-U logs

Using Awstats to monitor Serv-U the FTP Server from Rhino Software.
After investigating this a "bit", it appears we will have to write a process/script/app that will read the Serv-U logs and convert then to a format that Awstats can read. This is not difficult but probably a few days of effort. In this document, I will outline the design. Maybe some kind "angel" programmer will offer to write it. In any case if I get something that works, I will post it here.

1) The script/app needs to parse the Serv-U logs and create a new log that is Awstats compatible.
With the goal to output one line item in the new log that will contain:
%time1 %host %logname %method %url %code %bytesd

Algorithm:
a) Read the Serv-U log file, when we encounter a new connection line:
[5] Thu 15Dec05 03:07:46 - (005901) Connected to 195.115.114.37 (Local address 63.123.139.203)
Create a structure/array to hold the FTP connection details using the Connection_ID (ex: 005901) as the key and store the ip (%host in Awstats).

b) When you encounter the line:
[2] Thu 15Dec05 01:05:36 - (005869) USER christophe
Add to the primary structure/array the UserID (%logname in Awstats).

c) When you encounter the line:
[4] Thu 15Dec05 01:05:40 - (005869) Receiving file \\83.132.131.203\domlyon\wwwroot\js.cfm

Add to a secondary structure/array indexed back (parent Key) to Connecton_ID (ex: 005901). The key here should be "\\83.132.131.203\domlyon\wwwroot\js.cfm".
Store the following data in the secondary structure:
Receiving: Might be an optimization to store R for "Receiving" and S for "Sending" (%method in Awstats)
\\83.132.131.203\domlyon\wwwroot\js.cfm: Our key in the secondary structure (%url in Awstats)
Thu 15Dec05 01:05:40: Convert this to a supported Awstats date format. Three to pick from:
# %time1 Date and time with format: [dd/mon/yyyy:hh:mm:ss +0000] or [dd/mon/yyyy:hh:mm:ss]
# %time2 Date and time with format: yyyy-mm-dd hh-mm-ss
# %time3 Date and time with format: Mon dd hh:mm:ss or Mon dd hh:mm:ss yyyy

d) When you encounter the line:
[4] Thu 15Dec05 01:05:41 - (005869) Received file \\83.132.131.203\domlyon\wwwroot\js.cfm successfully (6.00 kB/sec - 2646 Bytes)

Update to secondary structure/array with:
successfully: Convert to 226 (standard FTP code) any other value store a code in the 400s or 500s to indicate "not-successfully".
(%code in Awstats)
Note: Here it would be nice to know all Serv-U codes and translate then to the standard FTP codes.
2646 Bytes: Store only the numeric value. (%bytesd in Awstats)
Now write a line to the new Awstats compatible log. You should have:
%time1 %host %logname %method %url %code %bytesd

e) When you encounter a "Closing connection" line:
[5] Thu 15Dec05 03:26:34 - (005905) Closing connection for user UPDATE (00:00:09 connected)
Loop threw the secondary structure/array and dump a line for each file. These did not complete. Use a code value in the 400s or 500s (i.e. failure).
Default %bytesd to 0 (zero).

Same thing when you encounter a line indicating a server start:
[1] Sat 17Dec05 09:49:16 - FTP Server listening on port number 21, IP 63.123.139.203

That should be it for the parsing.

Now for tracking the logs processed, many implementation possible. I will continue aiming for the KISS (keep it simple stupid) method. ;-)

Need a global parameter storing the location of the logs (ex: String Log_Folder).
Need a global parameter storing the naming schema of the logs (ex: String Log_Naming_Format).
Here, I propose to simplify since most people that need to analyze their logs have high traffic(ISPs), log rotation is a must.
If further simplification is required impose daily logs in a certain format like: ftp%Y%N%D.log.

The scripts need to track the currently processed logs (ex: String Last_Processsed_Log), like ftp20051220.log.
When it runs it processes Last_Processsed_Log log file and at the end, it looks for the next log (according to the log naming schema), if it finds one it updates the value of Last_Processsed_Log (ex: ftp20051221.log). When it runs the first time the value Last_Processsed_Log will be empty, if so, look for the oldest file in Log_Folder, if found update Last_Processsed_Log and process the log.
-
Special case handling:
*) Orphan operations:
If you encounter a line like c):
[4] Thu 15Dec05 01:05:40 - (005869) Receiving file \\83.132.131.203\domlyon\wwwroot\js.cfm

and you have no corresponding primary structure/array with the Connecton_ID (ex: (005869)), parse the previous log (according to the ftp log naming schema) to get the values: UserID (%logname ) and Client_IP (%host). Once found, we are back to "normal".
If those values can not be found dump a line with "-" in place of Client_IP and UserID when you enter line d).
This special case handles, the log rotation while users are connected.