Saturday, May 13, 2006

patent: call detail record (cdr) analysis

Recent news regarding government acquisition and analysis of civilian phone call data has been woefully lacking in detail about what was captured. I've identified a US Patent (USP 6385301) that you can read that will give you some insight into what could be divined from the sort of raw data that flows off of a switch - if you go the extra mile to put it into a useful analytical state.

The holders of this patent, Tom Nolting and Rich LaPearl, were looking for a novel way to analyze call traffic in order to improve the efficiency of their network. I sold them the software they used to do this work (the software package is cited in the patent) while they were at Bell Atlantic in the late 90s, and in the process learned a lot about just how much you can do with this information.

It's important to note that at no time did any of the information captured, stored or analyzed in this process have anything to do with the content of the call(s).

At the raw level, CDR data gives you access to the following:

a) Whether the call is terminating or originating

b) The total carrier elapsed time (time between when the call was initiated and when it was released)

c) The total customer elapsed time (time between when the call was answered and when it was released)

d) Whether the call was answered and how it was cleared (i.e., busy, normal clearing, etc.)

e) The date and time the call began

f) The originating and terminating number (NPA-NXX-XXXX)

Where NPA is the 3-digit Numbering Plan Area (Area Code) and NXX identifies the central office exchange allocated within the NPAs and XXXX are the consecutive last 4 digits of a NANP 7-digit local phone number (N is any digit from 2-9, and X from 0-9). This format is standard across North America per the North American Numbering Plan, and is managed by the creatively-named North American Numbering Plan Administration.

While Nolting and LaPearl specified a multi-dimensional database in their patent, I've seen other sorts of data management systems used with success for this sort of work. Specifically the Sybase Adaptive Server IQ database, which employs bitmapped indicies for high- and low-cardinality datatypes to speed analysis. The issue with this sort of work isn't necessarily the data storage problem - it's the data analysis problem, especially if you want to correlate the CDRs with external data. That's where the multidimensional database came in handy.

No comments: