AVR Phone Recorder & Telephony Platform. Part 2

In the first part we have got acquainted with the device hardware. Have considered used interfaces and reports and the general principles of work of system.

Device Usage
When first turned on, the device tries to initialize the SD/MMC card. First, it repeatedly sends the card reset sequence until it succeeds. This happens “in background”, so it doesn’t hang; in other words, the device’s features that do not depend on the card will work normally.
When the card is ready, it reads the card status descriptor and identification fields to determine its capacity and serial number. Next, it reads the first sector of the card and checks if it already contains our “index table”. If it does not, an empty one is overwritten. This means any filesystem previously present in the card will be destroyed. (Obviously, we can always reformat the card later.)

The device monitors the phone line status and displays it on the LCD, saying “DISCONNECTED” if it did not sense voltage in the line, “IDLE” if it senses the on-hook voltage or “IN USE” if it is off-hook.

The keypad consists of a center “Enter” key and four “arrows”. Using the left and right arrows allows us to seek a previously recorded call for playback. The “Enter” key may be used to interrupt playback. When the device is nor playing back, the “Enter” key cycles between audio monitor modes: monitor off, monitor only when offhook or monitor always on. The up and down keys are currently unused, reserved for a to-be-implemented menu system.

An incoming call is started either by the reception of the ring tone or DTMF digits for the caller ID, which are shown on the second line of the LCD just as they are received. The LCD’s first line shows the ring counter and the call timer. The device records the whole call, from the very first ring or DTMF digit that started. If the call is taken, recording stops when the user brings the phone offhook again. If the call is not taken, recording stops after a few seconds of no activity.

Taking the phone offhook when there is no incoming call starts an outgoing call. Recording starts from the moment the phone is taken offhook and stops when it is brought onhook again. During the call, the LCD shows the dialed number on the first line and the call timer in the second.

DTMF Commands
The device accepts a few dialed-in commands. The command prefix is “**”. The currently implemented commands are:

  • 01HHMMSS: Sets the time to HH hours, MM minutes, SS seconds;
  • 02YYMMDD: Sets the date to day DD, month MM, year MM;
  • ##: Stops recording the current call. This is needed because the monitored tech support attendant may need to make a private call that he/se does wish recorded – say, to his/her phone banking service, which may involve passwords or PINs. The device confirms the command by sending five short beeps to the line.

The device has no way to differentiate between commands issued locally or remotely. In our usage scenario, this was not considered a problem – the only protection measure we implemented was that the user could not reset the clock after the computer has set it. It would be trivial, however, to implement a password-based scheme to prevent unauthorized command execution.

Phone Recorder Firmware
The basic timing element in our system is the audio sampling rate, generated using Timer/Counter0’s Clear Timer on Compare Match (CTC) mode. The initialization code sets the Timer0 prescaler to CLK/8, making its clock 1.3842 MHz. The Timer0 Compare Match interrupt service routine sets the Output Compare 0 register to 171 every five times it runs or 172 otherwise. On average, this effectively implements a division by (4×173+172)/5 = 172.8, giving exactly 8,000 Hz with 0.46% maximum jitter, which is negligible for our purposes.

The trick to get all the features working together is to have them all synchronous to this basic rate that we call “the (audio) sample processing cycle” or “main cycle”. For simplicity, we assume this cycle is 1,376 cycles long or 172 Timer0 counts. Everything we must do must fit within this time or else we’ll miss an audio sample.

ADC and Audio Sampling
The ADC control registers are set up to make it auto-triggered by the same Timer0 Output Compare Match and we get an interrupt every time the ADC delivers a result. This happens regularly 832 CPU cycles after the conversion started (normal ADC conversions take 13 ADC cycles and we have set the ADC prescaler to CLK/64).

The ADC interrupt service routine presents the ADC results in two ways:

  • As an unsigned byte value in the range 0-255 where 0V is 128 (because of the DC bias imposed by the input circuitry). This value is called sample8 throughout the code.
  • As a signed 16-bit value where 0V is actually 0 – that is, we subtract the DC bias. In the code, this value is called sample16. Although presented as a 16 bit value, recall that the ADC result is actually only 10 bits long.

The unsigned value will be the one we report over the serial link, save to the memory card and use for playback. The top 8 bits of the signed value will be used in the DTMF detection algorithm. The full signed 16 bit value is used in the experimental ADPCM compressor.
We call a group of 206 audio samples an “audio block”. This is the period where the DTMF detector issues a new set of results. Many other things are synchronized to specific points in this cycle just for convenience and to limit the maximum number of cycles the several features require, in a kind of “static multitasking”.

Dialing Tones and Digits Detector
Modern telephones transmit the numbers we dial using a technique called Dual Tone Multi Frequency (DTMF), in which each key pressed in the telephone’s keypad generates a tone composed of two sine waves of specific frequencies.

 

1209 Hz

1336 Hz

1477 Hz

1633 Hz

697 Hz

1

2

3

A

770 Hz

4

5

6

B

852 Hz

7

8

9

C

941 Hz

*

0

#

D

Most telephone keypads do not have the keys corresponding to “A” to “D” digits. However, they are used in various kinds of control signaling within the telephone network. In our country, for instance, phone operators transmit Caller ID as a series of DTMF digits before the first ring.

Implementing a tone detector thus has to do with measuring the intensities of the signal at those particular frequencies. If we detect intense signals for a certain period of time simultaneously at the particular frequencies at the table above, we say the corresponding digit has been dialed.

In hardware, this is typically done with bandpass filters. In software, we will use digital filters. When we want to measure the signal intensity at just a few known-in-advance frequencies, there is a procedure called “Goertzel’s algorithm” that does precisely that. It is beyond the scope of this text to provide a general introduction to the theory of digital filters; the reader is referred to one of the many good introductions on the subject available on the Internet  or any good textbook on digital signal processing out there. The remaining of this section focuses on efficiently implementing Goertzel’s algorithm.

This algorithm needs some precomputed constants and has two parts: the “inner” part, executed for every sample; and the “outer” or “finalization” part, executed at the end of each block.

The basic parameter is the block size. The larger the block, the more sensitive the filter and the narrower the bandpass. We use a block size of N = 206 samples, which we call an “audio block”. This makes the minimum duration necessary for a detectable signal 206 / 8 KHz = 25.75 ms and the bandpass approximately the inverse of that or about 40 Hz.

From that, for each frequency we want to measure, we precompute:

k = int(0.5+f×N/S) and c = 2×cos(2 ×k × pi / N);

Where N is the block size, f is the frequency we want to calculate the intensity of and S is the sampling rate. The value c is called the “cosine coefficient” and it will be used throughout the rest of the algorithm.

Our implementation uses fixed point arithmetic with 9-bit coefficients, interpreted as 1 binary digit before the decimal point and 8 digits after; in other words, everything is scaled by 256. So, computing those values for our frequencies of interest gives the following results:

 

c in fixed point

frequency

k

c

decimal

hex

440

11

1.8885

483

1E3

697

18

1.7061

437

1B4

770

20

1.6393

420

1A3

852

22

1.5664

401

190

941

24

1.4876

381

17C

1209

31

1.1706

300

12B

1336

34

1.0176

260

104

1477

38

0.8004

205

0CC

1633

42

0.5714

146

092

Notice that we added another frequency: 440Hz. This is the frequency of the dialing, busy and remote ring tones. It may prove useful when we implement functionality that originates calls and needs to know if the called party took the call.

Now for the inner part of the Goertzel algorithm, for each of those frequencies and for every sample, we must compute:

y = d1 × c - d2 + s
d2=d1
d1=y

Where d1 and d2 are intermediate results, c is the cosine coefficient calculated above and s is the current sample. The first computation above is called the “multiply-and-accumulate” (MAC) operation.

Converting that equation to our fixed-point units, it becomes:

y=((d1×c)>>8)-d2+ss8

Where ss8 is the current sample as a signed 8-bit (that is, the 8 most significant bits of sample16) value and the >> operator is the shift right operation, as in C. The variables d1, d2 and c must be signed 16 bits will in fact be arrays, because we need to have them for each frequency. So, what we really need to do is actually this:

for (n=0;n<9;n++) {
y=((d1[n]*c[n])>>8)-d2[n]+ss8;
d2[n]=d1[n]; d1[n]=y;
}

After the end of an audio block, we have to perform the finalization computations:

m = d1×d1 + d2×d2 - d1×d2×s

The result m is the magnitude or intensity of the signal at that frequency. Converting that to fixed point, we have:

d1=d1>>8;
d2=d2>>8;
mag=d1*d1+d2*d2-d1*((d2*c[n])>>8);
d1=d2=0;

Let’s recap what we learned so far: Goertzel’s algorithm allows us to detect signal intensities at particular frequencies by making one 16 by 16 bit multiplication, one addition, one subtraction and one 8-bit shift right per sample per frequency being detected. At the end of the 206-sample block there’s additional computations involving three eight bit right shifts, three 16×16 bit multiplications, one addition and one subtraction. Considering the sophistication of that it does, it doesn’t sound that much computation. But the question is: can we compute that fast enough? This sounds like the perfect job for AVR’s built-in hardware multiplier, capable of multiplying two 8×8 bit operands in just 2 cycles.

However, trying to code this in C results in multiplying a 16 bit number (d1) by another 16 bit number (d2), resulting in a 32 bit number. Not only this is wasteful because it requires four 8x8 bit multiplications, but we will throw out the 8 least significant bits right away. Worst, from now on GCC generates code to evaluate the entire expression with 32 bit precision. Casting to short, forcing the rest of the computation to 16 bit arithmetic helps a bit, but there is a lot of overhead in dereferencing the arrays and moving data around. When we tried that with GCC, that single loop took about 100 cycles! Recall that we have to do that for 9 frequencies and we get an estimate of around 900 cycles to perform this computation – 65% of our 1376 cycle budget.

And we didn’t even get to the finalization part. Since there are about three times more operations in the finalization than in the inner part, a rough estimate is that it would take around 2700 cycles. That definitely won’t fit.

However, we can easily make the finalization fit by reordering things. The key to that is the observation that we don’t need all the results right after the end of the audio block. We can have two sets of d1 and d2 arrays, one in use by the inner part and other being handled by the finalization part with the results of the previous block. Besides, we can spread the finalization computations over the whole duration of the 206-sample block cycle, doing just one or two single operations per sample. Let’s divide the finalization operation in parts controlled by two counters:

if (nn switch (cc) {
case 0: d1[n]>>=8; break;
case 1: d2[n]>>=8; break;
case 2: mag =d1[n]*d1[n]; break;
case 3: mag+=d2[n]*d2[n]; break;
case 4: coef=c[n]; break;
case 5: coef*=d2[n]; coef>>=8; break;
case 6: coef*=d1[n]; break;
case 7: mag-=coef; break;
case 8: // results interpretation,
// omitted for brevity
case 9: d1[n]=d2[n]=0; break;
}
cc++; if (cc==10) cc=0,n++;

Now if d1 and d2 are actually pointers to one of two sets of arrays and we exchange each working set at the end of every block, we achieve precisely the spreading of the workload we wanted. Using this scheme, the finalization step takes less than 60 cycles per sample. The disadvantage is that we use twice as much memory, but since d1 and d2 are 9 short ints each, we’re talking going from 36 bytes to 72 bytes. That’s cheap.

Another disadvantage is that our results come out with a latency of slightly less than the time of one audio block – 25.8. No big deal either.
The next step is to optimize the inner loop. First we combined the d1 and d2 values in a single array d to simplify indexing, so we could move over the array using a single cycle 8-bit increment operation. (This also means that the d array cannot cross a 256-byte boundary). We unrolled the loop and implemented the multiplications in assembler ourselves, eliminating any operations on bytes that would be discarded by the shift rights. That trimmed down the ~100 cycle inner Goertzel operation to 42 cycles.

Another optimization is that the last two c values fit in 8 bits, so we can shove some more operations out. The last to items of the loop are implemented that way, saving an additional 6 cycles.

Yet another optimization is to notice that the c value for the 1336Hz frequency is very close to 1 (this happens whenever the frequecy to be detected is one-sixth of the sampling frequency). So we replace it by 1, transforming the multiplications into no-ops. So that particular frequency can be computed in just 27 cycles.

Summing up, we managed to trim the Goertzel inner loop to 42×6+27+36×2 = 351 cycles – a 60% improvement over the original 900 cycles. The full Goertzel algorithm, as implemented in the final program, including the finalization step and interpretation of the results, takes less than 500 cycles per sample or 37% of our cycle budget.

After the Goertzel algorithm does its work, we have signal intensities at each frequency. If they’re above a minimum threshold, we accumulate them in another as long as the signal is present. When the signal disappears, we report detection of the digits whose accumulated intensities were the highest for each group (rows or columns) and clear the accumulators.

As we said earlier, the caller ID message in our country is a stream of DTMF digits just before the first ring. Making a long story short, the format of the message is AXYYZZZZZZZZC, where “X” is a number whose meaning we are still to figure out (it sometimes comes as “1”, other times it’s a “2”), “YY” is the area code and “ZZZZZZZZ” is the phone number. Reception of the “A” digit signals the beginning of an incoming call and it starts the recording.

Although we do support detecting the dial tone (and the busy and remote ring tone), this feature is unused for now. It will be useful when we teach the device to call other parties.

Serial Communication
The serial protocol has to multiplex audio and control data on the same channel. This is achieved by using a escaping/bytestuffing technique: all bytes with the exception of a specific one called “command prefix” (0xFE in our implementation) are interpreted as audio data.

The following outbound (recorder to computer) “commands” have been defined:

Name

Format

Meaning

CMD_NOTIFY_ON_HOOK

FE 01

The line was just hung up

CMD_NOTIFY_OFF_HOOK

FE 02

The line was taken offhook

CMD_NOTIFY_DISCONNECTED

FE 03

No voltage on the line.

CMD_DTMF_DETECTED

FE 10 XX

DTMF digit XX (in ASCII) detected

CMD_LOCAL_RING_DETECTED

FE 11

Incoming ring detected

CMD_ADPCM_STATE

FE 00 XX YY YY

Start of ADPCM compressed audio block

CMD_RAW_BLOCK

FE 0F

Start of raw uncompressed audio block

CMD_LITERAL

FE FE

Literal 0xFE in the audio stream

The following inbound (computer to recorder) commands have are available:

Name

Format

Meaning

CMD_LCD_BACKLIGHT_ON

FE 80

Turns the LCD backlight on

CMD_LCD_BACKLIGHT_OFF

FE 81

Turns the LCD backlight off

CMD_GO_OFF_HOOK

FE 82

Enables hook relay, takes line offhook

CMD_GO_ON_HOOK

FE 83

Disables hook relay, brings line onhook

CMD_SET_TIME

FE 90 HH MM SS

Sets time to HH:MM:SS (in BCD)

CMD_SET_DATE

FE 91 YY MM DD

Sets date to MM/DD/YY (in BCD)

CMD_LITERAL

FE FE

Literal 0xFE in the audio stream

Transmission speed is 115,200 bps – the smallest standard serial speed that could accommodate the 8KB/s rate at which the audio data is generated. Notice that the command encoding above means we actually transmit at slightly more than 8KB/s. In the discussion below, bear in mind that a sample may mean more than one byte.

At the beginning of each 206-sample audio block we transmit a start block command. This allows the computer to know whether we’re sending the data compressed or uncompressed and, in case of the former, send the compression state information.
The rationale behind this simple encoding scheme is that it is simple to implement – requires only a modest-sized transmit FIFO in firmware, is quite easy to parse in real time.

Since much of the code is rather time critical, it is important that we spend as little time as possible worrying about serial transmission. So instead of an interrupt-based transmission system, we used a synchronous arrangement: when a new byte is to be transmitted, it is stashed in the transmit FIFO. Two strategic spots in the audio sample processing routine send anything pending in the FIFO to the UART. We do not check the computer’s CTS (Clear to Send) line – in fact, it’s not even wired up; we assume the PC can keep pace: any modern PC can easily handle 8KB/s.

Serial reception, on the other hand, is interrupt-based. However, we implement the hardware flow control “Request to Send” line to prevent the PC form sending data faster than we can consume. This is mainly needed because we there is also support for receiving audio data over the serial line for playing back through the monitor speaker or the phone line. We made little use this feature in our phone recorder application but it might become important when the device evolves to become a full-featured interactive voice response system.

MMC/SD card code
Although there are many publicly available libraries for interfacing with MMC/SD cards, the ones we found [17] assumed they had no time constraints, making them unsuitable for our application. We then set out to write our own code, using the same idea of splitting the operations over many audio processing cycles using many state machines controlled by global variables. The code works, but its long cascading ifs and complicated state transitions make it a bit hard to follow.

The MMC/SD code is the real memory hog: it uses nearly 80% of the available RAM. The MMC/SD block size is 512 bytes and we keep three of them in memory: two for the double buffer used for recording/playback and one for the table of pointers to the start clusters of each recording.

This memory restriction ruled out full-fledged traditional filesystems such as FAT, so we had to resort to a simple scheme where the memory card is viewed as a huge ring buffer. When it fills, older calls are simply overwritten.

In order to make the 230 pointers fit in a single 512-byte index sector, we invented our own card addressing scheme: we divide the card in 16-sector clusters (8K or approximately one second of audio) and each cluster number has is 16 bits wide. Thus we can support up cards with up to 2^20 sectors or 512MB, giving a theoretical maximum capacity of 18 hours and 38 minutes of raw uncompressed audio. In practice, this figure is a bit smaller because a 512 memory card reports only 1,003,776 sectors.

A nice finesse of the double buffering is that we can have both recording and playback simultaneously. This is possible because the playback and recording rates are identical and the code forces their boundaries to be synchronized as well.

We read data from the card in buffer A, and as each byte is sent to the playback PWM we immediately replace it with another one from fresh from the ADC. At the same time, the previously recorded data in buffer B is written to the card and the next sector to be played back is read – these two operations complete before the buffer A finishes being played back/recorded. When it does finish, the pointers to buffers A and B are exchanged and the whole process starts again. In this context, samples and bytes are the same. Also notice that the playback/record cycle is 512 samples long and completely unrelated to the 206-sample cycle of the DMTF detector.

Build instructions
The main MCU’s software was compiled using avr-gcc 3.4.3 and libc-1.2.3 from WinAVR-20050214. Download and install them first. Unpack the sources package in a directory of your choice and go there with your shell (we use Cygwin’s Bash).

Then take a look at the Makefile to see if you need to change something, such as paths or the name of your programmer passed to avrdude. After that, type just type make. If all goes we, do a make fuses to program the AVR to use the external crystal and disable the JTAG port (it interferes with the LCD). Finally, do a make upload to send the firmware to the MCU.

The initial part of the code has lots of #defines allowing us to disable specific features (we used that to make the timing analysis graphs) and turn on debugging features. Feel free to experiment with them.

Appendix: the description of parametres of the program for the computer, the scheme, source codes and.hex-files for microcontrollers, an source code and the program for the computer, the description of “Goertzel’s algorithm”, charging algorithm.

circuitcellar.com