Improving Embedded Systems with Logic and Flexibility Hardware Peripherals
June 08, 2023
Blog
Overview: On-chip logic and flexibility peripherals improve applications by reducing code size, lowering power consumption, and improving system performance by implementing discrete logic and/or connecting other peripherals together.
On-chip hardware peripherals are well known for their ability to reduce power consumption, enhance performance, increase device capabilities, and reduce code size. These peripherals come in a variety of flavors, from Operational Amplifiers and Enhanced Analog to Digital Converters (ADC) to Pulse Width Modulators (PWM) and Universal Timers (UTMR).
One of the most powerful types of these peripherals are the ones that can implement discrete logic or can connect other peripherals together. This article will explore the Configurable Logic Cell (CLC), Configurable Custom Logic (CCL), Event System (EVSYS), and Signal Routing (SR) Port peripherals and how they can be used to add value to your design.
CLC / CCL
The Configurable Logic Cell (CLC) and Configurable Custom Logic (CCL) peripherals are programmable Look-Up Tables (LUT), with each being effectively equivalent to a single cell of an FPGA. The logic function configured inside of each peripheral instance is defined at runtime. The CLC/CCL can operate independently of the CPU, which allows it to replace discrete logic chips in the design. As for the difference between a CLC and a CCL, there are minor implementation differences. The CLC is PIC® microcontroller specific while the CCL is AVR® microcontroller specific. The basic operation of each peripheral remains the same.
Debouncing Buttons and Switches
One of the most common use cases for the CLC/CCL is to implement hardware-level debouncing in conjunction with a timer/oscillator. Application Note 2805 (AN2805) discusses three ways to implement debouncing using CLCs. Of the three, the two CLC versions strike a good balance between hardware resource usage and performance. Source code for the three versions is available on Github.
To implement the debouncer, one of the CLCs is set up as a D flip-flop to latch the value from the button or switch. The second CLC logically ANDs the latched value from the previous stage with the direct input and then latches the resulting value. The clock source for both flip-flops is a low-frequency clock source generated by a timer or oscillator on the device. The implementation is shown below.
Figure1 – Two CLC Debouncer
For the CCLs on AVR MCUs, it’s even simpler than this. The CCLs contain an input filtering option which effectively performs the same two-cycle filtering operation that was implemented on the CLCs. Additionally, the CCLs can be clocked from a 1 kHz oscillator on the device, which is slow enough to perform debouncing.
Quadrature Decoding
Another use case of the CLCs is to perform quadrature decoding. Incremental quadrature encoders generate two square waves where one phase leads the other by 90 degrees. The amount of rotation is determined by the number of waveforms, while the phase of the waveform indicates the direction. The image below shows an example of a quadrature encoded signal.
Figure 2 – Sample Quadrature Waveform
To decode this, the CLCs convert this waveform into two outputs, representing clockwise and counterclockwise pulses. Two timers in the microcontroller count the number of pulses received. When the microcontroller needs to know the net change in position, some simple math can be performed on the counts of each timer to determine the net change since the last read.
Event System
The Event System (EVSYS) is designed to select an output signal from another peripheral and route it to other peripheral(s) internal to the microcontroller. This interconnectivity can happen independently of the CPU, which saves power when in sleep or idle mode and improves performance. EVSYS is only found in AVR microcontrollers.
SR Port
On PIC microcontrollers, there is a peripheral known as the Signal Routing (SR) Port. The SR Port is a structure like an output I/O port but internalized. Software can be used to manually set or clear the bits inside, like a standard I/O output register, but it also supports peripheral output signals and shift register functionality.
The SR Port also works well with the Peripheral Pin Select (PPS) feature also found on PIC microcontrollers. PPS gives the designer flexibility in assigning I/O by allowing digital I/O signals to be moved to different pins of the microcontroller. Similarly, PPS allows peripherals to select the SR Port “pins” as inputs individually. This allows for advanced state machines to be constructed and controlled using this peripheral.
On-Device Signal Select
One possible use case of the SR Port and CLC is to implement an internal signal select multiplexer. This could be used for internal self-test or to select one of N signals to process. To implement this, a CLC is used to create a 2:1 multiplexer. A 4:1 multiplexer is also possible but uses three CLCs and two bits from the SR Port. To control the multiplexer, a bit from the SR Port is used as a select line. The logical implementation is shown below.
Figure 3 – 2:1 Multiplexer (As implemented within the CLC logic; unused inputs hidden)
One advantage that this implementation has over PPS is speed and flexibility. PPS can be locked to prevent inadvertent changes at runtime. Additionally, a config bit can be set to make sure the PPS is only unlockable once. The CLC multiplexer setup allows the program to change inputs without going through an unlock sequence every time. This use case is used in the Morse Code demo for Microchip’s PIC18F56Q71 family. The demo creates a simple morse code transmitter and receiver – the multiplexer is used to select between the transmitter output and an external input signal for receiving.
Figure 4 – Block Diagram of the Morse Code Demo
Inside of the program, single bit set/clear/toggle macros for the SR Port pins are defined for readability.
//Select input to decoder
#define SELECT_TX_DECODE() do { RW0_SetLow(); } while(0)
#define SELECT_USER_DECODE() do { RW0_SetHigh(); } while(0)
#define SWITCH_DECODE_SOURCE() do { RW0_Toggle(); } while(0)
The code snippet below is responsible for cleaning switching between the input sources. When the transmitter and receiver are idle, and the user has sent a ‘#’ to the terminal, it will switch input sources.
if (morseTx_isSwitchRequested() && morseRx_isIdle() && morseTx_isIdle())
{
//Request to switch input sources
SWITCH_DECODE_SOURCE();
morseTx_clearSwitchRequest();
if (IS_USER_INPUT_ACTIVE())
{
//User Input
printf("User input is now active.\r\n");
}
else
{
//TX Input
printf("Transmitter input is now active.\r\n");
}
}
The source code for this program is available on Github.
Hardware Accelerated Parity Calculation
In some cases, it is necessary to generate a parity bit for data transmission or communication. Calculating the parity in software is trivial, but slower to execute than in hardware. A simple function is shown below.
Note: The test pattern evaluated is stored globally for these simple examples.
bool isOdd_SW(void)
{
bool isOdd = false;
uint8_t temp;
//Byte Scan
for (uint8_t byIndex = 0; byIndex < DATA_SCAN_LENGTH; byIndex++)
{
//Bit Scan
temp = data[byIndex];
for (uint8_t biIndex = 0; biIndex != 8; biIndex++)
{
if (temp & 0b1)
{
//Count
isOdd = !isOdd;
}
//Shift bits
temp >>= 1;
}
}
return (isOdd);
}
To accelerate computation, an SPI peripheral can be used with a CLC to build a hardware parity calculator. The SPI hardware contains a serial shift register to transmit and receive data. The output of the SPI hardware (the shift register) can be fed into a CLC to create a parity calculator that operates at a substantially faster rate than the software version.
Figure 5 – Hardware Implementation
This implementation can also reuse an existing SPI peripheral, further lowering the implementation costs. This is because normally the Chip Select (CS) line must be asserted to communicate with other SPI devices. But, if it is not asserted, then communication is ignored.
To perform the hardware accelerated version, first capture the current latched value in the CLC. Then transmit the data to be parity calculated with CS deasserted. Then, look at the new value latched in the CLC. If the new value is equal to the previous value, then an even number of ones were counted. If the values mismatch, there was an odd number. The software to compute and odd parity is shown below.
bool isOdd_HW(void)
{
bool initialState = CLC3_OutputStatusGet();
SPI1_BufferWrite(&data[0], DATA_SCAN_LENGTH);
return (!(CLC3_OutputStatusGet() == initialState));
}
To demonstrate the speed advantage, a small demo program was built on Microchip’s PIC16F18146 family of microcontrollers. One of the internal timers, Timer 1, was used to count the number of clock cycles (FOSC/4) it takes to execute the hardware and software parity calculations. The results are printed to a serial terminal for analysis. A copy of the performance results at different compiler optimization levels is included below.
Optimization Level |
Software Time |
Hardware Time |
Performance Difference (%) |
Level 0 (None) |
1726 |
552 |
+313% |
Level 1 |
1371 |
533 |
+257% |
Level 2 |
1371 |
533 |
+257% |
Level 3 (Speed) |
1019 |
465 |
+219% |
Level S (Size) |
1019 |
465 |
+219% |
Table 1 – Performance Differences between Software and Hardware Implementations, 10-byte sequence
The performance improvement shown in this example is dependent on the clock speeds of the microcontroller and that of the SPI peripheral. The source code of this program is available on Github.
Concluding Thoughts
Hardware peripherals are an important feature in embedded systems. Using them, microcontrollers can become more powerful, efficient, and capable. Logic and flexibility peripherals are especially powerful tools for handling simple tasks like debouncing or quadrature decoding. Creative use of the hardware peripherals enhances designs and and pushes the boundaries of what is possible on a microcontroller.