The recent effort by bushing‘s team to develop an open-source USB protocol analyzer reminded me of a quick hack I did previously. I was debugging a tricky USB problem but only had an oscilloscope.
If you’ve been following this blog, you know one of my hobby projects has been designing a USB interface for old Commodore floppy drives. The goal is to archive old data, including the copy-protection bits, before the media fails. Back in January 2009, I was debugging the first prototype board. Most of the commands succeeded but one would fail immediately every time I sent it.
I tried a software USB analyzer, but it didn’t show any more information. The command was returning almost immediately with no data. Debugging output on the device’s UART didn’t show anything abnormal, except it was never receiving the problem command. So the problem had to be between the host and target’s USB stacks, and possibly was in the AVR‘s hardware USB state machine. Only a bus analyzer could reveal what was going on.
Like other hobby developers, I couldn’t justify the cost of a dedicated USB analyzer just to troubleshoot this one problem, especially in a design I would be releasing for free. Since I did have an oscilloscope at work, I decided to build a USB decoding stack on top of it.
USB, like Ethernet and TCP/IP, is a combination of protocols. The lowest layer is the physical cabling and bit signalling. On top of this is packet framing and device addressing. Next, each device has a set of endpoints. These are analogous to TCP/UDP ports and support control, bulk, or interrupt message types. The standard control endpoint (address 0) handles a set of common configuration messages. Other endpoints are device-specific.
High-speed signalling (480 Mbit/s) is a bit different from full/low-speed, so I won’t describe it here. Suffice to say, you can just put a USB 1.1 hub between your device and host to force it to downgrade speeds. Unless you’re trying to debug a problem with high-speed signalling itself, this is sufficient to debug protocol-level issues.
The USB physical layer uses differential current flow to signal bits. This balances the charge, decreasing the latency for line transitions and increasing noise rejection. I hooked up probes to the D+ and D- lines and saw a trace like this:
Each zero bit in USB is signalled by a transition, low to high or high to low. A one bit is signalled by no transition for the clock period. (This is called NRZI encoding). Obviously, there’s a chance for sender and receiver clocks to drift out of sync if there are too many one bits in a row, so a zero bit is stuffed into the frame after every 6 one bits. It is discarded by the receiver. An end-of-packet is signalled by a single-ended zero (SE0), which is both lines held low. You can see this at the beginning of the trace above
To start each packet, USB sends an 0×80 byte, least-significant bit first. This is 7 transitions followed by a one bit, allowing the receiver to synchronize their clock on it. You can see this in the trace above, just after the end-of-packet from the previous frame. After the sync bits, the rest of the frame is byte-oriented.
The host initiates every transaction. In a control transfer, it sends the command packet, generates an optional data phase (in/out from device), and ends with a status phase. If the transaction failed, the device returns an error byte.
My decoding script implemented all the layers in the quickest way possible. After taking a scope trace, I’d dump the samples to a file. The script would then run through them, looking for the first edge. If this edge was part of a sync byte, it would begin byte-aligned decoding of a frame to pass up to higher-level functions. At the end of the packet, it would go back to scanning for the next edge. Using python’s generators made this quite easy since it was just a series of nested loops instead of a complicated state machine.
Since this was a quick hack, I cut corners. To detect the SE0 end-of-packet, you really need to monitor both D+ and D-. At higher speeds, the peaks get lower since less current is exchanged. However, at lower speeds, you can ignore this and just put a scope probe on the D- line. Instead of proper decoding of the SE0, I’d just decode each frame until no more data was expected and then yield a fake EOP symbol to the upper layers.
After a few days of debugging, I found the problem. The LUFA USB stack I was using in my firmware had a bug. It had a filter for standard control messages (such as endpoint configuration) that it handled for you. Class-specific transactions were passed up to a handler in my firmware. The bug was that the filter was too permissive — all control transfers of type 6, even if they were class-specific, were captured by LUFA. This ended up returning an error without ever passing the message to my firmware. (By the way, the LUFA stack is excellent, and this bug has long since been fixed).
Back in the present, I’m glad to see the OpenVizsla project creating a cheaper USB analyzer. It should be a great product. Based on my experience, I have some questions about their approach I hope are helpful.
It seems kind of strange that they are going for high-speed support. Since the higher-level protocol messages you might want to reverse-engineer are the same regardless of speed, it would be cheaper to just handle low/full speed and use a hub to force devices to downgrade. I guess they might be dealing with proprietary devices, such as the Kinect, that refuse to operate at lower speeds. But if that isn’t the case, their namesake, the Beagle 12, is a great product for only $400.
I have used the Total Phase Beagle USB analyzers, and they’re really nice. As with most products these days, the software makes the difference. They support Windows, Mac, and Linux and have a useful API. They can output data in CSV or binary formats. They will be supporting USB 3.0 (5 Gbps) soon.
I am glad OpenVizsla will be driving down the price for USB analyzers and providing an option for hobbyists. At the same time, I have some concern that it will drive away business from a company that provides open APIs and well-supported software. Hopefully, Total Phase’s move upstream to USB 3.0 will keep them competitive for people doing commercial development and the OpenVizsla will fill an underserved niche.
