I needed some debug and tracing information on Zephyr RTOS. It was a bit of a struggle, but I did retrieve some useful information, so here is a compilation of my findings about tracing.
Disclaimer: writing this on Zephyr V3.6, if you are me from the future, check your software version first.
What does it do
Traces only export data from the kernel itself, namely threads, semaphores, and interrupts. It won’t trace functions, so it will mostly be useful for timing debugging, typically getting a view on how much time is spent in idle thread versus work threads.
There is a very useful blog post explaining how to add custom traces to functions, but it is quite heavy work:
https://jonathan.rico.live/blog/2023-11-13/
How to set it up
In my case, I have an ST-Link debugger. There are some more advanced trace options with Segger chips, but I will have to use the most basic features:
- CTF format is Common Trace Files format, a quite crude but efficient way to pack up trace data in a binary file.
- Backend RAM means that the trace will not be sent through UART, or USB, but will be recorded into a RAM buffer. The total size is set by CONFIG_RAM_TRACING_BUFFER_SIZE. I chose the max value I could, 65536. And since we are writing directly to RAM, it should be fast enough to avoid disrupting the program execution.
Tracing is activated in prj.conf:
CONFIG_TRACING=y
CONFIG_TRACING_CTF=y
CONFIG_TRACING_BACKEND_RAM=y
CONFIG_TRACING_BUFFER_SIZE=2048
CONFIG_RAM_TRACING_BUFFER_SIZE=65536
CONFIG_TRACING_SYNC=y
You can also toggle trace types on or off:
CONFIG_TRACING_ISR=n
CONFIG_TRACING_SEMAPHORE=y
After building, you can check the ram buffer was indeed created in build/zephyr/zephyr.map:
.bss.ram_tracing
0x0000000020001964 0x1000 zephyr/libzephyr.a
0x0000000020001964 ram_tracing
How to retrieve your RAM buffer
Well, you will need GDB, and dump the trace with:
(gdb) dump binary memory data/mytrace.ctf ram_tracing ram_tracing+65536
But wait, is this buffer circular, how do I know where I am? The answer is: this buffer just records everything since the start of program and stops when the buffer is full. If you need startup data, it’s perfect… but if you don’t, keep reading.
How to choose what to trace
The “buffer full” event is handled in tracing_backend_ram.c:
zephyr/subsys/tracing/tracing_backend_ram.c
At this point, the pos cursor reach its max value and we set buffer full to true.
We can use GDB to force those variables back to pristine state:
(gdb) set variable pos=0
(gdb) set variable buffer_full=0
For instance, to get an infinite buffer retrieval, it is possible to break when buffer is full, download, and reset variables. You need to have a folder called “data” in the path from where you are calling gdb.
(gdb) break tracing_backend_ram.c:27
(gdb) continue
(gdb) dump binary memory data/mytrace.ctf ram_tracing ram_tracing+65536
(gdb) set variable pos=0
(gdb) set variable buffer_full=0
(gdb) continue
Or, it is also possible to break before something interesting happens, reset variables, and break again after the something.
And, since I am ashamed of nothing, I wrote a script for that:
https://github.com/everedero/misc_scripts/blob/main/script.gdb
(gdb) source script.gdb
For complete download:
(gdb) dltrace
For tracing between two points:
(gdb) tracebetween main.c:331 main.c:333
Update: In upcoming versions (> 3.6.0) it will be possible to reset the RAM content by calling init().
How to convert traces
CTF to text with babeltrace2
You got yourself a bunch of bytes, awesome. But how to convert it to something useful?
First, fun fact, CTF format is not complete without a “metadata” file which describes the CTF packets formats. In order to decypher the CTF file, it is necessary to put it in the same folder as its metadata file:
data
├── mytrace.ctf
├── metadata
Zephyr provides its metadata file:
cp ~/zephyrproject/zephyr/subsys/tracing/ctf/tsdl/metadata ./data
Now, a parser is required. I chose babeltrace2 to simply convert ctf files to text.
To install babeltrace2:
sudo apt-add-repository ppa:lttng/ppa
sudo apt-get update
sudo apt-get install babeltrace2
To convert the CTF to text:
babeltrace2 convert --component=ctf:source.ctf.fs --params='inputs=["data"]' --component=muxer:filter.utils.muxer --component=pretty:sink.text.pretty > trace.txt
This takes the data folder as input, and converts it from “ctf.fs” to “text.pretty”. Syntax looks like GStreamer CLI, and I have always been scared of GStreamer CLI.
Since we downloaded the full buffer with potential 0-padding or remaining data from previous iteration, babeltrace2 will complain that it cannot parse the end of the file.
02-26 13:36:46.555 24379 24379 E PLUGIN/CTF/MSG-ITER set_current_event_class@msg-iter.c:1276 [ctf] No event class with ID of event class ID to use in stream class: msg-it-addr=0x55bd46d4e850, stream-class-id=0, event-class-id=0, trace-cla
ss-addr=0x55bd46cd1af0
02-26 13:36:46.555 24379 24379 E PLUGIN/CTF/MSG-ITER ctf_msg_iter_get_next_message@msg-iter.c:2897 [ctf] Cannot handle state: msg-it-addr=0x55bd46d4e850, state=AFTER_EVENT_HEADER
02-26 13:36:46.555 24379 24379 E PLUGIN/SRC.CTF.FS ctf_fs_iterator_next_one@fs.c:108 [ctf] Failed to get next message from CTF message iterator.
This is what happens with 0-paddings: stream-class-id=0 and event-class-id=0 suggests that it could only find zeros instead of valid classes defined by the metadata file.
But it works all right, it’s just like me on roller skates: I don’t know how to brake, so I crash into lamp posts instead.
Result
[01:00:02.724926184] (+0.000004238) semaphore_give_enter: { id = 536871028 }
[01:00:02.724930250] (+0.000004066) semaphore_give_exit: { id = 536871028 }
[01:00:02.724935964] (+0.000005714) semaphore_take_enter: { id = 536871028, timeout = 4294967196 }
[01:00:02.724940398] (+0.000004434) semaphore_take_exit: { id = 536871028, timeout = 4294967196, ret = 0 }
[01:00:02.724948363] (+0.000007965) semaphore_take_enter: { id = 536871048, timeout = 200000 }
[01:00:02.724952809] (+0.000004446) semaphore_take_blocking: { id = 536871048, timeout = 200000 }
[01:00:02.724963767] (+0.000010958) isr_enter:
[01:00:02.724971285] (+0.000007518) isr_exit:
[01:00:02.724975517] (+0.000004232) isr_enter:
[01:00:02.724982285] (+0.000006768) isr_exit:
[01:00:02.724986523] (+0.000004238) isr_enter:
[01:00:02.724993285] (+0.000006762) isr_exit:
[01:00:02.724997523] (+0.000004238) isr_enter:
[01:00:02.725006386] (+0.000008863) semaphore_give_enter: { id = 536871048 }
[01:00:02.725014005] (+0.000007619) semaphore_give_exit: { id = 536871048 }
[01:00:02.725018375] (+0.000004370) isr_exit:
[01:00:02.725021857] (+0.000003482) isr_enter:
[01:00:02.725028363] (+0.000006506) semaphore_give_enter: { id = 536871048 }
[01:00:02.725033166] (+0.000004803) semaphore_give_exit: { id = 536871048 }
[01:00:02.725036809] (+0.000003643) isr_exit:
[01:00:02.725042505] (+0.000005696) thread_switched_out: { thread_id = 536871360, name = "main" }
[01:00:02.725052303] (+0.000009798) thread_switched_in: { thread_id = 536871360, name = "main" }
[01:00:02.725057720] (+0.000005417) semaphore_take_exit: { id = 536871048, timeout = 200000, ret = 0 }
[01:00:02.725061952] (+0.000004232) semaphore_give_enter: { id = 536871028 }
[01:00:02.725066005] (+0.000004053) semaphore_give_exit: { id = 536871028 }
[01:00:02.725072910] (+0.000006905) semaphore_take_enter: { id = 536871028, timeout = 4294967196 }
[01:00:02.725077351] (+0.000004441) semaphore_take_exit: { id = 536871028, timeout = 4294967196, ret = 0 }
“id” gives you a RAM address, you can use the GDB session to find what lives here:
(gdb) x /1x 536871012
0x20000064 <spi_stm32_dev_data_0+16>: 0x20000064
(gdb) print spi_stm32_dev_data_0
$1 = {ctx = {config = 0x8007d40 <ili9xxx_config_0+8>, owner = 0x0, cs_gpios = 0x8008028 <__compound_literal.0>, num_cs_gpios = 1, lock = {wait_q = {waitq = {{head = 0x20000064 <spi_stm32_dev_data_0+16>, next = 0x20000064 <spi_stm32_dev_data_0+16>}, {tail = 0x20000064 <spi_stm32_dev_data_0+16>, prev = 0x20000064 <spi_stm32_dev_data_0+16>}}}, count = 1, limit = 1, _obj_track_next = 0x0}, sync = {wait_q = {waitq = {{head = 0x20000078 <spi_stm32_dev_data_0+36>, next = 0x20000078 <spi_stm32_dev_data_0+36>}, {tail = 0x20000078 <spi_stm32_dev_data_0+36>, prev = 0x20000078 <spi_stm32_dev_data_0+36>}}}, count = 0, limit = 1, _obj_track_next = 0x0}, sync_status = 0, current_tx = 0x200119b8 <z_main_stack+888>, tx_count = 0, current_rx = 0x0, rx_count = 0, tx_buf = 0x0, tx_len = 0, rx_buf = 0x0, rx_len = 0}}
Some of those addresses can also be recovered from the zephyr.map file, here with a semaphore with id=536936612=0x200100a4:
grep -rnw ./build/zephyr/zephyr.map -e "0x.*200100a4"
2706:k_sem_area 0x00000000200100a4 0x28 load address 0x0000000008004cfc
2707: 0x00000000200100a4 _k_sem_list_start = .
2710: 0x00000000200100a4 0x14 app/libapp.a(main.c.obj)
2711: 0x00000000200100a4 threadA_sem
In-tree CTF parsing script
There is a parsing script in Zephyr directory:
./scripts/tracing/parse_ctf.py
It is calling BabelTrace2 via its Python bindings. It also has the same “there’s an error, just ignore it” issue.
https://github.com/everedero/zephyr/tree/eve/trace_experiments
python3 ../scripts/tracing/parse_ctf.py -t data > trace.txt
Result
Example trace with the sample/subsys/tracing directory, using a custom RAM trace config (prj_ram.conf). It is slightly more readable, but does not show all the fields.
1970-01-01 01:00:00.005947 (+0.000000 s): thread_create: thread_a
1970-01-01 01:00:00.005954 (+0.000007 s): thread_info (Stack size: 2048)
1970-01-01 01:00:00.005977 (+0.000023 s): thread_switched_out: main
1970-01-01 01:00:00.005991 (+0.000013 s): thread_switched_in: thread_a
1970-01-01 01:00:00.006003 (+0.000012 s): thread_create: unknown
1970-01-01 01:00:00.006010 (+0.000007 s): thread_info (Stack size: 2048)
1970-01-01 01:00:00.006024 (+0.000014 s): thread_name_set
1970-01-01 01:00:00.006032 (+0.000008 s): semaphore_take_enter
1970-01-01 01:00:00.006038 (+0.000006 s): semaphore_take_exit
1970-01-01 01:00:00.009628 (+0.003590 s): thread_switched_out: thread_a
1970-01-01 01:00:00.009641 (+0.000013 s): thread_switched_in: thread_b
1970-01-01 01:00:00.009649 (+0.000008 s): semaphore_take_enter
1970-01-01 01:00:00.009656 (+0.000006 s): semaphore_take_blocking
1970-01-01 01:00:00.009668 (+0.000013 s): thread_switched_out: thread_b
1970-01-01 01:00:00.009682 (+0.000013 s): thread_switched_in: idle
1970-01-01 01:00:00.009690 (+0.000008 s): idle
1970-01-01 01:00:00.087132 (+0.077442 s): idle
1970-01-01 01:00:00.164637 (+0.077505 s): idle
[...]
CTF to Google Perfetto CTF
In order to be displayed in Perfetto, traces have to be converted to a specific JSON format.
The ctf2ctf.py converter exists in an unmerged branch here and is also described in the same blog post: https://jonathan.rico.live/blog/2023-11-13/
python3 ctf2ctf.py -t ./data/
It ends with a “Trace does not terminate cleanly” message, because of the exact same issue as before, but still outputs an out.json file.
Result
"traceEvents": [{"pid": 0, "tid": 0, "name": "thread_create", "ph": "i", "ts": 5946.912, "args": {"dummy": 0}},
{"pid": 0, "tid": 0, "name": "thread_info", "ph": "i", "ts": 5954.342, "args": {"dummy": 0}},
{"pid": 0, "tid": 536937200, "name": "running", "ph": "E", "ts": 5977.495, "args": {"dummy": 0}},
{"pid": 0, "tid": 536936656, "name": "running", "ph": "B", "ts": 5990.782, "args": {"dummy": 0}},
{"pid": 0, "tid": 0, "name": "thread_create", "ph": "i", "ts": 6003.111, "args": {"dummy": 0}},
{"pid": 0, "tid": 0, "name": "thread_info", "ph": "i", "ts": 6009.976, "args": {"dummy": 0}},
{"pid": 0, "tid": 0, "name": "thread_name_set", "ph": "i", "ts": 6024.296, "args": {"dummy": 0}},
{"pid": 0, "tid": 3, "name": "semaphore_take_enter", "ph": "i", "ts": 6032.236, "args": {"dummy": 0}},
{"pid": 0, "tid": 3, "name": "semaphore_take_exit", "ph": "i", "ts": 6038.069, "args": {"dummy": 0}},
{"pid": 0, "tid": 536936656, "name": "running", "ph": "E", "ts": 9628.083, "args": {"dummy": 0}},
{"pid": 0, "tid": 536936832, "name": "running", "ph": "B", "ts": 9641.314, "args": {"dummy": 0}},
{"pid": 0, "tid": 3, "name": "semaphore_take_enter", "ph": "i", "ts": 9649.462, "args": {"dummy": 0}},
{"pid": 0, "tid": 3, "name": "semaphore_take_blocking", "ph": "i", "ts": 9655.773, "args": {"dummy": 0}},
{"pid": 0, "tid": 536936832, "name": "running", "ph": "E", "ts": 9668.462, "args": {"dummy": 0}},
{"pid": 0, "tid": 536937024, "name": "running", "ph": "B", "ts": 9681.745, "args": {"dummy": 0}},
[...]
Trace GUI
You can also download and use a GUI to visualize the trace.
I tried two options:
- Tracecompass
- Perfetto
Tracecompass
Tracecompass is open-source (EPL V2 license), it is based on Eclipse, hence on Java, except I don’t have the right Java version, oh well.
After a bit of fiddling around, I found a suitable Java for it, and I cannot remember how.
Result
Now, it would need a plugin to display the start and end of an event and create advanced visualizations.
An example of such scripts for FreeRTOS: https://github.com/gpollo/freertos-barectf/tree/master/scripts
I dug up the Internet for Zephyr TraceCompass parser scripts, but I could not find any, so I made a draft parser for threads. Here it what it looks like:
https://github.com/everedero/misc_scripts/tree/main/tracecompass_scripts
Perfetto
Perfetto is a web-based UI from Google, and it is open-source, with Apache 2.0 license.
The output json file generated by ctf2ctf.py can be opened in https://ui.perfetto.dev:
Conclusion
GDB scripting, binary file formats, GStreamer-like CLIs, Java version issues… there are lots of things I struggled with. But I am glad I made it this far and I can now use some tracing when needed!
And if you have any updates or projects, come find me on zephyrproject Discord channel #tracing.