This isn’t the next One New Thing a Quarter I thought I’d be writing, but I’ve learned some things on this topic that I want to record, as much for myself as for anyone.
This started with an offhand remark on the Polymorphic Podcast that there was an accelerated feed available. Listening to it, I realized that a lot of technical podcasts could be easily digested at, say, 120-130% of normal speed. For some, it’s a lot more tolerable for me to listen to them this way.
This led to the Dream Application. It would run on a web server and refactor a requested RSS feed to point back to itself. When a feed reader went to download an mp3, the application would download the mp3 from the actual source, speed it up and stream it down to the reader. After about 2 weeks of light work on that application, someone pointed me to PodShifter, which is exactly what I was building. My only problem with their solution, and I do not think it is their implementation, is that it is slow. The feed reader requests the mp3, it gets 404ed and then, maybe 15 minutes later, the mp3 is actually available to download.
So I’ve refactored my approach… what I want now is a service that watches the podcast directories I want to accelerate. When it sees a new mp3 stored there, it should pick it up, accelerate it and store it back in the same filename so that when I plug my Zune in, the Zune software will pick up the accelerated version.
Bricks in the Wall
There are really 3 bits to this solution and, luckily, there are libraries available for each of them. I don’t have to learn all about audio formats to do this! (Or DO I?)
- Decoding an existing mp3. Libmpg123 takes care of this. It’s a fairly straightforward C style interface which suits itself to PInvoke fairly well.
- Accelerating the data (without pitch shifting it). SoundTouch seems to be the default solution to this. Unfortunately, this is implemented as a C++ Class rather than a C library. There’s no real way to PInvoke to a C++ class… unless you want to do all the work of normalizing the actual function names out of the DLL and passing the “this” pointer around. Not Fun.
- Encoding the decoded data. LAME is the standard “free” solution to this problem, and there is a liblame implementation that is also fairly PInvoke friendly.
Since SoundTouch threw a wrench into my initial plan of doing C# and PInvoke, I decided to reboot and start writing a managed C++ wrapper class for each of the libraries and using C# to wire the wrappers together. Each interface would basically take an input stream and an output stream.
Managing the unmanageable
The only real HARD part of all this was re-learning C++ and figuring out how C++ works in a managed environment. There are 2 things I learned that were really key:
- To work with framework objects, you wind up using ^ in the declaration… like this:
This tells C++ that this is a managed, garbage collected class and you get all the benefits of doing that, just like you would in C#. If you need to create a new managed class, you use gcnew rather than new. At that point, your work is done… no need to call delete on such an item.
- Creating unmanaged pointers is more interesting. Obviously, the C and C++ libraries I am calling don’t know anything about a System.Byte pointer. More interestingly, managed objects can be moved around in memory at the whim of the runtime… which doesn’t work out so well for C libraries that just want some memory they can dump stuff into. The solution is to pin the object in memory with pin_ptr. For example, to create an array of bytes that I can get to with managed and unmanaged code, I do the following:
array<System::Byte> ^ inputBufferLeftManaged ; //Managed byte buffer
inputBufferLeftManaged = gcnew array<System::Byte>(327680) ; //Allocate the byte buffer
cli::pin_ptr<unsigned char> inputBufferLeft = &inputBufferLeftManaged ; //Get an unmanaged pointer to the memory. This pins the object in memory so it can’t move
You can then cast the pinned pointer to other kinds of pointers, and pass it into the relevant functions.
NOTE: This is all recent learnings, so I suspect I have a memory leak in here. I’ll update this if I learn this is the case and how to deal with it.
Whatever floats your boat
After getting things wired up, I was getting nothing but garbage out of the sound modification library. If I just wired the decoder and encoder together, I got a reasonable mp3, but running it through the tempo adjustment meant that I got nothing but static. Some investigation later revealed that recompiling that library to deal with float samples rather than the 16-bit unsigned samples I had been using resolved this issue… so I changed the other libraries to work with float samples instead so I have a consistent data stream throughout the process.
Don’t Cross the Streams
A side effect of the change to floats is that the lame encoding library doesn’t support a single interleaved buffer when dealing with floats… instead I have to pass in a buffer for the left channel and a buffer for the right channel. I wound up writing a loop after the audio modification stem to pull data off the stream and drop it onto a left or right stream, de-interleaving the data myself. I wound up doing this in the C# code for simplicity… less chance of overrunning a buffer off into infinity.
Wall of Silence
At the moment, I’ve hit something of a wall with this project. Everything seems to work OK, but ever since switching to float audio data, I’ve had nothing but silence from the encoded mp3. It appears there’s real data going INTO it, but nothing but silence coming out. After several hours of debugging, I’ve set it to the side for the time being. Hopefully I’ll be back to this before too long, but there’s so much out there to do I’ll probably just live with the quirks of PodShifter (which IS really good).