Audiobooks
Turn some audiobooks resources into podcasts RSS
Type in book title, copy the RSS link, paste it somewhere.
1. Context
This page is mostly autogenerated from the http://litteratureaudio.com website.
They do a fantastic job reading books and publishing mp3 … in wordpress articles. I like the way http://librivox.org publishes books as an RSS feed, for everyone to enjoy in their favorite podcast player.
2. Create RSS from the website
RSS is a must have for audiobooks, to be able to keep track and sync between devices.
litteratureaudio.com is a sad wordpress instance with little to no automation options. The content seems to be mostly organized by hand. There is no apparent full RSS nor json, XML-RDF of any sort. To produce a RSS, we will need to first scrap the website.
2.1. Get the full list of books
http://www.litteratureaudio.com/notre-bibliotheque-de-livres-audio-gratuits is all we need. There are 10000+ books.
2.2. Handle the different kinds of books
2.2.1. Books aren't always laid out and structured the same.
Most books embed the mp3 list in a .link-roman-mp3-file
class container but if there is only one episode, then the mp3 link is just sitting.
2.2.2. New version
Of course, they had to change everything… Now, the website got updated, the script breaks, the provided download link don't work with Apple podcast. The good news is that there is an API, which should make things easier.
- the entry point is https://www.litteratureaudio.com/wp-json/
- the "book" entry point is https://www.litteratureaudio.com/wp-json/wp/v2/posts, items reference their "episodes", named
stations
. This endpoint doesn't bring in much value. I will probably stick with the HTML thing, which makes it easier to get images, author names and such. - the "episodes" entry point is https://www.litteratureaudio.com/wp-json/wp/v2/station, they don't seem to reference their parent quite directly BUT they provide a proper link to the MP3 file, not the nonce obfuscated one.
the API seems to have search features though, maybe I could batch requests. Nope, actually it's bad enough that it embed stations in posts.
Anyway. https://github.com/jeromenerf/ab2rss
It took a good hour to fetch everything.
2.3. Extract more information about each book
Tags, author name, publication date, an image for the feed.
2.4. Validate the RSS to what apple requires
Use https://castfeedvalidator.com/ as it seems good enough for our needs.
3. Generate the RSS index JSON file for the search / fuzzy finder engine
here=`pwd` cd ~/public_html/audiobooks touch index.json echo -n "[" > index.json for file in `ls *.rss`; do parts=`echo $file | sed 's/[_,.-]/ /g'` echo "{\"title\": \"$parts\", \"uri\": \"$file\"}," >> index.json done echo -n "{}]" >> index.json cp index.json $here/ date exit 0
4. What about text to speech to create audiobooks?
4.1. Goals
- create audiobook from text
- ensure minimal voice quality
- probably use the podcast format because I prefer it this way
Non goals: share those, copyright infringement blabla.
4.2. Some text-to-speech
4.2.1. Festival
4.2.2. Cloud
Hmm, sure, but most models seem to be GCP/AWS/Azure services I don't see myself using. Maybe something from the recent AI/ML overlords.
4.2.3. Mozilla
4.2.3.1. TTS
https://github.com/mozilla/TTS
Seems pretty low level and no support for french.
4.2.3.2. DeepSpeech
https://github.com/mozilla/DeepSpeech Seems older.