Audiobooks

Turn some audiobooks resources into podcasts RSS

Type in book title, copy the RSS link, paste it somewhere.

1. Context

This page is mostly autogenerated from the http://litteratureaudio.com website.

They do a fantastic job reading books and publishing mp3 … in wordpress articles. I like the way http://librivox.org publishes books as an RSS feed, for everyone to enjoy in their favorite podcast player.

2. Create RSS from the website

RSS is a must have for audiobooks, to be able to keep track and sync between devices.

litteratureaudio.com is a sad wordpress instance with little to no automation options. The content seems to be mostly organized by hand. There is no apparent full RSS nor json, XML-RDF of any sort. To produce a RSS, we will need to first scrap the website.

2.2. Handle the different kinds of books

2.2.1. Books aren't always laid out and structured the same.

Most books embed the mp3 list in a .link-roman-mp3-file class container but if there is only one episode, then the mp3 link is just sitting.

2.2.2. [2023-04-15 Sat] New version

Of course, they had to change everything… Now, the website got updated, the script breaks, the provided download link don't work with Apple podcast. The good news is that there is an API, which should make things easier.

2.3. Extract more information about each book

Tags, author name, publication date, an image for the feed.

2.4. Validate the RSS to what apple requires

Use https://castfeedvalidator.com/ as it seems good enough for our needs.

3. Generate the RSS index JSON file for the search / fuzzy finder engine

    here=`pwd`
    cd ~/public_html/audiobooks
    touch index.json
    echo -n "[" > index.json
    for file in `ls *.rss`; do
        parts=`echo $file | sed 's/[_,.-]/ /g'`
        echo "{\"title\": \"$parts\", \"uri\": \"$file\"}," >> index.json
    done
    echo -n "{}]" >> index.json
    cp index.json $here/
    date
    exit 0

4. What about text to speech to create audiobooks?

4.1. Goals

  • create audiobook from text
  • ensure minimal voice quality
  • probably use the podcast format because I prefer it this way

Non goals: share those, copyright infringement blabla.

4.2. Some text-to-speech

4.2.1. Festival

4.2.2. Cloud

Hmm, sure, but most models seem to be GCP/AWS/Azure services I don't see myself using. Maybe something from the recent AI/ML overlords.

4.2.3. Mozilla

4.2.3.1. TTS

https://github.com/mozilla/TTS

Seems pretty low level and no support for french.