Extract chapters from Youtube Media

Youtube recently got this “chapter” concept where it fragment a long video with chapters. I think this data might be parsed from the description of the video done, as they already parse any timestamp available for a while now.

Thanks to youtube-dl, we can download thena video and the metadata which now contains this chapter data.

$ youtube-dl --write-info-json -x --audio-format mp3 https://www.youtube.com/watch?v=HZTStHzWRxM
[youtube] HZTStHzWRxM: Downloading webpage
[info] Writing video description metadata as JSON to: The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.info.json
[download] Destination: The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.webm
[download] 100% of 3.22MiB in 00:00
[ffmpeg] Destination: The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.mp3
Deleting original file The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.webm (pass -k to keep)

We will use https://www.youtube.com/watch?v=HZTStHzWRxM as example.

The command above will download the video file, transcode it to mp3 and also download the metadata in a json format. We have now 2 files :

  • The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.info.json that contains data
  • The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.mp3 that is the media

jq is a wonderful command line to manipulate json on bash. We can for example get the title of the video like this :

$ cat The\ New\ Youtube\ Chapter\ Timestamp\ Feature-HZTStHzWRxM.info.json | jq -r .title  | sed -e 's/[^A-Za-z0-9._-]/_/g'

The_New_Youtube_Chapter_Timestamp_Feature

The sed here is to make sure we won’t have special characters that might lead to some error later.

The -r on jq indicate to return “raw text”. By default, jq will use some syntax colorization and keep some sepcial character that might leads to some issue.

If available, Youtube-dl info json contains a chapters array that contain all the chapters with their start_time , end_time and title .

$ cat The\ New\ Youtube\ Chapter\ Timestamp\ Feature-HZTStHzWRxM.info.json |\
jq -r '.chapters[]'

{
  "start_time": 0,
  "end_time": 17,
  "title": "The new feature"
}
{
  "start_time": 17,
  "end_time": 76,
  "title": "Slow roll-out"
}
{
  "start_time": 76,
  "end_time": 124,
  "title": "How it works"
}
{
  "start_time": 124,
  "end_time": 180,
  "title": "Problems / suggestions for the future"
}

The idea now is to use each dict entry here as parameters for ffmpeg to split the media according to the chapters data. As we are in bash, current json representation will be quite hard to use it like that, so we need to transform a little bit the representation here to use the output of jq in a pipe and in xargs.

What also we need to take into consideration is that ffmpeg can split a media by giving the option -ss to know where to start and -t to know the duration of the cut, not the end time. As the information on the json gives us a start and end time, we need to perfom a simple substraction to have the start time and the duration.

$ cat The\ New\ Youtube\ Chapter\ Timestamp\ Feature-HZTStHzWRxM.info.json  |\
jq -r '.chapters[] | .start_time,.end_time-.start_time,.title ' |\
sed 's/"//g'

0
17
The new feature
17
59
Slow roll-out
76
48
How it works
124
56
Problems / suggestions for the future

Thanks to jq, we can perfom simple math operation directly on the command to compute the duration. sed here again is only for cleaning up special characters.

Now, we can pipe the wonderful xargs to use the output as parameter and trigger a ffmpeg command

$ cat The\ New\ Youtube\ Chapter\ Timestamp\ Feature-HZTStHzWRxM.info.json|\
jq -r '.chapters[] | .start_time,.end_time-.start_time,.title ' |\
sed -e 's/[^A-Za-z0-9._-]/_/g' |\
xargs -n3 -t -d'\n' sh -c 'ffmpeg -y -ss $0 -i "The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.mp3" -t $1 -codec:a copy "$2.mp3"'
  • -n3 indicate to take parameters 3 by 3*
  • -t is only to debug as it will print each command xargs will execute
  • -d'\n' indicate that parameters are separated by \n

What is cool is that we could potentially parallelize the process here by adding to xargs the parameter -P X to run the multiple ffmpeg invokation in parallel.

On ffmpeg side, nothing tremendous :

  • -ss and -t has been already explain as start time and duration,
  • -codec:a copy indicate that we keep everything same as the original file in terms of codec, so no re-encoding for the output file, which means it’s going fast
  • -y to avoid prompt and force override of existing output file

That works quite well. It might be possible to fully one line it, but let’s put a proper script to ease the usage of this.

#!/bin/sh
set -x

#Download media + metadata
youtube-dl --write-info-json -x --audio-format mp3 -o "tmp_out.%(ext)s" $1

# Maybe a way to get the file name from previous function
INFO="tmp_out.info.json"
AUDIO="tmp_out.mp3"
echo :: $INFO $AUDIO ::
# Fetch the title
TITLE=$(cat "$INFO" | jq -r .title | sed -e 's/[^A-Za-z0-9._-]/_/g' )
                                                #  ^--- Remove all weird character as we want to use it as filename
# We will put all chapter into a directory
mkdir "$TITLE"

# Chapterization
cat "$INFO" |\
jq -r '.chapters[] | .start_time,.end_time-.start_time,.title ' |\
sed -e 's/[^A-Za-z0-9._-]/_/g' |\
xargs -n3 -t -d'\n' sh -c "ffmpeg -y -ss \$0 -i \"$AUDIO\" -to \$1 -codec:a copy  -f mp3  \"$TITLE/\$2.mp3\""

#Remove tmp file
rm tmp_out*

The script file here : https://gist.github.com/totetmatt/b4bf50c62642e5a9e1bf6365a47e19c6

No big change on the global approach just something to becareful : Yes, there is a hell quote escape game to play and it might not be pleasant ….

To explain the last part, as far as I understand it, the string will be evaluated multiple time :

  • First time will be at “script level”, so it will replace any $VARIABLE present in the script like $AUDIO and $TITLE
  • Second time will be at xargs / sh -c invokation where then it’s possible to use $0 $1 and $2. But if we don’t escape it first, theses variables will be evaluated at the first round, that’s why we need to backslash it \$0, \$1, \$2.

You can see the result of the string after the 1st evaluation thanks to the -t option of xargs :

sh -c 'ffmpeg -y -ss $0 -i "The New Youtube Chapter Timestamp Feature-HZTStHzWRxM.mp3" -to $1 -codec:a copy  -f mp3  "The_New_Youtube_Chapter_Timestamp_Feature/$2.mp3"' 124 56 Problems___suggestions_for_the_future

There might be other and better way to deal wih the args parsing, the string escape and the string cleanup, but current solution works enough :)