Export Kindle notes and highlights using Linux grep

If you own an Amazon Kindle device you probably have needed to export your notes and highlights from books you have read. Kindle holds all this information in a file named “My clippings” [1], which luckily is in .txt format. Unluckily the notes and highlights are mixed together in this file, thus it gives you a hard time locating the relevant information (that is, if you are indeed going back to your notes and highlights and you’re not just doing it to help your mind digest what you’re reading at that moment).

Spoiler alert: what follows next is for Linux users’ eyes only. We are going to see how to use the grep command line [2] to extract the relevant information from the “My Clippings.txt” file.

Extract Kindle notes and highlights using Linux grep

How to export the notes and highlights

This file has a format that resembles the following:

==========
Abolish Silicon Valley: How to Liberate Technology From Capitalism (Wendy Liu)
– Your Highlight on page 182 | Location 2776–2777 | Added on Saturday, August 1, 2020 11:07:20 PM

You’re not supposed to enjoy anything without being reminded that these corporations exist and would like you to buy their products.

==========
The Systems View of Life: A Unifying Vision (Fritjof Capra;Pier Luigi Luisi)
– Your Highlight on page 684 | Location 10481–10482 | Added on Sunday, March 15, 2020 11:26:03 PM

Organic farming is sustainable because it embodies ecological principles that have been tested by evolution for billions of years (see Section 16.3.2).

Say that you wish to export Kindle notes and highlights for the book “Abolish Silicon Valley”, by Wendy Liu. You would then need to use the following (or similar) command line, in the folder where you have copied the “My Clippings.txt” file:

grep -A 3 Wendy My\ Clippings.txt | grep -v Wendy > Abolish\ Silicon\ Valley\ notes.txt

But let’s decode this:

  • grep -A 3 Wendy My\ Clippings.txt: this part of the command line finds the lines that contain the word “Wendy” in the file “My Clippings.txt” (the “\” is used to escape the space character in the file name) and returns in the terminal each of these lines including their three following lines (-A 3 operator). We also count the empty line here.

So if we only used that part of the command line we would get multiple paragraphs like this one:

Abolish Silicon Valley: How to Liberate Technology From Capitalism (Wendy Liu)
– Your Highlight on page 182 | Location 2776–2777 | Added on Saturday, August 1, 2020 11:07:20 PM

You’re not supposed to enjoy anything without being reminded that these corporations exist and would like you to buy their products.

Cleaning-up the generated file

But we don’t want the first line to repeat. We just want to keep the “flesh” of the highlights, i.e. the page number / date and the highlight/note itself. This is where the second part of the command line comes in handy:

  • | grep -v Wendy: the so-called pipeline (symbol: |) is used to direct the output of the preceding part of the command line to the part that follows. In other words the command after the pipeline is applied to the output of the first grep. What it does is that it omits the lines with the term “Wendy”, and this is what we actually want.
  • > Abolish\ Silicon\ Valley\ notes.txt: the “>” operator is directing the output of all the previous commands to a text file named “Abolish Silicon Valley notes.txt”. Here we are using the “\” again to escape the space character in the filename.

That was it! If you modify the search terms of the grep command (e.g. “Systems\ View”) you can select different book title / author to filter the content of a specific book and place it in a separate .txt file.


[1] To find this file you need to connect your Kindle to your laptop/PC via USB cable and browse for it in the file manager. I suggest you copy this file in some local directory before trying this.

[2] For a thorough description of the grep command you can use man grep in the terminal, or visit https://www.gnu.org/software/grep/manual/grep.html

Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*