Blog of Daniel Ruf

#wordpress

create a list of all public WordPress plugins

01.08.2024

There is a simple and mostly undocumented way using shortlinks to enumerate all public WordPress plugins.

In the markup of single plugin pages at https://wordpress.org/plugins/ you can see an element like this:

<link rel="shortlink" href="https://wordpress.org/plugins/?p=151633">

That means you can go up from 1 until you reach the newest entry and store the results in some file or database.

For that you can use a simple bash script (fetch.sh):

#!/usr/bin/env bash

echo $1

ENTRY=$(curl -sLI -o /dev/null -w "$1,\"https://wordpress.org/plugins/?p=$1\",\"%{url_effective}\",%{http_code}" "https://wordpress.org/plugins/?p=$1")

echo $ENTRY >> data.csv

Running in an endless loop would be the next logic step (loop.sh):

#!/usr/bin/env bash
COUNTER=1
for (( ; ; ))
do
    echo $COUNTER

    ENTRY=$(curl -sLI -o /dev/null -w "$COUNTER,\"https://wordpress.org/plugins/?p=$1\",\"%{url_effective}\",%{http_code}" "https://wordpress.org/plugins/?p=$COUNTER")
    echo $ENTRY >> data.csv

    COUNTER=$(($COUNTER+1))
done

But you will notice, that this sequential logic has a flaw. Single slow queries can drastically reduce the number of results per hour.

To improve the throughput, you can use the GNU software parallel.

With 20 parallel requests at the same time it takes much less time:

#!/usr/bin/env bash
start=$(cat .counter)

seq -f "%.0f" $((start + 1)) $((start + 10000)) | parallel -j20 ./fetch.sh

sort -u -g -o data.csv data.csv
last_line=$( tail -n 1 data.csv )
IFS=','
read -ra newarr <<< "$last_line"
echo $newarr > .counter

The content of the resulting csv file looks like this:

151633,"https://wordpress.org/plugins/?p=151633","https://wordpress.org/plugins/cryptocurrency-payments-using-metamask-for-woocommerce/",200
151634,"https://wordpress.org/plugins/?p=151634","https://wordpress.org/plugins/search/%3Fp%3D151634/",200
151635,"https://wordpress.org/plugins/?p=151635","https://wordpress.org/plugins/my-templates-thumbnails-for-elementor/",200

In my case I also convert the csv file to a SQLite database, which makes it easier for me to work with the data:

#!/usr/bin/env bash
rm -- ./database.sqlite3
sqlite3 -cmd ".open ./database.sqlite3" \
    -cmd ".mode csv" \
    -cmd ".read schema.sql" \
    -cmd ".import data.csv entries" \
    -cmd ".exit 1"

As schema.sql I use this:

create table entries (
  number integer primary key not null,
  url_short varchar(255) not null,
  url_final varchar(255) not null,
  http_code integer not null
);