create a list of all public WordPress plugins
01.08.2024There is a simple and mostly undocumented way using shortlinks to enumerate all public WordPress plugins.
In the markup of single plugin pages at https://wordpress.org/plugins/ you can see an element like this:
<link rel="shortlink" href="https://wordpress.org/plugins/?p=151633">
That means you can go up from 1
until you reach the newest entry and store the results in some file or database.
For that you can use a simple bash script (fetch.sh
):
#!/usr/bin/env bash
echo $1
ENTRY=$(curl -sLI -o /dev/null -w "$1,\"https://wordpress.org/plugins/?p=$1\",\"%{url_effective}\",%{http_code}" "https://wordpress.org/plugins/?p=$1")
echo $ENTRY >> data.csv
Running in an endless loop would be the next logic step (loop.sh
):
#!/usr/bin/env bash
COUNTER=1
for (( ; ; ))
do
echo $COUNTER
ENTRY=$(curl -sLI -o /dev/null -w "$COUNTER,\"https://wordpress.org/plugins/?p=$1\",\"%{url_effective}\",%{http_code}" "https://wordpress.org/plugins/?p=$COUNTER")
echo $ENTRY >> data.csv
COUNTER=$(($COUNTER+1))
done
But you will notice, that this sequential logic has a flaw. Single slow queries can drastically reduce the number of results per hour.
To improve the throughput, you can use the GNU software parallel
.
With 20 parallel requests at the same time it takes much less time:
#!/usr/bin/env bash
start=$(cat .counter)
seq -f "%.0f" $((start + 1)) $((start + 10000)) | parallel -j20 ./fetch.sh
sort -u -g -o data.csv data.csv
last_line=$( tail -n 1 data.csv )
IFS=','
read -ra newarr <<< "$last_line"
echo $newarr > .counter
The content of the resulting csv file looks like this:
151633,"https://wordpress.org/plugins/?p=151633","https://wordpress.org/plugins/cryptocurrency-payments-using-metamask-for-woocommerce/",200
151634,"https://wordpress.org/plugins/?p=151634","https://wordpress.org/plugins/search/%3Fp%3D151634/",200
151635,"https://wordpress.org/plugins/?p=151635","https://wordpress.org/plugins/my-templates-thumbnails-for-elementor/",200
In my case I also convert the csv file to a SQLite database, which makes it easier for me to work with the data:
#!/usr/bin/env bash
rm -- ./database.sqlite3
sqlite3 -cmd ".open ./database.sqlite3" \
-cmd ".mode csv" \
-cmd ".read schema.sql" \
-cmd ".import data.csv entries" \
-cmd ".exit 1"
As schema.sql
I use this:
create table entries (
number integer primary key not null,
url_short varchar(255) not null,
url_final varchar(255) not null,
http_code integer not null
);