Recently I was thinking about writing a Shopify application, but couldn't think of a good one to start with. Instead I went to the Shopify app store and started checking applications with bad ratings. The idea was that if an application is getting bad reviews and it's not improving over time, then there can be an opportunity to write a better one.
This post explains the crawler created to collect Shopify App ratings. The crawler is written in Ruby and the data is stored in an SQLite database. The full code and an example database is available on GitHub.
Applications on the Shopify app store are divided into 9 main categories and each application falls into one or more of these. The categories are:
Every category page is also paginated, since hundreds of applications fall into each. To store the main categories, I've created a categories table saving the links to them with the time and the page of the last scrape.
CREATE TABLE IF NOT EXISTS "categories" (
"id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
"name" varchar NOT NULL,
"url" varchar NOT NULL,
"last_page_scraped" integer default 0,
"last_scraped_at" datetime
);
The idea is, that the crawler will periodically go through each category and get the applications listed under them.
Data about the applications are stored in two different tables. One of them is called application_links
, which contains the URL to a single application page found while parsing a category.
CREATE TABLE IF NOT EXISTS "application_links" (
"id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
"url" varchar NOT NULL,
"last_seen_in_category" datetime,
"last_scraped_at" datetime
);
Every time the crawler finds an application on the categories page, it either saves a new row into this table or updates the last_scraped_at
value of an existing record.
The other table is called application_snapshots
. This table is used to track the changes in the number of reviews. Shopify apps can be rated from 1 to 5 stars:
CREATE TABLE IF NOT EXISTS "application_snapshots" (
"id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
"application_link_id" integer NOT NULL,
"name" varchar,
"one_star" integer,
"two_stars" integer,
"three_stars" integer,
"four_stars" integer,
"five_stars" integer,
"created_at" datetime,
FOREIGN KEY ("application_link_id") REFERENCES "application_links" ("id")
);
The records in this table are never updated, only new records are inserted. A single application will have multiple snapshots saved to track the changes in their ratings.
Lets check out a Shopify App called Product Reviews in the example database:
sqlite> select * from application_snapshots where application_link_id = (select id from application_links where url = 'https://apps.shopify.com/product-reviews');
id application_link_id name one_star two_stars three_stars four_stars five_stars created_at
---------- ------------------- --------------- ---------- ---------- ----------- ---------- ---------- --------------------------
109 204 Product Reviews 153 55 95 177 985 2021-07-10 22:06:03.424475
6084 204 Product Reviews 153 55 95 177 986 2021-07-15 11:22:03.452666
12363 204 Product Reviews 153 55 95 177 986 2021-07-23 23:10:03.614955
18739 204 Product Reviews 156 56 95 178 987 2021-08-05 08:13:04.055920
25237 204 Product Reviews 157 56 97 179 990 2021-08-18 11:08:03.081740
31834 204 Product Reviews 156 56 99 179 992 2021-08-31 19:13:03.364446
38515 204 Product Reviews 158 56 102 179 993 2021-09-14 09:13:03.851465
45276 204 Product Reviews 160 58 102 180 994 2021-09-28 05:03:03.094413
52116 204 Product Reviews 164 58 104 182 994 2021-10-12 06:16:04.027297
59015 204 Product Reviews 165 59 108 184 994 2021-10-26 13:20:03.445861
Between 10th of July and 26th of October the application got 12 one star ratings, 4 two star ratings, 13 three star ratings, 7 four star ratings and 9 five star ratings. The average rating in this time period is less than 3 stars!
The crawler has two parts: crawling the categories and crawling the application ratings. To start the crawlers, use ruby parse_category.rb
and ruby parse_app_link.rb
. I used a cron job to run them periodically.
Happy crawling and let me know if you find some interesting changes in the ratings!
P.S. I'm ptrlaszlo on twitter, follow me for more stories.