Crawling Shopify application ratings

Recently I was thinking about writing a Shopify application, but couldn't think of a good one to start with. Instead I went to the Shopify app store and started checking applications with bad ratings. The idea was that if an application is getting bad reviews and it's not improving over time, then there can be an opportunity to write a better one.

This post explains the crawler created to collect Shopify App ratings. The crawler is written in Ruby and the data is stored in an SQLite database. The full code and an example database is available on GitHub.

Categories

Applications on the Shopify app store are divided into 9 main categories and each application falls into one or more of these. The categories are:

Every category page is also paginated, since hundreds of applications fall into each. To store the main categories, I've created a categories table saving the links to them with the time and the page of the last scrape.

CREATE TABLE IF NOT EXISTS "categories" (
  "id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
  "name" varchar NOT NULL,
  "url" varchar NOT NULL,
  "last_page_scraped" integer default 0,
  "last_scraped_at" datetime
);

The idea is, that the crawler will periodically go through each category and get the applications listed under them.

Applications

Data about the applications are stored in two different tables. One of them is called application_links, which contains the URL to a single application page found while parsing a category.

CREATE TABLE IF NOT EXISTS "application_links" (
  "id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
  "url" varchar NOT NULL,
  "last_seen_in_category" datetime,
  "last_scraped_at" datetime
);

Every time the crawler finds an application on the categories page, it either saves a new row into this table or updates the last_scraped_at value of an existing record.

The other table is called application_snapshots. This table is used to track the changes in the number of reviews. Shopify apps can be rated from 1 to 5 stars:

CREATE TABLE IF NOT EXISTS "application_snapshots" (
  "id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
  "application_link_id" integer NOT NULL,
  "name" varchar,
  "one_star" integer,
  "two_stars" integer,
  "three_stars" integer,
  "four_stars" integer,
  "five_stars" integer,
  "created_at" datetime,
  FOREIGN KEY ("application_link_id") REFERENCES "application_links" ("id")
);

The records in this table are never updated, only new records are inserted. A single application will have multiple snapshots saved to track the changes in their ratings.

Lets check out a Shopify App called Product Reviews in the example database:

sqlite> select * from application_snapshots where application_link_id = (select id from application_links where url = 'https://apps.shopify.com/product-reviews');
id          application_link_id  name             one_star    two_stars   three_stars  four_stars  five_stars  created_at
----------  -------------------  ---------------  ----------  ----------  -----------  ----------  ----------  --------------------------
109         204                  Product Reviews  153         55          95           177         985         2021-07-10 22:06:03.424475
6084        204                  Product Reviews  153         55          95           177         986         2021-07-15 11:22:03.452666
12363       204                  Product Reviews  153         55          95           177         986         2021-07-23 23:10:03.614955
18739       204                  Product Reviews  156         56          95           178         987         2021-08-05 08:13:04.055920
25237       204                  Product Reviews  157         56          97           179         990         2021-08-18 11:08:03.081740
31834       204                  Product Reviews  156         56          99           179         992         2021-08-31 19:13:03.364446
38515       204                  Product Reviews  158         56          102          179         993         2021-09-14 09:13:03.851465
45276       204                  Product Reviews  160         58          102          180         994         2021-09-28 05:03:03.094413
52116       204                  Product Reviews  164         58          104          182         994         2021-10-12 06:16:04.027297
59015       204                  Product Reviews  165         59          108          184         994         2021-10-26 13:20:03.445861

Between 10th of July and 26th of October the application got 12 one star ratings, 4 two star ratings, 13 three star ratings, 7 four star ratings and 9 five star ratings. The average rating in this time period is less than 3 stars!

Using the crawler

The crawler has two parts: crawling the categories and crawling the application ratings. To start the crawlers, use ruby parse_category.rb and ruby parse_app_link.rb. I used a cron job to run them periodically.

Happy crawling and let me know if you find some interesting changes in the ratings!

P.S. I'm ptrlaszlo on twitter, follow me for more stories.