To create an exporter first create an “exporters” directory in your project’s root folder. Inside this “exporters” folder create a file called “products_json.yaml” with the following content:

exporter_name: products_json # Must be unique
exporter_type: json
collection: products
write_mode: pretty_array # can be `line`,`pretty`, `pretty_array`, or `array`
offset: 0 # offset to where the exported record will start from
order: desc # can be ascending `asc` or descending `desc` 

Update your config.yaml file with this exporter location so that config.yaml looks like the following:

seeder:
 file: ./seeder/seeder.rb
parsers:
 - page_type: listings
   file: ./parsers/listings.rb
 - page_type: products
   file: ./parsers/products.rb
exporters:
 - file: ./exporters/products_json.yaml 

Commit this to Git, and push it to your remote Git repository and deploy once again. Check if the exporter is present with the following command:

$ datahen scraper exporter list walmart-movies

After that, let’s start the exporter:

$ datahen scraper exporter start walmart-movies products_json  

This will return a hash with info about our exporter that should look like the following:

{
 "id": "c700cb749f4e45eeb53609927e21da56", # Export ID here
 "job_id": 852,
 "scraper_id": 20,
 "exporter_name": "products_json",
 "exporter_type": "json",
 "config": {
  "collection": "products",
  "exporter_name": "products_json",
  "exporter_type": "json",
  "limit": 10,
  "offset": 0,
  "order": "desc",
  "write_mode": "pretty_array"
 },
 "status": "enqueued", # the status of the export
 "created_at": "2019-02-05T06:19:56.815979Z"
}

Using the value of the “id” we can check the status of the exporter and then download it when it is done.

$ datahen scraper export show c700cb749f4e45eeb53609927e21da56  

And then to download:

$ datahen scraper export download c700cb749f4e45eeb53609927e21da56

This will automatically download a compressed file with your json data inside. You now should have a working scraper for getting info from Walmart as the seeder link category can be changed from movies to whatever you like. Just make sure to keep testing your parsers locally to see if they run into any errors in case there are subtle differences between categories.

This tutorial should give you an idea of what DataHen is capable of. We were able to get a lot of data from Walmart.com, which can be difficult to scrape, in a short period of time. We also didn’t need to worry about running the scraper on a server or where to save the data, DataHen took care of that for us. All the hard parts of scraping have been solved which allows us to focus on getting the exact data we need. Hopefully this tutorial has been helpful in guiding you on how to use Ruby and Nokogiri to get that data. You should now be able to create your own scraper for whatever site you want and use this tutorial as a reference.