Google Trends Data via Embedded Charts

It’s been a while since I posted, I’ve been very busy! One of the current projects I’m working on required obtaining Google Trends data, but I noticed there is no API, that downloading the CSV files required logging in, and that scraping is against the Terms and Services. Since I need this data for legitimate use in a project for analyzing the usage of “Legal Highs” or “Research Chemicals” and correlating multiple sources, I wanted to use a data source that was already freely available, rather than screen scraping (and the web application already uses embedded Google Trends charts). Since Google allows the embedding of charts that contain data (in JavaScript form) I figured it must be possible to extract the data from the code. Turns out I was right. Here’s the chart the data is extracted from:

Disclaimer: I actually don’t know if this is against the Terms of Services or not, given that the data is already provided I’m assuming it isn’t, as long as full attribution is given, but I’m still reading through them to ensure this is the case.

The result of running

for instance is the following (output truncated for visibility purposes):

Some notes, what this script actually does is take the embedded chart JS code, extracts out the data held in the chartData variable, looks for the hash element with the name “rows”, and uses ExecJS to convert it to a Ruby hash. It then formats the Date into a friendly format. The reason for using ExecJS over Ruby’s own JSON parser is that the rows element contains instances of the Date() object, which is not valid in JSON, and requires the object to be evaluated before it can be converted. Hope this is useful to someone else!

Single Table Inheritance in Rails

Single Table Inheritance is a method of object-relational mapping that emulates class Object-oriented inheritance found in Object Oriented methodology in a relational database. Relational databases don’t provide a mechanism for inheritance, so mapping inheritance in an application requires a simple mechanism for providing the ability. This pattern can be found in Martin Fowler’s Principles of Enterprise Architecture, where he states:

Relational databases don’t support inheritance, so when mapping from objects to databases we have to consider how to represent our nice inheritance structures in relational tables. When mapping to a relational database, we try to minimize the joins that can quickly mount up when processing an inheritance structure in multiple tables. Single Table Inheritance maps all fields of all classes of an inheritance structure into a single table.

Without an example this might seem a little abstract, and Rails provides the ability for Single Table Inheritance via adding a ‘type’ column (you may have at some point tried to create a table via migrations using the name type, only to be told you couldn’t as it was a reserved word).

As an example, a set of object models that only slightly differ (i.e mostly homogenous) works well for this purpose, and doesn’t result in a large table full of fields that will often be null. An easy example might be user updates on a social networking site.

Take Facebook for example, each “status update” can be of a different type. In this example a base class of “Update” will be used, with the classes inheriting from it being Photo and Video, allowing the upload of images and video with different models using the same table. This serves a few purposes. Updates can be fetched with a single SQL query regardless of the model by using the user_id key, and “Update” logic that is independent of type can be used and provided to child classes. An additional advantage is the easy ability to add new types of update using the functionality already provided in the base class.

A Photo uploading class inheriting from Update, using Paperclip for a file attachment.

A Video uploading class inheriting from Update. This model uses the same ‘media’ field but instead can contain video-specific logic.

The table schema contains the following columns:

The media columns are added by the Paperclip Gem, and are treated in the model as a single ‘media’ property. ‘type’ is automatically filled out by Rails and contains the class name. “content” contains any text annotation to the update, and the user_id column references the Users table.

Free tech books: Part 1

I’m splitting this post up because there are far too many to list in one and keep things tidy, so expect future posts with more resources.

  • Dive into Python 3 Dive Into Python 3 covers Python 3 and its differences from Python 2. Compared to Dive Into Python, it’s about 20% revised and 80% new material. The book is now complete
  • Think Python Think Python is an introduction to Python programming for beginners. It starts with basic concepts of programming, and is carefully designed to define all terms when they are first used and to develop each new concept in a logical progression.
  • Clojure: Functional Programming for the JVM The goal of this article is to provide a fairly comprehensive introduction to the Clojure programming language. A large number of features are covered, each in a fairly brief manner.
  • Linux from Scratch A guide to installing Linux from Scratch
  • Learn Version Control with Git Version control is an essential tool if you want to be successful in today’s web & software world. This book will help you master it with ease.
  • Eloquent JavaScript“The first 11 chapters discuss the JavaScript language itself. The next eight chapters are about web browsers and the way JavaScript is used to program them. Finally, two chapters are devoted to Node.js, another environment to program JavaScript in.”
  • Scalable and Modular Architecture for CSS Learn how to structure your CSS to allow for flexibility and maintainability as your project and your team grows.
  • Learn you a Haskell This tutorial is aimed at people who have experience in imperative programming languages (C, C++, Java, Python) but haven’t programmed in a functional language before.
  • An introduction to Go Computer programming is the art, craft and science of writing programs which define how computers operate. This book will teach you how to write computer programs using a programming language designed by Google named Go.
  • Go By Example Go by Example is a hands-on introduction to Go using annotated example programs.

New Ruby 2.x features: Object#tap

The “tap” method adds to a chain of methods and returns the value at that point, but also takes the value as part of a block. This essentially provides a ‘tap’ on the point where data can be debugged, logged, communicated or otherwise used without affecting the return value.

Outputs:

A slightly more useful example might be adding a #tap call in a chain of expressions mid-way through to be able to examine the value without modifying it. For instance, look at the following chain of nonsensical expressions performing transformations on a “hello, world!” string.

You can see that the line has been ‘tapped’ in two places, once after the #upcase call, and secondly after swapping L and R, we can ‘listen in’ on the data here, and as I’ve chosen to in the block, simply output it.

With the #tap calls in place, you get the following output:

HELLO, WORLD!
!DRROW ,ORREH
!DLLOW ,OLLEH

With the taps removed, you simply get the following:


!DLLOW ,OLLEH

Artificial Intelligence is coming to get you

Just kidding. But thanks to modern media concepts of artificial intelligence, there’s a very real worry that if we created intelligent enough systems, they will overthrow or enslave us.

This is faulty reasoning for numerous reasons, as it assumes that any artificially intelligent entity would possess the same traits that we humans consider to be central to “intelligence”. The problem with this assumption is that our brains formed via competitive evolution, both interspecies and intraspecies (natural selection and sexual selection). Speculation on human intelligence and creativity seems to suggest that it may have evolved via sexual selection, i.e as means of courtship display, similar to how a Bowerbird creates “art” in order to impress a mate.

It makes very little sense for many human traits to be written into AI entities. What use is there for jealousy or greed in a robot that cleans your home? Or dominance/submission and hierarchical needs in a medical AI entity?

To illustrate the example with something more concrete, here’s an example of a fallacy in one of the most popular AI dystopian movie of the last few decades: The Matrix. The Machines have chosen an entirely irrational course of action by enslaving the human race in a seeming need for dominance and even vengeance, a strategy they know will waste resources and lead to disruption. A more rational and logical choice would simply be to use their superior intellect and resources to leave Earth and settle on a planet where humans couldn’t reach them. Given their inherent hardiness they could survive natively on planets we couldn’t. It makes no sense for them to pursue the strategy they did, unless we add the same flaws created by competitive evolution in humans.

Quick and Dirty Yahoo Finance Stock Quote Lookup (with metaprogramming)

Here’s a quick class I whipped together as part of an experiment for a project I’m working on, it may be useful to some of you. The class uses Yahoo Finance’s API. Warning: the API can sometimes be temperamental, so you’ll want to check the resulting values before using them in anything important.

Using with method_missing

For a breakdown of the data you can access using the f query string variable, this page offers a great table breakdown of the options.

Web Accessibility

A subject that has been on my mind for some time now has been web accessibility for people with impairments of various kinds. Exploring this world has been somewhat depressing as I’ve been told on numerous occasions while conducting informal interviews with people who have impairments that for many people, there are simply parts of the internet they have had to resign themselves to never being able to access and services they’re unable to use. In some cases these services or sites are targeted at people with disabilities, and fail to meet even minimal standards.

With increasingly complex frontend UIs the challenges of creating accessible websites have also become increasingly complex, but the need for accessibility has not decreased.

So what do we do?

As developers, this is a question which is hard to answer. There is low-hanging fruit that can be fixed or added easily (wikipedia’s guidelines that make it easier for people with screen readers to make use of content for instance, and can be used as an example). Semantic tags and WAI-ARIA have also made things easier to create a structured web application usable by partially sighted or severely visually impaired users. Personally, my understanding of the subject area is still expanding and is still in early stages, but this has become a subject of interest and an area I will continue to explore. I’m including a list of resources I’ve found so far that may be useful for other people who take an interest:

  • Capybara Accessible If you’re a Rails developer using Capybara for your integration tests, this implements the Google’s Accessibility Developer Tools audits during each test run. It’s a limited set of audits but not
  • WebAIM An online auditing tool for accessibility
  • WP Accessibility A WordPress plugin to improve accessibility (note: this is not a panacea).

A last note on researching what can be done better is to try to navigate the internet using the same devices and technology that people with visual/navigation impairments are restricted to. You’ll pretty quickly get an idea of the frustration of using sites and webapps you take for granted.

Design & Development: Link digest

This is part of a series I plan to release every week covering interesting and useful links I’ve come across recently that are relevant, along with some that are just interesting.

Spaced repetition learning: optimal memorization.

This post may seem a little off-topic, but anyone who’s worked in the tech industry knows the sheer amount of new information that is generated each week, changes that are made to your existing toolset and a constant need to refine your skills. Part of the learning process involves memorizing information. Many people are already using spaced repetition as a way of learning, for those who aren’t familiar here’s a brief breakdown:

Spaced repetition is a learning technique that incorporates increasing intervals of time between subsequent review of previously learned material in order to exploit the psychological spacing effect. In the field of psychology, the spacing effect is the phenomenon whereby animals (including humans) more easily remember or learn items when they are studied a few times spaced over a long time span (“spaced presentation”) rather than repeatedly studied in a short span of time (“massed presentation”).

If you’re into the empirical side of things rather than a brief summary, you can find one here

You can think of the ‘forgetting curve’ as being like a chart of a radioactive half-life: each review bumps your memory up in strength 50% of the chart, say, but review doesn’t do much in the early days because the memory simply hasn’t decayed much! (Why does the spacing effect work, on a biological level? There are clear neurochemical differences between massed and spaced in animal models with spacing (>1 hour) enhancing long-term potentiation but not massed.

spaced-repetition-forgetting-curve-stahl

The Forgetting Curve; from Stahl et al CNS Spectr. 2010;15(8):491-504

Algorithms have been devised that use this method in order to allow people to utilize this form of learning. These algorithms have been incorporated into a great amount of software. I’ve personally used Mnemosyne a great deal, but after looking for a program that allows syncing between devices have begun using Anki.

As applied to programming, there is some useful information available thanks to Jack Kinsella. His first blog entry went over what he called the janki method, with an update after 2 years of use on what he’d discovered in the process in terms of efficiency. His own deck on Web Development is available here.

Google’s new ‘promotional’ filter for Gmail

Google do have a track record of pulling API or making changes without warning, in ways that affect peoples business. It’s an unfortunate reality given that many of their products are still in experimental stages, and you can sometimes get screwed over if you built a product around an experimental project. A list of some of the products is here:

  • Google Reader Inexplicable as they’d cornered the market, me and many others used it and have struggled to find an adequate replacement until competitors like Feedly came along.
  • iGoogle Personalised google homepage
  • Google Health “The service allowed Google users to volunteer their health records – either manually or by logging into their accounts at partnered health services providers – into the Google Health system, thereby merging potentially separate health records into one centralized Google Health profile.” – wasn’t personally aware this one existed
  • Google Buzz http://en.wikipedia.org/wiki/Google_Buzz
  • Google Aardvark http://en.wikipedia.org/wiki/Aardvark_(search_engine)

For the full list go here. Some of the less public-facing products that were discontinued were the APIs for Google Finance and Google Translate (fortunately the latter was brought back as a paid service) during the “API spring cleaning”.

So why is this relevant? Recently Gmail altered its service in order to filter emails into separate boxes and is disabled automatically. I spent several hours banging my head against a wall trying to figure out why my emails were being reported as delivered (both through the webapp and the mailer service Mandrill), I found all the post-registration emails containing instructions had been moved to the “promotions” tab. There’s no reason it should be placed there, so now there’s going to be a lot of experimentation involved in the content to figure out how exactly to get that particular email out of the promotions box, confirmation and other emails seem to go through to the main inbox fine. It’s going to be a matter of black-box testing to figure out what’s tripping the heuristic.

.

Either way, Google are once again making a change that makes life more difficult and is umprompted. You now have to avoid the spam folder, and gmail’s promotions folder. I’ll be doing a follow-up post once I’ve mapped out the heuristic and some ways to avoid it under legitimate circumstances.