How many people have TOGAF 9, ArchiMate 2 and BABOK (IIBA) certificates?
The other day I was watching some action movie and there was a scene when main character found some list of people and in order to find suspects of a crime had to cross-check it against other lists to find suspects (people present in all lists). In mathematics common part of two (or more) sets is a trivial thing to do. However in real life it’s not always the case. Despite that, in the movie I was watching it took her like 5 seconds to find common group of people present in all the lists.
This gave me an idea: how long would similar exercise take in real life? That’s a great question – I thought, I just needed a real case. It wasn’t long to figure it out: How many people have both TOGAF 9 and ArchiMate 2 certificates (at any level)? But those are only two lists, to make it harder I asked the question:
How many people have TOGAF 9, ArchiMate 2 and BABOK (IIBA) certificates (at any level)?
The question can be presented by the Vann diagram. I’m asking for the number in the middle, the common part of three sets. The common parts of each of two sets are also interesting.
How to solve that case?
First of all I want to ensure you that I am not any movielike super-duper-hacker but I used to program in Java a lot and I still have my skills. The solution that I came with probably is not the best one (in terms of time spent) but any way it was fun to program again a little bit.
The grand rule that I set for myself here was no manual work computer must work for me and give me answer in othe
r words no Excel allowed. In movies they don’t use Excel, right?
Steps that I visualized for solving the task were pretty simple:
- Aggregate the data
- Analyse the data
First problem with data was that TOGAF and ArchiMate registries of certified people are available only as listings on website (or search results) and you cannot download the whole list of people at once. And I needed full lists for my analysis. So what did I do?
I fired my Eclipse and started writing code. How to download information from a website? Simple, you need a web-scraper that will go through the website (as a normal user would with a browser) and save the data. For that I used Selenium, WebDriver, and PythonJS headless browser (webbrowser emulator without GUI). After one evening I had application that was able to scrape information about each TOGAF and ArchiMate publicly available certificate. I should mention here that The Open Group allows each certified person to hide information about its certificate from a public registry.
So I started my application and directed it to scrape TOGAF certificates and save them into my database for further analysis. You know how long it took to get information about 38 thousands certificates? Two and a half of hour! Mostly because Selenium is quite slow on big pages (and there are listings with few thousands rows in a table) and secondly because website is not very fast too. While my application was working for me I had time to watch more movies 🙂 For ArchiMate it went much faster but I had to deal with captcha security before each search.
The list of IIBA’s certification was much simpler to get. You just go to the website and download Excel file. I wrote one class in my app to read the excel file contents and save into my db. The first step was finished. I had all there “lists” in my database. It was time to start the analysis.
First of all, before analysing data you need to understand it. Each certification has its own level, it can be presented like that:
For each field (TOGAF, ArchiMate, BABOK) there are two levels of certification. Any combination of them is valid for a given person. For example I can have:
- TOGAF 9 Foundation or
- TOGAF 9 Certified or
- Both certificates.
Normally you need to have Foundation level to be able to take exam for Certified level, but TOG allows people to take both exams at once in Combined Examination. If you pass both you get “only” one certificate at second level (Certified).
IIBA has different certification rules but for my analysis the principle was the same: you can have CCBA or CBAP or both.
Let’s move back to the analysis part. My stated question was about people not certificates. As you can see from the table above one person can have from one to six different certifies. So I needed to shift my focus from Certificate object to Person object which has many Certificates. (That’s why on the picture below with the answer there are two numbers: people are in red, and certificates are in blue.)
During this aggregation of certificates per person I also had to deal with the people who have the same names, I distinguish them by the date when the certificate was issued.
After having the list of people with certificates there was only one thing to do: calculate number of distinct certificates per person. By distinct certificate I mean that you have one or two certificates from one particular field. This logic is based on the fact that it does not matter whether you have level 1 or level 2 or both certificates you still possess knowledge about one field. Let’s take an example from a table above:
- Person 1 has 3 distinct certificates
- Person 2 has 2 distinct certificates, although he or she passed 4 examinations.
At this moment, my application was finished and I executed the analysis task. Below is part of the console output of the program show only 4 people with 3 distinct certificates.
So, the answer to the main question:
- How many people have TOGAF 9, ArchiMate 2 and BABOK (IIBA) certificates (at any level)?
- It’s 4 in the whole World.
Well, I must admit that I was hoping for 42 🙂
Below is a picture with all the results. And how long this whole exercise took me? Around 3 evenings (approx. 12 hours). And it was fun to program in Java 🙂
Blue number represent certificates. Red numbers represent people.
Of course 4 is not the final answer as there are new certifications every day. Also as I already mentioned TOG allows people to hide their certification information from the public listings on the website, so my analysis was not conducted on the complete data set. The graphs are made in Word, so no Excel.