New Role: Data Engineer at Skyscanner
Posted by Michael Okarimia in Skyscanner on February 3, 2018
I was delighted to start a new role as a Data Engineer at Skyscanner in London this January. I’m looking forward to learning how to run data pipelines that are at the Internet Economy scale. This will be definitely Big Data! 😀
R&D Kafka Catalogue Cloud write up
Posted by Michael Okarimia in 7digital on October 30, 2016
Avoiding the move of a monolithic database into the cloud
Music labels will send to 7digital not just the audio recordings, but also the data pertaining to the audio, such as the artwork, track listings, performing Artists, release dates, prices, and specific rights to stream or download the music in various territories around the world. Approximately 250,000 tracks per week are received by 7digital and added to its catalogue, which is stored in a database. This process is called ingestion.
At the outset of the project there was a single database that stores catalogue and sales data which is used for multiple, unrelated purposes. This database is written to with new albums sent to 7digital from music labels, along with licensing rules governing who can access the music.
Slow queries cause the web applications which use the database to time out and fail which returns errors to the end users. Changing the database structure would help to resolve the errors and failures, however they would necessitate re-writing nearly every other web application that 7digital own, since the web applications are tightly coupled to the database. The database is very large and uses proprietary, licensed technology that cannot be easily moved to data centres around the world.
By separating the catalogue data from other data in our system, it would become possible to, not only write a much more efficient database schema which failed less often, but to also move this database into a cloud provider’s platform. With a database in cloud we could build part of 7digital’s Web API platform in the same cloud provider’s platform and therefore deploy our platform nearer to our customers in Asia.
Creating a separate catalogue database in London and moving the applications to AWS in Asia might help solve the problems with concurrent reads and writes to the database, but it would not help reduce the latency experienced by customers in Asia.
The key issue becomes one of figuring out how to transport the relevant catalogue data from the ingestion process in London and send this data into an AWS region.
Moving the ingestion process into AWS is far larger piece of work and would not yield produce performance improvements for customers in Asia by itself, so that it was decided not to move it out of the London datacentre in 2015.
Failures of the project
During 2015 we did not complete the final link in the chain, the application which could read messages from the Kafka service and persist their contents into the AWS database. We could not deploy a full fledged version of the London based Gateway API service, as it was too complex. Instead we made a naive implementation of this Gateway using a technology called nginx.
The Catalogue Persister service eventually was abandoned. An instance of Kafka was built in late 2015 and the Catalogue consumer was due to be started early 2016.
Socrates UK 2016
Posted by Michael Okarimia in Conferences on August 17, 2016
I was fortunate to attend Socrates UK 2016 this year, which was hosted in the beautiful Wotton House, near Dorking in Surrey.
I’ve not attended a Socrates conference before but I’d heard great things about it, as Socrates has at heart the idea of promoting software craftmanship.
After arriving in the early evening and after hearty meal we were ushered into the main conference room for the evening Lightning Talks.
The two talks from that evening that stuck in my mind was one about Team smells, which has nothing to with hygiene but are signs that a team is not operating optimally. A rather informative mind map of Team smells was created and discussed
Franziska Sauerwein showed her presentation of Outside in TDD, something I’ve done throughout much of my career
As I met more and more attendees, I was struck by how friendly and welcoming they were to newcomers like myself. It was their desire to make software a craft rather than getting code out of the door as soon as possible, and attempting to improve professionalism of the industry.
Day One
Using the Unconference format, the agendas of the conference was unplanned before the conference start, where sessions were decided upon at the beginning of each day of the conference.
The on the first day of sessions I opted to attend the following:
- Anti-patterns anonymous
This was a session where the attendees arranged themselves to sit in a large circle and take turns to share their tales of software anti-patterns. It generally was a case of recounting war stories of incompetence and project failure, as we all took turns to tell our account of when a project didn’t go well, and then what we thought could have solved the problem.
Now I had an appetite for some technical sessions, so went on to attend a session about an application built on the SMACK platform
SMACK is an acronym listing the stack if technologies used in a single platform namely:
Spark, a distributed data processing framework, often used to analyse large datasets
Mesos, a distributed Linux kernel designed to operate on many nodes and host distributed systems
Akka, a Java runtime framework optimised for message based distributed systems
Cassandra, a highly scalable database
Kafka, a high throughput distributed messaging system
The application of this stack was used as a monitoring tool to log and analyse metrics from cloud based servers.
Having worked on a application at 7digital which uses Kafka I was interested to hear the problems that other session attendees had encountered and solved in their own platforms.
The next session I attended was titled Microservices vs Monoliths which became a discussion over the problems encountered with either architecture. I related my experiences of how breaking out a monolithic application into smaller APIs whilst sharing the same data store was not ideal, and how one had to consider fault tolerance when doing so.
Mashooq Badar of Codurance led a session on Serverless Architecture which I found very interesting. He talked about how his team built AWS Loft website was created only in AWS tech, namely using the Lambdas and API Gateway
One of the main advantages of using Lambda is the very low cost, rather than hosting your application inside EC2 or Beanstalk instances, one can pay for the resources used to execute the requests. Amazon will bill you appropriately per request, rather than for the resources used to maintain the uptime of your instances if they are not used.
After the hour was up, I moved onto a session titled Mentorship Patterns which is a role I’ve performed at 7digital, helping apprentices improved their skills and become more proficient software developers. I learnt that I could improve myself as a mentor by setting the what expectations I’d expect of the apprentice from the outset. Giving regular feedback was also essential, and this was something I’d do via regular 1-2-1’s.
As the end of the first day wound up, dinner was served, and in the evenings the lighting talks began. Attendees were encouraged to perform a short 5 minute talk in the evenings.
Domain Driven Development Strategic Patterns was a great talk by @Ouarzy whos blog is well worth reading
Radical Candor: Training guidance vs feedback. I wanted this talk to continue for more than its allotted five minutes as the concept of Radical Candor is to tell your team members constructively that they need to improve. The talker linked to this excellent article:
It sounds so simple to say that bosses need to tell employees when they’re screwing up. But it very rarely happens./
Forty Days of fixing, by @suzyhamilton commit to make a single change to a project to improve it, one change a day. Small changes to large messy project can slowly make it better, by following the boy scout rule; when making a change, always leave that part of code base in a better state than when arrived in it.
Discussion of the book Non-violent Communication
There was a story of introducing Agile into an enterprise waterfall project, which lead to a discussion of the book The Phoenix Project, a novel about how an IT project was turned around to save the company. It’s heavily inspired by the classic novel on the Theory Of Constraints by Eliyahu M. Goldratt, The Goal and has a contemporary setting.
Antony Marcano finished the lighting talks with by demoing how to applying SOLID principles to PageObjects when writing Acceptance Web Tests using Selenium & webdriver.
- Day Two
Serverless architecture was a subject the had piqued my interest so I spent a double session working on a hands on exercise
Mashooq Badar led a hands-on lambda session where we set up our own AWS web app powered by lambda. The lab was based upon his blog post on codurance’s site
When build applications on AWS it’s worth considering how to make them fit easily within the AWS ecosystem.
Amazon have created a guide to show how to do this
During the session I pushed my version of the lambda gateway application up up on my github account for future reference. This was probably my favourite technology session at SocratesUK.
Moving back onto the soft skills required to be a good developer, I headed over to join the session titled YOU’RE a developer?!
which was a discussion on the lack of diversity in technology sector, in particular the lack of women. There is still a cultural barrier that puts women off a industry sector which could do much more to become more professional. There is still a lot of sexism that goes unchallenged and we exchanged incidents of this occurring. Then we moved onto ideas around encouraging the changes in attitude that can help the situation. There’s clearly much more that can be done.
After another hearty dinner it was time again for the final five minute long Evening Lightning Talks
My former colleague Matt Butt spoke about the The Dangers of Empathy, and Emotional Contagion, and how to avoid Empathy Burn out. He recommended have a chat/slack room to vent one’s negative feelings. One should try to cultivate compassion without getting emotionally involved.
Finally Matthew Forrester demoed his code which could create diagrams of database schema from a YAML file
Socrates UK was the best conference I’ve attended. It really opened my eyes to the software craftmanship movement, some of the practices I was familiar with and use every day, but software craftmanship seems to tie them together succinctly. The attendees I met really seemed to care about writing great code that solves the right problem, and were a friendly and welcoming bunch to boot.
I want to attend next year’s conference and I wholeheartedly encourage others to do so as well.
- Further reading; information I learnt during the conference:
Tips for allow software developers to develop and grow. Seems to be informed by one of my favourite software books, Peopleware by Tom DeMarco and Tim Lister
Useful links:
Monitoring tool with SMACK architecture: instana.com
Secor is a tool to move Kafka logs into S3, created by pintrest
Software Craftsmanship Newsletter was created by @alebaffa
https://github.com/lscc/socrates-uk/wiki