Text recognition using Vision and Core ML

Introduction

Machine Learning allows computers to learn and make decisions without being explicitly programmed how to do that. This is accomplished by algorithms that iteratively learn from the data provided. It’s a very complex topic and an exciting field for researchers, data scientists and academia. However, lately, it’s starting to be a must know skill for good tech people in general. Apple is expecting us to catch up with these technologies, by announcing Core ML. Core ML is a brand new framework from Apple that enables integration of already trained learning models into the iOS apps. Developers can use trained models from popular deep learning frameworks, like Caffe, Keras, SKLearn, LibSVM and XgBoost. Using coremltools, provided by Apple, you can convert trained models from the frameworks above to iOS Core ML model, that can be easily integrated in the app. Then the predictions happen on the device, using both the GPU and CPU (depending on what’s more appropriate at the moment). This means, you don’t need internet connection and using an external web service to provide intelligence to your apps. Also, the predictions are pretty fast. It’s a pretty powerful framework, but with lot of restrictions, as we will see below.

Continue reading “Text recognition using Vision and Core ML”

Advertisements

A year of tech blogging

On this day, last year, I’ve published my first blog post. In one year, I’ve published (including this one) 23 posts, which is around one post in more than two weeks. The content is mostly about iOS development, as you can see by the keywords extracted from my blog posts in the Natural Language Processing tutorial. It’s been quite an interesting year and I’m positively surprised by the benefits that tech blogging brings to my engineering career. Here are some insights of the first year.

Continue reading “A year of tech blogging”

Natural Language Processing in iOS

Natural Language Processing (NLP) is a field in Computer Science which tries to analyze and understand the meaning of the human language. It’s quite a challenging topic, since computers find it pretty hard to understand what we are trying to say (although they are perfect for executing commands well known to them). By utilising established techniques, NLP analyzes the text, enabling applicability in the real world, such as automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question answering.
NLP is also starting to get important in the mobile world. With the rise of conversational interfaces, extracting the correct meaning of the user’s spoken input is crucial. For this reason, there are many NLP solutions on the two most popular platforms iOS and Android. Since iOS 5, Apple has the NSLinguisticTagger class, which provides a lot of natural language processing functionalities in different languages. NSLinguisticTagger can be used to segment natural language text into paragraphs, sentences, or words, and tag information about those tokens, such as part of speech, lexical class, lemma, script, and language. There’s a great presentation in this year’s WWDC about NSLinguisticTagger, which discusses the new enhancements of the class.
In this post, we will create a simple app that will list all the posts from my blog. When a post is selected, the app will open it in a web view, along with details at the bottom about the detected language of the post, as well as the most important words. We will accomplish this using the NSLinguisticTagger class and a simple implementation of the TF-IDF algorithm.

Continue reading “Natural Language Processing in iOS”

Creating lists with SiriKit on iOS11

Last year, Apple announced SiriKit, which provides a lot of opportunities for the developers to provide functionality to the users via Siri. However, there are restrictions in the domains that can utilize this feature and if your app isn’t part of those domains, then you can’t make a big use of it. This year, Apple announced more domains, which means new opportunities for developers. There are updates in the Payments domain, meaning users can now send/transfer money between their accounts, pay bills and search through their accounts. New domain called visual codes is also introduced, which gives your app a chance to show meaningful visual code in the Siri context. This can be very handy in cases where your app stores digital tickets (for transport, cinemas, sport events etc), and when the user is nearby a place for validating the ticket, they can just say ‘Hey Siri, show my MovieTicketsApp ticket’ and Siri will ask your app to provide the ticket. Another new domain is Lists and Notes, which can be used for adding/removing items to a todo list, or adding notes. It’s really handy domain, which we will explore in more details in this post. We will create an app that will add/remove items to a grocery list.

Continue reading “Creating lists with SiriKit on iOS11”

Everything Apple announced on WWDC 2017

It’s that time of the year – Apple’s annual worldwide developer conference (WWDC), where new technologies and updates to the existing ones in Apple’s platforms are announced to the public. It’s the time when iOS developers start testing their apps to see which of the introduced changes might have broken their products. Also, the time innovators eagerly anticipate, to start exploring the new technologies and how they might be utilized in their existing apps or in completely new ideas. It’s an exciting time with lots of expectations and wishes, and Apple always delivers (and sometimes even surprises us). So let’s see what’s new in this edition.

Continue reading “Everything Apple announced on WWDC 2017”

Attending CodeMobile UK

From 18th to 20th April, I had the chance to attend the CodeMobile conference in Chester, UK. This was the first edition of the conference and in this post I will share my impressions of what was happening in those 3 days.

The organizers had the idea to have a conference in Chester, since there are not many developer conferences outside of London. Chester is a lovely town in north-west England, around 40 miles from Manchester. Getting there is pretty easy – we took the plane to Manchester and then the train to Chester, which was about an hour ride. The town itself has an interesting architecture, with bits of Roman influence.

Continue reading “Attending CodeMobile UK”

Exploring Conversational Interfaces

People and computers speak different languages – the former are using words and sentences, while the latter are more into ones and zeros. As we know, this gap in the communication is filled with a mediator, which knows how to translate all the information flowing between the two parts. These mediators are called graphical user interfaces (GUIs).

Finding an appropriate GUI can be quite a challenge – and it’s basically the key factor in determining whether your software would be used. If users don’t understand the interactions they need to do in order to get the most out of it, they will not use it. That’s why the GUIs must be intuitive and easy to learn.

Continue reading “Exploring Conversational Interfaces”