A pretty usefull application of the YOLO algorithm can be to localize and to classify grabages from pictures.
Here we used YOLO to do so and built a full API (labeling / check tool) and an App (Android / iOS).
The default one is a Thistlethwaite’s 4 phases algorithm. But instead of using huge, pre-computed prunning tables, we implemented 4 heuristics matching the 4 standard algorithm groups and which could provide sufficient estimation for an IDAstar (depthFirstSearch), performing quite well on randomly scrumbled cubes. This implementation will find short solutions (20-50 steps), but may take from a few seconds to few minutes to return.
The second one is a “Human” algorithm. Based on “CFOP” method. It uses several algorithms to solve known configurations as it evolve to solution.
For this purpose, I used Keras and re-implemented the models with it. It’s been challenging, especially for YOLO ( Localisation ) witch need a custom loss function because it predicts bounding boxes instead of a single label…
Classification is the simplest task and uses only simple convolutional layers before some Dense (fully connected) layers. The number of layers depends on your problem’s complexity… You can also output more than one label per image (imagine a driving car that decide to go to left and to speed up, from the same picture… like I did before)
Segmentation is as simple as the previous but this time, we will classify the pixels themselves (or little groups of pixels)… So the network should output W x H x C elements with W the image’s width, H the image’s height and C the number of class in witch you want to classify. I used a UNET Keras implementation for this purpose :
Last but not least, maybe the more challenging and interesting in my opinion… YOLO !
You Only Look Once is an algorythm witch split the image into 7 x 7 cells, and will predict the probability of an object having its center in a cell, the width and height of the bounding box, along with a class label and confidence… So the output of the model should be S x S x (5 + C)) if we detect only one object per cell… (the “5” represents the 4 dimensions (x, y, w, h,) coords of a bounding box and the “confidence of the model in its prediction”).
Example on 3 x 3 grid:
All you have to do is build up your data the good shape and you’r good to go !
Then you need to process the loss by hand beacause Keras can’t do it for you… Basically, it compute the squared difference between the x, y, w, h values, classes, confidence and ground thuth. It also uses an “Intersection over Union” to consider only one box for a given object…
But they are very good at learning patterns and reproduce them, so let’s try to build an “AI” (or whatever you want to call it…) that can produce (originals) parts of music !
And like always, everything starts with data…
While computers are not that good to deal with audio, midi format fits perfectly for the job (for non musicians, midi files are like text files where you write the notes/lenghts/velocities/etc as successives numbers). So we take a bunch of piano solo midi files, put them all together, before we split them in chunks of N notes …
We will give those successive chunks of notes to an Artificial Neural Network, that will have to predict the next single note, knowing the N past notes.
In order to do so, we compute the midi files so the Neural Net can read them, and, with the help of several LSTM (Long SHort Term Memory) layers, after approx 10 epochs of learning (100 time the whole dataset), we achieved the result you can ear from here :
These 3 piano parts have been integrally improvised by the Neural Network, no post processing have been done except we used a piano patch to make it sound better…
Not that bad for a machine !
But one of these songs turned out to be a perfect copy of the original (a beautyfull example of “overfiting”), can you gess wich one ?